Skip to content

banditelol/airscraper

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Airscraper

Open In Colab PyPI version

A simple scraper to download csv from any airtable shared view programatically, think of it as a programatic way of downloading csv from airtable shared view. Use it if:

  • You want to download a shared view periodically
  • You don't mind the shared view to be accessed basically without authorization

Requirements

Because its a simple scraper, basically only beautifulsoup is needed

  • BeautifulSoup4
  • Pandas

Installation

Using pip (Recommended)

pip install airscraper

Build From Source

  • Install build dependencies:
pip install --upgrade pip setuptools wheel
pip install tqdm
pip install --user --upgrade twine
  • Build the Package
    • python setup.py bdist_wheel
  • Install the built Package
    • pip install --upgrade dist/airscraper-0.1-py3-none-any.whl
  • Use it without adding python in front of it
    • airscraper [url]

Direct Execution (Testing Purpose)

  • Clone this project
  • Install the requirements
    • pip install -r requirements.txt
  • run the code
    • python airscraper/airscraper.py [url]

Usage

Create a shared view link and use that link to download the shared view into csv. All [url] mentioned in the examples are referring to the shared view link you get from this step.

As CLI

# Print Result to Terminal
python airscraper/airscraper.py [url]

# Pipe the result to csv file
python airscraper/airscraper.py [url] > [filename].csv

As Python Package

from airscraper import AirScraper

client = AirScraper([url])
data = client.get_table().text

# print the result
print(data)

# save as file
with open('data.csv','w') as f:
  f.write(data)

# use it with pandas
from io import StringIO
import pandas as pd

df = pd.read_csv(StringIO(data), sep=',')
df.head()

Help

usage: airscraper [-h] [-l LOCALE] [-tz TIMEZONE] view_url

Download CSV from Airtable Shared View Link, You can pass the result to file using
'> name.csv'

positional arguments:
  view_url              url generated from sharing view using link in airtable

optional arguments:
  -h, --help            show this help message and exit
  -l LOCALE, --locale LOCALE
                        Your locale, default to 'en'
  -tz TIMEZONE, --timezone TIMEZONE
                        Your timezone, use URL encoded string, default to
                        'Asia/Jakarta'

What's next

Currently I'm thinking of several things in mind:

  • Making this installed package
  • Adds accessibility to use it in FaaS Platform (most use case I could thought of are related to this)
  • Create a proper package that can be imported (so I could use it in my ETL script)
  • Fill in LICENSE and setup.py, (to be honest I have no idea yet what to put into it)
    • It turns out there are a lot of resources out there if you know what to look for :)

Contributing

If you have similar problem or have any idea to improve this package please let me know in the issues or just hit me up on twitter @BanditelolRP

Development

If you're going to try to develop it yourself, here's my overall workflow

1. Create a virtual environment

I usually used venv on python 3.8 to create a new virtualenvironment

python -m venv venv
# and activate the environment
source venv/bin/activate

2. Create a virtual environment

Install necessary requirements and install the package for development using editable

pip install wheels pytest -q
pip install -r requirements.txt
pip install -e .

3. Play around with the code

You can browse the notebook for explanation on how it works and some example use case, and I really appreciate helps in documentation and testing. Have fun!

Releases

No releases published

Packages

No packages published