Skip to content
🚆This web scraper builds a dataset for São Paulo subway operation status
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
Procfile
README.md
email_debug.py
encode_data.py
requirements.txt
scraper.py

README.md

What it is

This project consists of basically a single python script to write the status of the São Paulo subway lines to a docs.google worksheet.

The sheets can be viewed (and freely used for any datascience project) here.

How it works

Every 5 minutes the script fetches the official subway company page using 'requests' module and extracts the operation status as shown in the column on the right-side of the page using 'beautiful soup' module. The last-update time shown is also stored and later on is associated with each subwat line.

Once everything is properly parsed, the information is stored in the worksheet using the 'gspread' module.

The script runs indefinately on heroku.

Unavailability or other issues

If for some reason the data points registered are empty, an e-mail is sent with the page attached so I can see the page and if necessary the logs to find out what happend.

If this data is ever useful to you, let me know. Enjoy! 🍻

Data Analysis

An analysis of the data was made by Paulo! You can read it here

You can’t perform that action at this time.