Skip to content

davelovesdata/01-MJ-webscraper---sales

Repository files navigation

MJ-webscraper---sales

Webscraper to retrieve marijuana sales files

This project uses Python Beautifulsoup and regex to identify and download excel files containing county level marijuana sales data from the following public facing website: https://www.colorado.gov/pacific/revenue/colorado-marijuana-sales-reports.

Due to the number of files (65 at the time of this writing) and the wait time, socket errors occasionally and randomly occur. Requests.get() is set to verify=False to prevent the program from stopping. A 30 second wait is implemented to prevent overwhelming the colorado.gov server.

The file 'Web Scraper Sales.ipynb' is a jupyterLab notebook that I will be adding narrative to as I get the chance. The file 'web scraper sales.py' is the shortened script.

Files are saved in the format mmyy_sales.xlsx.

Resources:
https://pythonspot.com/extract-links-from-webpage-beautifulsoup/
https://www.geeksforgeeks.org/downloading-files-web-using-python/

About

scraper to retrieve marijuana sales files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published