MJ-webscraper---sales

Webscraper to retrieve marijuana sales files

This project uses Python Beautifulsoup and regex to identify and download excel files containing county level marijuana sales data from the following public facing website: https://www.colorado.gov/pacific/revenue/colorado-marijuana-sales-reports.

Due to the number of files (65 at the time of this writing) and the wait time, socket errors occasionally and randomly occur. Requests.get() is set to verify=False to prevent the program from stopping. A 30 second wait is implemented to prevent overwhelming the colorado.gov server.

The file 'Web Scraper Sales.ipynb' is a jupyterLab notebook that I will be adding narrative to as I get the chance. The file 'web scraper sales.py' is the shortened script.

Files are saved in the format mmyy_sales.xlsx.

Resources:
https://pythonspot.com/extract-links-from-webpage-beautifulsoup/
https://www.geeksforgeeks.org/downloading-files-web-using-python/

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.ipynb_checkpoints		.ipynb_checkpoints
files		files
.gitattributes		.gitattributes
README.md		README.md
Web Scraper Sales.ipynb		Web Scraper Sales.ipynb
web scraper sales.py		web scraper sales.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MJ-webscraper---sales

About

Releases

Packages

Languages

davelovesdata/01-MJ-webscraper---sales

Folders and files

Latest commit

History

Repository files navigation

MJ-webscraper---sales

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages