In addition to webscraping using the Beautiful Soup package, Python enables analyses beyond scraping such as preprocessing of hidden characters, merging different data, summary statistics, and visualizations.
The Jupyter notebook Webscraping-script.ipynb can be found in the GitHub repository.
- python 2.7 (3.5. may produce some errors)
- pandas
- BeautifulSoup
- requests
- csv
- re
- urllib2
- datetime
- os
- sys
- matplotlib
A link to the original blog: https://rrighart.github.io/Webscraping/
Remote data science service for small and larger projects: https://www.rrighart.com
Any questions or remarks, reach out to me: rrighart@googlemail.com