Skip to content
Repository for the scraping bootcamp.
Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.DS_Store
HODP Python Scraping Exercise.ipynb
HODP Python Scraping Solutions.ipynb
README.md
demo_scraping.ipynb
geckodriver.log
sample_text.txt fixing sample text Oct 1, 2019

README.md

Scraping Bootcamp

This is the repository for the HODP Week 3 Data Scraping Bootcamp.

Here's what you need to do:

  1. Ignore demo_scraping.ipynb, that's for us to demonstrate what scraping looks like, you don't need to change anything there!
  2. Open regex101.com and copy the text from sample_text.txt into the test string box.
  3. Make sure your flavour (on the left hand side of the page) is set to Python.
  4. Feel free to refer to the useful reesources listed below during the bootcamp!

Helpful resources

  1. Regex reference sheet: https://www.regular-expressions.info/quickstart.html or http://www.rexegg.com/regex-quickstart.html#ref
  2. Most of the Regex lessons were taken from https://regexone.com/, definitely return to them if you need a refresher/want more lessons.
  3. Great tutorial on how to use BeautifulSoup: https://www.datacamp.com/community/tutorials/web-scraping-using-python
You can’t perform that action at this time.