Skip to content

GCDigitalFellows/bs4_workshop

 
 

Repository files navigation

Introduction to Web Scraping with Python (bs4)

Skip to Frontmatter.

This workshop introduces web scraping with Python library bs4. It can be taught asynchronously or synchronously, as its own workshop or as part of the Python track on the digital institutes.

Credits

This workshop was written by Filipa Calado.

Workshops

It was first taught at CUNY GC by Filipa Calado in the Spring of 2021 as a two hour online syncronous workshop.

Abstract:

This workshop goes over how to web scrape using python library, Beautiful Soup 4, or bs4. In short, bs4 is a Python library for "web scraping," or pulling data out of HTML and XML files. In this workshop, we will be using bs4 to scrape news data from the New York Times website. By this end of this workshop, you will have a python script that can grab data from a website and export that data into a CSV file. Then, at the very end, I will show you a couple of other ways to scrape websites, that go beyond bs4, for scraping social media.

Requirements

  • Students need to be familiar with the Python language, having completed the Introduction to Python workshop before taking this workshop.
  • Students should install the most recent Anaconda Python distribution on their computers, as well as the python libraries requests, bs4, lxml and csv.

Reception and Feedback

Feedback was very good. Students thought the pace and content was effective.

Needed/Desired Changes

There is some interest in expanding this workshop into a two or three part web scraping series.

License

Workshop leader: Filipa Calado, Graduate Center Digital Fellows

Creative Commons License

Creative Commons Attribution-ShareAlike 4.0 International License.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 58.9%
  • HTML 41.1%