Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Section Introducing the Requests and BeautifulSoup Packages #11

Closed
runderwood opened this issue Jun 1, 2017 · 2 comments
Closed

Comments

@runderwood
Copy link
Collaborator

runderwood commented Jun 1, 2017

Provide an introductory example, aligned with the exercises in the browser console, that fetches a document from a given url and scrapes data from its markup using Python requests and BeautifulSoup.

@vphill
Copy link
Collaborator

vphill commented Jun 1, 2017

A little more context on this ticket.

This ticket will build off of the basic structure from a perviously created scraping tutorial from the University of Oklahoma - http://ouinformatics.github.io/swc_beautiful_soup/

Instead of the supplied URLs we will use the United Nations Security Council Resolutions found here - http://www.un.org/en/sc/documents/resolutions/

Here is the basic idea for the lesson:

  • Load the page with Requests and BeautifulSoup
  • Extract the links from the page
  • Limit the links to just the years of resolutions
  • Output the links to the page that lists resolutions by year
  • Load one year and display page contents

The next section in the lesson will dive deeper into using Python and BeautifulSoup as well as diving a little further into the data on the site.

@jnothman
Copy link
Contributor

It turns out the UNSC resolutions dataset that was suggested as a scraping target is very quirky in terms of variation from year to year. See ctds-usyd#2 where I've built scrapers for it in multiple frameworks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

5 participants