Add a Section Introducing the Requests and BeautifulSoup Packages #11

runderwood · 2017-06-01T20:03:30Z

Provide an introductory example, aligned with the exercises in the browser console, that fetches a document from a given url and scrapes data from its markup using Python requests and BeautifulSoup.

The text was updated successfully, but these errors were encountered:

vphill · 2017-06-01T20:24:49Z

A little more context on this ticket.

This ticket will build off of the basic structure from a perviously created scraping tutorial from the University of Oklahoma - http://ouinformatics.github.io/swc_beautiful_soup/

Instead of the supplied URLs we will use the United Nations Security Council Resolutions found here - http://www.un.org/en/sc/documents/resolutions/

Here is the basic idea for the lesson:

Load the page with Requests and BeautifulSoup
Extract the links from the page
Limit the links to just the years of resolutions
Output the links to the page that lists resolutions by year
Load one year and display page contents

The next section in the lesson will dive deeper into using Python and BeautifulSoup as well as diving a little further into the data on the site.

jnothman · 2017-06-21T07:52:27Z

It turns out the UNSC resolutions dataset that was suggested as a scraping target is very quirky in terms of variation from year to year. See ctds-usyd#2 where I've built scrapers for it in multiple frameworks

ldko added the mozsprint label Jun 1, 2017

runderwood added this to Identified in Moz-Sprint-2017 Jun 1, 2017

runderwood mentioned this issue Jun 1, 2017

Include Beautiful Soup instead of Scrapy (or as an add-on) #6

Closed

runderwood mentioned this issue Jun 1, 2017

Add a Section Providing an Advanced Exercise in Scraping with Requests and BeautifulSoup #12

Closed

vphill moved this from Identified to Discussed in Moz-Sprint-2017 Jun 1, 2017

ldko mentioned this issue Jun 5, 2017

Review web scraping lesson and get it ready for publication data-lessons/librarycarpentry#35

Closed

jnothman mentioned this issue Jun 21, 2017

Trying to extract UNSC Resolutions with various Python scraping frameworks ctds-usyd/library-webscraping#2

Open

8 tasks

jnothman mentioned this issue Jul 4, 2017

Episode on scraping UNSC data with requests/lxml #47

Merged

weaverbel mentioned this issue Jun 1, 2018

Lesson content carpentries-incubator/lc-webscraping#21

Open

weaverbel closed this as completed Jun 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Section Introducing the Requests and BeautifulSoup Packages #11

Add a Section Introducing the Requests and BeautifulSoup Packages #11

runderwood commented Jun 1, 2017 •

edited

vphill commented Jun 1, 2017

jnothman commented Jun 21, 2017

Add a Section Introducing the Requests and BeautifulSoup Packages #11

Add a Section Introducing the Requests and BeautifulSoup Packages #11

Comments

runderwood commented Jun 1, 2017 • edited

vphill commented Jun 1, 2017

jnothman commented Jun 21, 2017

runderwood commented Jun 1, 2017 •

edited