Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review web scraping lesson and get it ready for publication #35

Closed
ostephens opened this issue May 31, 2017 · 2 comments
Closed

Review web scraping lesson and get it ready for publication #35

ostephens opened this issue May 31, 2017 · 2 comments

Comments

@ostephens
Copy link

The web scraping lesson https://github.com/data-lessons/library-webscraping was initially developed by @timtomch

The contents of @timtomch's lesson has been copied to this repository to be reviewed and amended before it is ready for publication as part of the library carpentry materials.

Use https://github.com/data-lessons/library-webscraping/issues for issues to be worked on during the sprint

@drjwbaker
Copy link

When you (someone) gets a minute, please report back on progress during the sprint and close this issue. Ta!

@ldko
Copy link

ldko commented Jun 5, 2017

During the 2017 sprint we worked on setup and README files and outlined a plan to:

  • Remove Chrome extensions section to focus more on scraping with Python as a way to learn and apply programming concepts rather than using an available tool that may not be supported for very long.

  • Add CSS Selector examples in addition to the XPATH examples that are there as it will lead into the episodes about selecting items with BeautifulSoup.

  • Replace Scrapy instructions with Requests and BeautifulSoup (#6, #14).

  • Laid out a general outline of how the teaching of BeautifulSoup (#11, #12 might go--following the structure of a scraping tutorial used by University of Oklahoma but using the UN site to get data about Security Council resolutions for the examples.

We also discussed modifying where/how ethics are brought up in the lesson and possible benefits of using URLs from archive.org Wayback Machine for the scraping examples, since those should be static as opposed to using a live production site that may change at any time. Also, we set up a GitHub Project in the library-webscraping repo for tracking progress.

Due to having fewer people participating in work on the web scraping lesson on day 2 of the sprint, there was not much done to actually make the changes to the content structure that were proposed the first day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants