Skip to content

amlsf/scrapy_workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scrapy Workshop Demo Spiders

This repo contains Scrapy spiders demo-ed in Radius Intelligence's workshop "Data Collection with Scrapy: Build & Manage Production Web Scraping Pipelines".

####Presentation materials available here

The spiders collect data about the wine products from www.wine.com and are broken out into levels that build new concepts on top of each other.

  • L0 (wine_example/spiders/L0_barespider.py)

    • set up basic spider to fetch from wine.com url
  • L1 (wine_example/spiders/L1_wine.py)

    • Create a spider that returns an item type named 'Wine' containing the fields: 1) the specific product page link, 2) product name, and 3) the current sell price. Only do this for the first page of 25 wine products at www.wine.com/v6/wineshop
  • L2 (wine_example/spiders/L2_wine_meta.py)

    • Add to the Wine item the following fields: 1) wine type and 2) region.
  • L3 (wine_example/spiders/L3_wine_pagination.py)

    • Teach your spider to crawl through all product pages to gather all 5000+ products
  • Wine_login.py (wine_example/spiders/wine_login.py)

    • Create a login authentication aware spider

#####Take-Home Challenge:

  • L4 (wine_example/spiders/L4_wine_reviews.py)
    • Complete this part on your own. Teach your spider to crawl one more page level deep to scrape all ratings and reviews for each product. Good luck and have fun!

Development Environment Setup Instructions

  • For those who do not have pip installed:
curl -O https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py # writes to system Python
  • Install & activate virtualenv
sudo pip install virtualenv # writes to system Python
virtualenv scrapy_learn # isolated from system Python
source scrapy_learn/bin/activate
  • Install Scrapy & Dependencies
pip install wheel
pip install scrapy
  • You will also need Chrome

Additional Resources

About

Data Collection with Scrapy: Build & Manage Production Web Scraping Pipelines

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages