Scrapy Workshop Demo Spiders

This repo contains Scrapy spiders demo-ed in Radius Intelligence's workshop "Data Collection with Scrapy: Build & Manage Production Web Scraping Pipelines".

####Presentation materials available here

The spiders collect data about the wine products from www.wine.com and are broken out into levels that build new concepts on top of each other.

L0 (wine_example/spiders/L0_barespider.py)
- set up basic spider to fetch from wine.com url
L1 (wine_example/spiders/L1_wine.py)
- Create a spider that returns an item type named 'Wine' containing the fields: 1) the specific product page link, 2) product name, and 3) the current sell price. Only do this for the first page of 25 wine products at www.wine.com/v6/wineshop
L2 (wine_example/spiders/L2_wine_meta.py)
- Add to the Wine item the following fields: 1) wine type and 2) region.
L3 (wine_example/spiders/L3_wine_pagination.py)
- Teach your spider to crawl through all product pages to gather all 5000+ products
Wine_login.py (wine_example/spiders/wine_login.py)
- Create a login authentication aware spider

#####Take-Home Challenge:

L4 (wine_example/spiders/L4_wine_reviews.py)
- Complete this part on your own. Teach your spider to crawl one more page level deep to scrape all ratings and reviews for each product. Good luck and have fun!

Development Environment Setup Instructions

For those who do not have pip installed:

curl -O https://bootstrap.pypa.io/get-pip.py
sudo python get-pip.py # writes to system Python

Install & activate virtualenv

sudo pip install virtualenv # writes to system Python
virtualenv scrapy_learn # isolated from system Python
source scrapy_learn/bin/activate

Install Scrapy & Dependencies

pip install wheel
pip install scrapy

You will also need Chrome

Additional Resources

Scrapy Documentation
- http://doc.scrapy.org/en/0.24/
CSS Selectors
- http://www.w3.org/TR/CSS2/selector.html
- http://code.tutsplus.com/tutorials/the-30-css-selectors-you-must-memorize--net-16048
XPath
- http://zvon.org/comp/r/tut-XPath_1.html
Regex
- https://docs.python.org/2/library/re.html
Beautiful Soup
- http://www.crummy.com/software/BeautifulSoup/bs4/doc

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
wine_example		wine_example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapy Workshop Demo Spiders

Development Environment Setup Instructions

Additional Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scrapy Workshop Demo Spiders

Development Environment Setup Instructions

Additional Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages