Scrapy Web Mining

Scarping Mobile phones from Flipkart

Details
Name	Ritvik Gupta
Registration Number	19BCE0397
Assignment	5th - Web Scraping

Web Scraped data includes the following attributes for both phone types:

Image URL - The main photo of the phone
Phone URL - Link to the page for the phone on flipkart for the consumer
Name
Rating - Average rating of the phone by reviewrs
Total Reviews
Price
Colors - Model colors available
Storages - Model storage space available (eg: 64GB)
General Specs - Specifications such as In The Box, SIM Type, Hybrid Sim Slot, Touchscreen, OTG Compatible.

Scrapy is a tool like BeautifulSoup ( bs4 ) for web scraping but unlike the latter provides much more features along with parallel scraping multiple webpages and recursively scraping paginated sites.

Project includes two spider scripts as following:

Scrape a limited amount of Samsung Galaxy Phones, from the first page, and store the scraped data in a JSON format with multiple fields in a nested structure
Scrape recursively through all iPhones from all 15 pages ( starting from first page ) present on flipkart for different models. Each paginated page would call its "Next" page and follow the links to the end. Scraped data is stored in CSV format and cannot have nested structure so the "General Specs" is flattened out.

Details about mining each individual component during the scraping process can be found and followed in detail with comments specified

Tools Used

Main and only tool used is Scrapy for Python ( following the tutorial ).

Generating and Running Spiders

To genrate the two spiders the command used is

scrapy genspider <spider-name> <main-url-used>

Note: Spider Names need to be unique to identify the spiders In our case they are flipkart_iphones and flipkart_galaxys

To run a specific spider

scrapy crawl <spider-name-provided> -O <output-file>.<csv|json>

Note: The flag -O overwrites any previous content and -o appends.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
scrapy_tutorial		scrapy_tutorial
.gitignore		.gitignore
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scrapy Web Mining

Scarping Mobile phones from Flipkart

Project includes two spider scripts as following:

Tools Used

Generating and Running Spiders

About

Uh oh!

Releases

Packages

Languages

Ritvik-Gupta/scrapy_tutorial

Folders and files

Latest commit

History

Repository files navigation

Scrapy Web Mining

Scarping Mobile phones from Flipkart

Project includes two spider scripts as following:

Tools Used

Generating and Running Spiders

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages