Skip to content

crawl products from amazon and save them locally as xml files so it could be further processed

Notifications You must be signed in to change notification settings

beshoyabdelmalak/Amazon-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon-Crawler

crawl all laptops from amazon and save them in xml files in python using Scrapy, scrapy-user-agents, scrapy-rotating-proxies

Install Packages :

Scrapy :

install scrapy using the following command : pip install Scrapy

See the Documentation here : https://doc.scrapy.org/

scrapy-user-agents:

install the package using the following command: pip install scrapy-user-agents

See the Documentation here : https://github.com/hyan15/crawler-demo/tree/master/crawling-basic/scrapy_user_agents

scrapy-rotating-proxies:

install the package using the following command: pip install scrapy-rotating-proxies

See the Documentation here : https://github.com/TeamHG-Memex/scrapy-rotating-proxies

Run the spider:

navigate to Amazon-Crawler/crawler/crawler then run the command : scrapy crawl amazon

Result:

after the spider finishes, you can find the xml files insind Amazon-Crawler/crawler/crawler/product-xml-files and the links of the crawled products in Amazon-Crawler/crawler/crawler/laptops.txt

Scrape other products:

you will have to paste the link of the department you want to crawl inside start_urls array which is located in Amazon-Cralwer/crawler/crawler/spiders/amazon.py

About

crawl products from amazon and save them locally as xml files so it could be further processed

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published