GitHub

How to run quotes crawler? (basic scrappy task)

Fetches quotes from quotes.toscrape.com

cd tutorial
rm quotes.jl; scrapy crawl quotes -o quotes.jl

How to run author crawler? (basic scrappy task)

Fetches quotes' author bio from quotes.toscrape.com

cd tutorial
rm authors.jl; scrapy crawl author -o authors.jl

How to run github emails crawler (scrappy session task)?

You need to enter your username and password in spiders/github_emails.py. It will output your emails you listed in your github

cd tutorial
rm emails.jl; scrapy crawl emails -o emails.jl

How to run js_track_scrap crawler (scrappy js tracking and replicating task)?

It finds all js files referred by script tag in a given url and download them

cd tutorial
scrapy crawl jstrackscrap

How to run github_repos crawler (post request, session management and recovery task)

Github-repos spider log in to Github and fetches all of user's repo links. Then it have to go to each repo's setting and fetches its name. But, the twist is that we will break session(remove cookies) with some random probability before accessing repo's setting (thus creating an effect that the website break our session due to some reason). The crawler should know that session is broken and should restablish it

Note: Insert your github username and password at line#46 of spiders/github-repos.py

clear; rm repo_description.jl; scrapy crawl github-repos -o repo_description.jl

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.vscode		.vscode
tutorial		tutorial
.gitignore		.gitignore
README.md		README.md
authors.jl		authors.jl
quotes.jl		quotes.jl
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

How to run quotes crawler? (basic scrappy task)

How to run author crawler? (basic scrappy task)

How to run github emails crawler (scrappy session task)?

How to run js_track_scrap crawler (scrappy js tracking and replicating task)?

How to run github_repos crawler (post request, session management and recovery task)

About

Releases

Packages

Languages

ahmedbilal/scrapy_tutorial

Folders and files

Latest commit

History

Repository files navigation

How to run quotes crawler? (basic scrappy task)

How to run author crawler? (basic scrappy task)

How to run github emails crawler (scrappy session task)?

How to run js_track_scrap crawler (scrappy js tracking and replicating task)?

How to run github_repos crawler (post request, session management and recovery task)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages