multithreaded_crawler

A condensed crawler framework of “multithreaded model”

dependency

At present, the framework depends on nothing except for modules in the python standard libraries.

Usage

cd threaded_spider
python run.py --help

You will see a demo output by python run.py, it crawls the sina.com.cn using five threads and has the crawling depth limited to be 2 by default (It's tested in python2.7).
In threaded_spider directory, there are extra log files whose name like “spider.*.log” respectively generated using python run.py --thread=* command.

Community

QQ Group: 4704309
Your contribute will be welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
threaded_spider		threaded_spider
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

multithreaded_crawler

dependency

Usage

Community

About

Releases

Packages

Languages

License

aware-why/multithreaded_crawler

Folders and files

Latest commit

History

Repository files navigation

multithreaded_crawler

dependency

Usage

Community

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages