将自动爬虫的结果判断是否属于hooks,并不断抓取url爬啊爬。
Switch branches/tags
Nothing to show
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib update KeyboardInterrupt stop Jun 2, 2017
LICENSE update May 24, 2017
README.md update readme May 24, 2017
hooks.txt update May 24, 2017
main.py update KeyboardInterrupt stop Jun 2, 2017
requirements.txt update May 24, 2017

README.md

AutoHookSpider

将自动爬虫的结果判断是否属于hooks,属于则入库,并不断抓取url爬啊爬。

AutoHookSpider
├── LICENSE
├── README.md
├── hooks.txt   #hooks字典,随机放了200个,可以自己收集。
├── lib
│   ├── __init__.py
│   ├── common.py   #琐碎功能
│   └── record.sql  #先在Mysql创建这个表,并改下common.py数据库连接
├── main.py #主程序
└── requirements.txt
  1. sudo pip install -r requirements.txt
  2. lib/record.sql into mysql
  3. usage: python main.py {Options} [ google.com,twitter.com,facebook.com | -t 20 ]
  4. 或者直接python main.py会直接在hooks.txt抽取(thread_cnt)个入口域名。