Using Scrapy to get Linkedin's person public profile.
- Get all public profile
- Using Scrapy
- Enable auto throttle
- Enable naive proxy providing
- Agent rotating
- Support Unicode
- Using MongoDB as Backend
- ...
- Improve speed
- Improve availablity
- add ajax load support
- more complex proxy providing algorithm
- Scrapy == 0.16
- pymongo
- BeautifulSoup, UnicodeDammit
1. start a MongoDB instance, `mongod`
2. run the crawler, `scrapy crawl LinkedinSpider`
you may found Rakefile
you can change MongoDB setting ang other things in
if you just need whatever public profiles, there are better ways to do it. check out these urls:[a-z].html
Our strategy is following also-view
links in public profile.