A crawler for instagram.
A project for crawling Instagram. The crawler logs in to the user account and fetches all the general information of followers // following like:
- Full Name
- Username
- Biography text
- Followers count
- Following count
The dump data is like:
{
"full_name" : "Full Name",
"username" : "username",
"followers" : [ user1_data, user2_data, ...],
"following" : [ user1_data, user2_data, ...]
}
Since Instagram is fully dynamic and its api is sandboxed(limited), Selenium is used to automate the login, click on the followers/following link and extract all the usernames.
Finally, Scrapy is used to crawl all the relevant data for the usernames collected.
It uses python3 along with scrapy and selenium
pip install scrapy
pip install selenium
Run scrapy spider as:
scrapy crawl instaspider
Also, the crawler instaspider.py
requires a path to json file. The JSON format is:
{
"USERNAME" : "your_user_name",
"PASSWORD" : "your password"
}