Skip to content
Crawler base script using python and selenium. Examples for reddit.com and instagram.com.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
crawler
data
docker
.gitignore
LICENSE
README.md
auth.json.example
crawl.py
docker-compose.yml
local.env.example
requirements.txt
util.py

README.md

crawler base

Requirements

  • python 3.X
  • docker-ce
  • docker-compose
  • vnc client(ex. vinagre)

Installation

$ pip install -r requirements.txt

Usage

$ docker-compose up -d

For reddit login feed crawl.

$ echo '{"username": "YOUR_USERNAME", "password": "YOUR_PASSWORD"}' > auth.json
$ python crawl.py -d

at the host machine: open vnc client and connect to localhost:5900 with password secret.

Example

  • RedditCrawler: get item links from user's feed of reddit.com
  • InstagramCrawler: get feed photos and captions from user's feed of instagram.com. May not work anymore.
You can’t perform that action at this time.