Skip to content

calebalem/pin-crawler

Repository files navigation

Pinterest-scraper

Tool Description

This tool can be used to make image scrapping from Pinterest, the tool have 4 stages:

Stage 1 - Board Search

Given a `search term` the crawler searches for boards using this term and stores the collected board links into a `sqlite` database to be used for collecting the pin urls in the second stage.

Stage 2 - Board Url Scraping

Given the board urls stored in the database from `stage 1` the crawler go through those stored links and collects the pins links and stores them in the `sqlite` database to be used in scraping and downloading the pins images in `stage 4`.

Stage 3 - Get Unique Pins

Before going to `stage 4` this stage is just simply excludes any duplicated pin urls so that in the fourth and the last stage, only the unique pins are being downloaded.

Stage 4 - Download Images

Given the pin urls stored from `stage 2` and after the duplicated urls being excluded in `stage 3` this last stage is going through those pin links and downloading the images inside those pins, then compresses those downloaded images and uploading them to `Mega Upload`.

Requirements

This tool uses Chrome web driver ,afterwards run the following command to install the required dependencies for the tool.

pip install -r ./requirements.txt

Example Usages

  • This command will execute all the 4 stages searching for bears images
python ./PinterestScraper.py --search_term='bears'
  • This command will execute only the 3rd & 4th stages and will use the stored url links stored in the sqlite database
python ./PinterestScraper.py --stages_to_execute=[3,4]

CLI Arguments and Options

  • search_term [string] - [optional] - If stage 1 was chosen to be executed then it should be a valid string to search boards with that provided string or else the tool will raise error.

  • stages_to_execute [list[int]] - [optional] - a list containing the number of stages required to be executed, default is a list containing all 4 stages [1,2,3,4]

  • maximum_scrape_theads [int] - [optional] Maximum number of threads used in scraping the pins, default is 2 threads

  • maximum_pin_threads [int] - [optional]_ Maximum number of threads to use when scraping boards, defauult is 2 threads

  • use_proxy [bool] - [optional]_ Weather to use a proxy server when scrapping, default is False

  • board_limit [int] - [optional]_ Maximum number of boards to scrap default is no limit

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages