Skip to content

collab-uniba/PH_miner

Repository files navigation

PH_miner

A ProductHunt.com miner in Python3.

Installation

Execute the following commands:

$ git clone https://github.com/collab-uniba/PH_miner.git
$ git submodule init
$ git submodule update

Setup

  1. Register two apps using the dashboard, PH_miner and PH_updater.

  2. For the first app, in the root folder, create the file credentials_miner.yml with the following structure:

api:
  key: CLIENT_KEY
  secret: CLIENT_SECRET
  redirect_uri: APP_REDIRECT_URI
  dev_token: DEVELOPER_TOKEN
  1. For the second app, follow the same steps as above to create the file credentials_updater.yml.

  2. Create the folder db/cfg/, then create therein the file dbsetup.yml to setup the connection to the MySQL database:

mysql:
    host: 127.0.0.1
    user: root
    passwd: *******
    db: producthunt
    recycle: 3600

NOTE: If you're using a MySQL database, the default parameter pool_recycle for resetting the database connection is fine, since the wait_timeout is set to 28800 by default. But, if you're using Maria DB, then wait_timeout is set by default to 600 seconds. Edit the my.cnf file and change it to anything larger than the value chosen for pool_recycle.

  1. Install packages via pip:
$ pip install -r requirements.txt
  1. Enable execution via crontab:
$ crontab -e

Add the following lines. Make sure to enter the correct path.

SHELL=bash
# New products are uploaded at 12.01 PST (just past midnight, 9am next morning in CET timezone):
# minute hour day-of-month month day-of-week command
    35     8       *          *       *       /path/.../to/PH_miner/cronjob.sh /var/log/ph_miner.log 2>&1
    05    20       *          *       *       /path/.../to/PH_miner/cronjob.sh --update -c credentials_updater.yml >> /var/log/ph_miner_updates.log 2>&1
    */30   *       *          *       *       /path/.../to/PH_miner/cronjob.sh --newest -c credentials_updater.yml >> /var/log/ph_miner.log 2>&1
  1. Enable the rotation of the log files:
$ sudo ln -s /fullpath/to/../ph_miner.logrotate /etc/logrotate.d/ph_miner 
  1. Install Chromium browser and the chromedriver

This step depends on the OS. On Ubuntu boxes, run:

$ sudo apt-get install chromium-browser chromium-chromedriver
$ sudo ln -s /usr/lib/chromium-browser/chromedriver /usr/bin/chromedriver

Resources & Libraries

  • Product Hunt API
  • ph_py - ProductHunt.com API wrapper in Python
  • Scrapy - A scraping and web-crawling framework
  • Selenium - A suite of tools for automating web browsers
  • ChromeDriver - Tool to connect to Chromium web browser
  • Beautiful Soup 4 - HTML parser

License

The project is licensed under the MIT license.