A web crawler/scraper to find the broken links in the targeted seed url based on the keywords matched around that broken links.
Python CSS JavaScript
Switch branches/tags
Clone or download
kodekracker fix(absolute_url) : fix absolute url method and some other fixes
- edit README.md
- apply format() type in logging message
Latest commit a172dc7 Jul 31, 2014

README.md

Rotto-Links-Scraper

A web crawler/scraper to find the broken links in the targeted seed url based on the keywords matched in the broken links contained page .

##Installation

  1. Redis
  2. Fabric
  3. Python 2.7+

##Instructions

  1. First install all dependencies listed in requirements.txt using pip package manager :
    $ pip install -r requirements.txt
  1. Set the DATABASE_PATH environment variables (i.e SMTP_USER, SMTP_PASSWORD) in your shell config file(i.e .bashrc , .zshrc or etc)
    # your shell config file
    export DATABASE_PATH='/path/to/database/'
  1. Also, set the two more environment variables required for SMTP Server for sending email to users in your shell config file.
    # your shell config file
    export SMTP_USER='smtp-username'
    export SMTP_PASSWORD='smtp-password'
  1. Also, set the one more environmnet variable to save Logs of the app in defined location.
    # your shell config file
    export LOGS_DIR='path/to/logs'

##Commands Note:- First install Fabric to run below commands

To run a gui app :

    $ fab app

To run a dispatcher :

    $ fab dispatcher

To run a worker :

    $ fab worker

##Developer

  1. Akshay Pratap Singh
  2. Sunny Gupta