Skip to content

BoroDevMeetup/webcrawler-ml-2018

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

This is an open source project created from 'BoroDev Meetup and future meetup(Hack Night).


Here are a few TO-DOs before we get started: [0] Write up goal for projects [1] Write up Features list [2] Figure out some form of MVP [3] Steps to get to milestone [4] Extra features not included in MVP

https://www.meetup.com/BoroDev/

Pre-Reqs

  1. Clone the repo
  2. Install the lastest version python3 (from Python or brew install)
  3. Follow the steps here to install pipenv here

To Test

  1. CD into the repo
  2. Unzip test-site.zip
  3. Setup local webserver using:
    python3 -m http.server
  4. Setup the dependecies
    pipenv install
  5. Run for pre-configured:
    pipenv run python3 crawler.py
    or
    pipenv run python3 . <url-entry-point> --max_pages <number-of-pages> --restrict_domain <bool>
    • where url-entry-point is a URL (e.g. http://localhost:8000/test-site/a.html), --max_pages is an optional int, and --restrict_domain is an optional bool
    • example: pipenv run python3 . http://www.google.com --max_pages 3 --restrict_domain True

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages