omni-scrape

This is a repository for a new ruby gem called omni_scrape (your all-purpose web scraper)

description

This gem is meant to be an all purpose web crawler and scraper. My current focus for its development will be as follows.

Successfully follow links from first page and scrape. -done
Store scraped information as html docs. -done
Handle bad links when scraping. -done
Allow for a partial-url to be passed. -done
Handle both internal and external links. -done
Scrape main page and replace links to redirect to generated html then store as html. -done
Create file structure for storing the html. -done
Provide recursive depth implementation for more specified crawling. -done
Link page to local html docs as they are scraped and stored. -done
Manage duplication of stored documents. -done
Provide methods for explicitly internal and explicitly external links. -done
Provide method for just scraping initial page. -done (just pass 0)
Provide a method for passing a css accessor for all pages. -done

MORE COMING SOON! ...

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
(ruby)README.md		(ruby)README.md
LICENSE		LICENSE
README.md		README.md
omni_scrape.rb		omni_scrape.rb