This is a repository for a new ruby gem called omni_scrape (your all-purpose web scraper)
This gem is meant to be an all purpose web crawler and scraper. My current focus for its development will be as follows.
-
Successfully follow links from first page and scrape. -done
-
Store scraped information as html docs. -done
-
Handle bad links when scraping. -done
-
Allow for a partial-url to be passed. -done
-
Handle both internal and external links. -done
-
Scrape main page and replace links to redirect to generated html then store as html. -done
-
Create file structure for storing the html. -done
-
Provide recursive depth implementation for more specified crawling. -done
-
Link page to local html docs as they are scraped and stored. -done
-
Manage duplication of stored documents. -done
-
Provide methods for explicitly internal and explicitly external links. -done
-
Provide method for just scraping initial page. -done (just pass 0)
-
Provide a method for passing a css accessor for all pages. -done
MORE COMING SOON! ...