Skip to content

bmaynard1991/omni-scrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

omni-scrape

This is a repository for a new ruby gem called omni_scrape (your all-purpose web scraper)

description

This gem is meant to be an all purpose web crawler and scraper. My current focus for its development will be as follows.

  1. Successfully follow links from first page and scrape. -done

  2. Store scraped information as html docs. -done

  3. Handle bad links when scraping. -done

  4. Allow for a partial-url to be passed. -done

  5. Handle both internal and external links. -done

  6. Scrape main page and replace links to redirect to generated html then store as html. -done

  7. Create file structure for storing the html. -done

  8. Provide recursive depth implementation for more specified crawling. -done

  9. Link page to local html docs as they are scraped and stored. -done

  10. Manage duplication of stored documents. -done

  11. Provide methods for explicitly internal and explicitly external links. -done

  12. Provide method for just scraping initial page. -done (just pass 0)

  13. Provide a method for passing a css accessor for all pages. -done

MORE COMING SOON! ...

About

This is a repository for a new ruby gem called omni_scrape

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages