Skip to content

AlexanderAntov/scraper-js

Repository files navigation

Scraper API

is a modular web scraper which performs analysis on the data being collected. The API itself is deployed in heroku. The API is being used by the Scraper Android and Scraper web applications that do not perform extra modifications on the data. They are just UI wrappers that use the API.

The current data is being collected from the CNN, BBC, Google News, Reuters, Tech Crunch, Medium, The Morning Brew rss feeds, the New York Times,The Verge, Techradar, Open Weather API, AirTube and two public utility services handled by clean html scraping. tf-idf over n-grams with small weight modifications is used on the data scraped as well as the RAKE algorithm for bigger blocks of text with the same context and Flesch–Kincaid. Text summarisation is performed on the whole news pieces via Text-Rank

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published