is a modular web scraper which performs analysis on the data being collected. The API itself is deployed in heroku. The API is being used by the Scraper Android and Scraper web applications that do not perform extra modifications on the data. They are just UI wrappers that use the API.
The current data is being collected from the CNN, BBC, Google News, Reuters, Tech Crunch, Medium, The Morning Brew rss feeds, the New York Times,The Verge, Techradar, Open Weather API, AirTube and two public utility services handled by clean html scraping. tf-idf over n-grams with small weight modifications is used on the data scraped as well as the RAKE algorithm for bigger blocks of text with the same context and Flesch–Kincaid. Text summarisation is performed on the whole news pieces via Text-Rank