Scraper API

is a modular web scraper which performs analysis on the data being collected. The API itself is deployed in heroku. The API is being used by the Scraper Android and Scraper web applications that do not perform extra modifications on the data. They are just UI wrappers that use the API.

The current data is being collected from the CNN, BBC, Google News, Reuters, Tech Crunch, Medium, The Morning Brew rss feeds, the New York Times,The Verge, Techradar, Open Weather API, AirTube and two public utility services handled by clean html scraping. tf-idf over n-grams with small weight modifications is used on the data scraped as well as the RAKE algorithm for bigger blocks of text with the same context and Flesch–Kincaid. Text summarisation is performed on the whole news pieces via Text-Rank

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
caches		caches
src		src
.babelrc		.babelrc
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
copy.js		copy.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caches

caches

src

src

.babelrc

.babelrc

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

copy.js

copy.js

package-lock.json

package-lock.json

package.json

package.json

tsconfig.json

tsconfig.json

Repository files navigation

Scraper API

About

Releases

Packages

Languages

License

AlexanderAntov/scraper-js

Folders and files

Latest commit

History

Repository files navigation

Scraper API

About

Resources

License

Stars

Watchers

Forks

Languages