Trawler

A job scheduler and analysis tool for webscraping (and other) tasks.

Datasources

Curently the following datasources are implemented:

tiktok get video metadata per hashtag, download them and analyse the text using easyOCR
gab (nazi-twitter) crawl posts for user
onionlist download tor-catalogue from onionlist.org
google dorking fint interesting files and download them
facebook posts and reactions scrape facebook posts, comments and reactions (like, heart, etc)

Can be distributed (workers and c&c on different locations/servers)
Jobs are managed through json files (and can be distrubuted with an adapter like pouchDB)
Multithreaded

Install using docker-compose by running:

docker-compose up

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
datasources		datasources
models		models
public		public
src		src
tests		tests
utils		utils
.env.template		.env.template
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
api.js		api.js
babel.config.js		babel.config.js
ecosystem.config.js		ecosystem.config.js
package-lock.json		package-lock.json
package.json		package.json
start.sh		start.sh
vue.config.js		vue.config.js
worker.js		worker.js