Squirrel - Crawler of linked data

Introduction

Squirrel is a crawler for the linked web. It provides several tools to search and collect data from the heterogeneous content of the linked web.

or

Clone the repository in a directory of your choice with:

git clone https://github.com/dice-group/Squirrel

Enter into the Squirrel directory and start RabbitMQ and MongoDB containers:

docker-compose up -d mongodb rabbit

Set up your seeds in the file seed/seeds.txt and start the frontier and one worker instance with:

docker-compose up frontier worker1

Go to https://master.project-hobbit.eu/
Register an account or log in into an existing one
Go to "Benchmarks"
Select "ORCA" in the Benchmark list
Select the system and set all parameters (also can be found by following links in the paper):

Parameter	Effectiveness	Efficiency
Average crawl delay	0	0
Average node degree	20	20
Average ratio of disallowed resources	0	0
Average resource degree	9	9
Disallowed resources	0	0
Dump file compression ratio	0.3	0
Node size definition	Static	Static
Number of nodes	100	200
RDF dataset size	1000	1000
Seed	20200318	20200318
Use N3 dumps	true	true
Use NT dumps	true	true
Use RDF/XML dumps	true	true
Use TTL dumps	true	true
Weight of CKAN node occurrence	5	0
Weight of dereferencing HTTP node occurrence	21	100
Weight of HTTP dump file node occurrence	40	0
Weight of RDFa node occurrence	4	0
Weight of SPARQL node occurrence	30	0

Use "Submit" to queue the experiment
Watch the received link for experiment results. You can use "Experiments → Experiment Status" page to check if it's still running.

It is also possible to deploy your own HOBBIT platform. Refer to the HOBBIT platform manual: https://hobbit-project.github.io/. In this case you may need system adapters for ORCA as well: https://github.com/topics/orca-system-adapter.

Name		Name	Last commit message	Last commit date
Latest commit History 1,021 Commits
data		data
deployment		deployment
docs		docs
scripts		scripts
seed		seed
spring-config		spring-config
squirrel.api		squirrel.api
squirrel.deduplication		squirrel.deduplication
squirrel.frontier		squirrel.frontier
squirrel.mockup		squirrel.mockup
squirrel.reports		squirrel.reports
squirrel.web-api		squirrel.web-api
squirrel.web		squirrel.web
squirrel.worker		squirrel.worker
src		src
whitelist		whitelist
yaml		yaml
.editorconfig		.editorconfig
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
Dockerfile.frontier		Dockerfile.frontier
Dockerfile.mockup		Dockerfile.mockup
Dockerfile.web		Dockerfile.web
Dockerfile.worker		Dockerfile.worker
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build-squirrel		build-squirrel
docker-compose-sparql.yml		docker-compose-sparql.yml
docker-compose-web.yml		docker-compose-web.yml
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
foundUris.lobs		foundUris.lobs
my-rethinkdb.pp		my-rethinkdb.pp
my-rethinkdb.te		my-rethinkdb.te
pom.xml		pom.xml
virtuoso-server.sh		virtuoso-server.sh