-
Create folder
.secrets
and setup MongoDB, RabbitMQ, and Chrome credentials:cp -r .secrets_example .secrets
-
Start all containers:
docker-compose up -d docker-compose ps
-
Install package:
pip install --editable '.'
-
Launch a process to generate predicates
python -m webly.predicates \ --output amqp \ --amqp-url amqp://user@localhost \ --amqp-pass-file .secrets/rabbitmq_default_pass_file \ data/vrd/predicates.txt
-
Launch one or more of the expander processes:
python -m webly.expander \ --ngrams 4 5 \ --ngrams-dir data/ngrams/processed \ --ngrams-max 2 \ --languages fr it \ --input amqp \ --output amqp \ --amqp-url amqp://user@localhost \ --amqp-pass-file .secrets/rabbitmq_default_pass_file
-
Launch one or more scraper processes:
python -m webly.scraper \ --engines google yahoo flickr \ --chrome-url http://localhost:3000/webdriver \ --chrome-token-file .secrets/chrome_token \ --input amqp \ --output mongo \ --amqp-url amqp://user@localhost \ --amqp-pass-file .secrets/rabbitmq_default_pass_file \ --mongo-url mongodb://user@localhost \ --mongo-pass-file .secrets/mongo_initdb_root_password
-
Kill expander and scraper processes, then stop containers:
docker-compose stop
-
Download images
python -m webly.downloader \ --mongo-url mongodb://user@localhost \ --mongo-pass-file .secrets/mongo_initdb_root_password \ --output-dir images
Monitoring:
Test queries manually (input from stdin):
python -m webly.scraper \
--engines google yahoo flickr \
--chrome-url http://localhost:3000/webdriver \
--chrome-token-file .secrets/chrome_token \
--input text \
--output text