This is an example of a system that makes requests to URLs from list and stores results in database. System is splitted into producer and consumer, that run separately.
- Producer collects data using
aiohttpand sends it toKafka. It optionally checks page contents for a provided regexp pattern and adds matched text to the result message. - Consumer reads messages from
Kafkaand stores data intoPostgreSQLoverwriting data for existing URLs.
- Store config files for Kafka and PostgreSQL in
~/.kafkaand~/.postgres:$ ls ~/.kafka ca.pem service.cert service.key $ ls ~/.postgres ca.pem password.txt
- Run
python3 setup.py bdist_wheel. It will build.whlbinaries for producer and consumer indist/. Running with--build_package [wsc-consumer, wsc-producer]builds only specified.whl. - Install packages with
pip3 install dist/<package>.whl. - Add
~/.local/bintoPATHand run in separate terminals:$ wsc_consumer
where$ wsc_producer -i input.txt
input.txtis a file without header, containing lines of tab-separated urls and optional corresponding regexp patterns.
python3 -m unittest discover test