Install the python env using pipenv
.
pipenv install
Activate the environment.
source .venv/bin/activate
Start scrapy splash (the javascript rendering engine)
docker-compose up
(spider)fiete@ubu:~/Documents/studium/bdp/spider/products$ scrapy crawl products_spider
- save screenshots of the rendered page
- save the complete html source
- save the extracted text from the html source
- [stretch] save statistics for structure of DOM