GitHub - WorkShoft/scrapnik: Scraping project with Django 3.0 and Scrapy

This project uses the Django web framework and the Scrapy framework to show how scraping can be automated by using periodic tasks with Celery tasks and workers, and how easy it is to integrate Scrapy in a Django project.

The scraped sites are furniture catalogs of sites such as Carrefour's and Ikea's. These sites are interesting because they have pagination and complex elements.

Some of the interesting features are:

A custom Django command that runs a specific spider
A custom Django command that runs all spiders
Usage of Cloudflare-scrapy to scrape Cloudflare-protected sites
Usage of Scrapy-DjangoItem to create a pipeline that deserializes the Scrapy data to Django objects, validates them and saves them to the Django project's database

This is the complete stack:

Docker with Docker Compose
Python 3.7
PostgreSQL
Django
Celery
RabbitMQ
Scrapy

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.idea		.idea
logs		logs
scraper		scraper
scrapnik		scrapnik
scrapy_examples		scrapy_examples
tables		tables
task_runner		task_runner
.gitignore		.gitignore
Dockerfile		Dockerfile
README.MD		README.MD
__init__.py		__init__.py
celerybeat-schedule		celerybeat-schedule
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
init.sh		init.sh
local.env		local.env
manage.py		manage.py
python3.7		python3.7
requirements-local.txt		requirements-local.txt
test_celery_spiders.json		test_celery_spiders.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

WorkShoft/scrapnik

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages