Skip to content

altuntasmuhammet/eksisozluk-scraper

Repository files navigation

eksisozluk-scraper

eksisozluk-scraper

Complete eksi-sozluk scraper management system

Requirements

Docker 19.03.6+ Compose 1.24.1+

Setup

  1. To make system run
docker-compose up
  1. Create super user in web app
docker exec -it web sh
python manage.py createsuper

Deployment

docker-compose -f docker-compose.yml -f docker-compose.prod.yml up

Creating scheduled task

  1. Enter link given below http://localhost:8000/admin/django_celery_beat/periodictask/

  2. Click add button image

  3. Named scheduled task and select one of task(registered) image

  4. Create schedule (Only one should be filled) image

  5. Extend Arguments title and fill Keyworded arguments as given below

{"keywords": [["apache kafka", 5], ["winter wine", 2]]}

Note: First argument is keyword, second argument is number of pages to be parsed from end

  1. Click save button

Seeing scraped entries

Enter link given below http://localhost:8000/admin/scraper/entry/ image

Monitoring tasks

http://localhost:8888 image image

Architecture

image

TO dos

  1. REST API for obtaining scraped data programmatically
  2. React or VueJs frontend for monitoring data
  3. Code edit to achive Remote database option

Notes

I think of creating a scraper managework framework which extends this idea in the project for scraping any other websites or datasource. Please reach me out if you are interested in this kind of project.