Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
No description, website, or topics provided.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
L2 - 2019
- Docker - Dockerfile, docker-compose, containers in general
- Python - pip, requirements
- Task queue
- Investigate existing codebase:
- why we use a broker
- why there is no broker URL defined in code
- how the broker URL is build (what is guest etc.)
- change RabitMQ logs to appropriate severity (warning)
- do tasks need to return results?
- can we schedule periodical tasks?
- why the worker is logging twice? can we fix that?
- why we can see celery errors at the beginning?
- what is the context of docker image building process defined in docker-compose file?
- can we somehow exclude some files from docker image building context?
- Decide if you are using an API approach or scrapping based approach. Create fine-grained tasks for everything.
- Scrapping approach
- Implement Reddit submission URL provider that will create appropriate tasks
- Implement submission scrapper that utilizes the provided URL and fetch submission data:
- check further tasks to know what date you will need
- API approach
- Get credentials
- Select client lib
- Create code that will create tasks required for submission fetching
- Create code that will consume tasks and fetch submissions
- Take care of new submissions fetching. Add appropriate task scheduling, that will fetch new submissions. (How to check if the submission is new? Utilize current time, submission creation time, schedule interval)
- Add process monitoring:
- Utilize Prometheus or InfluxDB (add them to docker-compose, remember about volumes for data persistency)
- Publish some metrics about the data fetching process (submission counts, lengths, properties distributions, timings etc.) at least:
- submission fetch times (avg, histogram)
- 2 counters
- 2 distributions (histogram)
- publish general celery metrics (you can use a library)
- Visualize metrics using Grafana (add it to docker-compose, remember about volume for dashboard persistency)