👾 gdq-collector

Data Collection Utilities for GDQStatus

Explanation

gdq-collector is an amalgamation of services and utilities designed to be a serverless-ish backend for gdq-stats. There are 3 distinct components of the project:

gdq_collector (the python module) - Python scraping module designed to be run constantly on a compute platform like EC2 which updates a Postgres database with new timeseries and GDQ schedule data.
lambda_suite - Lambda application that caches the Postgres database to a JSON file in S3 (to reduce the load on the Database). Also includes a simple API to query recent timeseries data that doesn't appear in the cached JSON. The Lambda Suite has 3 stages (three separate configurations that are deployed independently):
- The API stage (dev/prod) - Serves recent data to a publicly facing REST endpoing
- The Caching stage (cache_databases) - Queries Postgres database and stores query results in S3 as a cache
- The Monitoring stage (monitoring) - Queries the API stage to do periodic health checks on the system

gdq_collector uses APScheduler its schedule and execute the scraping / refreshing tasks.

The Lambda applications use Zappa for deployment.

Architecture Diagram

Building / Running

gdq_collector

Note: If you're running this on an Ubuntu EC2 instance, bootstrap_aws.sh will be more useful then the following bullet list for specific setup.

Clone the repo and cd into the root project directory.
Pull down the dependencies with pip install -r requirements.txt --user
- You may wish to run aws/install.sh, as there will be necessary system dependencies to install some of the python packages.
Copy credentials_template.py to credentials.py. Fill in your credentials for Twitch, your Postgres server, and Twitch.
- You'll need to register a new Twitch application to get your clientid.
- You'll want to use this site to generate an oauth code for Twitch.
Ensure your Postgres server is running and that your credentials are valid. Create the necessary tables by executing the SQL commands in schema.sql
Run python -m gdq_collector to start the collector.
- You can run python -m gdq_collector --help to learn about the optional command line args.

lambda_suite

Clone the repo and cd into the root project directory.
Pull down the dependencies with pip install -r requirements.txt --user
- You may wish to run aws/install.sh, as there will be necessary system dependencies to install some of the python packages.
Copy credentials_template.py to credentials.py. Fill in your credentials for your Postgres database. Add the ARN of the monitoring SNS topic to sns_arn if you want to use the lambda functions to send you notifications when monitoring alarms occur.
Update zappa_settings.json to fit your AWS configuration. Of particular note is that you'll need to update your vpc_config. You'll also need to make an S3 bucket that matches your S3_CACHE_BUCKET config.
Run zappa deploy dev to deploy the application and schedule the caching operations.
Run zappa deploy cache_databases to deploy the lambdas that cache the Postgres data to JSON blobs in S3.
Run zappa deploy monitoring to deploy the lambdas that check the output of the APIs to detect problems with the collector.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
aws		aws
gdq_collector		gdq_collector
lambda_suite		lambda_suite
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
aws_diagram.png		aws_diagram.png
bootstrap_aws.sh		bootstrap_aws.sh
docker-compose.yaml		docker-compose.yaml
postgres-settings.env.template		postgres-settings.env.template
postgres.Dockerfile		postgres.Dockerfile
schema.sh		schema.sh
startup.sh		startup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👾 gdq-collector

Explanation

Architecture Diagram

Building / Running

gdq_collector

lambda_suite

About

Releases

Packages

Languages

License

bcongdon/gdq-collector

Folders and files

Latest commit

History

Repository files navigation

👾 gdq-collector

Explanation

Architecture Diagram

Building / Running

gdq_collector

lambda_suite

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages