Skip to content

dentropy/discord-binding

Repository files navigation

Discord Binding

The goal of this project is to take the data exported form Tyrrrz/DiscordChatExporter and put it into a relational database so aggregations can be easily calculated and so the data can be used in other parts of an ETL pipeline.

Additional Reference Docs

  • Scraping Discord
    • This page explains how to get your own Discord data to feed into this ETL pipeline
  • Setup Postgres
    • This doc contains instructions to setup and access a local postgres server
  • Setup Postgraphile
    • Postgraphile generate and runs a graphql API from just looking inside a postgres database
  • neo4j Docs
    • Setup neo4j and contains some example queries, including how to reset the database

Transforming the data from DiscordChatExporter

Requirements:

  • S3 Bucket loaded with data from DiscordChatExporter
  • Postgres Database, you can use postgres.dockercompose.yml if you do not have on already setup

Steps:

Setup python virtual environment and install requirements.txt

python3.10 minimum unless you install deps manually

# install pip
curl https://bootstrap.pypa.io/get-pip.py | python3 $1
python3 -m pip install virtualenv
sudo apt install python3-venv # Debian Distros
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Set environment variables using .env file

cp .env_example .env
$EDITOR .env

Update the environment variables under DB Select and S3, the ones below

# DB Select
db_select='postgres'
db_url='psql://$USER:$PASS@$HOSTNAME:$PORT/$DATABASE_NAME'

# S3
aws_access_key_id=''
aws_secret_access_key=''
endpoint_url=''
bucket_name=''

Run ETL pipeline, also remember tmux exists

# Using Bash
source env/bin/activate
python3 run_dag.py &
cat *.log