# Kirby Data Analysis

## Introduction

In April of 2019, I have created a video which is called `Discord "Big" Data`. For this video, I created a Discord Bot with the code name `Kirby` which is able to collect sets of data like every time when a message was sent, a user joined or left the guild, changed their status and much more. All the data was collected, anonymized and then saved to a MongoDB database.

Here, you can find the repository of `Kirby`:  
https://github.com/dev-schueppchen/Kirby

This bot was then set up on my *(mostly german)* [development Discord guild](https://dc.zekro.de) and collected data, until now.

And yes, the bot is still collecting data each day. This notebook directly connects to the kirby database via a publicly avaliable read-only account.  
Here you can see an example query which shows the current count of documents for each collection:

In [15]:
%run -i scripts/setup-mongo.py

import pandas

data = {}
total_count = 0
for c in mongo_db.list_collection_names():
    count = mongo_db[c].count_documents({})
    data[c] = count
    total_count += count
    
data['summ'] = total_count
    
df = pandas.DataFrame({ 'Document Count': data })
df.sort_values(by='Document Count', inplace=True, ascending=False)
df

Unnamed: 0,Document Count
summ,6963547
statuschanges,3245164
status,1691846
statusroles,1691537
messages,317908
reactions,13158
memberchange,2567
voicechannels,1367


## Accessing the Database

To access the data, you can use a publicly available account with the following credentials to connect to the database:

| | |
|---|---|
| Server | `zekro.de` |
| Port | `27071` |
| Database | `kirby` |
| Authorization Database | `kirby` |
| Username | `kirby` |
| Password | `D3v5chupp3n` |

You can use something like [MongoDB Compass](https://www.mongodb.com/products/compass) to get an overview about the collection and data structure.


## Getting Discord Metadata

As you might notice, all objects like channels or roles are stored as Discord snowflake IDs. So you need to recover the actual names of the objects via the Discord API.

For this, you can use the script `scripts/get-discord-data.py` which generates a JSON file at `data/idtable.json` which contaisns all IDs with their clear names.

The problem is, to execute this script you need to pass an API token as `DISCORD_TOKEN` environment variable of a bot application which is connected with the guild. So you will not be able to obtain the data if you don't have a bot on the guild.

But I try to keep the JSON file as up-to-date as possible. ;)

## Video

Here you can find the video I've created.

*Attention, the video is fully in german!*

In [16]:
%%html
<iframe width="900" height="506" src="https://www.youtube.com/embed/mvTeEEeb0jM" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>