# Discord Chat Export For Music

This script is designed to take a .csv file cretaed with the [DiscordChatExporter](https://github.com/Tyrrrz/DiscordChatExporter) by Tyrrrz and do the following:

- Flag all posts that contain a valid URL
    - Have a categorical column that futher identifies as: Spotify, Youtube, Bandcamp
        - Determine if there are other valid categories through experimentation
- Filter to only posts that contain URLs
    - From a design standpoint, this will remove some music talk / recommendation, but we are concerned specifically about things we can listen to
- Flag for posts that contian a single URL
    - This will be the easiest case to deal with
    - Based on the valid categories for a URL type, we can further create columns and store that URL
        - e.g. SpotifyURL, YoutubeURL, etc.
- Flag for posts that contain *multiple* URLs
    - This will be the most complicated data to handle. My current thought is to 'duplicate' that post ID multiple times and have a separate entry within the CSV. Exploding that one post to individual URLs to organize
- We will determine more cases to test for as we explore
- Thought: If there is markdown syntax surrounding the URL (Brackets and Parentheses) then we can pull a 'label' out of it. I imgaine this is a rare case and most people just post raw URLs
- Thought: We could have a 'context' column, which could be a lead / lag. Four columns, 2 before and 2 after, as long as:
    - They are by the same poster
    - Within the same Day
- If there IS an attachment (non-null value in "Attachments") we should have a categorical column for the type of value it is. A .png, an .mp3, a .mov
- Can extract the number of reactions and get a total count. Theoretically this a sense of a 'quality' post, the more reactions, the more attention, the more people 'suggest' this post
- **Will need to handle returns and carriage returns**

# Python Packages

We are utilizing a Virtual Environment (venv), for future Keliff's convenience, we will catalog all the packages used

```python
pip install pandas
pip install pytest
```

# Checklist

- [x] Read in .csv - 2024-10-26
- [x] Understand the shape of the data - 2024-10-26
- [ ] Determine what testing cases we have
- [ ] Setup testing environment

In [3]:
import pandas as pd

music_file = "keliff_music_export_2024-10-25.csv"
df = pd.read_csv(music_file)

df.head()

Unnamed: 0,AuthorID,Author,Date,Content,Attachments,Reactions
0,179026370880339979,keliff,2021-09-23T08:32:10.9160000-05:00,Playlist Link: https://www.youtube.com/playlis...,,
1,179026370880339979,keliff,2021-09-23T08:32:16.3800000-05:00,Pinned a message.,,
2,179026370880339979,keliff,2021-09-23T08:33:01.7830000-05:00,I made this channel in part to remind myself t...,,
3,171070205139746816,huskerfu,2021-09-23T09:01:31.2950000-05:00,You're aiming for things that won't get hit wi...,,
4,68825251546402816,thehearth,2021-09-23T09:16:44.9540000-05:00,oh don't get me started on video game music -F,,


In [24]:
# What are attachments? Does that include images? Seems to always be 'not available'
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('expand_frame_repr', True)
pd.set_option('display.max_colwidth', 1000)
df["Content"].sample(n=10)

295                                                                                                                                                           ah yes my favorite artist "" and their magnum opus ""
271                                                                                                                                                                                    https://youtu.be/M5gGZ9pl_dY
660                                                                                                                                                                       https://youtu.be/HYFW3nwXHVE\n\n:HOTJAMS:
530     Once I've got it properly tagged etc. to my standards :3\n\nI'm not gonna be uploading it to sites like khinsider etc. 'cus I don't feel comfortable doing that, but happy to share it round :keliffTeeHee:
1177                                                                                          just had the urge to play this an' dance to it in my offic

In [52]:
import json

test = json.loads(
    """
    {
    "messages": [
    {
      "id": "1064973902150897706",
      "type": "Default",
      "timestamp": "2023-01-17T12:26:04.531-06:00",
      "timestampEdited": null,
      "callEndedTimestamp": null,
      "isPinned": false,
      "content": "(mild epilepsy warning) https://www.youtube.com/watch?v=V-uTOWL8X0w",
      "author": {
        "id": "204793922424274944",
        "name": "goldenpelt",
        "discriminator": "0000",
        "nickname": "Goldenpelt",
        "color": "#FD7875",
        "isBot": false,
        "roles": [
          {
            "id": "893948505998106684",
            "name": "Roo Crew",
            "color": "#FD7875",
            "position": 9
          },
          {
            "id": "980964075464978482",
            "name": "Twitch Subscriber",
            "color": null,
            "position": 6
          },
          {
            "id": "980964075464978483",
            "name": "Twitch Subscriber: Tier 1",
            "color": null,
            "position": 5
          },
          {
            "id": "1128812376226013224",
            "name": "Nardo Enjoyers",
            "color": "#F1C40F",
            "position": 2
          }
        ],
        "avatarUrl": "https://cdn.discordapp.com/avatars/204793922424274944/8e126388ac194196fd5136f91617e5ba.png?size=512"
      },
      "attachments": [],
      "embeds": [
        {
          "title": "Anonymouz - River (ヴィンランド・サガ [VINLAND SAGA] SEASON 2 OPテーマ)",
          "url": "https://www.youtube.com/watch?v=V-uTOWL8X0w",
          "timestamp": null,
          "description": "Anonymouz - River\nStream Now!▶https://Anonymouz.lnk.to/River\n*English subtitles available♡\n*日本語(佐賀弁SAGA)字幕もご覧ください♡\n\nTVアニメ「ヴィンランド・サガ」\nVINLAND SAGA SEASON 2 オープニング・テーマ\nOrder your package here▶https://www.amazon.co.jp/River-%E3%83%A1%E3%82%AC%E3%82%B8%E3%83%A3%E3%82%B1%E4%BB%98-Anonymouz/dp/B0BRZWB729/ref=sr_1_2?__mk_ja_JP=%E3%82%AB%E3%82%BF%E3%82%...",
          "color": "#FF0000",
          "author": {
            "name": "Anonymouz アノニムーズ",
            "url": "https://www.youtube.com/channel/UCe0h32Y4Po84iLubCviKRug"
          },
          "thumbnail": {
            "url": "https://images-ext-1.discordapp.net/external/W9_EMSzbmQBnx1oazzm4XA5QNrMfZQdW_ORNZtdH2wc/https/i.ytimg.com/vi/V-uTOWL8X0w/sddefault.jpg",
            "width": 640,
            "height": 480
          },
          "video": {
            "url": "https://www.youtube.com/embed/V-uTOWL8X0w",
            "width": 960,
            "height": 720
          },
          "images": [],
          "fields": []
        }
      ],
      "stickers": [],
      "reactions": [],
      "mentions": []
    }]
    }""",strict=False)

#pd.json_normalize(test["messages"],sep='_')

# Pull out Embed information w/ a key to join on 
temp_df = pd.json_normalize(test["messages"],record_path="embeds",meta=["id"],sep='_')
print(temp_df.columns)
temp_df.add_prefix("embed_")

Index(['title', 'url', 'timestamp', 'description', 'color', 'images', 'fields',
       'author_name', 'author_url', 'thumbnail_url', 'thumbnail_width',
       'thumbnail_height', 'video_url', 'video_width', 'video_height', 'id'],
      dtype='object')


Unnamed: 0,embed_title,embed_url,embed_timestamp,embed_description,embed_color,embed_images,embed_fields,embed_author_name,embed_author_url,embed_thumbnail_url,embed_thumbnail_width,embed_thumbnail_height,embed_video_url,embed_video_width,embed_video_height,embed_id
0,Anonymouz - River (ヴィンランド・サガ [VINLAND SAGA] SEASON 2 OPテーマ),https://www.youtube.com/watch?v=V-uTOWL8X0w,,Anonymouz - River\nStream Now!▶https://Anonymouz.lnk.to/River\n*English subtitles available♡\n*日本語(佐賀弁SAGA)字幕もご覧ください♡\n\nTVアニメ「ヴィンランド・サガ」\nVINLAND SAGA SEASON 2 オープニング・テーマ\nOrder your package here▶https://www.amazon.co.jp/River-%E3%83%A1%E3%82%AC%E3%82%B8%E3%83%A3%E3%82%B1%E4%BB%98-Anonymouz/dp/B0BRZWB729/ref=sr_1_2?__mk_ja_JP=%E3%82%AB%E3%82%BF%E3%82%...,#FF0000,[],[],Anonymouz アノニムーズ,https://www.youtube.com/channel/UCe0h32Y4Po84iLubCviKRug,https://images-ext-1.discordapp.net/external/W9_EMSzbmQBnx1oazzm4XA5QNrMfZQdW_ORNZtdH2wc/https/i.ytimg.com/vi/V-uTOWL8X0w/sddefault.jpg,640,480,https://www.youtube.com/embed/V-uTOWL8X0w,960,720,1064973902150897706


In [62]:
attachment_example = json.loads(
    """
    {
    "messages": [
    {
      "id": "1091115849500336229",
      "type": "Default",
      "timestamp": "2023-03-30T16:44:50.473-05:00",
      "timestampEdited": null,
      "callEndedTimestamp": null,
      "isPinned": false,
      "content": ":lgwWoah:",
      "author": {
        "id": "185504812874465280",
        "name": "deltawhiskey",
        "discriminator": "0000",
        "nickname": "Delta Whiskey",
        "color": "#FD7875",
        "isBot": false,
        "roles": [
          {
            "id": "893948505998106684",
            "name": "Roo Crew",
            "color": "#FD7875",
            "position": 9
          },
          {
            "id": "980964075464978482",
            "name": "Twitch Subscriber",
            "color": null,
            "position": 6
          },
          {
            "id": "980964075464978483",
            "name": "Twitch Subscriber: Tier 1",
            "color": null,
            "position": 5
          },
          {
            "id": "1128812376226013224",
            "name": "Nardo Enjoyers",
            "color": "#F1C40F",
            "position": 2
          }
        ],
        "avatarUrl": "https://cdn.discordapp.com/avatars/185504812874465280/6e34a2302ee2444951336c61ac0fa9a0.png?size=512"
      },
      "attachments": [
        {
          "id": "1091115849068335114",
          "url": "https://cdn.discordapp.com/attachments/890591223625162792/1091115849068335114/20230330_171036.jpg?ex=671e0b92&is=671cba12&hm=c4e7b8945a4587cc9cd361c935a354b2645fb026391c766b7c975f3f01f69e76&",
          "fileName": "20230330_171036.jpg",
          "fileSizeBytes": 2519070
        }
      ],
      "embeds": [],
      "stickers": [],
      "reactions": [],
      "mentions": []
    }
    ]
    }
    """
    )

temp_df = pd.json_normalize(attachment_example["messages"],record_path="attachments",meta=["id"],sep='_',record_prefix="attachments_")
temp_df.columns

Index(['attachments_id', 'attachments_url', 'attachments_fileName',
       'attachments_fileSizeBytes', 'id'],
      dtype='object')