<a href="https://colab.research.google.com/github/GemmaGorey/Dissertation/blob/main/Dissertation_GG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Initial Colab setup below - Run  top two cells once only per session**


In [5]:
!pip install -q condacolab
import condacolab
condacolab.install()
# installs mamba to use instead of pip

⏬ Downloading https://github.com/jaimergp/miniforge/releases/download/24.11.2-1_colab/Miniforge3-colab-24.11.2-1_colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:09
🔁 Restarting kernel...


In [2]:
# creates the config file and builds the environment.
yaml_content = """
name: dissertation
channels:
  - pytorch
  - conda-forge
dependencies:
  - python=3.11
  - pytorch=2.2.2
  - torchvision=0.17.2
  - torchaudio
  - librosa
  - numpy<2
  - pandas
  - jupyter
  - wandb
"""

# writes the string content to a file -  'environment.yml'.
with open('environment.yml', 'w') as f:
    f.write(yaml_content)

print("environment.yml file created successfully.")

# creates the environment using mamba from the yml file.
print("\n Creating environment")

!mamba env create -f environment.yml --quiet && echo -e "\n 'dissertation' environment is ready to use."

environment.yml file created successfully.

 Creating environment
Channels:
 - pytorch
 - conda-forge
Platform: linux-64
Collecting package metadata (repodata.json): - \ | / - \ | done
Solving environment: - \ | / - \ | / done


    current version: 24.11.2
    latest version: 25.5.1

Please update conda by running

    $ conda update -n base -c conda-forge conda



Downloading and Extracting Packages:
mkl-2022.2.1         | 157.3 MB  | :   0% 0/1 [00:00<?, ?it/s]
pytorch-2.2.2        | 82.5 MB   | :   0% 0/1 [00:00<?, ?it/s][A

pillow-11.2.1        | 41.5 MB   | :   0% 0/1 [00:00<?, ?it/s][A[A


python-3.11.13       | 29.2 MB   | :   0% 0/1 [00:00<?, ?it/s][A[A[A



llvmlite-0.44.0      | 28.6 MB   | :   0% 0/1 [00:00<?, ?it/s][A[A[A[A




wandb-0.20.1         | 20.2 MB   | :   0% 0/1 [00:00<?, ?it/s][A[A[A[A[A





scipy-1.15.2         | 16.4 MB   | :   0% 0/1 [00:00<?, ?it/s][A[A[A[A[A[A






pandas-2.3.0         | 14.6 MB   | :   

In [3]:
# imports and setting up of GitHub and W&B

# clone project repository from GitHub
print("⏳ Cloning GitHub repository...")
!git clone https://github.com/GemmaGorey/Dissertation.git
print("Repository cloned.")

# Get Kaggle API key from Google Drive
print("\n⏳ Setting up Kaggle API key...")
from google.colab import drive
drive.mount('/content/drive')

!mkdir -p ~/.kaggle
!cp /content/drive/MyDrive/kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
print("Kaggle key configured.")

# download and unzip the DEAM dataset
print("\n Downloading dataset from Kaggle...")
!kaggle datasets download -d imsparsh/deam-mediaeval-dataset-emotional-analysis-in-music
print(" Dataset downloaded. Unzipping...")
!unzip -q deam-mediaeval-dataset-emotional-analysis-in-music.zip
print(" Dataset unzipped.")

# Check the files in the dataset
print("\n--- Current files: ---")
!ls

print("\n Project setup is complete.")


⏳ Cloning GitHub repository...
Cloning into 'Dissertation'...
remote: Enumerating objects: 67, done.[K
remote: Counting objects: 100% (67/67), done.[K
remote: Compressing objects: 100% (58/58), done.[K
remote: Total 67 (delta 26), reused 5 (delta 1), pack-reused 0 (from 0)[K
Receiving objects: 100% (67/67), 25.21 KiB | 3.60 MiB/s, done.
Resolving deltas: 100% (26/26), done.
Repository cloned.

⏳ Setting up Kaggle API key...
Mounted at /content/drive
Kaggle key configured.

 Downloading dataset from Kaggle...
Dataset URL: https://www.kaggle.com/datasets/imsparsh/deam-mediaeval-dataset-emotional-analysis-in-music
License(s): CC-BY-NC-SA-4.0
Downloading deam-mediaeval-dataset-emotional-analysis-in-music.zip to /content
 98% 1.79G/1.83G [00:10<00:00, 43.4MB/s]
100% 1.83G/1.83G [00:10<00:00, 179MB/s] 
 Dataset downloaded. Unzipping...
 Dataset unzipped.

--- Current files: ---
condacolab_install.log					drive
DEAM_Annotations					environment.yml
DEAM_audio						features
deam-mediaeval-d

In [None]:
import pandas as pd

# Load the musical features from the unzipped 'features' folder
# The 'song_id' is in the first column, so we'll use it as the index.
features_df = pd.read_csv('features/features_30_sec.csv', index_col='song_id')

# Load the static emotion annotations from the unzipped 'DEAM_Annotations' folder
annotations_df = pd.read_csv('DEAM_Annotations/annotations/static/annotations.csv', index_col='song_id')

print("✅ Datasets loaded successfully.")

# --- Inspect the DataFrames ---
print("\n--- First 5 rows of the Features DataFrame ---")
display(features_df.head())

print("\n--- First 5 rows of the Annotations DataFrame ---")
display(annotations_df.head())

# --- Combine the Features and Labels ---
dataset_df = pd.merge(features_df, annotations_df, on='song_id')

print("\n✅ Features and annotations merged successfully.")
print("\n--- First 5 rows of the final combined DataFrame ---")
display(dataset_df.head())

total 1.9G
drwxr-xr-x 3 root root 4.0K Jun 10 19:14 DEAM_Annotations
drwxr-xr-x 3 root root 4.0K Jun 10 19:14 DEAM_audio
-rw-r--r-- 1 root root 1.9G Jun 23  2021 deam-mediaeval-dataset-emotional-analysis-in-music.zip
drwxr-xr-x 9 root root 4.0K Jun 10 18:15 Dissertation
drwx------ 7 root root 4.0K Jun 10 18:54 drive
drwxr-xr-x 3 root root 4.0K Jun 10 19:15 features
-rw-r--r-- 1 root root   66 Jun 10 18:44 kaggle.json
drwxr-xr-x 1 root root 4.0K Jun  9 13:37 sample_data
