# Project:
This notebook is being written by DPAG as part of the DFG-funded Research Project "Resolving the cognitive and neural basis of affective sound-meaning associations" under supervision of Dr. Arash Aryani, PostDoc researcher at FU Berlin.

Goal: We aim to fine-tune existing auditory DNNs in order to model and predict arousal & valence ratings from non-words.

## Models:
Current Model being tested is the XLRS-53 version of the wav2vec2 large model

Huggingface: https://huggingface.co/facebook/wav2vec2-large-xlsr-53

Short description: This model is a transformer-based model that learned speech representations on unlabeled data.

Why it's fitting for the project:

+ Pre-trained on shorter speech units than phonemes, this should make it so it's better for recognizing non-words compared to other models
+ There's literature on how the model layers effectively encode acoustic and phonetic information.

## Dataset:
Data utilized was gathered and consists of (data Arash sent me) - more TBA

### 1. Import libraries

In [None]:
# Packages to manage, load & save data 
import tqdm as notebook_tqdm
from datasets import load_dataset
import copy
import pickle

# ML packages
import numpy as np
import math
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split # For dataloader split
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
from scipy import stats

# DL packages
import torch
import transformers
import torchaudio
import librosa
from torch import nn
import optuna
import torch.nn as nn
from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Import local functions and classes
import os
import sys
from pathlib import Path
root = os.path.abspath("..") # Go up to root folder

if root not in sys.path:
    sys.path.append(root)

from src.load_data import load_data

In [2]:
# Set device = GPU // only needed for training
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Optional: enable more detailed CUDA error reporting
os.environ['TORCH_USE_CUDA_DSA'] = '1'

print("CUDA debugging enabled!")
print(f"Using device: {torch.cuda.get_device_name() if torch.cuda.is_available() else 'CPU'}")

Using device: cpu
CUDA debugging enabled!
Using device: CPU


In [3]:
# Set base dir with data files
base_dir = Path(r"C:\Users\blxck\Desktop\nn_acoustic")
data_dir = base_dir / "data" / "processed"

In [4]:
# Load wav files + labels

model_sr = 16000 

batch_sample = 1103 # This is the max size of shared files between wav_files and labels_df

data = load_data(data_dir, batch_size=batch_sample, target_sr=model_sr)

# Access dictionary variables
waveforms = data["waveforms"]
valences = data["valences"]
arousals = data["arousals"]

 1. Found 1103 matching audio-label pairs.

 2. Loading audio files


Loading audio files: 100%|██████████| 1103/1103 [00:15<00:00, 70.08it/s]


 No normalization applied

Data ready as a Torch object
Total samples: 1103






### Load model 

In [None]:
from transformers import Wav2Vec2Model, Wav2Vec2Processor, Wav2Vec2FeatureExtractor

 # Load pretrained model and processor
model_name = "facebook/wav2vec2-large-xlsr-53"
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)

### Run optimization study

In [None]:
from src.optimization_utils import setup_and_run_optimization

valence_study = setup_and_run_optimization(
    target_type="valence",
    waveforms=waveforms,
    targets=valences,
    n_trials=50,
    timeout=7200
)

### Save best model

In [None]:
# # Save final model - Optional
# aro_full_model = "/content/drive/MyDrive/Arash Projects/aro_full_model.pth"
# torch.save(optimized_arousal_model.state_dict(), aro_full_model)
# print(f"Retrained best arousal model saved at {aro_full_model}")