# ARAUS-extended dataset generation - preparing original ARAUS
This script adequates the augmented audios and the data csv from ARAUS dataset so that it is prepared to generate the new ARAUS-extended dataset.
responses.csv provided by ARAUS authors contains the data associated with the augmented soundscapes (participant answers, features of the audio, fold to which the audio belongs, base soundscape and masker used for the augmentation...).
However, we are included some new columns into the dataframe, so that it is complete and handy for our operations.
1) We are adding the sound source of the maskers (bird, traffic, construction...), as new features --> 6 more columns
2) We are addind Pleasantness and Eventfulness values calculated from the participant answers punctuations --> 2 more columns
3) We are adding the wav gain that has to be applied to each digital signal to convert it to pressure signal in Pascals --> 1 more column

In [31]:
import os
import sys
code_dir="/Users/amaiasagastimartinez/Desktop/Master/Master-Thesis/Code/src"
# src directory to sys.path
if code_dir not in sys.path:
    sys.path.insert(0, code_dir)

# data directory
current_dir = os.getcwd()
data_dir = os.path.abspath(os.path.join(current_dir, '..', '..', '..', 'data'))

In [32]:
import numpy as np
import pandas as pd
from maad.util import mean_dB
from maad.spl import pressure2leq
from SoundLights.dataset.Mosqito.loadFiles import load

In [None]:
# Path to original ARAUS dataset
path_csv="main_files/responses.csv"
# Path tp save new adapted dataset
saving_path="responses_SoundLights_delete.csv"

## Load ARAUS original csv file
Obtained directly from ARAUS repository

In [33]:
responses = pd.read_csv(os.path.join(data_dir, path_csv), dtype = {'participant':str})

## Maskers as features

One-hot encoding is a technique used to convert categorical variables into a numerical format that can be used for machine learning algorithms. It is particularly useful when dealing with categorical data that has no inherent order or hierarchy among its categories.

Here's how one-hot encoding works:

1) Identify Unique Categories:
First, you identify all the unique categories present in the categorical variable.

1) Create Binary Columns:
For each unique category, you create a new binary column. Each binary column corresponds to one unique category.

1) Assign Values:
In each binary column, you assign a value of 1 if the observation belongs to the category represented by that column, and 0 otherwise.



In [34]:
# Extract only the maskers column to generate the one-hot encoding
maskers=responses["masker"]
# From the maskers, extract the type of masker from name (type_number.wav) and then calculate the number of different maskers there is, and assign an order
maskers_type=maskers.str.split("_").str[0]
# Count different maskers
maskers_variety=maskers_type.unique().tolist()
print(maskers_variety)
# Generate the one-hot encoded dataframe
one_hot_encoded=pd.get_dummies(maskers_type, columns=maskers_variety, prefix="masker", dtype=int)
# Finally, concatenate the one-hot-encoded dataframe with the original, and store it as a new csv
responses_with_maskers=pd.concat([responses, one_hot_encoded], axis=1)
print(responses_with_maskers.shape, responses_with_maskers)

['silence', 'water', 'traffic', 'construction', 'wind', 'bird']
(27255, 166)        participant  fold_r                          soundscape  \
0      ARAUS_00001      -1  R0091_segment_binaural_44100_1.wav   
1      ARAUS_00001       1  R0079_segment_binaural_44100_1.wav   
2      ARAUS_00001       1  R0056_segment_binaural_44100_2.wav   
3      ARAUS_00001       1  R0046_segment_binaural_44100_2.wav   
4      ARAUS_00001       1  R0092_segment_binaural_44100_1.wav   
...            ...     ...                                 ...   
27250  ARAUS_10005       0    R1007_segment_binaural_44100.wav   
27251  ARAUS_10005       0    R1006_segment_binaural_44100.wav   
27252  ARAUS_10005       0    R1008_segment_binaural_44100.wav   
27253  ARAUS_10005       0    R1007_segment_binaural_44100.wav   
27254  ARAUS_10005      -1  R0091_segment_binaural_44100_1.wav   

                       masker  smr  stimulus_index  time_taken  is_attention  \
0           silence_00001.wav    0               1

## Calculate P and E

Ground truth labels refer to the actual, true, or correct values of the target variable (or labels) in a supervised machine learning task. In other words, these are the known outcomes or responses associated with the input data points. The purpose of ground truth labels is to provide a basis for training and evaluating machine learning models.

<img src="../../../data/images/PandE_axis.png" alt="Image Description" width="500">

<img src="../../../data/images/PandE_formulas.png" alt="Image Description" width="800">






Weights for ISO pleasantness:
- Pleasant: 1
- Eventful: 0
- Chaotic: -sqrt(2)/2
- Vibrant: sqrt(2)/2
- Uneventful: 0
- Calm: sqrt(2)/2
- Annoying: -1
- Monotonous: -sqrt(2)/2

Weights for ISO eventfulness:
- Pleasant: 0
- Eventful: 1
- Chaotic: sqrt(2)/2
- Vibrant: sqrt(2)/2
- Uneventful: -1
- Calm: -sqrt(2)/2
- Annoying: 0
- Monotonous: -sqrt(2)/2

In [35]:
# Define attributes to extract from dataframes
attributes = ['pleasant', 'eventful', 'chaotic', 'vibrant', 'uneventful', 'calm', 'annoying', 'monotonous'] 
# Define weights for each attribute in attributes in computation of ISO Pleasantness
ISOPl_weights = [1,0,-np.sqrt(2)/2,np.sqrt(2)/2, 0, np.sqrt(2)/2,-1,-np.sqrt(2)/2] 
# Define weights for each attribute in attributes in computation of ISO Eventfulness
ISOEv_weights = [0,1,np.sqrt(2)/2,np.sqrt(2)/2, -1, -np.sqrt(2)/2,0,-np.sqrt(2)/2] 
# Copy 
responses_with_maskers_PE = responses_with_maskers.copy() 
# These are normalised ISO Pleasantness values (in [-1,1])
responses_with_maskers_PE['P_ground_truth'] = ((responses[attributes] * ISOPl_weights).sum(axis=1)/(4+np.sqrt(32))).values
# These are normalised ISO Eventfulness values (in [-1,1])
responses_with_maskers_PE['E_ground_truth'] = ((responses[attributes] * ISOEv_weights).sum(axis=1)/(4+np.sqrt(32))).values
print(responses_with_maskers_PE.shape, responses_with_maskers_PE)

(27255, 168)        participant  fold_r                          soundscape  \
0      ARAUS_00001      -1  R0091_segment_binaural_44100_1.wav   
1      ARAUS_00001       1  R0079_segment_binaural_44100_1.wav   
2      ARAUS_00001       1  R0056_segment_binaural_44100_2.wav   
3      ARAUS_00001       1  R0046_segment_binaural_44100_2.wav   
4      ARAUS_00001       1  R0092_segment_binaural_44100_1.wav   
...            ...     ...                                 ...   
27250  ARAUS_10005       0    R1007_segment_binaural_44100.wav   
27251  ARAUS_10005       0    R1006_segment_binaural_44100.wav   
27252  ARAUS_10005       0    R1008_segment_binaural_44100.wav   
27253  ARAUS_10005       0    R1007_segment_binaural_44100.wav   
27254  ARAUS_10005      -1  R0091_segment_binaural_44100_1.wav   

                       masker  smr  stimulus_index  time_taken  is_attention  \
0           silence_00001.wav    0               1      98.328             0   
1           silence_00001.wav    6

## Wav gains for each augmented soundscape

In ARAUS dataset responses.csv constitute the dataset of +25k augmented soundscapes labeled with psychoacoustic and acoustic parametres. Among these, we can find Leq_r, which constitutes the Leq of channel R for each audio.

In order to generate certain features (the ones we call "ARAUS features" as they aim to replicate the original ARAUS features), it is needed to know the gain or calibration factor that was applyied to the wav files (audios) in order to get the specified Leq. This linear gain (that converts wav to Peak-Pascals), one for each audio, is calculated in this section, and it must be stored. 

For the other two set of features (the ones we call "Freesound features" and for the CLAP embedding generation), the audios need to be coherent between each other in terms of energy, meaning that audios that were played with less volume, should have less amplitude than those who were played with higher volume. The factor that gives us this proportionate relation is the gain mentioned in the paragraph above. Therefore, this gain value is also needed for this set of features.

This gain is stored in the new csv, in a new column.

In [None]:
# Path to folder containing original augmented soundscapes
audioFolderPath=data_dir+"/soundscapes_augmented/"
# Prepare output dataframe
columns=responses_with_maskers_PE.columns
newDF=pd.DataFrame(columns=columns)
newDF.insert(loc=6, column='wav_gain', value=None)
# Go over all the audio files in the given directory 
count_clip=0
count_total=0
clipping=[]
for dirpath, dirnames, files in os.walk(audioFolderPath):
    dirpath_split=dirpath.split("soundscapes_augmented")
    # Iterate over all files in the current directory
    files = list(files)
    files.sort()
    for file in files:
        if file.endswith(".mp3") or file.endswith(".wav"):
            print("file ", file)
            print("count total ", count_total)
            # Find the row in responses.csv corresponding to current audio
            audio_path = dirpath + "/"+file
            file_split = file.split("_")
            file_fold = int(file_split[1])
            file_participant = "ARAUS_" + file_split[3]
            file_stimulus = int(file_split[5].split(".")[0])
            audio_info_aug = responses_with_maskers_PE[responses_with_maskers_PE["fold_r"] == file_fold]
            audio_info_aug = audio_info_aug[
                audio_info_aug["stimulus_index"] == file_stimulus
            ]
            audio_info_aug = audio_info_aug[
                audio_info_aug["participant"] == file_participant
            ]
            # Get the original Leq of this audio 
            true_Leq=audio_info_aug["Leq_R_r"].values[0]
            # Load the stereo audio file
            audio_r,fs=load(audio_path, wav_calib=1.0, ch=1)
            audio_l,fs=load(audio_path, wav_calib=1.0, ch=0)
            # Calculate gain from true Leq and "raw" Leq
            rawR_Leq=mean_dB(pressure2leq(audio_r, fs, 0.125))
            gain_dB=true_Leq-rawR_Leq
            gain=10**(gain_dB/20)
            # Add gain info
            audio_info_aug["wav_gain"]=gain
            # Add audio file name
            audio_info_aug["file"]=file.split(".")[0]
            newDF = pd.concat([newDF, audio_info_aug], ignore_index=True)
            # Prepare next iteration
            count_total=count_total+1

## 2.4) Save new generated dataset

In [38]:
newDF.to_csv(os.path.join(data_dir,saving_path), index=False)