# Reproducibility of ARAUS-extended dataset generation (part 1) - preparing original ARAUS
This script adequates the augmented audios and the data csv from ARAUS dataset so that it is prepared to generate the new ARAUS-extended dataset. It consists of 2 parts:
1) Apply gain to the augmented audios and re-save them
2) Complete responses.csv


In [2]:
import numpy as np
import pandas as pd
import os
from maad.util import mean_dB
from maad.spl import pressure2leq
from Mosqito.loadFiles import load
from src.SoundLights.dataset.wav_files import save_wav

# 1) Adapt augmented audios gains
ARAUS augmented audios should be found in folder /data/augmented_soundscapes, distributed in 25 folders according to the fold they belong to. 
Each of these audios, was played (in the listening tests) at a certain Leq. This Leq value is provided in responses.csv by ARAUS authors. 

In order to generate certain features (the ones we call "ARAUS features" as they aim to replicate the original ARAUS features), it is needed to know the gain that was applyied to the wav files (audios) in order to get the specified Leq. This linear gain (that converts wav into Peak-Pascals), one for each audio, is calculated in this section, and it must be stored. This is needed because this set of features are acoustical or psychoacoustical, and are linked to the physical signal, not the digital one.

However, for the second set of features (the ones we call "Freesound features", as they are calculated with FreesoundExtractor() from Essentia library), the audios need to be coherent between each other in terms of energy, meaning that audios that were played with less volume, should have less amplitude than those who were played with higher energy. The factor that gives us this proportionate relation is the gain mentioned in the paragraph above. Therefore, we are re-generating the whole set of augmented audios applying the corresponding gain to each soundscape. In order to avoid clipping (signal values out of range [+1,-1]), we are later normalising by (1/4))

In [4]:
# Path to folder containing original augmented soundscapes
audioFolderPath="../data/soundscapes_augmented/"
# Path to original responeses.csv
csvPath="../data/csv_files/responses.csv"
ARAUScsv = pd.read_csv(csvPath)
# Path to save
folder="ARAUS-extended_soundscapes_v2"

count_clip=0
count_total=0
clipping=[]

for dirpath, dirnames, files in os.walk(audioFolderPath):
    dirpath_split=dirpath.split("soundscapes_augmented")
    # Iterate over all files in the current directory
    files.sort()
    for file in files:
        if file.endswith(".mp3") or file.endswith(".wav"):
            # Find the row in responses.csv corresponding to current audio
            audio_path = dirpath + "/"+file
            file_split = file.split("_")
            file_fold = int(file_split[1])
            file_participant = "ARAUS_" + file_split[3]
            file_stimulus = int(file_split[5].split(".")[0])
            audio_info_aug = ARAUScsv[ARAUScsv["fold_r"] == file_fold]
            audio_info_aug = audio_info_aug[
                audio_info_aug["stimulus_index"] == file_stimulus
            ]
            audio_info_aug = audio_info_aug[
                audio_info_aug["participant"] == file_participant
            ]
            # Get the original Leq of this audio 
            true_Leq=audio_info_aug["Leq_R_r"].values[0]
            # Load the stereo audio file
            audio_r,fs=load(audio_path, wav_calib=1.0, ch=1)
            audio_l,fs=load(audio_path, wav_calib=1.0, ch=0)
            # Calculate gain from true Leq and "raw" Leq
            rawR_Leq=mean_dB(pressure2leq(audio_r, fs, 0.125))
            difference=true_Leq-rawR_Leq
            gain=10**(difference/20)
            # Normalisation gain to avoid a lot of clipping
            norm_gain=4
            # Apply gain to audio
            safe_gain=gain/norm_gain
            adapted_audio_r=audio_r*safe_gain
            adapted_audio_l=audio_l*safe_gain
            adapted_signal=np.column_stack((adapted_audio_l, adapted_audio_r))
            max_gain=np.max(adapted_audio_r)
            min_gain=np.min(adapted_audio_r)
            # Clipping?
            if(max_gain>1 or min_gain<-1):
                count_clip=count_clip+1
                clipping.append([file, gain, max_gain,min_gain])
                adapted_signal=np.clip(adapted_signal, -1, 1)
            # Save audio
            savingPath=dirpath_split[0]+folder+dirpath_split[1]+"/"
            if not os.path.exists(savingPath):
                os.makedirs(savingPath)
            savingPathComplete=savingPath+file
            save_wav(adapted_signal, fs, savingPathComplete)
            
            count_total=count_total+1
            print("Done audio ", count_total,"/25440")

Done audio  1 /25440
Done audio  2 /25440
Done audio  3 /25440
Done audio  4 /25440
Done audio  5 /25440
Done audio  6 /25440
Done audio  7 /25440
Done audio  8 /25440
Done audio  9 /25440
Done audio  10 /25440
Done audio  11 /25440
Done audio  12 /25440
Done audio  13 /25440
Done audio  14 /25440
Done audio  15 /25440
Done audio  16 /25440
Done audio  17 /25440
Done audio  18 /25440
Done audio  19 /25440
Done audio  20 /25440
Done audio  21 /25440
Done audio  22 /25440
Done audio  23 /25440
Done audio  24 /25440
Done audio  25 /25440
Done audio  26 /25440
Done audio  27 /25440
Done audio  28 /25440
Done audio  29 /25440
Done audio  30 /25440
Done audio  31 /25440
Done audio  32 /25440
Done audio  33 /25440
Done audio  34 /25440
Done audio  35 /25440
Done audio  36 /25440
Done audio  37 /25440
Done audio  38 /25440
Done audio  39 /25440
Done audio  40 /25440
Done audio  41 /25440
Done audio  42 /25440
Done audio  43 /25440
Done audio  44 /25440
Done audio  45 /25440
Done audio  46 /254

# 2) Completing responses.csv
responses.csv provided by ARAUS authors contains the data associated with the augmented soundscapes (participant answers, features of the audio, fold to which the audio belongs, base soundscape and masker used for the augmentation...).
However, we are included some new columns into the dataframe, so that it is complete and handy for our operations.
1) We are adding the sound source of the maskers (bird, traffic, construction...), as new features --> 6 more columns
2) We are addind Pleasantness and Eventfulness values calculated from the participant answers punctuations --> 2 more columns
3) We are adding the wav gain that has to be applied to each digital signal to convert it to pressure signal in Pascals --> 1 more column

In [4]:
responses = pd.read_csv(os.path.join('..','data/csv_files','responses.csv'), dtype = {'participant':str})

## 2.1) Maskers as features

One-hot encoding is a technique used to convert categorical variables into a numerical format that can be used for machine learning algorithms. It is particularly useful when dealing with categorical data that has no inherent order or hierarchy among its categories.

Here's how one-hot encoding works:

1) Identify Unique Categories:
First, you identify all the unique categories present in the categorical variable.

1) Create Binary Columns:
For each unique category, you create a new binary column. Each binary column corresponds to one unique category.

1) Assign Values:
In each binary column, you assign a value of 1 if the observation belongs to the category represented by that column, and 0 otherwise.



Extract only the maskers column to generate the one-hot encoding

In [5]:
maskers=responses["masker"]

Now from the maskers, extract the type of masker from name (type_number.wav) and then calculate the number of different maskers there is, and assign an order

In [6]:
# Generate maskers column with just masker type
maskers_type=maskers.str.split("_").str[0]
print(maskers_type)

0             silence
1             silence
2               water
3             traffic
4             traffic
             ...     
27250         traffic
27251         silence
27252    construction
27253         silence
27254         silence
Name: masker, Length: 27255, dtype: object


In [7]:
# Now count different maskers
maskers_variety=maskers_type.unique().tolist()
print(maskers_variety)

['silence', 'water', 'traffic', 'construction', 'wind', 'bird']


Now, generate the one-hot encoded dataframe

In [8]:
one_hot_encoded=pd.get_dummies(maskers_type, columns=maskers_variety, prefix="masker", dtype=int)
print(one_hot_encoded)

       masker_bird  masker_construction  ...  masker_water  masker_wind
0                0                    0  ...             0            0
1                0                    0  ...             0            0
2                0                    0  ...             1            0
3                0                    0  ...             0            0
4                0                    0  ...             0            0
...            ...                  ...  ...           ...          ...
27250            0                    0  ...             0            0
27251            0                    0  ...             0            0
27252            0                    1  ...             0            0
27253            0                    0  ...             0            0
27254            0                    0  ...             0            0

[27255 rows x 6 columns]


Finally, concatenate the one-hot-encoded dataframe with the original, and store it as a new csv

In [9]:
# Concatenate
responses_with_maskers=pd.concat([responses, one_hot_encoded], axis=1)
print(responses_with_maskers.shape, responses_with_maskers)

(27255, 166)        participant  fold_r  ... masker_water masker_wind
0      ARAUS_00001      -1  ...            0           0
1      ARAUS_00001       1  ...            0           0
2      ARAUS_00001       1  ...            1           0
3      ARAUS_00001       1  ...            0           0
4      ARAUS_00001       1  ...            0           0
...            ...     ...  ...          ...         ...
27250  ARAUS_10005       0  ...            0           0
27251  ARAUS_10005       0  ...            0           0
27252  ARAUS_10005       0  ...            0           0
27253  ARAUS_10005       0  ...            0           0
27254  ARAUS_10005      -1  ...            0           0

[27255 rows x 166 columns]


## 2.2) Calculate P and E

Ground truth labels refer to the actual, true, or correct values of the target variable (or labels) in a supervised machine learning task. In other words, these are the known outcomes or responses associated with the input data points. The purpose of ground truth labels is to provide a basis for training and evaluating machine learning models.

<img src="../data/images/PandE_axis.png" alt="Image Description" width="500">

<img src="../data/images/PandE_formulas.png" alt="Image Description" width="800">






Weights for ISO pleasantness:
- Pleasant: 1
- Eventful: 0
- Chaotic: -sqrt(2)/2
- Vibrant: sqrt(2)/2
- Uneventful: 0
- Calm: sqrt(2)/2
- Annoying: -1
- Monotonous: -sqrt(2)/2

Weights for ISO eventfulness:
- Pleasant: 0
- Eventful: 1
- Chaotic: sqrt(2)/2
- Vibrant: sqrt(2)/2
- Uneventful: -1
- Calm: -sqrt(2)/2
- Annoying: 0
- Monotonous: -sqrt(2)/2

In [10]:
attributes = ['pleasant', 'eventful', 'chaotic', 'vibrant', 'uneventful', 'calm', 'annoying', 'monotonous'] # Define attributes to extract from dataframes
ISOPl_weights = [1,0,-np.sqrt(2)/2,np.sqrt(2)/2, 0, np.sqrt(2)/2,-1,-np.sqrt(2)/2] # Define weights for each attribute in attributes in computation of ISO Pleasantness
ISOEv_weights = [0,1,np.sqrt(2)/2,np.sqrt(2)/2, -1, -np.sqrt(2)/2,0,-np.sqrt(2)/2] # Define weights for each attribute in attributes in computation of ISO Eventfulness

In [11]:
responses_with_maskers_PE = responses_with_maskers.copy() 
responses_with_maskers_PE['P_ground_truth'] = ((responses[attributes] * ISOPl_weights).sum(axis=1)/(4+np.sqrt(32))).values # These are normalised ISO Pleasantness values (in [-1,1])
responses_with_maskers_PE['E_ground_truth'] = ((responses[attributes] * ISOEv_weights).sum(axis=1)/(4+np.sqrt(32))).values # These are normalised ISO Pleasantness values (in [-1,1])
print(responses_with_maskers_PE.head())
print(responses_with_maskers_PE)

   participant  fold_r  ... P_ground_truth E_ground_truth
0  ARAUS_00001      -1  ...       0.603553       0.207107
1  ARAUS_00001       1  ...       0.457107      -0.500000
2  ARAUS_00001       1  ...       0.353553      -0.250000
3  ARAUS_00001       1  ...       0.457107      -0.189340
4  ARAUS_00001       1  ...       0.530330      -0.116117

[5 rows x 168 columns]
       participant  fold_r  ... P_ground_truth E_ground_truth
0      ARAUS_00001      -1  ...   6.035534e-01       0.207107
1      ARAUS_00001       1  ...   4.571068e-01      -0.500000
2      ARAUS_00001       1  ...   3.535534e-01      -0.250000
3      ARAUS_00001       1  ...   4.571068e-01      -0.189340
4      ARAUS_00001       1  ...   5.303301e-01      -0.116117
...            ...     ...  ...            ...            ...
27250  ARAUS_10005       0  ...  -2.299347e-17       0.207107
27251  ARAUS_10005       0  ...  -3.964466e-01       0.560660
27252  ARAUS_10005       0  ...  -9.267767e-01       0.383883
27253  A

## 2.3) Wav gains for each augmented soundscape

In ARAUS dataset responses.csv constitute the dataset of +25k augmented soundscapes labeled with psychoacoustic and acoustic parametres. Among these, we can find Leq_r, which constitutes the Leq of channel R for each audio.
The wav calibration we need to apply to the audio to obtain such Leq was calculated and already applyied in "Adapt augmented audios gain" section: we applyied gain/norm_gain. Therefore, in order to transform the new soundscape augmented ARAUS-extended audio wavs into the peak-Pascals signal, the norm_gain needs to be applyied still. This gain is stored in the new csv, in a new column.

In [13]:
norm_gain=6.44
n_audios=responses.shape[0]
gain_values=np.ones(n_audios)*norm_gain


In [14]:
# Create a new column with the generated values
responses_with_maskers_PE_gain=responses_with_maskers_PE.copy(deep=True)
responses_with_maskers_PE_gain.insert(loc=6, column='wav_gain', value=gain_values)
print(responses_with_maskers_PE_gain.columns)

Index(['participant', 'fold_r', 'soundscape', 'masker', 'smr',
       'stimulus_index', 'wav_gain', 'time_taken', 'is_attention', 'pleasant',
       ...
       'Leq_L_r', 'Leq_R_r', 'masker_bird', 'masker_construction',
       'masker_silence', 'masker_traffic', 'masker_water', 'masker_wind',
       'P_ground_truth', 'E_ground_truth'],
      dtype='object', length=169)


## 2.4) Save new generated dataset

In [15]:
# Save new dataset
responses_with_maskers_PE_gain.to_csv("../data/responses_SoundLights.csv", index=False)