# Reproducibility of ARAUS-extended dataset generation (part 1) - preparing original ARAUS
This script adequates the augmented audios and the data csv from ARAUS dataset so that it is prepared to generate the new ARAUS-extended dataset. It consists of 2 parts:
1) Apply gain to the augmented audios and re-save them
2) Complete responses.csv


In [2]:
import numpy as np
import pandas as pd
import os
from maad.util import mean_dB
from maad.spl import pressure2leq
from Mosqito.loadFiles import load

# 1) Adapt augmented audios gains
ARAUS augmented audios should be found in folder /data/augmented_soundscapes, distributed in 25 folders according to the fold they belong to. 
Each of these audios, was played (in the listening tests) at a certain Leq. This Leq value is provided in responses.csv by ARAUS authors. 

In order to generate certain features (the ones we call "ARAUS features" as they aim to replicate the original ARAUS features), it is needed to know the gain that was applyied to the wav files (audios) in order to get the specified Leq. This linear gain (that converts wav into Peak-Pascals), one for each audio, is calculated in this section, and it must be stored. This is needed because this set of features are acoustical or psychoacoustical, and are linked to the physical signal, not the digital one.

However, for the second set of features (the ones we call "Freesound features", as they are calculated with FreesoundExtractor() from Essentia library), the audios need to be coherent between each other in terms of energy, meaning that audios that were played with less volume, should have less amplitude than those who were played with higher energy. The factor that gives us this proportionate relation is the gain mentioned in the paragraph above. Therefore, we are re-generating the whole set of augmented audios applying the corresponding gain to each soundscape. In order to avoid clipping (signal values out of range [+1,-1]), we are later normalising by (1/4))

In [4]:
""" # Path to folder containing original augmented soundscapes
audioFolderPath="../data/soundscapes_augmented/"
# Path to original responeses.csv
csvPath="../data/csv_files/responses.csv"
ARAUScsv = pd.read_csv(csvPath)
# Path to save
folder="ARAUS-extended_soundscapes_v2"

count_clip=0
count_total=0
clipping=[]

for dirpath, dirnames, files in os.walk(audioFolderPath):
    dirpath_split=dirpath.split("soundscapes_augmented")
    # Iterate over all files in the current directory
    files.sort()
    for file in files:
        if file.endswith(".mp3") or file.endswith(".wav"):
            # Find the row in responses.csv corresponding to current audio
            audio_path = dirpath + "/"+file
            file_split = file.split("_")
            file_fold = int(file_split[1])
            file_participant = "ARAUS_" + file_split[3]
            file_stimulus = int(file_split[5].split(".")[0])
            audio_info_aug = ARAUScsv[ARAUScsv["fold_r"] == file_fold]
            audio_info_aug = audio_info_aug[
                audio_info_aug["stimulus_index"] == file_stimulus
            ]
            audio_info_aug = audio_info_aug[
                audio_info_aug["participant"] == file_participant
            ]
            # Get the original Leq of this audio 
            true_Leq=audio_info_aug["Leq_R_r"].values[0]
            # Load the stereo audio file
            audio_r,fs=load(audio_path, wav_calib=1.0, ch=1)
            audio_l,fs=load(audio_path, wav_calib=1.0, ch=0)
            # Calculate gain from true Leq and "raw" Leq
            rawR_Leq=mean_dB(pressure2leq(audio_r, fs, 0.125))
            difference=true_Leq-rawR_Leq
            gain=10**(difference/20)
            # Normalisation gain to avoid a lot of clipping
            norm_gain=6.44
            # Apply gain to audio
            safe_gain=gain/norm_gain
            adapted_audio_r=audio_r*safe_gain
            adapted_audio_l=audio_l*safe_gain
            adapted_signal=np.column_stack((adapted_audio_l, adapted_audio_r))
            max_gain=np.max(adapted_audio_r)
            min_gain=np.min(adapted_audio_r)
            # Clipping?
            if(max_gain>1 or min_gain<-1):
                count_clip=count_clip+1
                clipping.append([file, gain, max_gain,min_gain])
                adapted_signal=np.clip(adapted_signal, -1, 1)
            # Save audio
            savingPath=dirpath_split[0]+folder+dirpath_split[1]+"/"
            if not os.path.exists(savingPath):
                os.makedirs(savingPath)
            savingPathComplete=savingPath+file
            save_wav(adapted_signal, fs, savingPathComplete)
            
            count_total=count_total+1
            print("Done audio ", count_total,"/25440") """

Done audio  1 /25440
Done audio  2 /25440
Done audio  3 /25440
Done audio  4 /25440
Done audio  5 /25440
Done audio  6 /25440
Done audio  7 /25440
Done audio  8 /25440
Done audio  9 /25440
Done audio  10 /25440
Done audio  11 /25440
Done audio  12 /25440
Done audio  13 /25440
Done audio  14 /25440
Done audio  15 /25440
Done audio  16 /25440
Done audio  17 /25440
Done audio  18 /25440
Done audio  19 /25440
Done audio  20 /25440
Done audio  21 /25440
Done audio  22 /25440
Done audio  23 /25440
Done audio  24 /25440
Done audio  25 /25440
Done audio  26 /25440
Done audio  27 /25440
Done audio  28 /25440
Done audio  29 /25440
Done audio  30 /25440
Done audio  31 /25440
Done audio  32 /25440
Done audio  33 /25440
Done audio  34 /25440
Done audio  35 /25440
Done audio  36 /25440
Done audio  37 /25440
Done audio  38 /25440
Done audio  39 /25440
Done audio  40 /25440
Done audio  41 /25440
Done audio  42 /25440
Done audio  43 /25440
Done audio  44 /25440
Done audio  45 /25440
Done audio  46 /254

# 2) Completing responses.csv
responses.csv provided by ARAUS authors contains the data associated with the augmented soundscapes (participant answers, features of the audio, fold to which the audio belongs, base soundscape and masker used for the augmentation...).
However, we are included some new columns into the dataframe, so that it is complete and handy for our operations.
1) We are adding the sound source of the maskers (bird, traffic, construction...), as new features --> 6 more columns
2) We are addind Pleasantness and Eventfulness values calculated from the participant answers punctuations --> 2 more columns
3) We are adding the wav gain that has to be applied to each digital signal to convert it to pressure signal in Pascals --> 1 more column

In [3]:
responses = pd.read_csv(os.path.join('..','data/main_files','responses.csv'), dtype = {'participant':str})

## 2.1) Maskers as features

One-hot encoding is a technique used to convert categorical variables into a numerical format that can be used for machine learning algorithms. It is particularly useful when dealing with categorical data that has no inherent order or hierarchy among its categories.

Here's how one-hot encoding works:

1) Identify Unique Categories:
First, you identify all the unique categories present in the categorical variable.

1) Create Binary Columns:
For each unique category, you create a new binary column. Each binary column corresponds to one unique category.

1) Assign Values:
In each binary column, you assign a value of 1 if the observation belongs to the category represented by that column, and 0 otherwise.



Extract only the maskers column to generate the one-hot encoding

In [4]:
maskers=responses["masker"]

Now from the maskers, extract the type of masker from name (type_number.wav) and then calculate the number of different maskers there is, and assign an order

In [5]:
# Generate maskers column with just masker type
maskers_type=maskers.str.split("_").str[0]
print(maskers_type)

0             silence
1             silence
2               water
3             traffic
4             traffic
             ...     
27250         traffic
27251         silence
27252    construction
27253         silence
27254         silence
Name: masker, Length: 27255, dtype: object


In [6]:
# Now count different maskers
maskers_variety=maskers_type.unique().tolist()
print(maskers_variety)

['silence', 'water', 'traffic', 'construction', 'wind', 'bird']


Now, generate the one-hot encoded dataframe

In [7]:
one_hot_encoded=pd.get_dummies(maskers_type, columns=maskers_variety, prefix="masker", dtype=int)
print(one_hot_encoded)

       masker_bird  masker_construction  masker_silence  masker_traffic  \
0                0                    0               1               0   
1                0                    0               1               0   
2                0                    0               0               0   
3                0                    0               0               1   
4                0                    0               0               1   
...            ...                  ...             ...             ...   
27250            0                    0               0               1   
27251            0                    0               1               0   
27252            0                    1               0               0   
27253            0                    0               1               0   
27254            0                    0               1               0   

       masker_water  masker_wind  
0                 0            0  
1                 0          

Finally, concatenate the one-hot-encoded dataframe with the original, and store it as a new csv

In [8]:
# Concatenate
responses_with_maskers=pd.concat([responses, one_hot_encoded], axis=1)
print(responses_with_maskers.shape, responses_with_maskers)

(27255, 166)        participant  fold_r                          soundscape  \
0      ARAUS_00001      -1  R0091_segment_binaural_44100_1.wav   
1      ARAUS_00001       1  R0079_segment_binaural_44100_1.wav   
2      ARAUS_00001       1  R0056_segment_binaural_44100_2.wav   
3      ARAUS_00001       1  R0046_segment_binaural_44100_2.wav   
4      ARAUS_00001       1  R0092_segment_binaural_44100_1.wav   
...            ...     ...                                 ...   
27250  ARAUS_10005       0    R1007_segment_binaural_44100.wav   
27251  ARAUS_10005       0    R1006_segment_binaural_44100.wav   
27252  ARAUS_10005       0    R1008_segment_binaural_44100.wav   
27253  ARAUS_10005       0    R1007_segment_binaural_44100.wav   
27254  ARAUS_10005      -1  R0091_segment_binaural_44100_1.wav   

                       masker  smr  stimulus_index  time_taken  is_attention  \
0           silence_00001.wav    0               1      98.328             0   
1           silence_00001.wav    6

## 2.2) Calculate P and E

Ground truth labels refer to the actual, true, or correct values of the target variable (or labels) in a supervised machine learning task. In other words, these are the known outcomes or responses associated with the input data points. The purpose of ground truth labels is to provide a basis for training and evaluating machine learning models.

<img src="../data/images/PandE_axis.png" alt="Image Description" width="500">

<img src="../data/images/PandE_formulas.png" alt="Image Description" width="800">






Weights for ISO pleasantness:
- Pleasant: 1
- Eventful: 0
- Chaotic: -sqrt(2)/2
- Vibrant: sqrt(2)/2
- Uneventful: 0
- Calm: sqrt(2)/2
- Annoying: -1
- Monotonous: -sqrt(2)/2

Weights for ISO eventfulness:
- Pleasant: 0
- Eventful: 1
- Chaotic: sqrt(2)/2
- Vibrant: sqrt(2)/2
- Uneventful: -1
- Calm: -sqrt(2)/2
- Annoying: 0
- Monotonous: -sqrt(2)/2

In [9]:
attributes = ['pleasant', 'eventful', 'chaotic', 'vibrant', 'uneventful', 'calm', 'annoying', 'monotonous'] # Define attributes to extract from dataframes
ISOPl_weights = [1,0,-np.sqrt(2)/2,np.sqrt(2)/2, 0, np.sqrt(2)/2,-1,-np.sqrt(2)/2] # Define weights for each attribute in attributes in computation of ISO Pleasantness
ISOEv_weights = [0,1,np.sqrt(2)/2,np.sqrt(2)/2, -1, -np.sqrt(2)/2,0,-np.sqrt(2)/2] # Define weights for each attribute in attributes in computation of ISO Eventfulness

In [10]:
responses_with_maskers_PE = responses_with_maskers.copy() 
responses_with_maskers_PE['P_ground_truth'] = ((responses[attributes] * ISOPl_weights).sum(axis=1)/(4+np.sqrt(32))).values # These are normalised ISO Pleasantness values (in [-1,1])
responses_with_maskers_PE['E_ground_truth'] = ((responses[attributes] * ISOEv_weights).sum(axis=1)/(4+np.sqrt(32))).values # These are normalised ISO Eventfulness values (in [-1,1])
print(responses_with_maskers_PE.head())
print(responses_with_maskers_PE)

   participant  fold_r                          soundscape             masker  \
0  ARAUS_00001      -1  R0091_segment_binaural_44100_1.wav  silence_00001.wav   
1  ARAUS_00001       1  R0079_segment_binaural_44100_1.wav  silence_00001.wav   
2  ARAUS_00001       1  R0056_segment_binaural_44100_2.wav    water_00047.wav   
3  ARAUS_00001       1  R0046_segment_binaural_44100_2.wav  traffic_00006.wav   
4  ARAUS_00001       1  R0092_segment_binaural_44100_1.wav  traffic_00016.wav   

   smr  stimulus_index  time_taken  is_attention  pleasant  eventful  ...  \
0    0               1      98.328             0         5         4  ...   
1    6               2      77.446             0         5         2  ...   
2   -3               3      67.102             0         4         2  ...   
3    6               4      56.640             0         5         4  ...   
4   -6               5      51.311             0         5         4  ...   

     Leq_L_r    Leq_R_r  masker_bird  masker_const

## 2.3) Wav gains for each augmented soundscape

In ARAUS dataset responses.csv constitute the dataset of +25k augmented soundscapes labeled with psychoacoustic and acoustic parametres. Among these, we can find Leq_r, which constitutes the Leq of channel R for each audio.

In order to generate certain features (the ones we call "ARAUS features" as they aim to replicate the original ARAUS features), it is needed to know the gain that was applyied to the wav files (audios) in order to get the specified Leq. This linear gain (that converts wav to Peak-Pascals), one for each audio, is calculated in this section, and it must be stored. This is needed because some of the features are acoustical or psychoacoustical, linked to the physical signal, not the digital one.

However, for the second set of features (the ones we call "Freesound features", as they are calculated with FreesoundExtractor() from Essentia library), or for the CLAP embedding generation, the audios need to be coherent between each other in terms of energy, meaning that audios that were played with less volume, should have less amplitude than those who were played with higher energy. The factor that gives us this proportionate relation is the gain mentioned in the paragraph above. Therefore, this gain value is also needed for this set of features.

This gain is stored in the new csv, in a new column.

In [14]:
# Path to folder containing original augmented soundscapes
audioFolderPath="../data/soundscapes_augmented/"
# Path to original responeses.csv
csvPath="../data/main_files/responses.csv"
ARAUScsv = pd.read_csv(csvPath)

# prepare output dataframe
columns=ARAUScsv.columns
newDF=pd.DataFrame(columns=columns)
newDF.insert(loc=6, column='wav_gain', value=None)


count_clip=0
count_total=0
clipping=[]

for dirpath, dirnames, files in os.walk(audioFolderPath):
    dirpath_split=dirpath.split("soundscapes_augmented")
    # Iterate over all files in the current directory
    files = list(files)
    files.sort()
    for file in files:
        if file.endswith(".mp3") or file.endswith(".wav"):
            print("count total ", count_total)
            # Find the row in responses.csv corresponding to current audio
            audio_path = dirpath + "/"+file
            file_split = file.split("_")
            file_fold = int(file_split[1])
            file_participant = "ARAUS_" + file_split[3]
            file_stimulus = int(file_split[5].split(".")[0])
            audio_info_aug = responses_with_maskers_PE[responses_with_maskers_PE["fold_r"] == file_fold]
            audio_info_aug = audio_info_aug[
                audio_info_aug["stimulus_index"] == file_stimulus
            ]
            audio_info_aug = audio_info_aug[
                audio_info_aug["participant"] == file_participant
            ]
            # Get the original Leq of this audio 
            true_Leq=audio_info_aug["Leq_R_r"].values[0]
            # Load the stereo audio file
            audio_r,fs=load(audio_path, wav_calib=1.0, ch=1)
            audio_l,fs=load(audio_path, wav_calib=1.0, ch=0)
            # Calculate gain from true Leq and "raw" Leq
            rawR_Leq=mean_dB(pressure2leq(audio_r, fs, 0.125))
            difference=true_Leq-rawR_Leq
            gain=10**(difference/20)
            #Add gain info
            audio_info_aug["wav_gain"]=gain
            #print(audio_info_aug)
            newDF = pd.concat([newDF, audio_info_aug], ignore_index=True)
            #gain_values[count_total]=gain
            count_total=count_total+1

count total  0
count total  1
count total  2
count total  3
count total  4


  newDF = pd.concat([newDF, audio_info_aug], ignore_index=True)


count total  5
count total  6
count total  7
count total  8
count total  9
count total  10
count total  11
count total  12
count total  13
count total  14
count total  15
count total  16
count total  17
count total  18
count total  19
count total  20
count total  21
count total  22
count total  23
count total  24
count total  25
count total  26
count total  27
count total  28
count total  29
count total  30
count total  31
count total  32
count total  33
count total  34
count total  35
count total  36
count total  37
count total  38
count total  39
count total  40
count total  41
count total  42
count total  43
count total  44
count total  45
count total  46
count total  47
count total  48
count total  49
count total  50
count total  51
count total  52
count total  53
count total  54
count total  55
count total  56
count total  57
count total  58
count total  59
count total  60
count total  61
count total  62
count total  63
count total  64
count total  65
count total  66
count total  

## 2.4) Save new generated dataset

In [15]:
# Save new dataset
newDF.to_csv("../data/responses_SoundLights.csv", index=False)