# Generate files for End2You

Here we provide an example on how to generate the required label files and the `input_file.csv` for the RECOLA (used in AVEC 2016) database in order to run End2You.

The AVEC2016 (RECOLA) folder is structured as follows:
```
AVEC2016
|─── features_audio
|─── features_video_appearance
|─── features_video_geometric
|─── ratings_gold_standard
|   |─── arousal
|   |   |─── train_1.arff
|   |   |─── train_2.arff
|   |   |─── ...
|   |─── valence
|   |   |─── train_1.arff
|   |   |─── train_2.arff
|   |   |─── ...
|─── ratings_individual
|─── ratings_individual_centred
|─── recordings_audio
|   |─── train_1.wav
|   |─── train_2.wav
|   |─── ...
|─── recordings_video
|   |─── train_1.mp4
|   |─── train_2.mp4
|   |─── ...
```

## Set Paths

In [None]:
import numpy as np
from pathlib import Path

root_dir = Path('/path/to/AVEC2016/')

audio_dir = root_dir / 'recordings_audio'
video_dir = root_dir / 'recordings_video'
ratings = root_dir / 'ratings_gold_standard'

arousal_path = ratings / 'arousal'
valence_path = ratings / 'valence'

In [None]:
modality = 'audio'
ext = 'mp4' if modality == 'video' else 'wav'
modality_dir = video_dir if modality =='video' else audio_dir

### Read ARFF files

In [None]:
import arff

def _get_data(arff_path):
    raw_data = arff.load(arff_path)
    data, timestamp = [], []
    for x in list(raw_data)[1:]:
        data.append(x.GoldStandard)
        timestamp.append(x.frameTime)
    
    timestamp = np.array(timestamp).reshape(-1,1)
    data = np.array(data).reshape(-1,1)
    
    return timestamp.astype(np.float32), data.astype(np.float32)

## Create label files for End2You

In [None]:
save_end2you_files = Path('/path/to/save/end2you_files')
save_end2you_files.mkdir(exist_ok=True)
save_label_files = save_end2you_files / 'labels'

In [None]:
for mod_file in modality_dir.glob(f'*.{ext}'):
    mod_file_file_name = mod_file.name[:-4] 
    if 'test' in mod_file_file_name:
        continue
    
    arousal_label_path = arousal_path / (mod_file_file_name + '.arff')
    valence_label_path = valence_path / (mod_file_file_name + '.arff')
    
    timestamp, arousal_ratings = _get_data(str(arousal_label_path))
    timestamp, valence_ratings = _get_data(str(valence_label_path))
    
    data = np.hstack([timestamp, arousal_ratings, valence_ratings])
    label_file = save_label_files / (mod_file.name[:-4] + '.csv')
    
    np.savetxt(str(label_file), data, header='timestamp,arousal,valence', fmt='%f', delimiter=',')

## Write `input_file.csv`

In [None]:
files = []
for mod_file in modality_dir.glob(f'*.{ext}'):
    if 'test' in mod_file.name[:-4]:
        continue
    
    label_file = str(save_label_files / (mod_file.name[:-4] + '.csv'))
    files.append([str(mod_file), str(label_file)])

In [None]:
save_inp_file = save_label_files / 'input_file.csv'

In [None]:
np.savetxt(str(save_inp_file), np.array(files), header='file,label_file', fmt='%s', delimiter=',')