## List of Datasets:

  * DISFA
  * CK+
  * UNBC Shoulder Pain Dataset
  * AM-FED (Affectiva)
  * FERA 2015 and 2017 Challenge Datasets

#### This notebook demonstrates how the FAU coding and the images can be loaded for facial analysis

**conda env-** faus_dl *with python 3 kernel*

Importing libraries:

In [12]:
import glob
import os
import sys
import pandas as pd
import numpy as np

In [13]:
from tqdm import tqdm #for visualising progress bar

### **DISFA**

http://mohammadmahoor.com/wp-content/uploads/2017/06/DiSFA_Paper_andAppendix_Final_OneColumn1-1.pdf

* 16.5 GB
* 12 FAUs intensity
* 27 subjects (15 male, 12 female)
* 130,000 frames -each video has 4845 frames @ 20 fps
* Page 5 of paper has the distribution of occurrence of FAUs, each FAU occurs in atleast 5000 frames

#### Contains videos from the left and the right cameras in avi format, and also FAU labels.
**Videos Name formatting:**

*Videos_LeftCamera/{L,l}eftVideoSN001_{c,C}omp.avi* **or** *Videos_RightCamera/{R,r}ightVideoSN001_{c,C}omp.avi*

**FAU label formatting:**

*ActionUnit_Labels/SN001/SN001_au1.txt*

inside this file, each row is like:   *frame_no,intensity{0,1,2,3,4,5}*

In [14]:
DISFA_path='/media/amogh/Stuff/CMU/datasets/DISFA_data/'

In [15]:
% ls {DISFA_path}

ls: cannot access /media/amogh/Stuff/CMU/datasets/DISFA_data/: No such file or directory


In [16]:
DISFA_AU_path=DISFA_path+'ActionUnit_Labels/'
print(DISFA_AU_path)
Videos_right_path=DISFA_path+'Videos_RightCamera/'
print(Videos_right_path)

/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/


**DISFA FAUs**

Let's try to save a new dataframe corresponding to all the relevant FAUs

FAUs for which you want a csv file to be created (also taking into account that DISFA doesn't have all the relevant FAU annotations)

In [17]:
relevant_fau=[1,2,4,5,12,25,26]

In [18]:
sample_path=DISFA_AU_path+'SN001/SN001_au4.txt'
sample_path

'/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN001/SN001_au4.txt'

In [27]:
all_dict={}
for fau in relevant_fau:
    all_dict[f"dict_{fau}"]={}
all_dict

{'dict_1': {},
 'dict_2': {},
 'dict_4': {},
 'dict_5': {},
 'dict_12': {},
 'dict_25': {},
 'dict_26': {}}

In [28]:
subject_files_list=glob.glob(DISFA_AU_path+"/*")
subject_files_list

['/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN009',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN001',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN002',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN003',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN004',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN005',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN006',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN007',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN008',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN010',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN011',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN012',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN013',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/ActionUnit_Labels/SN016',
 '/med

In [29]:
for fau in tqdm(relevant_fau):
    for subject_path in subject_files_list:
        subject=os.path.basename(subject_path)
        reqd_file=subject_path+f"/{subject}_au{fau}.txt"
        reqd_df=pd.read_csv(reqd_file,names=["frameNo","value"])
        all_dict[f"dict_{fau}"][f"{subject}"]=(reqd_df['frameNo'][reqd_df["value"]>1]).values
        all_dict[f"dict_{fau}"][f"{subject}_neg"]=(reqd_df['frameNo'][reqd_df["value"]<=1]).values

100%|██████████| 7/7 [00:00<00:00, 11.39it/s]


Saving all data holding dictionaries as csv files

In [30]:
for key in all_dict.keys():
    fau_no=key.split('_')[1]
    reqd_df=pd.DataFrame(dict([(k,pd.Series(v)) for k,v in (all_dict[key]).items()]))
    reqd_df.to_csv('DISFA_FAUs/'+f'FAU{fau_no}.csv')

This is how the data looks like finally:

In [41]:
sample_df=pd.read_csv("DISFA_FAUs/FAU1.csv")
sample_df.head()

Unnamed: 0.1,Unnamed: 0,SN001,SN001_neg,SN002,SN002_neg,SN003,SN003_neg,SN004,SN004_neg,SN005,...,SN028,SN028_neg,SN029,SN029_neg,SN030,SN030_neg,SN031,SN031_neg,SN032,SN032_neg
0,0,,1,414.0,1.0,1629.0,1.0,937.0,1.0,1011.0,...,,1,128.0,1.0,384.0,1.0,2684.0,1.0,553.0,1.0
1,1,,2,415.0,2.0,1630.0,2.0,938.0,2.0,1012.0,...,,2,129.0,2.0,385.0,2.0,2685.0,2.0,554.0,2.0
2,2,,3,416.0,3.0,1631.0,3.0,939.0,3.0,1013.0,...,,3,130.0,3.0,386.0,3.0,2686.0,3.0,555.0,3.0
3,3,,4,417.0,4.0,1632.0,4.0,940.0,4.0,1014.0,...,,4,131.0,4.0,387.0,4.0,2687.0,4.0,556.0,4.0
4,4,,5,418.0,5.0,1633.0,5.0,941.0,5.0,1015.0,...,,5,132.0,5.0,388.0,5.0,2688.0,5.0,557.0,5.0


**Extracting frames**

In [21]:
os.getcwd()

'/home/amogh/cmu/notebooks'

In [22]:
f'{Videos_right_path}*.avi'

'/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/*.avi'

In [23]:
glob.glob(f'{Videos_right_path}*.avi')[0].split('/')[-1].split('_')[0]

'RightVideoSN013'

Create folders, raw code to avoid overwrite

View files in Videos_right_path:

In [24]:
glob.glob(f'{Videos_right_path}*')

['/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN013_comp.avi',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN001',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN001_comp.avi',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN002',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN002_comp.avi',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN003',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN003_comp.avi',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN004',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN004_comp.avi',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN005',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN005_comp.avi',
 '/media/amogh/Stuff/CMU/datasets/DISFA_data

Use the following to convert all frames to bitmap:<br>
ffmpeg -i "/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN001_comp.avi" "/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN001/out-%03d.bmp"


In [25]:
for path in glob.glob(f'{Videos_right_path}*.avi'):
    print(path)

/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN013_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN001_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN002_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN003_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN004_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN005_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN006_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN007_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN008_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN009_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/Videos_RightCamera/RightVideoSN010_comp.avi
/media/amogh/Stuff/CMU/datasets/DISFA_data/

## **FERA 2015 and 2017 Challenges**

### **BP4D**

In [32]:
BP4D_base_path='/home/amogh/cmu/dataset/BP4D/'

**Loading FAUs:**
<br>
AUCoding has csv files for each sequence. Eg- F001_T1.csv
<br>
Each csv file has a row corresponding to each frame, and corresponding columns 1-27 represent FAUs.
<br>
Occurrence codes: 0 for absent, 1 for present, or 9 for missing data (unknown).



In [33]:
BP4D_AU_path=BP4D_base_path+'AUCoding/AUCoding/'

#### **BP4D example functions**

In [34]:
example_subject='F001'
example_sequence='T1'
example_file=f'{example_subject}_{example_sequence}.csv'
example_file

'F001_T1.csv'

In [7]:
df_example=pd.read_csv(BP4D_AU_path+example_file)
df_example.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,90,91,92,93,94,95,96,97,98,99
0,2440,0,0,9,0,0,0,0,9,0,...,9,9,9,9,9,9,9,9,9,9
1,2441,0,0,9,0,0,0,0,9,0,...,9,9,9,9,9,9,9,9,9,9
2,2442,0,0,9,0,0,0,0,9,0,...,9,9,9,9,9,9,9,9,9,9
3,2443,0,0,9,0,0,0,0,9,0,...,9,9,9,9,9,9,9,9,9,9
4,2444,0,0,9,0,0,0,0,9,0,...,9,9,9,9,9,9,9,9,9,9


In [8]:
frame_numbers=df_example['0']
frame_numbers

0      2440
1      2441
2      2442
3      2443
4      2444
5      2445
6      2446
7      2447
8      2448
9      2449
10     2450
11     2451
12     2452
13     2453
14     2454
15     2455
16     2456
17     2457
18     2458
19     2459
20     2460
21     2461
22     2462
23     2463
24     2464
25     2465
26     2466
27     2467
28     2468
29     2469
       ... 
523    2963
524    2964
525    2965
526    2966
527    2967
528    2968
529    2969
530    2970
531    2971
532    2972
533    2973
534    2974
535    2975
536    2976
537    2977
538    2978
539    2979
540    2980
541    2981
542    2982
543    2983
544    2984
545    2985
546    2986
547    2987
548    2988
549    2989
550    2990
551    2991
552    2992
Name: 0, Length: 553, dtype: int64

**Seeing the occurrence (0(not present)/1(present)/9(unlabelled)) of an FAU number for a frame**;  *nth* AU value is the *nth* column for that frame

In [9]:
frame_no=2441
fau_no=3
df2=(df_example.loc[df_example['0']==frame_no]).iloc[:,fau_no]
df2
# df_example.loc[(frame_no),str(fau_no)]

1    9
Name: 3, dtype: int64

Seeing the **number of times each FAU occurs(1), does not occur(0), isn't labelled(9)**; syntax- df_example.iloc[:,0:28] returns the column for 27 FAUs, apply the value_counts function to each column, which gives the frequency of all values that occur in it. iloc at end is used to just see '0','1','9'. 

In [10]:
df_example.iloc[:,0:28].apply(pd.Series.value_counts).iloc[:3,:]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,18,19,20,21,22,23,24,25,26,27
0,,233.0,415.0,,527.0,432.0,263.0,553.0,,553.0,...,553.0,553.0,537.0,,553.0,553.0,553.0,,,496.0
1,,320.0,138.0,,26.0,121.0,290.0,,,,...,,,16.0,,,,,,,57.0
9,,,,553.0,,,,,553.0,,...,,,,553.0,,,,553.0,553.0,


In [17]:
def label_getter(subj_req,seq_req,frame_req):
    au_file_reqd=BP4D_AU_path+f'{subj_req}_{seq_req}.csv'
    print (au_file_reqd)
    df_reqd=pd.read_csv(au_file_reqd)
    list_of_faus=list(df_reqd.loc[df_example['0']==frame_req].iloc[0]) #list: [frame_no, fau1, fau2, fau3....]
    #choose FAUS for which you want the labels.
    list_mask=[1,2,4,6,7,10,12,14,15,17,23]
    list_final=[list_of_faus[i] for i in list_mask]
    return list_final

#running on an example.
label_getter('F001','T1',2440)

/home/amogh/cmu/dataset/BP4D/AUCoding/AUCoding/F001_T1.csv


[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0]

**Analysing occurrence of FAUs**

In [None]:
list_of_AU_files=glob.glob(f'{BP4D_AU_path}/*')
list_of_AU_files

Let's try to create csv files for each relevant FAU.<br>
Rows - frameNo<br>
Columns - subject_sequence

FAUs for which you want a csv file to be created

In [36]:
relevant_fau=[1,2,4,5,7,12,25,26,43]

intialising FAU dictionaries for above FAUS

In [37]:
all_dict={}
for fau in relevant_fau:
    all_dict[f"dict_{fau}"]={}
all_dict

{'dict_1': {},
 'dict_2': {},
 'dict_4': {},
 'dict_5': {},
 'dict_7': {},
 'dict_12': {},
 'dict_25': {},
 'dict_26': {},
 'dict_43': {}}

for each csv:

In [38]:
for csv_file in tqdm(list_of_AU_files):
    csv_df=pd.read_csv(csv_file)
    csv_basename=os.path.basename(csv_file)
    user_seq=os.path.splitext(csv_basename)[0]
    #1st column is for frame_no, returns the occurrence dataframe for relevant FAUs
    csv_relevant_df=csv_df.iloc[:,np.insert(relevant_fau,0,0)] 
    #adding to dictionary for each FAU:
    for fau in relevant_fau:
        occurrence_frames=(csv_relevant_df['0'][csv_relevant_df[str(fau)]==1]).values #list of all frames with au occurence
        not_occurrence_frames=(csv_relevant_df['0'][csv_relevant_df[str(fau)]==0]).values
        all_dict[f'dict_{fau}'][f'{user_seq}']=occurrence_frames
        all_dict[f'dict_{fau}'][f'{user_seq}_neg']=not_occurrence_frames

100%|██████████| 328/328 [00:10<00:00, 32.69it/s]


Saving all data holding dictionaries as csv files

In [39]:
for key in all_dict.keys():
    fau_no=key.split('_')[1]
    reqd_df=pd.DataFrame(dict([(k,pd.Series(v)) for k,v in (all_dict[key]).items()]))
    reqd_df.to_csv('BP4D_FAUs/'+f'FAU{fau_no}.csv')

Now since we have the csv files corresponding to each action unit, it is much easier to balance the data.

This is how the data looks like finally:

In [40]:
sample_df=pd.read_csv("BP4D_FAUs/FAU1.csv")
sample_df.head()

Unnamed: 0.1,Unnamed: 0,F001_T1,F001_T1_neg,F001_T2,F001_T2_neg,F001_T3,F001_T3_neg,F001_T4,F001_T4_neg,F001_T5,...,M018_T4,M018_T4_neg,M018_T5,M018_T5_neg,M018_T6,M018_T6_neg,M018_T7,M018_T7_neg,M018_T8,M018_T8_neg
0,0,2451.0,2440.0,836.0,721.0,,1.0,,664.0,,...,,1075.0,,237.0,,649.0,610.0,567.0,169.0,1.0
1,1,2452.0,2441.0,837.0,722.0,,2.0,,665.0,,...,,1076.0,,238.0,,650.0,611.0,568.0,170.0,2.0
2,2,2453.0,2442.0,838.0,723.0,,3.0,,666.0,,...,,1077.0,,239.0,,651.0,612.0,569.0,171.0,3.0
3,3,2454.0,2443.0,839.0,724.0,,4.0,,667.0,,...,,1078.0,,240.0,,652.0,613.0,570.0,172.0,4.0
4,4,2455.0,2444.0,840.0,725.0,,5.0,,668.0,,...,,1079.0,,241.0,,653.0,614.0,571.0,173.0,5.0


**Loading images:**
<br>
BP4D training has folders as *subject/sequence/frame_no.jpg*

In [126]:
BP4D_training_folder=BP4D_base_path+'BP4D-training/'

In [136]:
!ls $BP4D_training_folder

F001


### **SEMAINE**

* 150 participants' recordings; total 959 conversations, ~5 minutes each
* FACs annotation in 181 frames
