<center><h1>1. Adience Overview</h1></center>

## About this notebook

In order to evaluate our model, we need to split our available data intro training, validation and testing portions. That way, we can use the traning split to learn parameters, the validation set to decide on hyperparameters and the testing set to determine the final performance of our models unseen data. This method of model evaluation is called <b>Cross Validation</b>.

However, it is often problematic to decide on which portions of the dataset should be used for training, testing and validation because the quality of the splits has a non-trivial effect on the model's performance. Luckily enough, the question of which portion to use for testing is already answered by the Adience Benchmark guidelines. More precisely, the 5th fold - that would be the fold4 because the folds are indexed starting at 0- is to be used as the testing set.

With that question out of the way, we're still concerned with how to split the remaining data into a training and validation split. The technique of <b>K-Fold Cross Validation</b> answer this question by creating K different training and validation splits out of the remaining data, then testing our model and all of them and having the average performance of our models on all splits be our measure of accuracy / 'goodness'. We'll be using a specific variant of K-Fold Cross Validation called <b>Stratified K-Fold Cross Validation</b>.

Stratified K-Fold Cross Validation forces the K different training-validation splits to have roughly the same distribution of classes in each of them. The idea here is to prevent any fold from having a non-trivial excess of a given class that would then bias the classifier created on it.

Lastly, after our folds have been created, we'll organize them into directories in such a way that Keras's Image Processing tools can make use of it without extra -often hacky- workarounds

## Suggested Sources

If the explanation above didn't quite make sense. I recommend reviewing the sources below.

1. https://www.youtube.com/watch?v=TIgfjmp-4BA
2. stats.stackexchange.com/questions/117643/why-use-stratified-cross-validation-why-does-this-not-damage-variance-related-b

## Creating the foundational splits on the Adience Benchmark

Adience Benchmark source: http://www.openu.ac.il/home/hassner/Adience/data.html

In [1]:
# Necessary imports
import os
import pandas as pd
import numpy  as np
from sklearn.model_selection import StratifiedShuffleSplit

In [2]:
# Ensure reproducibility
np.random.seed(0)

In [3]:
# Path constants
ADIENCE_TEMPALTE  = "../../data/adience/%s"
METADATA_TEMPLATE = "../../data/adience/meta/fold_%s_data.txt"
IMG_TEMPLATE      = "../../data/adience/aligned/%s/landmark_aligned_face.%d.%s"

AGE_TRAIN_TEMPLATE = "../../data/adience/keras_format/age/train/%d/%s/%d.jpg"
AGE_VALID_TEMPLATE = "../../data/adience/keras_format/age/valid/%d/%s/%d.jpg"
AGE_TEST_TEMPLATE  = "../../data/adience/keras_format/age/test/%s/%d.jpg"

GENDER_TRAIN_TEMPLATE = "../../data/adience/keras_format/gender/train/%d/%s/%d.jpg"
GENDER_VALID_TEMPLATE = "../../data/adience/keras_format/gender/valid/%d/%s/%d.jpg"
GENDER_TEST_TEMPLATE  = "../../data/adience/keras_format/gender/test/%s/%d.jpg"

RELEVANT_COLS = ["user_id","face_id","original_image","gender","age"]

META_SAVE_TEMPLATE = "../../data/adience/meta/%s_%d.csv"

In [4]:
# Indepdendent constants
NUM_TRAIN_FOLDS = 4
IDX_TEST_FOLD   = 4
NUM_SPLITS      = 5

In [5]:
# Dependendent constants
METADATA_TEST = METADATA_TEMPLATE % IDX_TEST_FOLD

In [6]:
# Extracting the test partition and 
# creating a combined train_validation (trvl) superset

folds = []
for index in range(NUM_TRAIN_FOLDS):
    path = METADATA_TEMPLATE % index
    folds.append(pd.read_csv(filepath_or_buffer=path, sep="\t"))
    
trvl_meta = pd.concat(folds, ignore_index=True)
test_meta = pd.read_csv(filepath_or_buffer=METADATA_TEST, sep="\t")

trvl_meta = trvl_meta[RELEVANT_COLS]
test_meta = test_meta[RELEVANT_COLS]

At this point, we've created an initial partition of our dataset into a testing split and its complement. Both splits still need more processing before they're ready for primetime

## Overview of current splits

### Testing split

In [7]:
test_meta.head()

Unnamed: 0,user_id,face_id,original_image,gender,age
0,115321157@N03,1744,12111738395_a7f715aa4e_o.jpg,m,"(4, 6)"
1,115321157@N03,1745,12112413505_0aea8e17c6_o.jpg,m,"(48, 53)"
2,115321157@N03,1744,12112392255_995532c2f0_o.jpg,m,"(4, 6)"
3,115321157@N03,1746,12112392255_995532c2f0_o.jpg,m,"(25, 32)"
4,115321157@N03,1747,12112392255_995532c2f0_o.jpg,m,"(25, 32)"


In [8]:
test_meta.shape

(3816, 5)

In [9]:
test_meta["gender"].value_counts()

f    1848
m    1597
u     286
Name: gender, dtype: int64

In [10]:
test_meta["age"].value_counts()

(25, 32)     1056
(4, 6)        570
(38, 43)      502
(0, 2)        483
(8, 12)       340
(60, 100)     257
(48, 53)      241
(15, 20)      227
None           62
35             36
57             17
55             11
45              6
(38, 48)        5
32              3
Name: age, dtype: int64

Aha! There are 'None'-valued ages. That's problematic. We'll need to do something about those. Thanfully, it's apparently only a few

In [11]:
test_meta.isnull().values.any()

True

This is unexpected, there are null / NaN values in this split. Let us inspect that further

In [12]:
test_meta[test_meta.isnull().any(axis=1)].head()

Unnamed: 0,user_id,face_id,original_image,gender,age
2805,7285955@N06,2059,9489513876_86d04ff460_o.jpg,,
3091,8007224@N07,2118,8917875562_c7925a4e2b_o.jpg,,"(25, 32)"
3092,8007224@N07,2118,8755673180_d6945bff9f_o.jpg,,"(25, 32)"
3096,8007224@N07,2118,11866643475_3a8d5ef09f_o.jpg,,"(25, 32)"
3103,8007224@N07,2118,8917875226_98976f714e_o.jpg,,"(25, 32)"


### test complement set

In [13]:
trvl_meta.head()

Unnamed: 0,user_id,face_id,original_image,gender,age
0,30601258@N03,1,10399646885_67c7d20df9_o.jpg,f,"(25, 32)"
1,30601258@N03,2,10424815813_e94629b1ec_o.jpg,m,"(25, 32)"
2,30601258@N03,1,10437979845_5985be4b26_o.jpg,f,"(25, 32)"
3,30601258@N03,3,10437979845_5985be4b26_o.jpg,m,"(25, 32)"
4,30601258@N03,2,11816644924_075c3d8d59_o.jpg,m,"(25, 32)"


In [14]:
trvl_meta.shape

(15554, 5)

In [15]:
trvl_meta["gender"].value_counts()

f    7524
m    6523
u     813
Name: gender, dtype: int64

In [16]:
trvl_meta["age"].value_counts()

(25, 32)     3948
(0, 2)       2005
(38, 43)     1791
(8, 12)      1784
(4, 6)       1570
(15, 20)     1415
None          686
(60, 100)     615
(48, 53)      589
35            257
13            168
22            149
34            105
23             96
45             82
(27, 32)       77
55             65
36             56
(38, 42)       46
3              18
29             11
57              7
58              5
2               3
56              2
42              1
(38, 48)        1
46              1
(8, 23)         1
Name: age, dtype: int64

This is weird. There should not be ages outside ranges. Let's look into this as well

In [17]:
trvl_meta.isnull().values.any()

True

Is is TRUE that there are NaN value in this set as well. We'll have to fix it too

## Fixing dataset inconsistencies and filling missing values

The summaries above show problems with the dataset. Namely, NaNs as gender values, None as age values and inconsistent / overlapping labels for age. We address those issues right here.

### Dropping rows with NaN gender and None age simultaneously

Rows with NaN gender and None age are the most problematic because we cannot average any values in order to 'guesstimate' their real values. A possible solution would be to fill in those values by running a state-of-the-art model such as Face++ or Microsofts' Facial Features model but we think that'd be unecessary given that there aren't that many rows with these two characteristics at the same time

In [18]:
# Conditions for dropping
NaN_gender_trvl = trvl_meta["gender"].isnull()
None_age_trvl   = trvl_meta["age"] == "None"
trvl_bad_rows    = NaN_gender_trvl & None_age_trvl
trvl_bad_indices = trvl_meta[trvl_bad_rows].index.values

# Dropping rows when two conditions are present
trvl_meta.drop(labels=trvl_bad_indices, inplace=True)

In [19]:
# Conditions for dropping
NaN_gender_test = test_meta["gender"].isnull()
None_age_test   = test_meta["age"] == "None"
test_bad_rows   = NaN_gender_test & None_age_test
test_bad_indices = test_meta[test_bad_rows].index.values

# Dropping rows when two conditions are present
test_meta.drop(labels=test_bad_indices, inplace=True)

### Fixing bad age ranges

We already know that some of the age labels for some reason are not declared as a range, as most other labels, but instead as a single number. This is problematic. The next section transform real, continuous ages into their matching ranges

#### Testing complement

In [20]:
trvl_meta["age"].value_counts()

(25, 32)     3948
(0, 2)       2005
(38, 43)     1791
(8, 12)      1784
(4, 6)       1570
(15, 20)     1415
(60, 100)     615
(48, 53)      589
35            257
13            168
22            149
34            105
23             96
45             82
(27, 32)       77
55             65
36             56
(38, 42)       46
None           40
3              18
29             11
57              7
58              5
2               3
56              2
42              1
(38, 48)        1
46              1
(8, 23)         1
Name: age, dtype: int64

So we see that there are a lot of ages that are not in range format. Since the number of such occurences is rather limited. We'll fix them manually in the cell below.

In [21]:
trvl_meta["age"] = trvl_meta["age"].replace("13","(8, 13)")
trvl_meta["age"] = trvl_meta["age"].replace("(8, 12)","(8, 13)")
trvl_meta["age"] = trvl_meta["age"].replace("42","(38, 43)")
trvl_meta["age"] = trvl_meta["age"].replace("2","(0, 2)")
trvl_meta["age"] = trvl_meta["age"].replace("29","(25, 32)")

What follows is a non-trivial change. For some reason, some of the ages don't fit into the supposed labeled ranges in the dataset. So we're gonna have to take some ages and simply group them inside their closest label, which will not necessary extend to the right range

In [22]:
trvl_meta["age"] = trvl_meta["age"].replace("35","(25, 32)")
trvl_meta["age"] = trvl_meta["age"].replace("22","(15, 20)")
trvl_meta["age"] = trvl_meta["age"].replace("34","(25, 32)")
trvl_meta["age"] = trvl_meta["age"].replace("23","(25, 32)")
trvl_meta["age"] = trvl_meta["age"].replace("45","(48, 53)")
trvl_meta["age"] = trvl_meta["age"].replace("55","(48, 53)")
trvl_meta["age"] = trvl_meta["age"].replace("36","(38, 43)")
trvl_meta["age"] = trvl_meta["age"].replace("3","(0, 2)")


trvl_meta["age"] = trvl_meta["age"].replace("57","(60, 100)")
trvl_meta["age"] = trvl_meta["age"].replace("58","(60, 100)")
trvl_meta["age"] = trvl_meta["age"].replace("56","(60, 100)")
trvl_meta["age"] = trvl_meta["age"].replace("46","(48, 53)")
trvl_meta["age"] = trvl_meta["age"].replace("(27, 32)","(25, 32)")
trvl_meta["age"] = trvl_meta["age"].replace("(38, 48)","(38, 43)")
trvl_meta["age"] = trvl_meta["age"].replace("(38, 42)","(38, 43)")
trvl_meta["age"] = trvl_meta["age"].replace("(8, 23)","(8, 13)")

In [23]:
trvl_meta["age"].value_counts()

(25, 32)     4494
(0, 2)       2026
(8, 13)      1953
(38, 43)     1895
(4, 6)       1570
(15, 20)     1564
(48, 53)      737
(60, 100)     629
None           40
Name: age, dtype: int64

In [24]:
# Dropping None ages. Not worth manually tagging them
trvl_meta = trvl_meta[trvl_meta["age"] != "None"]

In [25]:
trvl_meta["age"].value_counts()

(25, 32)     4494
(0, 2)       2026
(8, 13)      1953
(38, 43)     1895
(4, 6)       1570
(15, 20)     1564
(48, 53)      737
(60, 100)     629
Name: age, dtype: int64

Now we do the same procedure as above but with the testing set

#### Testing set

In [26]:
test_meta["age"].value_counts()

(25, 32)     1056
(4, 6)        570
(38, 43)      502
(0, 2)        483
(8, 12)       340
(60, 100)     257
(48, 53)      241
(15, 20)      227
35             36
57             17
55             11
45              6
(38, 48)        5
32              3
Name: age, dtype: int64

In [27]:
test_meta["age"] = test_meta["age"].replace("35", "(25, 32)")
test_meta["age"] = test_meta["age"].replace("57", "(60, 100)")
test_meta["age"] = test_meta["age"].replace("55", "(48, 53)")
test_meta["age"] = test_meta["age"].replace("45", "(38, 43)")
test_meta["age"] = test_meta["age"].replace("32", "(25, 32)")
test_meta["age"] = test_meta["age"].replace("(8, 14)", "(8, 13)")
test_meta["age"] = test_meta["age"].replace("(38, 48)", "(38, 43)")

In [28]:
test_meta["gender"].value_counts()

f    1848
m    1597
u     286
Name: gender, dtype: int64

In [29]:
test_meta["age"].value_counts()

(25, 32)     1095
(4, 6)        570
(38, 43)      513
(0, 2)        483
(8, 12)       340
(60, 100)     274
(48, 53)      252
(15, 20)      227
Name: age, dtype: int64

### Nan Genders

In [30]:
NaN_gender_trvl = trvl_meta["gender"].notnull()
NaN_gender_trvl.value_counts()

True     14820
False       48
Name: gender, dtype: int64

In [31]:
NaN_gender_test = test_meta["gender"].notnull()
NaN_gender_test.value_counts()

True     3731
False      23
Name: gender, dtype: int64

Since the number of NaN gender is quite small, there's no problem with dropping those as wel

In [32]:
test_meta = test_meta[NaN_gender_test]
trvl_meta = trvl_meta[NaN_gender_trvl]

Checking value counts again for sanity check

In [33]:
NaN_gender_trvl = trvl_meta["gender"].notnull()
NaN_gender_trvl.value_counts()

True    14820
Name: gender, dtype: int64

In [34]:
NaN_gender_test = test_meta["gender"].notnull()
NaN_gender_test.value_counts()

True    3731
Name: gender, dtype: int64

Perfect! The dataset has been fully curated

## No U gender

In [35]:
binary_gender = test_meta["gender"] != "u"
test_meta = test_meta[binary_gender]
test_meta.reset_index(inplace=True)

In [36]:
test_meta["gender"].value_counts()

f    1848
m    1597
Name: gender, dtype: int64

In [37]:
binary_gender = trvl_meta["gender"] != "u"
trvl_meta = trvl_meta[binary_gender]
trvl_meta.reset_index(inplace=True)

In [38]:
trvl_meta["gender"].value_counts()

f    7484
m    6523
Name: gender, dtype: int64

## Generating k-folds of train and validation splits

In this section we create K stratified folds of training and validation splits out of the complement of the testing set. We'll set the number of folds created to be equal to 5 as that is the number of splits suggested by the Adience Benchmark README file.

Let us take this moment to discuss propotions. Each fold of the original, unprocessed, dataset contains roughly the same amount of data. Given that there are a total of 5 folds and one of them is reserved for testing, the percentage of the dataset used for testing will be roughly 20%. Out of the remaining data, we'll use  25% of each of the k-folds to be reserved for validation. This means that our testing dataset split follows roughly the following proportions.

<ul>
<li><b>Testing:</b> 20%</li>
<li><b>Validation:</b>20%</li>
<li><b>Training</b>:60%</li>
</ul>

This configuration is not accidental, we've chosen this proportions because they're common practice

In [39]:
num_splits = 1
validation_prop = 0.25

In [40]:
sss = StratifiedShuffleSplit(n_splits=num_splits, test_size=validation_prop, random_state=0)

### Prepping the test complement set

For the stratified k-fold partitioning tool to function, all classes need to appear at least twice in the training dataset. Given the amount of data we've dropped, such is not the case anymore. Observe below the current frequency of targets for age.

In [41]:
trvl_meta["age"].value_counts()

(25, 32)     4461
(8, 13)      1946
(38, 43)     1895
(4, 6)       1569
(15, 20)     1564
(0, 2)       1219
(48, 53)      732
(60, 100)     621
Name: age, dtype: int64

Now we're good to go! 

Here I'll introduce a minor but necessary fix to the dataset. Directory names in UNIX cannot start with parenthesis. Thus, I'll remove the parenthesis from the age column and replace them with '_'

In [42]:
trvl_meta["age"] = trvl_meta["age"].str.replace("(", "_")
trvl_meta["age"] = trvl_meta["age"].str.replace(")", "_")
trvl_meta["age"] = trvl_meta["age"].str.replace(" ", "-")
trvl_meta["age"] = trvl_meta["age"].str.replace(",", "")
trvl_meta["age"].value_counts()

_25-32_     4461
_8-13_      1946
_38-43_     1895
_4-6_       1569
_15-20_     1564
_0-2_       1219
_48-53_      732
_60-100_     621
Name: age, dtype: int64

In [43]:
test_meta["age"] = test_meta["age"].str.replace("(", "_")
test_meta["age"] = test_meta["age"].str.replace(")", "_")
test_meta["age"] = test_meta["age"].str.replace(" ", "-")
test_meta["age"] = test_meta["age"].str.replace(",", "")
test_meta["age"].value_counts()

_25-32_     1072
_4-6_        570
_38-43_      513
_8-12_       340
_60-100_     274
_48-53_      252
_15-20_      225
_0-2_        199
Name: age, dtype: int64

In [44]:
trvl_meta_X     = trvl_meta
trvl_meta_X_arr = trvl_meta_X.as_matrix()
trvl_meta_X_arr = trvl_meta_X_arr[:,1:]

In [45]:
trvl_meta_Y_gender  = trvl_meta["gender"].as_matrix()
trvl_meta_Y_age     = trvl_meta["age"].as_matrix()

### Splits for gender-only classication

In [46]:
from sklearn.model_selection import StratifiedShuffleSplit
X = trvl_meta_X_arr
y = trvl_meta_Y_gender
sss = StratifiedShuffleSplit(n_splits=num_splits, test_size=validation_prop, random_state=0)
sss.get_n_splits(X, y)

fold_count   = 1
fold_container_gender = []
for train_index, test_index in sss.split(X, y):
    print "FOLD #: %d" % fold_count
    print "TRAIN      :", train_index
    print "VALIDATION :", test_index
    print "========================================================="
    fold_count += 1
    X_train, X_valid = X[train_index], X[test_index]
    fold_container_gender.append([X_train,X_valid])

FOLD #: 1
TRAIN      : [ 8731 13493 11406 ...,  2407  1246 11011]
VALIDATION : [6691 3259 9592 ..., 9279 5220 2231]


### Splits for age-only classication

In [47]:
from sklearn.model_selection import StratifiedShuffleSplit
X = trvl_meta_X_arr
y = trvl_meta_Y_age
sss = StratifiedShuffleSplit(n_splits=num_splits, test_size=validation_prop, random_state=0)
sss.get_n_splits(X, y)

fold_count   = 1
fold_container_age = []
for train_index, test_index in sss.split(X, y):
    print "FOLD #: %d" % fold_count
    print "TRAIN      :", train_index
    print "VALIDATION :", test_index
    print "========================================================="
    fold_count += 1
    X_train, X_valid = X[train_index], X[test_index]
    fold_container_age.append([X_train,X_valid])

FOLD #: 1
TRAIN      : [13782  9718 12814 ..., 13946 11385  6384]
VALIDATION : [ 7028  4560  9145 ..., 12546   423  2167]


## Generating dataframes for each classification task

In [48]:
headers = ["user_id","face_id","original_image","gender","age"]

In [49]:
gender_fold1_train = pd.DataFrame(fold_container_gender[0][0], columns=headers)
gender_fold1_valid = pd.DataFrame(fold_container_gender[0][1], columns=headers)

In [50]:
age_fold1_train = pd.DataFrame(fold_container_age[0][0], columns=headers)
age_fold1_valid = pd.DataFrame(fold_container_age[0][1], columns=headers)

## Augment Dataframes to contain the current image path and the image path to generate

### Image path

In [51]:
gender_fold1_train["img_path"] = gender_fold1_train[RELEVANT_COLS].apply(lambda x: IMG_TEMPLATE % (x[0],x[1],x[2]), axis=1)
gender_fold1_valid["img_path"] = gender_fold1_valid[RELEVANT_COLS].apply(lambda x: IMG_TEMPLATE % (x[0],x[1],x[2]), axis=1)

In [52]:
age_fold1_train["img_path"] = age_fold1_train[RELEVANT_COLS].apply(lambda x: IMG_TEMPLATE % (x[0],x[1],x[2]), axis=1)
age_fold1_valid["img_path"] = age_fold1_valid[RELEVANT_COLS].apply(lambda x: IMG_TEMPLATE % (x[0],x[1],x[2]), axis=1)

In [53]:
test_meta["img_path"] = test_meta[RELEVANT_COLS].apply(lambda x: IMG_TEMPLATE % (x[0],x[1],x[2]), axis=1)

### Keras path

In [54]:
age_test_set = test_meta.copy()
gender_test_set = (test_meta).copy()

In [55]:
gender_test_set["keras_path"] = test_meta[RELEVANT_COLS].apply(lambda x: GENDER_TEST_TEMPLATE % (x[3],x.name), axis=1)
age_test_set["keras_path"] = test_meta[RELEVANT_COLS].apply(lambda x: AGE_TEST_TEMPLATE % (x[4],x.name), axis=1)

In [56]:
gender_fold1_train["keras_path"] = gender_fold1_train[RELEVANT_COLS].apply(lambda x: GENDER_TRAIN_TEMPLATE % (1, x[3], x.name), axis=1)
gender_fold1_valid["keras_path"] = gender_fold1_valid[RELEVANT_COLS].apply(lambda x: GENDER_VALID_TEMPLATE % (1, x[3], x.name), axis=1)

In [57]:
age_fold1_train["keras_path"] = age_fold1_train[RELEVANT_COLS].apply(lambda x: AGE_TRAIN_TEMPLATE % (1, x[4], x.name), axis=1)
age_fold1_valid["keras_path"] = age_fold1_valid[RELEVANT_COLS].apply(lambda x: AGE_VALID_TEMPLATE % (1, x[4], x.name), axis=1)

In [58]:
def prep_adience():
    all_gender_train = [gender_fold1_train]
    all_gender_valid = [gender_fold1_valid]
    
    all_age_train = [age_fold1_train]
    all_age_valid = [age_fold1_valid]
    
    full_meta = pd.concat(all_gender_train + all_gender_valid + all_age_train + all_age_valid)

    for index, row in full_meta.iterrows():
        
        mkdir_template = "mkdir -p %s"
        command_template = "cp -p %s %s"
        command0 = mkdir_template % row["keras_path"]
        command1 = command_template % (row["img_path"], row["keras_path"])
        wkspFldr = os.path.dirname(row["keras_path"])
        
        os.system(command0)
        os.system(command1)
        


In [59]:
full_meta = pd.concat([gender_test_set] + [age_test_set])
for index, row in full_meta.iterrows():
        
    mkdir_template = "mkdir -p %s"
    command_template = "cp -p %s %s"
    command0 = mkdir_template % row["keras_path"]
    command1 = command_template % (row["img_path"], row["keras_path"])
    wkspFldr = os.path.dirname(row["keras_path"])
        
    os.system(command0)
    os.system(command1)

In [60]:
prep_adience()

In [61]:
all_gender_train = [gender_fold1_train]
all_gender_valid = [gender_fold1_valid]
    
all_age_train = [age_fold1_train]
all_age_valid = [age_fold1_valid]
    
fold_num = 0
for df in all_gender_train:
    path  = META_SAVE_TEMPLATE % ("gender_train",fold_num)
    fold_num += 1
    df.to_csv(path)
    
fold_num = 0
for df in all_gender_valid:
    path  = META_SAVE_TEMPLATE % ("gender_valid",fold_num)
    fold_num += 1
    df.to_csv(path)
    
fold_num = 0
for df in all_age_train:
    path  = META_SAVE_TEMPLATE % ("age_train",fold_num)
    fold_num += 1
    df.to_csv(path)

fold_num = 0
for df in all_gender_valid:
    path  = META_SAVE_TEMPLATE % ("age_valid",fold_num)
    fold_num += 1
    df.to_csv(path)
    
fold_num = 0
path = META_SAVE_TEMPLATE % ("test", fold_num)
test_meta.to_csv(path)

