#### How to work with this script - 
I assume that the new dataset is in the same structure as the embeddings supplied thus far. <br>
To run the script you should change the path from which you read the embeddings files, and the path of the trained model. <br>
You can get 2 outputs  - <br>
A. The predicted face embedding derived from the audio inputs by the trained model(predicted_emb below) <br>
B. An estimation of the identification accuracy 

#### 1. Load packages

In [None]:
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pickle, random
from pandasql import sqldf
from sklearn.neighbors import KNeighborsClassifier

from keras.models import load_model

import warnings
warnings.filterwarnings('ignore')

#### 2. Load data, and transfer to pands dataframes

In [None]:
audio_emb_file = '/home/drorco/DC_corsound_submission/audio_embeddings.pickle'
face_emb_file = '/home/drorco/DC_corsound_submission/image_embeddings.pickle'

audio_emb = pickle.load(open(audio_emb_file, "rb"))
face_emb = pickle.load(open(face_emb_file, "rb"))

audio_df =  pd.DataFrame.from_dict(audio_emb).T.reset_index()
face_df =  pd.DataFrame.from_dict(face_emb).T.reset_index()



#### 3. Get the identity (Name) in each modality and generate the mean face embedding for each name
It appears that the Name (ID) appears in the string inside the 'index', just before the first slash. <br> After the slash comes the identifier of the file. <br> Hence I split the index in both modalities before the first slash, and give a regular numbering for the files of each ID.  

In [None]:
audio_df[['s_Name','s_file']] = audio_df['index'].str.split('/', n=1, expand=True)
audio_df = audio_df.sort_values(by=['s_Name','index'])
audio_df['s_audio_fNum'] = audio_df.groupby('s_Name')['index'].rank(method='first').astype('int')
audio_df = audio_df.drop(columns=['index','s_file'])
audio_df= audio_df.set_index(['s_Name','s_audio_fNum']).add_prefix('audio_col_').reset_index()

audio_cols = [col for col in audio_df.columns if 'col_' in col]


In [None]:
face_df[['s_Name','s_file']] = face_df['index'].str.split('/', n=1, expand=True)
face_df = face_df.sort_values(by=['s_Name','index'])
face_df['s_face_fNum'] = face_df.groupby('s_Name')['index'].rank(method='first').astype('int')
face_df = face_df.drop(columns=['index','s_file'])
face_df= face_df.set_index(['s_Name','s_face_fNum']).add_prefix('face_col_').reset_index()

face_cols = [col for col in face_df.columns if 'col_' in col]

#### 4. load the trained model 

In [None]:
model_path = '/home/drorco/DC_corsound_submission/checkpnt1.pkl'
loaded_model = load_model(model_path)

#### 5. Create a predicted face embedding for each of the audio embeddings in the test dataset

In [None]:
predicted_emb = audio_df[['s_Name','s_audio_fNum']]

face_cols_preds = [col+'_pred' for col in face_cols]

predicted_emb[face_cols_preds] = loaded_model.predict(audio_df[audio_cols])

#### 6. Evaluate the identification accuracy
Here for each of the PREDICTED face embeddings (Anchor) is tested against a random positive and negative Actual face embedding samples.
A positive sample - one of the face embeddings of the same ID as the original audio embedding upon which the prediction was made.
A negative sample - one of the face embeddings of a different ID.
If it is closer to the positive sample it is considered as Correct identification .
If it is closer to the negative sample it is considered as False identification.
The accuracy is the percentage of correct identifications out of the entire test dataset.
For the sake of time, I choose not to generate all the possible combinations of audio-inputs, positive-face-embeddings, and negative-face embeddings as in the original identification accuracy metric. Instead, I generated predicted face embeddings from all audio-inputs, and compared them with randomly selected samples of positive and negative face embeddings as approximation of the identification accuracy.

In [None]:
name_list = predicted_emb['s_Name'].unique().tolist()
l=[]

for name in name_list:
    if name in (face_df['s_Name'].unique().tolist()):
        
        rndom_l = [rnd_name for rnd_name in name_list if rnd_name!=name]
        rndom = random.choice(rndom_l)
        
        positive = face_df[(face_df['s_Name']==name)]
        negative = face_df[(face_df['s_Name']==rndom)]

        anchor = predicted_emb[predicted_emb['s_Name']==name].sort_values(by=['s_Name','s_audio_fNum'])
        
        if ((positive.shape[0]>0) and (anchor.shape[0]>0)):

            for audio_fNum in anchor['s_audio_fNum'].unique().tolist():
                
                curr_pair = positive.sample().append(negative.sample())
                            
                knn = KNeighborsClassifier(n_neighbors=1)
                knn.fit(curr_pair[face_cols], curr_pair['s_Name'])               
                
                pred_name = knn.predict(anchor[anchor['s_audio_fNum']==audio_fNum][face_cols_preds])
                
                l.append(int(pred_name==anchor[anchor['s_audio_fNum']==audio_fNum]['s_Name']))
                
print('The Accuracy is: ', round(sum(l)/len(l),4)*100, '%')                   