# Embedder extraction tutorial
This notebook will show you how to extract partition from a model, aka. `Transfer Learning` or `Fine-Tuning`. <br /> 
The souce instructions can be found [here](https://www.tensorflow.org/guide/keras/transfer_learning)

## Roadmap

The steps are like this:
1. Setting up the [notebook's variables](#/Environment-setup)
2. Loading the `model` from an `.h5` file
3. Extracting the embedding layer from the model
4. Added new input layer
5. Testing the new model 
5. Save the model again as `.h5` file

In case you have generated the new model you can load the model from it in the last chapter 

## Environment setup

In [2]:
import numpy as np
import tensorflow as tf
from keras.layers import Layer, Input, Dense
from keras.models import Model

In [1]:
from typing import Final, Tuple, Any
import os

TUTORIAL_WORKSPACE: Final[str] = 'tutorial_workspace'
MODEL_INPUT_PATH: Final[str] = os.path.join(TUTORIAL_WORKSPACE, 'outputs', 'siamesemodel.h5')
EMBEDDING_LAYER_NAME: Final[str] = 'embedding'
POSITIVE_FOLDER_PATH: Final[str] = os.path.join(TUTORIAL_WORKSPACE, 'positives')
NEGATIVE_FOLDER_PATH: Final[str] = os.path.join(TUTORIAL_WORKSPACE, 'negatives')

NEW_MODEL_PATH: Final[str] = os.path.join(TUTORIAL_WORKSPACE, 'outputs', 'embedding_model.h5')
VECTORS_FILE_PATH: Final[str] = os.path.join('tutorial_workspace', 'outputs', 'vectors.json')

## Loading the model

To load the model successfully, we will need to create the custom classes that created alongside the model 

In [64]:
# Siamese L1 Distance class
class L1Dist(Layer):
    
    # Init method - inheritance
    def __init__(self, **kwargs) -> None:
        super().__init__()

    # Smiliarity calculation
    def call(self, input_embeddings, validation_embeddings) -> Any:
        return tf.math.abs(input_embeddings - validation_embeddings)

In this step we will load the model from it's `.h5` file and use it in our program

In [65]:
# Model reload from file
model: Model = tf.keras.models.load_model(
    MODEL_INPUT_PATH, 
    
    # Adding the custom layers we created on our own
    custom_objects={
        'L1Dist': L1Dist, # Custom layer that need to be loaded
        'BinaryCrossentropy': tf.losses.BinaryCrossentropy
    }
)



In [66]:
model.summary()

Model: "SiameseNetwork"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_image (InputLayer)       [(None, 100, 100, 3  0           []                               
                                )]                                                                
                                                                                                  
 validation_image (InputLayer)  [(None, 100, 100, 3  0           []                               
                                )]                                                                
                                                                                                  
 embedding (Functional)         (None, 4096)         38960448    ['input_image[0][0]',            
                                                                  'validation_image[0

## Extract the embedding layer

Get the embedding layer from the model

In [67]:
embedding_layer: Layer = model.get_layer(EMBEDDING_LAYER_NAME)

Creating new input layer

In [68]:
INPUT_IMAGE_SHAPE: Final[Tuple[int, int, int]] = (100, 100, 3)

# Anchor image input in the network
input_image = Input(name='input_image', shape=INPUT_IMAGE_SHAPE)

Let's connect them up

In [69]:
_embedding_layer = embedding_layer(input_image)

In [70]:
new_model = Model(inputs=[input_image], outputs=[_embedding_layer], name='face_embedder')
new_model.summary()

Model: "face_embedder"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_image (InputLayer)    [(None, 100, 100, 3)]     0         
                                                                 
 embedding (Functional)      (None, 4096)              38960448  
                                                                 
Total params: 38,960,448
Trainable params: 38,960,448
Non-trainable params: 0
_________________________________________________________________


## Testing that model

Let's take our new model for a ride

In [8]:
from tf_agents.typing.types import EagerTensor

def preprocess(file_path: str):
    '''
        This function will load with tensorflow this image
        and resize it to the apropriate dimensions needed for our neural network.
        
        args:
            file_path: str - path to the target image file

        returns: 
            2d array of the pixels of the image 
    '''

    # Read image bytes from file path
    byte_image = tf.io.read_file(file_path)
    
    # Loading the bytes as image
    image = tf.io.decode_jpeg(byte_image)

    # Resize the image to 100x100 pixels
    image: EagerTensor = tf.image.resize(image, (100, 100))
    
    # Devide each pixel between (0 and 1) instead of (0 and 255)
    image /= 255.0

    return image

In [6]:
def calculate_embeddings(positive_folder_path: str, negative_folder_path = None) -> Tuple[np.ndarray, np.ndarray]:
    if negative_folder_path == None:
        all_files: list[str] = os.listdir(positive_folder_path)

        first_shot = preprocess(os.path.join(positive_folder_path, all_files[0]))
        second_shot = preprocess(os.path.join(positive_folder_path, all_files[1]))

        print(np.array([first_shot]).shape)

        first_embeddings: Tuple[float] = new_model.predict(np.array([first_shot]))
        print('anchor_embeddings:', first_embeddings.shape)

        second_embeddings: Tuple[float] = new_model.predict(np.array([second_shot]))
        print('positive_embeddings:', second_embeddings.shape)

    else:
        all_positives_files: list[str] = os.listdir(positive_folder_path)
        all_negatives_files: list[str] = os.listdir(negative_folder_path)

        first_shot = preprocess(os.path.join(positive_folder_path, all_positives_files[0]))
        second_shot = preprocess(os.path.join(negative_folder_path, all_negatives_files[0]))

        print(np.array([first_shot]).shape)

        first_embeddings: Tuple[float] = new_model.predict(np.array([first_shot]))
        print('anchor_embeddings:', first_embeddings.shape)

        second_embeddings: Tuple[float] = new_model.predict(np.array([second_shot]))
        print('negative_embedding:', second_embeddings.shape)

    return (first_embeddings, second_embeddings)

Let's check the distance between the vectors (remember, both of the embeddings vectors are the same person *postivie test*) 

In [9]:
anchor_embeddings, positive_embeddings = calculate_embeddings(POSITIVE_FOLDER_PATH)

l1_val = tf.math.abs(anchor_embeddings - positive_embeddings)
match_percentage = 1 - np.average(l1_val)

print('match_percentage: {:.3f}%'.format(match_percentage))

(1, 100, 100, 3)
anchor_embeddings: (1, 4096)
second_p_embeddings: (1, 4096)
match_percentage: 0.925%


Now let's chack a negative test

In [10]:
anchor_embeddings, negative_embeddings = calculate_embeddings(POSITIVE_FOLDER_PATH, NEGATIVE_FOLDER_PATH)

l1_val = tf.math.abs(anchor_embeddings - negative_embeddings)
match_percentage = 1 - np.average(l1_val)

print('match_percentage: {:.3f}%'.format(match_percentage))

(1, 100, 100, 3)
anchor_embeddings: (1, 4096)
negative_embedding: (1, 4096)
match_percentage: 0.576%


For later use let's save those embeddings in a file

In [15]:
import codecs
import json

open_mode: str = 'x'
if os.path.exists(VECTORS_FILE_PATH):
    open_mode = 'wb'

file_content: dict[str, list] = {
    'anchor': anchor_embeddings.tolist()[0],
    'positive': positive_embeddings.tolist()[0],
    'negative': negative_embeddings.tolist()[0]
}

json.dump(
    file_content, codecs.open(VECTORS_FILE_PATH, open_mode, encoding='utf-8'), 
    separators=(',', ':'), 
    sort_keys=True, 
    indent=4
) ### this saves the array in .json format
    

## Save the modified model

In [87]:
new_model.save(NEW_MODEL_PATH)



## Loading the model

You can try to load the generated model from the saved file.

In [5]:
if os.path.exists(NEW_MODEL_PATH):
    new_model = tf.keras.models.load_model(NEW_MODEL_PATH)
    print('Model loaded successfully:', new_model)
else:
    print('There is no model that generated with that notebook')

Model loaded successfully: <keras.engine.functional.Functional object at 0x0000029A1B6EED70>
