## Intro
In this workbook I have described my way of converting weights from keras to torch. In the beginning you will find some issues I encountered with and my way of solving them. Then you will find a code of converting. 

My motivation or how the story began:
I have downloaded model with a pretrained model from here https://github.com/uzh-rpg/rpg_public_dronet. But I am torch user and wished to convert model from keras to torch. The original keras model could be find in *'keras_model.py'* and the weights in *'model_weights.h5*

## My path
For some reasons, online available converter (https://github.com/Microsoft/MMdnn) didn't help me. It could convert model successfuly, but results of feeding the same data were different. So I decided to implement code of the model on Torch by bare hands and then convert weights from keras to torch. Torch copy of the original Keras model could be find in *torch_model.py*. My solution for converting weights was based on the mapping of parameters names between keras and torch. So here the task of mapping creating appeared. I couldn't find smart solution for it, so decided to make it by hands as well. I have extracted names of torch model 
> [(weight.name, weight.shape) for weight in keras_model.weights]

and names of keras model like this:
> [(key, torch_model.state_dict()[key].shape) for key in torch_model.state_dict().keys()]

I have used shape just as a good hint for correct mapping. 

## Next problems

After having such dictionary I encountered with some problems:

Here: [w=width, h=height, i=input_channel_size, o=output_channel_size]

1) **Convolution form issue.** Matrix of convolution in Keras has a form (w, h, i, o), while in torch (o, i, w, h)

2) **Dense form issue.** Matrix of dense in Keras has a form (i,o) while linear in torch (o, i)

3) **batch_norm issue.** Batch normalization has different default parameters (in particular momentum (0.1 in torch and 0.99 in keras) and eps (1e-5 in torch and 1e-3 in keras)

4) **padding issue.** In keras there is a parameter of convolution which is called padding which could be 'same', 'valid' and etc. But this 'same' is very tricky it could add one column on the left side of a picture and two on the right. In torch padding in convolution is an integer, so it is impossible to set one column on the left and not one on the right. So for this purposes I have used nn.ZeroPad2d which takes tuple of four element one for left, one for right, one for down and one for up.

5) **ReLU()** works with inplace=True by default. Which means, that one you run ReLU()(x), your x has been changed once. 

6) **Flatten issue.** Keras convolution inputs and outputs should be percieved so that channel dimension is the last one, i.e. (batch_size, w, h, channel_size), but in torch it is quite different: (batch_size, channel_size, w, h). So it creates different flattened vectors on torch and keras. 

You can find in this workbook or in .py files some comments like "# related to ___ issue". 

In [1]:
KERAS_MODEL_PATH="./model_weights.h5"
DICT_PATH="./keras_torch_mapping.csv"

In [2]:
import keras, torch
import numpy as np

Using TensorFlow backend.


In [3]:
%run ./keras_model.py
keras_model = resnet8(200,200,1,1)
keras_model.load_weights(KERAS_MODEL_PATH)

W0728 21:27:42.165055 139965437531968 deprecation.py:506] From /home/artem/.local/lib/python3.6/site-packages/tensorflow/python/training/moving_averages.py:210: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0728 21:27:42.524592 139965437531968 deprecation.py:506] From /home/artem/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.


In [4]:
%run ./torch_model.py

torch_model = ResNet8()
torch_model.eval();

In [5]:
img=np.random.uniform(size=(200,200,1))[None]
keras_img=img
torch_img = torch.tensor(img.transpose((0, 3, 1, 2)), dtype=torch.float32, device="cpu")

In [6]:
import pandas as pd 

def get_dict():
    krs2trch = pd.read_csv(DICT_PATH)
    length = len(krs2trch)
    return {krs2trch.iloc[i].keras : krs2trch.iloc[i].torch for i in range(length)}


def keras_to_pyt(km, pm):
    name_mapping = get_dict()
    weight_dict = dict()
    for layer in km.layers:
        weight_names = layer.weights
        weight_values = layer.get_weights()
        for i in range(len(weight_names)):
            torch_name = name_mapping[weight_names[i].name]
            if "conv2d_" in weight_names[i].name and "kernel" in weight_names[i].name:  # convolution from issue
                weight_dict[torch_name] = np.transpose(weight_values[i], (3, 2, 0, 1))
            elif "dense_" in weight_names[i].name and "kernel" in weight_names[i].name:  # dense from issue
                weight_dict[torch_name] = np.transpose(weight_values[i], (1, 0))
            else:
                weight_dict[torch_name] = weight_values[i]
                    
    pyt_state_dict = pm.state_dict()
    for key in weight_dict:
        pyt_state_dict[key] = torch.from_numpy(weight_dict[key])
    pm.load_state_dict(pyt_state_dict)

keras_to_pyt(keras_model, torch_model)

In [7]:
keras_output=keras_model.predict_on_batch(img)
torch_model.train(False)
torch_output=torch_model(torch_img)
#np.allclose(torch_output[0].detach().numpy().transpose(0,2,3,1), keras_output[0],atol=1e-5)
np.allclose(torch_output[0].detach().cpu().numpy(), keras_output[0],atol=1e-5)
#np.max(torch_output[0].detach().numpy().transpose(0,2,3,1) - keras_output[0])
np.max(torch_output[1].detach().cpu().numpy() - keras_output[1]), np.max(torch_output[0].detach().cpu().numpy() - keras_output[0])

(0.0, -9.536743e-07)

In [8]:
torch.save(torch_model.state_dict(), "./torch_weights.pth")