# Automated measurement of skin, fat and muscle thickness from ultrasound images

<strong>FILE:</strong> phase1.ipynb

<strong>PROJECT PHASE:</strong> Phase 1

<strong>PYTHON_VERSION:</strong> 3.8.3

<strong>AUTHOR:</strong> <a href="https://www.linkedin.com/in/sebastianjr/">Sebastian Janampa Rojas</a>

<strong>EMAIL:</strong> sebastian.janampa@utec.edu.pe

<strong>CREATE DATE:</strong> /03/2021 (DD/MM/YYYY)

<strong>COMPLEMENTED FILES:</strong> utils.py

<strong>AVAILABILITY OF DATA:</strong>:  <a href="https://multisbeta.stanford.edu/">https://multisbeta.stanford.edu/</a>




<strong>ACKNOWLEDGEMENT</strong>

This project is possible thanks to <strong>Erdemir Lab</strong> who provided the ultrasound images and did the manual annotation of the images. More information about their projec is available at 

---

This Jupyter Notebook is divided in:
1. Libraries
1. Import Dataset
1. Pre-processing
1. Data Normalization
1. Costum loss function
1. Categorical data

---

## Libraries
In the next cell, we will import the librarias used in this project

In [None]:
import numpy as np
import pandas as pd
import os
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
import utils
# # For reproducibility
# np.random.seed(1)
# tf.random.set_seed(1234)

<h2>Import Dataset</h2>

_**Directory**_

    Thesis:
        ProjectCode:
            Phase1:
                utils.py
                phase1.ipynb
        NN:
            data:
                Multisxxx-1:
                    Ultrasound_minFrame:
                        IMA files
                    categorical_values.csv
                    thickness.csv


In [None]:
cur_dir = os.getcwd() # Current directory
main_dir = '\\'.join(cur_dir.split('\\')[:-2]) # Main directory
dat_dir = os.path.join(main_dir,'NN\\data') # Data directory

for _,folder,_ in os.walk(dat_dir):
    if len(folder)==100:
        subjects = folder
print('Total number of subjects: %i'%len(subjects))

#### Split data ####
perc_train = 0.75 # train percentage
subjects_train, subjects_test = train_test_split(subjects, 
                                                 train_size=perc_train,
                                                 random_state=3)
subjects_val,subjects_test=train_test_split(subjects_test, 
                                            train_size=15/25, 
                                            random_state=3)
print('# of subjects used in training: %i'%len(subjects_train))
print('# of subjects used in validation: %i'%len(subjects_val))
print('# of subjects used in testing: %i'%len(subjects_test))

In [None]:
# Loading data
datasets = {'training': subjects_train, 'validation': subjects_val, 'test':subjects_test}
training, validation, testing = utils.load_data(dat_dir, **datasets)

# Unpacking
x_img_train, x_ctg_train, y_train = training
x_img_val, x_ctg_val, y_val = validation
x_img_test, x_ctg_test, y_test = testing
del training, validation, testing, datasets

## Pre-processing

In [None]:
# Normalization
[training_outputs, validation_outputs, testing_outputs], params = utils.normalization_tech([y_train, y_val, y_test],['std', 'lin', 'dec'])
del y_train, y_val, y_test

# Packing in datasets
training = {'images': x_img_train, 'categories': x_ctg_train, 'thickness': training_outputs}
validation = {'images': x_img_val, 'categories': x_ctg_val, 'thickness': validation_outputs}
testing = {'images': x_img_test, 'categories': x_ctg_test, 'thickness': testing_outputs}
del training_outputs, validation_outputs, testing_outputs 
del x_img_train, x_img_val, x_img_test
del x_ctg_train, x_ctg_val, x_ctg_test

# Removing outliers
training, validation, testing = utils.remove_outliers([training, validation, testing], params)

## Data Normalization
In this section, the objective is to determine if normalizing the data is beneficial for the model.

First, four models with the same architecture are created.

- non -> no normalization techinique was applied
- std -> z_score normalization was applied
- lin -> linear scaling was applied
- dec -> decimal scaling was applied

---

The models used the _**Mean-Squared Error**_ as loss function. Moreover, the _**Adam**_ optimizer is used with a learning rate of $10^{-4}$ . In additon the _**number of epochs**_ and _**batchsize**_ are $100$ and $64$, respectively.

In [None]:
print('Dims of the input data: %s' % ', '.join(map(str, training['images'].shape[1:])), '(height, width, channels)')
# Creating models
tf.random.set_seed(1234)
models = utils.create_models(utils.ModelAB, input_shape=training['images'].shape[1:], dic_weights=None, methods=['non','std', 'lin', 'dec'])
# 
# Training the models
# tf.random.set_seed(1234)
results = utils.myFit(models, num_epochs=2, verbose=0, training=training, validation=validation)
del models

# Showing results
utils.show_results(results, parameters=params, training=training,validation=validation, testing=testing)

#### Note
Run the next cell to visualize the architecture of the model.

In [None]:
utils.plot_modelsv1(results, params,'datanormalizationv1.pdf')

In [None]:
utils.plot_modelsv2(results, params, 'datanormalizationv2.pdf')

## Costum Loss Function

In this section, a costum loss function is implemented in the networks whose data is normalized.

---

The costum loss function is:

$J(Y, \hat{Y}) = W*\sum_{i=0}^{m}\frac{(Y - \hat{Y})^2}{m}$

where $y$ and $\hat{y}$ are the real and the predicted values, respectively. The loss function is the _**Mean-Squeared Error**_ multplied by a **Weight Vector** ($W$). Finally, $m$ represents the number of samples. The $*$ is the dot product.

---
Dimensions of the variables

$Y=
\begin{pmatrix}
y_{1,1} & y_{1,2} & y_{1,3} \\
y_{1,1} & y_{1,2} & y_{1,3} \\
\vdots & \vdots & \vdots  \\
y_{1,m} & y_{1,m} & y_{1,m}
\end{pmatrix}$     Each column represent a type of tissue.

$W=
\begin{pmatrix}
w_{1,1} & w_{1,2} & w_{1,3}
\end{pmatrix}$

In [None]:
# 'dic_weights' variable
dic_weigths = {
    'non': None,
    'std': params['std'],
    'lin': params['maxi'] - params['mini'],
    'dec': params['dec_vals']
}

# Creating models
models = utils.create_models(utils.ModelAB, input_shape=training['images'].shape[1:], dic_weights=dic_weigths, methods=['non','std', 'lin', 'dec'])

# Training the models
tf.random.set_seed(1234)
results = utils.myFit(models, num_epochs=2, verbose=0, training=training, validation=validation)
del models

# Showing results
utils.show_results(results, parameters=params, training=training,validation=validation, testing=testing)