# vLife Virtusa
## DeepScap
### Usecase Description
_The model predicts anatomical features of the scapulae from the projections on the 10 first principal compornents of the Statistical Shape Model_.


### Dataset Source
Data for this usecase can be found [here](https://www.kaggle.com/iham97/deepscapulassm).

### Dataset Description
<p>Each row represents a mesh (a 3D model of a Scapulae) with its features (Critical Shoulder Angle, Tilt, Version, Width, Length) and the parameters of the SSM projected on the first 10 principal components. </p>.

### Import Section

In [1]:

import statsmodels.api as sm
from sklearn import linear_model
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


import os
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams

import plotly.tools as tls
import plotly.offline as py
from plotly.offline import init_notebook_mode, iplot, plot
import plotly.graph_objs as go
init_notebook_mode(connected=True)
import warnings

%matplotlib inline

# figure size in inches
rcParams['figure.figsize'] = 12,6


## Exploratory Data Analysis
*We are going to first delete the ID column from our database. 
Let's check how the database looks like by loading the three first rows.*

In [2]:
df_features = pd.read_csv("scapFeaturesGauss1_5.csv" )
#We delete the MeshID feature from our dataset
del df_features['MeshID']
df_features.head(n=3).transpose()


Unnamed: 0,0,1,2
CSA,35.377494,41.238047,43.131545
Version,14.893928,9.521284,22.434351
Tilt,7.700076,0.83524,25.815892
Glene Width,24.500691,23.780445,23.259885
Glene Length,27.317544,35.993161,24.567649
Scapula Length,145.339661,133.78995,123.298976
Spine Length,134.368393,156.821989,115.796984
Lat Acromion Angle,92.145481,97.72093,105.685891
Glene Radius,21.871616,33.551018,20.319127
Acromion Shape,1.0,2.0,2.0


> **Some visualization of the features distributions**

We can notice  that the features seem to follow a gaussian distribution, intuitively it could let us think that a simple linear regression would be enough to obtain the predictions knowing that the parameters of the SSM also follow a gaussian distribution.

In [3]:
trace1 = go.Histogram(
    x=np.log(df_features['CSA']).sample(800), histnorm='percent', autobinx=True,
    showlegend=True, name='CSA')
    
trace2 = go.Histogram(
    x=np.log(df_features['Version']).sample(800), histnorm='percent', autobinx=True,
    showlegend=True, name='Version')

trace3 = go.Histogram(
    x=np.log(df_features['Tilt']).sample(800), histnorm='percent', autobinx=True,
    showlegend=True, name='Tilt')
    
trace4 = go.Histogram(
    x=np.log(df_features['Glene Width']).sample(800), histnorm='percent', autobinx=True,
    showlegend=True, name='Glene Width')
    
trace5 = go.Histogram(
    x=np.log(df_features['Glene Length']).sample(800), histnorm='percent', autobinx=True,
    showlegend=True, name='Glene Length')

#Creating the grid
fig = tls.make_subplots(rows=2, cols=3, specs=[[{'colspan': 2}, None, {}], [{}, {}, {}]],
                          subplot_titles=("CSA",
                                          "Version", 
                                          "Tilt",
                                          "Glene Width", 
                                          "Glene Length"))

fig.append_trace(trace1, 1, 1)
fig.append_trace(trace2, 1, 3)
fig.append_trace(trace3, 2, 1)
fig.append_trace(trace4, 2, 2)
fig.append_trace(trace5, 2, 3)

fig['layout'].update(showlegend=True, title="Features Distribution")
iplot(fig)


invalid value encountered in log


plotly.tools.make_subplots is deprecated, please use plotly.subplots.make_subplots instead



## Predictive Models

**Baseline Model with sklearn linear regression**

In [4]:
print('The standard deviation and the mean for the CSA are %(stdCS)f and %(meanCS)f .' %{'stdCS':df_features["CSA"].std() , "meanCS": df_features["CSA"].mean()})
print('The standard deviation and the mean for the version are %(stdV)f and %(meanV)f .' %{'stdV':df_features["Version"].std() , "meanV": df_features["Version"].mean()})
print('The standard deviation and the mean for the tilt are %(stdT)f and %(meanT)f .' %{'stdT':df_features["Tilt"].std() , "meanT": df_features["Tilt"].mean()})
print('The standard deviation and the mean for the glene width are %(stdW)f and %(meanW)f .' %{'stdW':df_features["Glene Width"].std() , "meanW": df_features["Glene Width"].mean()})
print('The standard deviation and the mean for the glene length are %(stdL)f and %(meanL)f .' %{'stdL':df_features["Glene Length"].std() , "meanL": df_features["Glene Length"].mean()})
# We define the targets
target = pd.DataFrame(df_features, columns=["CSA","Version","Tilt","Glene Width","Glene Length"])

# We define the predictors
df = pd.DataFrame(df_features, columns=["First PC","Second PC","Third PC","Fourth PC","Fifth PC","Sixth PC","Seventh PC","Ninth PC","Tenth PC"])

The standard deviation and the mean for the CSA are 7.242213 and 33.830566 .
The standard deviation and the mean for the version are 6.379106 and 9.561111 .
The standard deviation and the mean for the tilt are 6.393997 and 5.621756 .
The standard deviation and the mean for the glene width are 3.118340 and 24.671350 .
The standard deviation and the mean for the glene length are 4.342510 and 33.834401 .


In [5]:
X = df
y = target

# I now fit a model
lm = linear_model.LinearRegression()
model = lm.fit(X,y)

In [6]:
predictions = lm.predict(X)

In [7]:
print(predictions[0:5].transpose())
print(lm.score(X,y))
df_features.head(n=5).transpose().head(n=5).transpose()

[[33.57630835 42.32735162 35.53569074 32.967171   29.42087543]
 [12.66159381  6.08905132 12.17035461  4.70665788  8.04686583]
 [ 3.67687517 -2.53213279 10.1338706   4.53927649  1.68111773]
 [23.69342893 23.57062431 21.84560022 28.16153087 25.05298773]
 [31.6883114  34.46503049 26.73342932 35.7850591  35.48364497]]
0.5806166950445226






Unnamed: 0,CSA,Version,Tilt,Glene Width,Glene Length
0,35.377494,14.893928,7.700076,24.500691,27.317544
1,41.238047,9.521284,0.83524,23.780445,35.993161
2,43.131545,22.434351,25.815892,23.259885,24.567649
3,25.09548,5.012612,5.640584,29.32409,38.052437
4,26.290516,5.908531,-1.585274,25.558747,38.942607


### Using neural networks to improve the results

*Using a more complex model to do our predictions.*

In [8]:
# We do the necessary imports
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Activation, BatchNormalization, MaxPooling1D
from keras import optimizers
from keras.optimizers import RMSprop
from keras.optimizers import Adam
from keras.applications import vgg16

Using TensorFlow backend.

Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.


Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.



### Define NN Architecture

In [9]:
model = Sequential()
# Our input will be a 10 size vector containing the coefficients for each eigenvector
model.add(Dense(100, input_dim=9))
model.add(Activation('relu'))
model.add(Dense(200))
model.add(Dropout(0.1))
model.add(Dense(100))
model.add(BatchNormalization())
model.add(Dense(100))
model.add(Dense(5))
model.compile(loss='mean_squared_error', optimizer=Adam(lr=0.01,decay=0.1), metrics=['accuracy'])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 100)               1000      
_________________________________________________________________
activation_1 (Activation)    (None, 100)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 200)               20200     
_________________________________________________________________
dropout_1 (Dropout)          (None, 200)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 100)               20100     
_________________________________________________________________
batch_normalization_1 (Batch (None, 100)               400       
_________________________________________________________________
dense_4 (Dense)              (None, 100)               10100     
__________

In [11]:
hist = model.fit(X, y, epochs=2, verbose=1, validation_split=0.2)
y_pred = model.predict(X) 


Train on 80000 samples, validate on 20000 samples
Epoch 1/2
Epoch 2/2


## Model Prediction

In [12]:
print(y_pred[0:5].transpose())
scores = model.evaluate(X, y, verbose=1)
print('%(score)f percent accuracy.'%{'score':scores[1]*100})
df_features.head(n=5).transpose().head(n=5).transpose()

[[33.2945    41.865818  35.791363  31.860296  29.636238 ]
 [12.633472   5.84797   12.050914   5.1741257  7.1873565]
 [ 3.1195915 -1.1202148 11.258458   4.501017   1.4387058]
 [23.487263  23.54908   22.016079  28.102386  25.07198  ]
 [31.640682  34.578243  26.984264  36.76205   35.40877  ]]
79.202000 percent accuracy.


Unnamed: 0,CSA,Version,Tilt,Glene Width,Glene Length
0,35.377494,14.893928,7.700076,24.500691,27.317544
1,41.238047,9.521284,0.83524,23.780445,35.993161
2,43.131545,22.434351,25.815892,23.259885,24.567649
3,25.09548,5.012612,5.640584,29.32409,38.052437
4,26.290516,5.908531,-1.585274,25.558747,38.942607


### Conclusive Analysis

This is quite an improvement compared to the baseline model using simple linear regression.
It could be interesting to test the model with outliers such as pathological Scapulae or samples from our model with parameters far from the mean, we could then evaluate wether or not our model is overfitting on the data we have.

## END