# Homework 3
---

*imports*

In [54]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

from sklearn.preprocessing import scale
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.metrics import accuracy_score

%matplotlib inline


## Problem 1:

### Diagnosis of Fetal Cardio-Vascular Disease from Cardiotocography using ANN in SciKitLearn

---

#### a.
  * In this problem, I selected a new medical dataset from UCI ML repository. The dataset includes 2126 fetal cardiotocograms (CTGs) collected in the Biomedical Engineering Institute in Porto, Portugal.
  
  The dataset includes **21 feature elements**, and the output (NSP) has **3 classes**:
  
  `1: Normal; 2: Suspect; 3: Pathologic`.
  
  For more info, you can visit: https://archive.ics.uci.edu/ml/datasets/Cardiotocography You can either download the original raw data from the above link OR download the clean version of the data as a csv file from CSNS in the HW package (“CTG_clean.csv”).
  
  The goal is to build an Artificial Neural Network that can diagnose (classify) the fetal health status. The last column (NSP) in the dataset is the label: NSP - fetal state class code (`1=normal; 2=suspect; 3=pathologic`), and the first 21 column are the features. To see what each feature represents, take a look at the dataset info link above.

---

*import data*

In [4]:

ctg_data = pd.read_csv("ctg_data/CTG_clean.csv")


In [5]:

ctg_data.head()


Unnamed: 0,LB,AC,FM,UC,DL,DS,DP,ASTV,MSTV,ALTV,...,Min,Max,Nmax,Nzeros,Mode,Mean,Median,Variance,Tendency,NSP
0,120,0.0,0.0,0.0,0.0,0.0,0.0,73,0.5,43,...,62,126,2,0,120,137,121,73,1,2
1,132,0.006,0.0,0.006,0.003,0.0,0.0,17,2.1,0,...,68,198,6,1,141,136,140,12,0,1
2,133,0.003,0.0,0.008,0.003,0.0,0.0,16,2.1,0,...,68,198,5,1,141,135,138,13,0,1
3,134,0.003,0.0,0.008,0.003,0.0,0.0,16,2.4,0,...,53,170,11,0,137,134,137,13,1,1
4,132,0.007,0.0,0.008,0.0,0.0,0.0,16,2.4,0,...,53,170,9,0,137,136,138,11,1,1


---

*get features names*

In [15]:

ctg_features = []
for col in ctg_data.columns:
    ctg_features.append(col)

ctg_features.pop() # remove the label (last) column

print(ctg_features)
print("\nNumber of features:", len(ctg_features) )


['LB', 'AC', 'FM', 'UC', 'DL', 'DS', 'DP', 'ASTV', 'MSTV', 'ALTV', 'MLTV', 'Width', 'Min', 'Max', 'Nmax', 'Nzeros', 'Mode', 'Mean', 'Median', 'Variance', 'Tendency']

Number of features: 21


---

*build feature and label dataframe/series*

In [21]:

label = "NSP" # target

X_ctg = ctg_data[ctg_features]
y_ctg = ctg_data[label]


In [19]:

X_ctg.head()


Unnamed: 0,LB,AC,FM,UC,DL,DS,DP,ASTV,MSTV,ALTV,...,Width,Min,Max,Nmax,Nzeros,Mode,Mean,Median,Variance,Tendency
0,120,0.0,0.0,0.0,0.0,0.0,0.0,73,0.5,43,...,64,62,126,2,0,120,137,121,73,1
1,132,0.006,0.0,0.006,0.003,0.0,0.0,17,2.1,0,...,130,68,198,6,1,141,136,140,12,0
2,133,0.003,0.0,0.008,0.003,0.0,0.0,16,2.1,0,...,130,68,198,5,1,141,135,138,13,0
3,134,0.003,0.0,0.008,0.003,0.0,0.0,16,2.4,0,...,117,53,170,11,0,137,134,137,13,1
4,132,0.007,0.0,0.008,0.0,0.0,0.0,16,2.4,0,...,117,53,170,9,0,137,136,138,11,1


In [20]:

y_ctg.head()


0    2
1    1
2    1
3    1
4    1
Name: NSP, dtype: int64

---

#### b.

  * Use `sklearn.preprocessing.scale` function to normalize the features.

In [30]:

X_ctg_scaled = scale(X_ctg)

X_ctg_scaled


  """Entry point for launching an IPython kernel.


array([[-1.35222005, -0.8223883 , -0.20320955, ..., -1.18164215,
         1.87056871,  1.11298001],
       [-0.1325256 ,  0.73013282, -0.20320955, ...,  0.13203796,
        -0.23499819, -0.52452553],
       [-0.03088439, -0.04612774, -0.20320955, ..., -0.00624416,
        -0.2004807 , -0.52452553],
       ...,
       [ 0.68060404, -0.56363478, -0.20320955, ...,  0.96173066,
        -0.51113811,  1.11298001],
       [ 0.68060404, -0.56363478, -0.20320955, ...,  0.8925896 ,
        -0.51113811,  1.11298001],
       [ 0.88388645, -0.30488126, -0.16034157, ...,  0.47774325,
        -0.61469058, -0.52452553]])

---

#### c.

  * Design an ANN with one hidden layer with 30 neurons to predict the fetal’s health status.
  
  For your ANN, Use `random_state=1, learning_rate_init = 0.02, solver='adam', alpha=1, verbose=True, activation='logistic'`.
  
  Use 10-fold Cross-Validation to evaluate your model. Make sure to add “`verbose=True`” to see the training process.
  
  Then, Test your ANN on testing set, and calculate and report the accuracy.

---

*create ANN*

In [74]:

ctg_ann = MLPClassifier(
    hidden_layer_sizes = (30,),
    random_state = 1,
    learning_rate_init = 0.02,
    solver = "adam",
    alpha = 1,
    verbose = True,
    activation = "logistic"
)


---

*evaluate model*

In [75]:

ctg_cv = cross_val_score(
    ctg_ann,
    X_ctg_scaled,
    y_ctg,
    cv = 10,
    scoring = "accuracy"
)


Iteration 1, loss = 0.73766896
Iteration 2, loss = 0.47313277
Iteration 3, loss = 0.40673341
Iteration 4, loss = 0.37862112
Iteration 5, loss = 0.35653807
Iteration 6, loss = 0.34688054
Iteration 7, loss = 0.33881202
Iteration 8, loss = 0.33734346
Iteration 9, loss = 0.33001538
Iteration 10, loss = 0.32797160
Iteration 11, loss = 0.32636487
Iteration 12, loss = 0.32981298
Iteration 13, loss = 0.33724941
Iteration 14, loss = 0.32235960
Iteration 15, loss = 0.32934261
Iteration 16, loss = 0.32732860
Iteration 17, loss = 0.32549387
Iteration 18, loss = 0.32180319
Iteration 19, loss = 0.31969682
Iteration 20, loss = 0.32140984
Iteration 21, loss = 0.32508722
Iteration 22, loss = 0.32357771
Iteration 23, loss = 0.32789266
Iteration 24, loss = 0.32023943
Iteration 25, loss = 0.31725165
Iteration 26, loss = 0.31710651
Iteration 27, loss = 0.31695576
Iteration 28, loss = 0.31466339
Iteration 29, loss = 0.31995488
Iteration 30, loss = 0.31936823
Iteration 31, loss = 0.32175113
Iteration 32, los

Iteration 14, loss = 0.33636461
Iteration 15, loss = 0.34211502
Iteration 16, loss = 0.34191172
Iteration 17, loss = 0.34046828
Iteration 18, loss = 0.33759081
Iteration 19, loss = 0.33529847
Iteration 20, loss = 0.33395675
Iteration 21, loss = 0.33512641
Iteration 22, loss = 0.33600379
Iteration 23, loss = 0.33708578
Iteration 24, loss = 0.33178146
Iteration 25, loss = 0.33799551
Iteration 26, loss = 0.33247678
Iteration 27, loss = 0.33500343
Iteration 28, loss = 0.33065744
Iteration 29, loss = 0.33334097
Iteration 30, loss = 0.32682821
Iteration 31, loss = 0.32819239
Iteration 32, loss = 0.33652888
Iteration 33, loss = 0.33479845
Iteration 34, loss = 0.33364076
Iteration 35, loss = 0.33103325
Iteration 36, loss = 0.32781014
Iteration 37, loss = 0.32959929
Iteration 38, loss = 0.33367673
Iteration 39, loss = 0.32899824
Iteration 40, loss = 0.32582238
Iteration 41, loss = 0.32638636
Iteration 42, loss = 0.33155639
Iteration 43, loss = 0.32795611
Iteration 44, loss = 0.33051310
Iteratio

Iteration 18, loss = 0.30130195
Iteration 19, loss = 0.29814558
Iteration 20, loss = 0.30327728
Iteration 21, loss = 0.29563131
Iteration 22, loss = 0.29530685
Iteration 23, loss = 0.29582007
Iteration 24, loss = 0.29966337
Iteration 25, loss = 0.29831362
Iteration 26, loss = 0.29742867
Iteration 27, loss = 0.29331838
Iteration 28, loss = 0.29564841
Iteration 29, loss = 0.30268597
Iteration 30, loss = 0.29672495
Iteration 31, loss = 0.29841232
Iteration 32, loss = 0.29443509
Iteration 33, loss = 0.29420303
Iteration 34, loss = 0.29553871
Iteration 35, loss = 0.29740811
Iteration 36, loss = 0.29406175
Iteration 37, loss = 0.29544967
Iteration 38, loss = 0.29452128
Training loss did not improve more than tol=0.000100 for 10 consecutive epochs. Stopping.


---

*accuracy*

In [80]:

print("Accuracy list:\n", ctg_cv)
print("\nAccuracy mean:", ctg_cv.mean() )


Accuracy list:
 [0.8317757  0.79906542 0.87850467 0.90654206 0.88317757 0.93396226
 0.86729858 0.92890995 0.83886256 0.68246445]

Accuracy mean: 0.8550563229735388


---

d.
  * Fix the random state for reproducibility:
  
  `seed = 1, np.random.seed(seed)`.
  
  Now, use `GridSearchCV` to find the best number of neurons for your 1-hidden layer network.
  
  Search in the `range` of (5-250) with the step size of 5 for the number of neurons (i.e. 5 neurons, 10 neurons, 15 neurons, ... , 245 neurons, 250 neurons).
  
  As for other parameters, use the same arguments as part (c) for your network.
  
  What is the best accuracy, and best number of neurons?

---

*random state*

In [81]:

seed = 1
np.random.seed(seed) # am i using this right?
print("Seed:", seed)


Seed: 1


In [89]:

neuron_number = [(i,) for i in range(5, 250, 5) ]

param_grid = dict(hidden_layer_sizes = neuron_number)

print(param_grid)


{'hidden_layer_sizes': [(5,), (10,), (15,), (20,), (25,), (30,), (35,), (40,), (45,), (50,), (55,), (60,), (65,), (70,), (75,), (80,), (85,), (90,), (95,), (100,), (105,), (110,), (115,), (120,), (125,), (130,), (135,), (140,), (145,), (150,), (155,), (160,), (165,), (170,), (175,), (180,), (185,), (190,), (195,), (200,), (205,), (210,), (215,), (220,), (225,), (230,), (235,), (240,), (245,)]}


---

*create/fit grid search*

In [88]:

ctg_grid = GridSearchCV(
    ctg_ann,
    param_grid,
    cv=10,
    scoring="accuracy",
    verbose=True,
    n_jobs=-1
)


In [None]:

ctg_grid.fit(X_ctg_scaled, y_ctg)


---

*report best accuracy*

In [85]:
print("\nScore:", ctg_grid.best_score_)


Score: 0.8748824082784572


---

*best number of neurons*

In [87]:
ctg_grid.best_params_

{'hidden_layer_sizes': (130,)}

## Problem 2:

### Face Recognition Using SVM

---