# Neural Networks - Practical 2

After having trained and used your first neural network, you should be able to apply these skills to another data set. 

The dataset used in this exercise is from the UCI machine learning repository. It consists of measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians.

A more detailed description can be found here: https://archive.ics.uci.edu/ml/datasets/Cardiotocography

We have already extracted the main data table and the csv file can be found in the data subdirectory. For reference, the original Excel file is also supplied. 

The task is to classify the dataset based on the measurements. Here, we use the setting having three classes (according to the NSP column):

Normal     = 1
Suspect    = 2
Pathologic = 3

Feel free to use the 10 class version using the CLASS column. 

Your task in this exercise is to load the data and train a simple neural network to either predict the three classes or later 10 classes. 

For training purposes it is acceptable to use a train-test split. 
(However, you might want to evaluate the performance using a 5-fold cross validation. As the presented approach uses the keras module of tensorflow, the GridSearch of sklearn cannot easiliy be applied)




## Required imports

Please note this practical also switched off some warnings. 

In [None]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import tensorflow as tf

In [None]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix



from tensorflow.python.keras.layers import Input, Dense
from tensorflow.python import keras

from tensorflow.python.keras.models import Sequential


## Read in the data 

This file contains a not very biological dataset. It is comprised of customers and their shopping behavious. I chose this one, to indicate a bit of pre-processing. A task which will potentially be required by the task for next week. 

A more detailed introduction in data wrangling will be introduced in another lecture. 


In [None]:
df = pd.read_csv('./data/CTG.csv')
# drop unused information
df = df.drop(['b', 'e', 'Unnamed: 9', 'Unnamed: 31','Unnamed: 42','Unnamed: 44','A','B','C','D','E','AD','DE','LD','FS','SUSP'],axis=1)


In [None]:
df.describe()

In [None]:
df.columns.values


## Remove the CLASS attribute

The column CLASS contains more detailed classification, when compared to NSP. Hence, we do not want to use it for learning and the column is removed. The results is saved in a new dataframe called df_new.

In [None]:
df_new = df.drop(['CLASS'],axis=1)
df_new = df.dropna()


## On your own

From here on, please use the skills you have learned so far to:

1. Split the data into X and y
2. Split the result into training and test (or even a 5-fold cross validation)
3. Apply scaling for numerical variables and an appropriate encoding for cetegorical ones
4. Set up a (multi-)layer neural network
5. Train the network and report on its performance.
