<a href="https://colab.research.google.com/github/TomasRipsky/Ai-Collab/blob/Deep_Learning/DeepLearningTaskRipsky.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# IMPLEMENTATION OF THE NEURAL NETWORK

In [None]:
#Corresponding imports
import pandas as pd
import numpy
from sklearn.model_selection import StratifiedKFold
import matplotlib.pyplot as plt
from pandas.plotting import parallel_coordinates
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from keras.models import Sequential
from keras.layers import Dense

## Preparing the data



In [None]:
#We load the data from the database URL
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/dermatology/dermatology.data"
new_names = ['erythema','scaling','definite borders','itching','koebner phenomenon','polygonal papules','follicular papules','oral mucosal involvement','knee and elbow involvement','scalp involvement','family history','melanin incontinence',
'eosinophils in the infiltrate','PNL infiltrate','fibrosis of the papillary dermis','exocytosis','acanthosis','hyperkeratosis','parakeratosis','clubbing of the rete ridges','elongation of the rete ridges','thinning of the suprapapillary epidermis','spongiform pustule','munro microabcess','focal hypergranulosis','disappearance of the granular layer','vacuolisation and damage of basal layer','spongiosis',
'saw-tooth appearance of retes','follicular horn plug','perifollicular parakeratosis','inflammatory monoluclear inflitrate','band-like infiltrate','Age','Class']
dermatologyDB = pd.read_csv(url, names=new_names, skiprows=0, delimiter=',')
dermatologyDB.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 35 columns):
 #   Column                                    Non-Null Count  Dtype 
---  ------                                    --------------  ----- 
 0   erythema                                  366 non-null    int64 
 1   scaling                                   366 non-null    int64 
 2   definite borders                          366 non-null    int64 
 3   itching                                   366 non-null    int64 
 4   koebner phenomenon                        366 non-null    int64 
 5   polygonal papules                         366 non-null    int64 
 6   follicular papules                        366 non-null    int64 
 7   oral mucosal involvement                  366 non-null    int64 
 8   knee and elbow involvement                366 non-null    int64 
 9   scalp involvement                         366 non-null    int64 
 10  family history                            366 non-

In [None]:
#In order to fit the data in the neural network we must clear it form the ? values in age, thats why we use the following lambda function.
dermatologyDB["Age"]=dermatologyDB["Age"].apply(lambda x: int(0) if x == "?" else int(x))

#Here we split the data to work with X (everything except for the class) and Y (only the class of the sample)
X=dermatologyDB.drop(labels='Class', axis=1)
y=dermatologyDB.Class
y

0      2
1      1
2      3
3      1
4      3
      ..
361    4
362    4
363    3
364    3
365    1
Name: Class, Length: 366, dtype: int64

## Training and testing

In [None]:
#Due to the nature of this problem, being this one a non balanced problem, the use of the stratifiedlkfold cross validator is the best option.
#This validator generates train/test splits made by preserving the percentage of samples for each class
y.unique()
y2=pd.get_dummies(y)
skf = StratifiedKFold(n_splits=2)
skf.get_n_splits(X, y)


2

In [None]:
#Generating the model

#Corresponding imports
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
from sklearn.metrics import confusion_matrix
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

accuracy_metrics=[]

for train_index, test_index in skf.split(X, y):

  #Generation of training and testing samples
  X_train, X_test = X.to_numpy()[train_index], X.to_numpy()[test_index]
  y_train, y_test = y2.to_numpy()[train_index], y2.to_numpy()[test_index]

  nb_classes = 6 #Number of unique medical conditions

  #Adaptation to be have to create the tensor
  y_train = numpy.asarray(y_train).astype(numpy.float32)
  X_train = numpy.asarray(X_train).astype(numpy.float32)

  # now we can generate the model
  # create model
  #We create a sequential model with an imput dimension of 34 and layers with 17,10 and 6 neurons
  model = Sequential()
  model.add(Dense(17, input_dim=34, activation='relu'))
  model.add(Dense(10, activation='relu'))
  model.add(Dense(6, activation='sigmoid'))
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

  #train the model
  print("\n-------------------------------------------------------------------------------------------------------\n")
  model.fit(X_train, y_train, epochs=100, batch_size=5)

  # calculating the metrics
  print("\nEvaluate on test data\n")
  results = model.evaluate(X_test, y_test)
  print("test loss, test acc:", results)
  accuracy_metrics.append(results[1])


  #Making predictions

  y_pred = model.predict(X_test)
  predicted_classes=numpy.argmax(y_pred,axis=1)
  Y_test_onecolumn=numpy.argmax(y_test,axis=1)

  #Creating the confusion matrix

  print("\n ----CONFUSION MATRIX----\n",confusion_matrix(Y_test_onecolumn, predicted_classes))

  #Calculating other metrics

  precision=precision_score(Y_test_onecolumn, predicted_classes, average='weighted',zero_division=1)
  accuracy_metrics.append(precision)
  print("\nThe precision of the model is: ",precision)

  recall=recall_score(Y_test_onecolumn, predicted_classes, average='weighted',zero_division=1)
  accuracy_metrics.append(recall)
  print("\nThe recall of the model is: ",recall)

  f1=f1_score(Y_test_onecolumn, predicted_classes, average='weighted',zero_division=1)
  accuracy_metrics.append(f1)
  print("\nThe f1 score of the model is: ",f1)



-------------------------------------------------------------------------------------------------------

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 

In [None]:
from keras.metrics.metrics import accuracy
accuracy_metrics
numpy.mean(accuracy_metrics)

0.9645314970648131

# CONCLUSIONS

After implementing the neural network it is time to compare its results against the ones obtained previously in order to reach some conclusions:

After some tests we can see that the metrics of the neural network are variable depending on the amount of layers, neurons per layers, and the amount of the epochs used. Also by maintining this values constant, the result may change depending on the behaviour of the neural network, giving different values with every use.

Depending on each use, the behaviour of the neural network,the amount of layers or the values for epoch and neurons used, the results could be better or worse than the ones obteined in previous tasks.

If we want to analize the neural network from the perspective of transparency, performance and metrics we could conclude that:
1.   Talking about **metrics** as we said before, this can change so it depends on every use, but in general the results obteind are similar that the ones obteined before, but with the chance of getting better if you find the correct value of layers, epochs and neurons to use.
2.   If we want to talk about **performance** it is clear that the neural network if by far the most demanding one, we can see it by only taking a look to the amount of time needed to run the algorithm, also by its nature that consist of trail and error with a big amount of data we can clearly see that it is a very demanding process, the other algorithms used before in the other hand are far fastter than this but could be not as scalable al this or not a powerfull.
3. Finally if we talk about **transparency** we could say that the neural network is the least transparent of all, we can understand the logical functon of the algorithm and the matemathical process behind it, but it is by far the most complicated of the ones we have being using.

It is clear that it is not the most "simple" or "friendly" algorithm to use or implement. But in the long run it is the most powerfull one. If you have the hardware power to use it, the corresponding amount of data to train it and test it and the knowledge of how to implement it and change their values at will, you have a really powerfull tool to work with. As almost every algorithm it has his faults, but if you have a large amount of data and the performance/time is not an issue, this this definitely the algorithm to use. It is the most powerfull and it becomes better over time if you give him more data.
( + data = + accurate predictions )
