# Sick or not ?

Neural networks are also used and work very well on more traditional data sets than images. Here is a set of data that contains a column with the white blood cell rate, another with the red blood cell rate and a last one with an indication of who is sick or not. (Not sick = 0, sick = 1)

The (fictionals) dataset is located in the dataset folder, here is the path :  

``"./dataset/sick_or_not.csv"`` 

Rows : 40000

## Your task:

Design a model that recognizes if the person is sick based on white and red blood cells.  
Use neural networks to perform this task. 

![](https://d418bv7mr3wfv.cloudfront.net/s3/W1siZiIsIjIwMTcvMDUvMzAvMDYvNTMvNTcvODk3L2dpcmwtMjE3MTA1Ml85NjBfNzIwLmpwZyJdLFsicCIsInRodW1iIiwiOTgweDU4MCMiXV0)

### Score to beat :
Accuracy: **% 96.025**  
Loss : **0.1151**  
Epochs : **40**  
That means that out of **8000** test samples, this model got **7682** correct and **318** incorrect.

In [1]:
%pip install --upgrade tensorflow

Note: you may need to restart the kernel to use updated packages.


c:\Python312\python.exe: No module named pip


In [None]:
import tensorflow as tf

print(tf.__version__)


2.19.0


In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from tensorflow import keras
from keras.layers import Dense, Dropout
import matplotlib.pyplot as plt
import random as rd

# %matplotlib inline

In [3]:
# import dataset in variable df

df = pd.read_csv("./dataset/sick_or_not.csv")
df

Unnamed: 0,white_blood_cell,red_blood_cell,sick
0,1.178028,0.464315,0.0
1,0.844175,2.440351,0.0
2,2.878409,-1.438124,1.0
3,-0.057521,2.054928,1.0
4,-1.232600,-2.722805,0.0
...,...,...,...
39995,-2.641717,2.356235,1.0
39996,3.675737,2.956299,0.0
39997,-2.192320,-3.356272,0.0
39998,3.100980,-2.561397,1.0


In [4]:
# describe
df.describe()

Unnamed: 0,white_blood_cell,red_blood_cell,sick
count,40000.0,40000.0,40000.0
mean,-0.004351,-0.007324,0.5
std,2.229326,2.235759,0.500006
min,-5.834724,-5.781232,0.0
25%,-1.995645,-2.013,0.0
50%,0.004485,-0.01454,0.5
75%,1.983753,1.985451,1.0
max,5.541181,5.947922,1.0


In [8]:
# info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40000 entries, 0 to 39999
Data columns (total 3 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   white_blood_cell  40000 non-null  float64
 1   red_blood_cell    40000 non-null  float64
 2   sick              40000 non-null  float64
dtypes: float64(3)
memory usage: 937.6 KB


In [9]:
# Create a mask with df["white_blood_cell"] < df["white_blood_cell"].mean()]
mask = df["white_blood_cell"] < df["white_blood_cell"].mean()
mask

0        False
1        False
2        False
3         True
4         True
         ...  
39995     True
39996    False
39997     True
39998    False
39999     True
Name: white_blood_cell, Length: 40000, dtype: bool

In [10]:
df[mask]

Unnamed: 0,white_blood_cell,red_blood_cell,sick
3,-0.057521,2.054928,1.0
4,-1.232600,-2.722805,0.0
5,-1.866536,-1.052549,0.0
10,-0.264878,-0.516184,1.0
12,-2.461048,-2.301644,0.0
...,...,...,...
39992,-1.130755,-2.727139,0.0
39994,-1.042973,-0.530867,1.0
39995,-2.641717,2.356235,1.0
39997,-2.192320,-3.356272,0.0


In [11]:
#  Create X
X = df.drop(columns=["sick"])
X

Unnamed: 0,white_blood_cell,red_blood_cell
0,1.178028,0.464315
1,0.844175,2.440351
2,2.878409,-1.438124
3,-0.057521,2.054928
4,-1.232600,-2.722805
...,...,...
39995,-2.641717,2.356235
39996,3.675737,2.956299
39997,-2.192320,-3.356272
39998,3.100980,-2.561397


In [12]:
# Create y
y = df["sick"]
y

0        0.0
1        0.0
2        1.0
3        1.0
4        0.0
        ... 
39995    1.0
39996    0.0
39997    0.0
39998    1.0
39999    1.0
Name: sick, Length: 40000, dtype: float64

In [13]:
# Split train and test Set
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42
)

In [None]:
# Check the shaping of Training and Testing Features

print(X_train.shape)
print(X_test.shape)

(32000, 2)
(8000, 2)


In [None]:
# Check the shaping of train labels and Testing labels Features
print(y_train.shape)
print(y_test.shape)

(32000,)
(8000,)


In [None]:
# check white and red cell columns n
X_train

Unnamed: 0,white_blood_cell,red_blood_cell
14307,-1.876101,-2.909080
17812,-3.277303,3.445362
11020,0.319613,1.781244
15158,-2.098732,-0.711272
24990,3.539814,-2.830110
...,...,...
6265,3.793211,-0.496116
11284,1.818764,1.396741
38158,2.274097,-0.844513
860,-2.852641,-3.643892


In [16]:
# NO NEED
from sklearn.preprocessing import StandardScaler
# Just Normalize the data


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)


### Create the Model Architecture
- input layer with 48 neurones
- hidden layer with 36 neurones
- choose correctly the output layer an the correct activation function


### Compilation
- optimizer
- correct loss
- accuracy metric

### Story your epochs in history
- 30 epochs
- batch_size 16



In [None]:
import tensorflow as tf
from tensorflow import keras

# Build a neural network model
model = tf.keras.Sequential()

model.add(Dense(48, input_shape=(2,), activation="relu"))
model.add(Dense(24, activation="softmax"))
model.add(Dropout(0.2))  # on simule 20% de neurones qui ne seront pas utilisés( morts)
model.add(
    Dense(1, activation="sigmoid")
)  # une seule sortie booleen car 2 états pour l'output


opt = keras.optimizers.Adam(learning_rate=0.01)
# Compile the model
model.compile(optimizer=opt, loss="binary_crossentropy", metrics=["accuracy"])
# Train the model
history = model.fit(X_train, y_train, epochs=30, validation_split=0.05, batch_size=16)


Epoch 1/30


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m1900/1900[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 4ms/step - accuracy: 0.9114 - loss: 0.2708 - val_accuracy: 0.9469 - val_loss: 0.1297
Epoch 2/30
[1m1900/1900[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 4ms/step - accuracy: 0.9487 - loss: 0.1464 - val_accuracy: 0.9581 - val_loss: 0.1114
Epoch 3/30
[1m1900/1900[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 4ms/step - accuracy: 0.9511 - loss: 0.1320 - val_accuracy: 0.9450 - val_loss: 0.1191
Epoch 4/30
[1m1900/1900[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.9526 - loss: 0.1326 - val_accuracy: 0.9538 - val_loss: 0.1129
Epoch 5/30
[1m1900/1900[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - accuracy: 0.9493 - loss: 0.1368 - val_accuracy: 0.9550 - val_loss: 0.1095
Epoch 6/30
[1m1900/1900[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - accuracy: 0.9498 - loss: 0.1306 - val_accuracy: 0.9581 - val_loss: 0.1056
Epoch 7/30
[1m1900/1900

KeyboardInterrupt: 