<a href="https://colab.research.google.com/github/DunkleCat/ia-titanic/blob/master/titanic_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Librerie

In [0]:
import numpy as np
import pandas as pd
import tensorflow as tf

from tensorflow import feature_column
from tensorflow.keras import layers
from sklearn.model_selection import train_test_split

# Rende i numeri a virgola mobile pi√π leggibili
np.set_printoptions(precision=3, suppress=True)

# Dataframe

## Download del dataset

In [0]:
TRAIN_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/train.csv"
TEST_DATA_URL = "https://storage.googleapis.com/tf-datasets/titanic/eval.csv"

train_file_path = tf.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
test_file_path = tf.keras.utils.get_file("eval.csv", TEST_DATA_URL)

train_dataframe = pd.read_csv(train_file_path)
test_dataframe = pd.read_csv(test_file_path)

dataframe = train_dataframe.append(test_dataframe, ignore_index=True)
dataframe = dataframe.replace('unknown', np.nan)
target_label = "survived"

## Vista generica del dataset

In [3]:
dataframe.head()

Unnamed: 0,survived,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,0,male,22.0,1,0,7.25,Third,,Southampton,n
1,1,female,38.0,1,0,71.2833,First,C,Cherbourg,n
2,1,female,26.0,0,0,7.925,Third,,Southampton,y
3,1,female,35.0,1,0,53.1,First,C,Southampton,n
4,0,male,28.0,0,0,8.4583,Third,,Queenstown,y


## Ottimizzazione del dataset

La rete neurale lavora sfruttando i numeri e non le stringhe. Per questo motivo conviene trasformare ogni colonna che contiene degli identificatori scritti come carattere in numeri interi che rappresentino la classe corrispondente. 

In [0]:
for elem in dataframe:
  if type(dataframe[elem][1]) is str:
    dataframe[elem] = pd.Categorical(dataframe[elem])
    dataframe[elem] = getattr(dataframe, elem).cat.codes

## Dataset post-ottimizzazione

In [5]:
dataframe.head()

Unnamed: 0,survived,sex,age,n_siblings_spouses,parch,fare,class,deck,embark_town,alone
0,0,1,22.0,1,0,7.25,2,-1,2,0
1,1,0,38.0,1,0,71.2833,0,2,0,0
2,1,0,26.0,0,0,7.925,2,-1,2,1
3,1,0,35.0,1,0,53.1,0,2,2,0
4,0,1,28.0,0,0,8.4583,2,-1,1,1


# Modellazione rete neurale

## Preparazione del modello

In [0]:
model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1)
])

model.compile(
    loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
    optimizer='adam',
    metrics=['accuracy'])

## Preparazione del Dataset

In [7]:
train, test = train_test_split(dataframe, test_size=0.1)
train, val = train_test_split(train, test_size=0.1)
print(len(train), 'train examples')
print(len(test), 'test examples')
print(len(val), 'val example')

720 train examples
90 test examples
81 val example


## Addestramento

In [8]:
# Creazione del dataset tensorflow di training
train_target = train.pop(target_label)
train_dataset = (tf.data.Dataset.from_tensor_slices((train.values, train_target.values))).shuffle(len(train)).batch(1)

# Fase di Training
model.fit(train_dataset, epochs=500)

Epoch 1/500


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 

<tensorflow.python.keras.callbacks.History at 0x7f1a44e993c8>

## Test

In [9]:
# Creazione del dataset tensrflow di test 
test_target = test.pop(target_label)
test_dataset = (tf.data.Dataset.from_tensor_slices((test.values, test_target.values))).shuffle(len(test)).batch(1)

# Fase di test
test_loss, test_accuracy = model.evaluate(test_dataset)
print('\n\nTest Loss {}, Test Accuracy {}'.format(test_loss, test_accuracy))



Test Loss 0.46275702118873596, Test Accuracy 0.800000011920929


## Valutazione

In [10]:
# Creazione del dataset di valutazione
val_target = val.pop(target_label)
val_dataset = (tf.data.Dataset.from_tensor_slices(val.values)).batch(1)

# Fase di valutazione
predictions = model.predict(val_dataset)

# Show some results
for prediction, survived in zip(predictions[:10], list(val_target)[:10]):
  prediction = tf.sigmoid(prediction).numpy()
  print("Predicted survival: {:.2%}".format(prediction[0]),
        " | Actual outcome: ",
        ("SURVIVED" if bool(survived) else "DIED"))


Predicted survival: 96.14%  | Actual outcome:  SURVIVED
Predicted survival: 43.42%  | Actual outcome:  DIED
Predicted survival: 16.03%  | Actual outcome:  DIED
Predicted survival: 0.15%  | Actual outcome:  DIED
Predicted survival: 18.77%  | Actual outcome:  DIED
Predicted survival: 16.71%  | Actual outcome:  DIED
Predicted survival: 99.99%  | Actual outcome:  SURVIVED
Predicted survival: 10.17%  | Actual outcome:  DIED
Predicted survival: 64.30%  | Actual outcome:  SURVIVED
Predicted survival: 11.62%  | Actual outcome:  DIED
