# Titanic

El objetivo de este notebook es pronosticar si un pasajero del titanic es sobreviviente o no.

![](https://upload.wikimedia.org/wikipedia/commons/9/95/Titanic_sinking%2C_painting_by_Willy_St%C3%B6wer.jpg)

## Cargar paquetes

In [1]:
import pandas as pd  # Manipular tablas tipo excel
import numpy as np   # Operaciones algebraicas

## Cargar datos

| Variable | Definition                                 | Key                                            |
|----------|--------------------------------------------|------------------------------------------------|
| survival | Survival                                   | 0 = No, 1 = Yes                                |
| pclass   | Ticket class                               | 1 = 1st, 2 = 2nd, 3 = 3rd                      |
| sex      | Sex                                        |                                                |
| Age      | Age in years                               |                                                |
| sibsp    | # of siblings / spouses aboard the Titanic |                                                |
| parch    | # of parents / children aboard the Titanic |                                                |
| ticket   | Ticket number                              |                                                |
| fare     | Passenger fare                             |                                                |
| cabin    | Cabin number                               |                                                |
| embarked | Port of Embarkation                        | C = Cherbourg, Q = Queenstown, S = Southampton |

In [2]:
# Por defecto el enlace viene así: https://drive.google.com/file/d/1kjrZOj2ExEDcR5QGX0CQTgjg5AXRr7nU/view?usp=sharing
# Ajustarlo para que quedé así:
train_data = pd.read_excel('https://drive.google.com/uc?id=1kjrZOj2ExEDcR5QGX0CQTgjg5AXRr7nU')
train_data

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,712833.00,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7925.00,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.10,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.00,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.00,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.45,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.00,C148,C


## Método 1: Basado en reglas

In [3]:
# Realizo un filtro de mujeres
filtro_mujeres = (train_data.Sex == 'female')

# Filtrar la tabla por sexo femenino
train_data[filtro_mujeres]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,712833.00,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7925.00,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.10,C123,S
8,9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27.0,0,2,347742,111333.00,,S
9,10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14.0,1,0,237736,300708.00,,C
...,...,...,...,...,...,...,...,...,...,...,...,...
880,881,1,2,"Shelley, Mrs. William (Imanita Parrish Hall)",female,25.0,0,1,230433,26.00,,S
882,883,0,3,"Dahlberg, Miss. Gerda Ulrika",female,22.0,0,0,7552,105167.00,,S
885,886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39.0,0,5,382652,29125.00,,Q
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.00,B42,S


### Ejercicio 1:
Realizar un modelo basado en reglas para pronosticar si un pasajero sobrevive o no antes que se hunda el barco.

Ayuda: Use lo anterior


### **Respuesta**

Vamos a asumir que las mujeres son las únicas que sobreviven y vamos a evaluar el modelo

In [4]:
tabla_mujeres = train_data[filtro_mujeres]                       # Filtrar las mujeres
mujeres_sobrevivientes = tabla_mujeres['Survived'].sum()         # Total mujeres sobrevivientes = 233
mujeres_totales = len(tabla_mujeres)                             # Total mujeres = 314
sensibilidad_mujeres = mujeres_sobrevivientes / mujeres_totales  # Sensibilidad mujeres
sensibilidad_mujeres


0.7420382165605095

In [5]:
from sklearn.metrics import accuracy_score
sobreviviente_pred = (filtro_mujeres).astype(int)
sobreviviente_real = train_data.Survived
sobreviviente_real
pd.DataFrame({'true':sobreviviente_real, 'pred':sobreviviente_pred})

Unnamed: 0,true,pred
0,0,0
1,1,1
2,1,1
3,1,1
4,0,0
...,...,...
886,0,0
887,1,1
888,0,1
889,1,0


In [6]:
accuracy_score(sobreviviente_real, sobreviviente_pred)

0.7867564534231201

### Tarea:
Intentar mejorar el resultado anterior, considerando que mujeres o niños sobreviven (esto lo dicen en la película!)

In [None]:
# Escriba su código aquí
# Ayuda:
# Realice un filtro para los niños
# Una el filtro mediante el operador: https://stackoverflow.com/questions/24775648/element-wise-logical-or-in-pandas

372

### Modelo Machine Learning

In [7]:
from sklearn.ensemble import RandomForestClassifier

y_train = train_data.Survived

features = ['Pclass', 'Sex', 'SibSp', 'Parch']
X_train = pd.get_dummies(train_data[features])
X_train

model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)
model.fit(X_train, y_train)
y_pred = model.predict(X_train)
y_pred

pd.DataFrame({'true':y_train, 'pred':y_pred})

Unnamed: 0,true,pred
0,0,0
1,1,1
2,1,1
3,1,1
4,0,0
...,...,...
886,0,0
887,1,1
888,0,0
889,1,0


In [8]:
from sklearn.metrics import accuracy_score
accuracy_score(y_train, y_pred)

0.8159371492704826

Hacer lo que se hizo arriba no es permitido para evaluar un modelo. Lo que se debe realizar es probar el modelo con datos que nunca ha observado. Nunca salen en un parcial las mismas preguntas del taller!

In [9]:
test_data = pd.read_excel('https://drive.google.com/uc?id=1V61MGXURd7HrkedW5Z3Jzl2A2kKmEvU6')
test_data

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,78292.00,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.00,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,96875.00,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,86625.00,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,122875.00,,S
...,...,...,...,...,...,...,...,...,...,...,...
413,1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.05,,S
414,1306,1,"Oliva y Ocana, Dona. Fermina",female,39.0,0,0,PC 17758,108.90,C105,C
415,1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.25,,S
416,1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.05,,S


Ahora nuestro objetivo es pasar el modelo por estos 418 pasajeros para ver que tan bien lo hace

In [10]:
# y_train = test_data.Survived

features = ['Pclass', 'Sex', 'SibSp', 'Parch']
X_test = pd.get_dummies(test_data[features])
model.predict(X_test)

array([0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0,
       1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1,
       1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1,
       1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
       0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
       1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
       0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1,
       0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0,
       0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0,