# Spaceship Titanic

- Utilizaando os [dados disponíveis no Kaggle](https://www.kaggle.com/competitions/spaceship-titanic)
    - Dataset de **competição**
    - O resultado é avaliado através da **acurácia**

### Importando novamente as bases e fazendo o tratando dos dados
- Importando oque foi feito na [Parte 3 - Engenharia de Recursos](https://github.com/PedroALage/Projetos/blob/main/Data_Science/Spaceship_Titanic/Parte3_EngRecursos.ipynb)

In [57]:
# Importando o pandas
import pandas as pd

In [58]:
# Visualizando a base de treino
treino = pd.read_csv('treino_trat3.csv')
treino.head(3)

Unnamed: 0,PassengerId,Age,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Transported,VIPCheck,CryoSCheck,HomePlanet_Earth,HomePlanet_Europa,HomePlanet_Mars,Destination_55 Cancri e,Destination_PSO J318.5-22,Destination_TRAPPIST-1e,JovemSono
0,0001_01,39.0,0.0,0.0,0.0,0.0,0.0,False,0,0,0,1,0,0,0,1,0
1,0002_01,24.0,1.786885,0.096774,0.78125,7.418919,0.758621,True,0,0,1,0,0,0,0,1,0
2,0003_01,58.0,0.704918,38.451613,0.0,90.743243,0.844828,False,1,0,0,1,0,0,0,1,0


In [59]:
# Visualizando a base de teste
teste = pd.read_csv('teste_trat3.csv')
teste.head(3)

Unnamed: 0,PassengerId,Age,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,VIPCheck,CryoSCheck,HomePlanet_Earth,HomePlanet_Europa,HomePlanet_Mars,Destination_55 Cancri e,Destination_PSO J318.5-22,Destination_TRAPPIST-1e,JovemSono
0,0013_01,27.0,0.0,0.0,0.0,0.0,0.0,0,1,1,0,0,0,0,1,0
1,0018_01,19.0,0.0,0.088235,0.0,44.809524,0.0,0,0,1,0,0,0,0,1,0
2,0019_01,31.0,0.0,0.0,0.0,0.0,0.0,0,1,0,1,0,1,0,0,0


### Utilizando outros modelos para fazer a previsão

- Selecionando algoritmos diferentes das partes anteriores
- Considerando os [outros algoritmos disponíveis no scikit-learn](https://scikit-learn.org/stable/supervised_learning.html#supervised-learning)
    - **Regressão Logística**
        - https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression
    - **Random Forest**
        - https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier
    - **MLPClassifier (Redes Neurais)**
        - https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
- Antes de usar os algoritmos, é necessário separar a base de treino em **treino e validação**
    - Utilizando o **train_test_split**
        - https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

In [60]:
# Importando o train_test_split
from sklearn.model_selection import train_test_split

In [61]:
# Separando a base de treino em X e y
X = treino.drop(['PassengerId', 'Transported'], axis=1)
y = treino.Transported

In [62]:
# Separando em treino e validação
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.33, random_state=42)

- Para a **Regressão Logística**

In [63]:
# Importando
from sklearn.linear_model import LogisticRegression

In [64]:
# Criando o classificador
clf_rl = LogisticRegression(random_state=42,max_iter=1000)

In [65]:
# Fazendo o fit com os dados
clf_rl = clf_rl.fit(X_train, y_train)

In [66]:
# Fazendo a previsão
y_pred_rl = clf_rl.predict(X_val)

- Para o **Random Forest**

In [67]:
# Importando
from sklearn.ensemble import RandomForestClassifier

In [68]:
# Criando o classificador
clf_rf = RandomForestClassifier(random_state=42)

In [69]:
# Fazendo o fit com os dados
clf_rf = clf_rf.fit(X_train, y_train)

In [70]:
# Fazendo a previsão
y_pred_rf = clf_rf.predict(X_val)

- E para o **MLPClassifier (Redes Neurais)**

In [71]:
# Importando
from sklearn.neural_network import MLPClassifier

In [72]:
# Criando o classificador
clf_mp = MLPClassifier(random_state=42, max_iter=1000)

In [73]:
# Fazendo o fit com os dados
clf_mp = clf_mp.fit(X_train, y_train)

In [74]:
# Fazendo a previsão
y_pred_mp = clf_mp.predict(X_val)

- **Avaliando os modelos**
    - Acurácia (método de avaliação usado na competição):
        - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
    - Matriz de confusão (ajuda a visualizar a distribuição dos erros):
        - https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

- Avaliando a **acurácia**

In [75]:
# Importando
from sklearn.metrics import accuracy_score

In [76]:
# Para a Regressão Logística
accuracy_score(y_val, y_pred_rl)

0.776925758103869

In [77]:
# Para o Random Forest
accuracy_score(y_val, y_pred_rf)

0.7765772046009063

In [78]:
# Para o MLPClassifier (Redes Neurais)
accuracy_score(y_val, y_pred_mp)

0.7807598466364587

- Avaliando a **matriz de confusão**

In [79]:
# Importando
from sklearn.metrics import confusion_matrix

In [80]:
# Para a Regressão Logística
confusion_matrix(y_val, y_pred_rl)

array([[1048,  376],
       [ 264, 1181]], dtype=int64)

In [81]:
# Para o Random Forest
confusion_matrix(y_val, y_pred_rf)

array([[1084,  340],
       [ 301, 1144]], dtype=int64)

In [82]:
# Para o MLPClassifier (Redes Neurais)
confusion_matrix(y_val, y_pred_mp)

array([[1049,  375],
       [ 254, 1191]], dtype=int64)

### Fazendo a previsão para os dados de teste
- Vamos usar o modelo com melhor precisão para fazer o predict na base de teste

In [83]:
# Visualizando o X_train
X_train.head(3)

Unnamed: 0,Age,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,VIPCheck,CryoSCheck,HomePlanet_Earth,HomePlanet_Europa,HomePlanet_Mars,Destination_55 Cancri e,Destination_PSO J318.5-22,Destination_TRAPPIST-1e,JovemSono
4696,35.0,21.918033,0.526882,1.78125,0.0,0.0,0,0,0,0,1,0,0,1,0
5946,28.0,0.0,1.634409,6.71875,0.405405,8.793103,0,0,1,0,0,0,0,1,0
227,43.0,0.0,0.0,0.0,0.0,0.0,0,1,0,0,1,0,0,1,0


In [84]:
# Visualizando a base de teste
teste.head(3)

Unnamed: 0,PassengerId,Age,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,VIPCheck,CryoSCheck,HomePlanet_Earth,HomePlanet_Europa,HomePlanet_Mars,Destination_55 Cancri e,Destination_PSO J318.5-22,Destination_TRAPPIST-1e,JovemSono
0,0013_01,27.0,0.0,0.0,0.0,0.0,0.0,0,1,1,0,0,0,0,1,0
1,0018_01,19.0,0.0,0.088235,0.0,44.809524,0.0,0,0,1,0,0,0,0,1,0
2,0019_01,31.0,0.0,0.0,0.0,0.0,0.0,0,1,0,1,0,1,0,0,0


In [85]:
# Para a base de teste ser igual a base de treino, precisamos eliminar a coluna de id
X_teste = teste.drop('PassengerId',axis=1)

In [86]:
# Utilizando o melhor modelo na base de teste
y_pred = clf_mp.predict(X_teste)

In [87]:
# Criando uma nova coluna com a previsão na base de teste
teste['Transported'] = y_pred

In [88]:
# Selecionando apenas a coluna de Id e Survived para fazer o envio
base_envio = teste[['PassengerId','Transported']]

In [89]:
# Exportando para um csv
base_envio.to_csv('resultados4.csv',index=False)

## Resultado

- Modelo acertou 79,33% das avaliações
- O modelo de Redes Neurais conseguiu uma taxa de acertos um pouco melhor
    
<img src="pkgImages/tentativa4.png" width=900>