**Desenvolvimento de aplicação com Redes Neurais Artificiais para aplicação no dataset Titanic**


Neste Notebook apresento o desenvolvimento de uma aplicação de Redes Neurais no trabalho final da disciplina de Redes Neurais Artificiais 2024/1 na Universidade Federal de Minas Gerais com o Prof. Antônio Pádua Braga.


**Importando o DataSet do Titanic**

In [434]:
import pandas as pd
titanic_train = pd.read_csv('/kaggle/input/titanic/train.csv')

**Exploração dos dados**
Abaixo, printei as cinco primeiras linhas do dataset para conhecer sobre as suas principais features:
1.  PassengerId: Identificador único de cada passageiro.
1.  Survived: Indica se o passageiro sobreviveu (1) ou não (0).
1. Pclass: Classe do ticket do passageiro (1 = 1ª classe, 2 = 2ª classe, 3 = 3ª classe).
1. Name: Nome do passageiro.
1. Sex: Sexo do passageiro.
1. Age: Idade do passageiro.
1. SibSp: Número de irmãos/cônjuges a bordo.
1. Parch: Número de pais/filhos a bordo.
1. Ticket: Número do ticket.
1. Fare: Preço do ticket.
1. Cabin: Número da cabine.
1. Embarked: Porto de embarque (C = Cherbourg, Q = Queenstown, S = Southampton).



In [435]:
titanic_train.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


**Sumário Estatístico dos Dados**
> Análise das características de cada uma das features, exceto a característica individual PassengerID, Name
* Estou assumindo a priori que as características PassengerID, Name, Ticket e Cabin não serão Relevantes para a construção do Modelo de predição;
* PassengerID e Nome são Apenas Identificadores Individuais e portanto presumo que não é possível ser útil ao modelo.Basta analisar que a quantidade de valores únicos para essas features é elevada. Com as técnicas que conheço não consigo extrair informação dessas features;

In [436]:
#Demonstração do porque abandonei a feature  Cabin
print("Valores únicos em cabin:")
print(titanic_train["Cabin"].unique())
print(titanic_train["Cabin"].describe())

Valores únicos em cabin:
[nan 'C85' 'C123' 'E46' 'G6' 'C103' 'D56' 'A6' 'C23 C25 C27' 'B78' 'D33'
 'B30' 'C52' 'B28' 'C83' 'F33' 'F G73' 'E31' 'A5' 'D10 D12' 'D26' 'C110'
 'B58 B60' 'E101' 'F E69' 'D47' 'B86' 'F2' 'C2' 'E33' 'B19' 'A7' 'C49'
 'F4' 'A32' 'B4' 'B80' 'A31' 'D36' 'D15' 'C93' 'C78' 'D35' 'C87' 'B77'
 'E67' 'B94' 'C125' 'C99' 'C118' 'D7' 'A19' 'B49' 'D' 'C22 C26' 'C106'
 'C65' 'E36' 'C54' 'B57 B59 B63 B66' 'C7' 'E34' 'C32' 'B18' 'C124' 'C91'
 'E40' 'T' 'C128' 'D37' 'B35' 'E50' 'C82' 'B96 B98' 'E10' 'E44' 'A34'
 'C104' 'C111' 'C92' 'E38' 'D21' 'E12' 'E63' 'A14' 'B37' 'C30' 'D20' 'B79'
 'E25' 'D46' 'B73' 'C95' 'B38' 'B39' 'B22' 'C86' 'C70' 'A16' 'C101' 'C68'
 'A10' 'E68' 'B41' 'A20' 'D19' 'D50' 'D9' 'A23' 'B50' 'A26' 'D48' 'E58'
 'C126' 'B71' 'B51 B53 B55' 'D49' 'B5' 'B20' 'F G63' 'C62 C64' 'E24' 'C90'
 'C45' 'E8' 'B101' 'D45' 'C46' 'D30' 'E121' 'D11' 'E77' 'F38' 'B3' 'D6'
 'B82 B84' 'D17' 'A36' 'B102' 'B69' 'E49' 'C47' 'D28' 'E17' 'A24' 'C50'
 'B42' 'C148']
count         204


In [437]:
#Demonstração do porque abandonei a feature  Ticket
print("Valores únicos em Ticket:")
print(titanic_train["Ticket"].unique())
print(titanic_train["Ticket"].describe())

Valores únicos em Ticket:
['A/5 21171' 'PC 17599' 'STON/O2. 3101282' '113803' '373450' '330877'
 '17463' '349909' '347742' '237736' 'PP 9549' '113783' 'A/5. 2151'
 '347082' '350406' '248706' '382652' '244373' '345763' '2649' '239865'
 '248698' '330923' '113788' '347077' '2631' '19950' '330959' '349216'
 'PC 17601' 'PC 17569' '335677' 'C.A. 24579' 'PC 17604' '113789' '2677'
 'A./5. 2152' '345764' '2651' '7546' '11668' '349253' 'SC/Paris 2123'
 '330958' 'S.C./A.4. 23567' '370371' '14311' '2662' '349237' '3101295'
 'A/4. 39886' 'PC 17572' '2926' '113509' '19947' 'C.A. 31026' '2697'
 'C.A. 34651' 'CA 2144' '2669' '113572' '36973' '347088' 'PC 17605' '2661'
 'C.A. 29395' 'S.P. 3464' '3101281' '315151' 'C.A. 33111' 'S.O.C. 14879'
 '2680' '1601' '348123' '349208' '374746' '248738' '364516' '345767'
 '345779' '330932' '113059' 'SO/C 14885' '3101278' 'W./C. 6608'
 'SOTON/OQ 392086' '343275' '343276' '347466' 'W.E.P. 5734' 'C.A. 2315'
 '364500' '374910' 'PC 17754' 'PC 17759' '231919' '244367' '3

In [438]:
summary=titanic_train.describe()
print(summary)

       PassengerId    Survived      Pclass         Age       SibSp  \
count   891.000000  891.000000  891.000000  714.000000  891.000000   
mean    446.000000    0.383838    2.308642   29.699118    0.523008   
std     257.353842    0.486592    0.836071   14.526497    1.102743   
min       1.000000    0.000000    1.000000    0.420000    0.000000   
25%     223.500000    0.000000    2.000000   20.125000    0.000000   
50%     446.000000    0.000000    3.000000   28.000000    0.000000   
75%     668.500000    1.000000    3.000000   38.000000    1.000000   
max     891.000000    1.000000    3.000000   80.000000    8.000000   

            Parch        Fare  
count  891.000000  891.000000  
mean     0.381594   32.204208  
std      0.806057   49.693429  
min      0.000000    0.000000  
25%      0.000000    7.910400  
50%      0.000000   14.454200  
75%      0.000000   31.000000  
max      6.000000  512.329200  


**Construção do DataSet após primeira remoção de Características**
> Saímos de 12 para 7 features;

In [439]:
titanic_train=titanic_train[["Survived","Pclass","Sex","Age","SibSp","Parch","Fare","Embarked"]]
titanic_train.head()

Unnamed: 0,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,male,22.0,1,0,7.25,S
1,1,1,female,38.0,1,0,71.2833,C
2,1,3,female,26.0,0,0,7.925,S
3,1,1,female,35.0,1,0,53.1,S
4,0,3,male,35.0,0,0,8.05,S


**Verificação da presença de Valores Ausentes**

In [440]:
#Código Para identificar se faltam dados em alguma das features
missing_data_columns = titanic_train.isnull().sum()
any_missing_columns = missing_data_columns[missing_data_columns > 0]

print("Colunas com dados ausentes:")
print(any_missing_columns)

Colunas com dados ausentes:
Age         177
Embarked      2
dtype: int64


> Percebemos que nas Features Age e Embarked faltam dados. A metodologia de tratamentos dos ausentes não será a ramoção das linha. Com base na classe que o índividuo pertence vamos substituir o valor ausente pela média de idades da classe economica que o individuo pertence e no caso da cidade de embarque vamos substituir pela cidade mais frequente na classe econômica que pertence

In [441]:
#Média de idade de Cada Classe
meanAge=titanic_train.groupby('Pclass')['Age'].mean()
print(meanAge)

Pclass
1    38.233441
2    29.877630
3    25.140620
Name: Age, dtype: float64


In [442]:
#Pegar os endereços com valores ausentes de idade
missing_age_indices = titanic_train[titanic_train['Age'].isnull()].index.tolist()
print(missing_age_indices)


[5, 17, 19, 26, 28, 29, 31, 32, 36, 42, 45, 46, 47, 48, 55, 64, 65, 76, 77, 82, 87, 95, 101, 107, 109, 121, 126, 128, 140, 154, 158, 159, 166, 168, 176, 180, 181, 185, 186, 196, 198, 201, 214, 223, 229, 235, 240, 241, 250, 256, 260, 264, 270, 274, 277, 284, 295, 298, 300, 301, 303, 304, 306, 324, 330, 334, 335, 347, 351, 354, 358, 359, 364, 367, 368, 375, 384, 388, 409, 410, 411, 413, 415, 420, 425, 428, 431, 444, 451, 454, 457, 459, 464, 466, 468, 470, 475, 481, 485, 490, 495, 497, 502, 507, 511, 517, 522, 524, 527, 531, 533, 538, 547, 552, 557, 560, 563, 564, 568, 573, 578, 584, 589, 593, 596, 598, 601, 602, 611, 612, 613, 629, 633, 639, 643, 648, 650, 653, 656, 667, 669, 674, 680, 692, 697, 709, 711, 718, 727, 732, 738, 739, 740, 760, 766, 768, 773, 776, 778, 783, 790, 792, 793, 815, 825, 826, 828, 832, 837, 839, 846, 849, 859, 863, 868, 878, 888]


In [443]:
#Substituir as idades
#Primeira Classe
indexAgeFirstClass=titanic_train.loc[missing_age_indices][titanic_train.loc[missing_age_indices]['Pclass']==1].index
print(indexAgeFirstClass)
titanic_train.loc[indexAgeFirstClass,'Age']=meanAge[1]
print(titanic_train.loc[indexAgeFirstClass]['Age'])
#Segunda Classe
indexAgeSecondClass=titanic_train.loc[missing_age_indices][titanic_train.loc[missing_age_indices]['Pclass']==2].index
print(indexAgeSecondClass)
titanic_train.loc[indexAgeSecondClass,'Age']=meanAge[2]
print(titanic_train.loc[indexAgeSecondClass]['Age'])
#Terceira Classe
indexAgeThirdlass=titanic_train.loc[missing_age_indices][titanic_train.loc[missing_age_indices]['Pclass']==3].index
print(indexAgeThirdlass)
titanic_train.loc[indexAgeThirdlass,'Age']=meanAge[3]
print(titanic_train.loc[indexAgeThirdlass]['Age'])


Index([ 31,  55,  64, 166, 168, 185, 256, 270, 284, 295, 298, 306, 334, 351,
       375, 457, 475, 507, 527, 557, 602, 633, 669, 711, 740, 766, 793, 815,
       839, 849],
      dtype='int64')
31     38.233441
55     38.233441
64     38.233441
166    38.233441
168    38.233441
185    38.233441
256    38.233441
270    38.233441
284    38.233441
295    38.233441
298    38.233441
306    38.233441
334    38.233441
351    38.233441
375    38.233441
457    38.233441
475    38.233441
507    38.233441
527    38.233441
557    38.233441
602    38.233441
633    38.233441
669    38.233441
711    38.233441
740    38.233441
766    38.233441
793    38.233441
815    38.233441
839    38.233441
849    38.233441
Name: Age, dtype: float64
Index([17, 181, 277, 303, 413, 466, 481, 547, 596, 674, 732], dtype='int64')
17     29.87763
181    29.87763
277    29.87763
303    29.87763
413    29.87763
466    29.87763
481    29.87763
547    29.87763
596    29.87763
674    29.87763
732    29.87763
Name: Age, dtype: 

In [444]:
#Pegar os endereços com valores ausentes de cidade de Embarque
missing_Embarked_indices = titanic_train[titanic_train['Embarked'].isnull()].index.tolist()
print(missing_Embarked_indices)

[61, 829]


In [445]:
# Agrupar por Classe e Cidade_de_Embarque, contar o número de ocorrências
embarques_por_classe_cidade = titanic_train.groupby(['Pclass', 'Embarked']).size().reset_index(name='Contagem')

print(embarques_por_classe_cidade)

# Encontrar o índice da linha com o máximo número de embarques para cada classe
indices_max_embarques = embarques_por_classe_cidade.groupby('Pclass')['Contagem'].idxmax()

# Filtrar o dataframe original com base nos índices encontrados
cidades_mais_embarques = embarques_por_classe_cidade.loc[indices_max_embarques].reset_index(drop=True)
#cidades_mais_embarques=cidades_mais_embarques.to_frame()
print("Cidade com mais embarques por classe:")
print(cidades_mais_embarques)

   Pclass Embarked  Contagem
0       1        C        85
1       1        Q         2
2       1        S       127
3       2        C        17
4       2        Q         3
5       2        S       164
6       3        C        66
7       3        Q        72
8       3        S       353
Cidade com mais embarques por classe:
   Pclass Embarked  Contagem
0       1        S       127
1       2        S       164
2       3        S       353


In [446]:
#Substituindo as cidades
#Apenas a primeira classe falta dados
indexCityFirstClass=titanic_train.loc[missing_Embarked_indices][titanic_train.loc[missing_Embarked_indices]['Pclass']==1].index
print(indexCityFirstClass)
indexCitySecondClass=titanic_train.loc[missing_Embarked_indices][titanic_train.loc[missing_Embarked_indices]['Pclass']==2].index
print(indexCitySecondClass)
indexCityThirdClass=titanic_train.loc[missing_Embarked_indices][titanic_train.loc[missing_Embarked_indices]['Pclass']==3].index
print(indexCityThirdClass)
#Substituindo

titanic_train.loc[indexCityFirstClass,'Embarked']="S"
print(titanic_train.loc[indexCityFirstClass]['Embarked'])

Index([61, 829], dtype='int64')
Index([], dtype='int64')
Index([], dtype='int64')
61     S
829    S
Name: Embarked, dtype: object


**Substituir Dados Categoricos por númericos**
> As classes genero e cidade de embarque precisam ser adequadas para o modelo


In [450]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
print(titanic_train)
titanic_train["Embarked"]=le.fit_transform(titanic_train["Embarked"])
titanic_train["Sex"]=le.fit_transform(titanic_train["Sex"])


     Survived  Pclass  Sex       Age  SibSp  Parch     Fare  Embarked
0           0       3    1  22.00000      1      0   7.2500         2
1           1       1    0  38.00000      1      0  71.2833         0
2           1       3    0  26.00000      0      0   7.9250         2
3           1       1    0  35.00000      1      0  53.1000         2
4           0       3    1  35.00000      0      0   8.0500         2
..        ...     ...  ...       ...    ...    ...      ...       ...
886         0       2    1  27.00000      0      0  13.0000         2
887         1       1    0  19.00000      0      0  30.0000         2
888         0       3    0  25.14062      1      2  23.4500         2
889         1       1    1  26.00000      0      0  30.0000         0
890         0       3    1  32.00000      0      0   7.7500         1

[891 rows x 8 columns]


**Importando os dados de Teste**

In [451]:
test=pd.read_csv('/kaggle/input/titanic/test.csv')
test.head()
#test=test[["Pclass","Sex","Age","SibSp","Parch","Fare","Embarked"]]

Unnamed: 0,PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
1,893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47.0,1,0,363272,7.0,,S
2,894,2,"Myles, Mr. Thomas Francis",male,62.0,0,0,240276,9.6875,,Q
3,895,3,"Wirz, Mr. Albert",male,27.0,0,0,315154,8.6625,,S
4,896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22.0,1,1,3101298,12.2875,,S


In [419]:
#Código Para identificar se faltam dados em alguma das features de treinamento
missing_data_columns = test.isnull().sum()
any_missing_columns = missing_data_columns[missing_data_columns > 0]

print("Colunas com dados ausentes:")
print(any_missing_columns)

Colunas com dados ausentes:
Age       86
Fare       1
Cabin    327
dtype: int64


In [452]:
#Média de idade de Cada Classe
meanAgeTest=test.groupby('Pclass')['Age'].mean()
print(meanAgeTest)

Pclass
1    40.918367
2    28.777500
3    24.027945
Name: Age, dtype: float64


In [453]:
#Pegar os endereços com valores ausentes de idade
missing_age_indices = test[test['Age'].isnull()].index.tolist()
print(missing_age_indices)


[10, 22, 29, 33, 36, 39, 41, 47, 54, 58, 65, 76, 83, 84, 85, 88, 91, 93, 102, 107, 108, 111, 116, 121, 124, 127, 132, 133, 146, 148, 151, 160, 163, 168, 170, 173, 183, 188, 191, 199, 200, 205, 211, 216, 219, 225, 227, 233, 243, 244, 249, 255, 256, 265, 266, 267, 268, 271, 273, 274, 282, 286, 288, 289, 290, 292, 297, 301, 304, 312, 332, 339, 342, 344, 357, 358, 365, 366, 380, 382, 384, 408, 410, 413, 416, 417]


In [454]:
#Substituir as idades
#Primeira Classe
indexAgeFirstClass=test.loc[missing_age_indices][test.loc[missing_age_indices]['Pclass']==1].index
print(indexAgeFirstClass)
test.loc[indexAgeFirstClass,'Age']=meanAgeTest[1]
print("Primeira Classe")
print(test.loc[indexAgeFirstClass]['Age'])
#Segunda Classe
indexAgeSecondClass=test.loc[missing_age_indices][test.loc[missing_age_indices]['Pclass']==2].index
print(indexAgeSecondClass)
print("Segunda Classe")
test.loc[indexAgeSecondClass,'Age']=meanAgeTest[2]
print(test.loc[indexAgeSecondClass]['Age'])
#Terceira Classe
indexAgeThirdlass=test.loc[missing_age_indices][test.loc[missing_age_indices]['Pclass']==3].index
print(indexAgeThirdlass)
test.loc[indexAgeThirdlass,'Age']=meanAgeTest[3]
print("Terceira Classe")
print(test.loc[indexAgeThirdlass]['Age'])


Index([22, 41, 146, 148, 168, 191, 205, 266, 290], dtype='int64')
Primeira Classe
22     40.918367
41     40.918367
146    40.918367
148    40.918367
168    40.918367
191    40.918367
205    40.918367
266    40.918367
290    40.918367
Name: Age, dtype: float64
Index([54, 65, 84, 301, 384], dtype='int64')
Segunda Classe
54     28.7775
65     28.7775
84     28.7775
301    28.7775
384    28.7775
Name: Age, dtype: float64
Index([ 10,  29,  33,  36,  39,  47,  58,  76,  83,  85,  88,  91,  93, 102,
       107, 108, 111, 116, 121, 124, 127, 132, 133, 151, 160, 163, 170, 173,
       183, 188, 199, 200, 211, 216, 219, 225, 227, 233, 243, 244, 249, 255,
       256, 265, 267, 268, 271, 273, 274, 282, 286, 288, 289, 292, 297, 304,
       312, 332, 339, 342, 344, 357, 358, 365, 366, 380, 382, 408, 410, 413,
       416, 417],
      dtype='int64')
Terceira Classe
10     24.027945
29     24.027945
33     24.027945
36     24.027945
39     24.027945
         ...    
408    24.027945
410    24.027945
41

In [455]:
#Pegar os endereços com valores ausentes de preço do Ticket
missing_Ticket_indices = test[test['Fare'].isnull()].index.tolist()
print(missing_Ticket_indices)

[152]


In [457]:
#Média de valor do ticket de Cada Classe
meanTicketTest=test.groupby('Pclass')['Fare'].mean()
print(meanTicketTest)

Pclass
1    94.280297
2    22.202104
3    12.459678
Name: Fare, dtype: float64


In [458]:
#Substituir as idades
#Primeira Classe
indexTicketFirstClass=test.loc[missing_Ticket_indices][test.loc[missing_Ticket_indices]['Pclass']==1].index
print(missing_Ticket_indices)
test.loc[missing_Ticket_indices,'Fare']=meanAgeTest[1]
print(test.loc[missing_Ticket_indices])

[152]
     PassengerId  Pclass                Name   Sex   Age  SibSp  Parch Ticket  \
152         1044       3  Storey, Mr. Thomas  male  60.5      0      0   3701   

          Fare Cabin Embarked  
152  40.918367   NaN        S  


In [459]:
test=test[["PassengerId","Pclass","Sex","Age","SibSp","Parch","Fare","Embarked"]]
#Código Para identificar se faltam dados em alguma das features de treinamento
missing_data_columns = test.isnull().sum()
any_missing_columns = missing_data_columns[missing_data_columns > 0]

print("Colunas com dados ausentes:")
print(any_missing_columns)


Colunas com dados ausentes:
Series([], dtype: int64)


In [460]:
print(test)

     PassengerId  Pclass     Sex        Age  SibSp  Parch      Fare Embarked
0            892       3    male  34.500000      0      0    7.8292        Q
1            893       3  female  47.000000      1      0    7.0000        S
2            894       2    male  62.000000      0      0    9.6875        Q
3            895       3    male  27.000000      0      0    8.6625        S
4            896       3  female  22.000000      1      1   12.2875        S
..           ...     ...     ...        ...    ...    ...       ...      ...
413         1305       3    male  24.027945      0      0    8.0500        S
414         1306       1  female  39.000000      0      0  108.9000        C
415         1307       3    male  38.500000      0      0    7.2500        S
416         1308       3    male  24.027945      0      0    8.0500        S
417         1309       3    male  24.027945      1      1   22.3583        C

[418 rows x 8 columns]


In [461]:
#test["Embarked"]=le.fit_transform(test["Embarked"])
test["Sex"]=le.fit_transform(test["Sex"])


In [462]:
print(test)

     PassengerId  Pclass  Sex        Age  SibSp  Parch      Fare Embarked
0            892       3    1  34.500000      0      0    7.8292        Q
1            893       3    0  47.000000      1      0    7.0000        S
2            894       2    1  62.000000      0      0    9.6875        Q
3            895       3    1  27.000000      0      0    8.6625        S
4            896       3    0  22.000000      1      1   12.2875        S
..           ...     ...  ...        ...    ...    ...       ...      ...
413         1305       3    1  24.027945      0      0    8.0500        S
414         1306       1    0  39.000000      0      0  108.9000        C
415         1307       3    1  38.500000      0      0    7.2500        S
416         1308       3    1  24.027945      0      0    8.0500        S
417         1309       3    1  24.027945      1      1   22.3583        C

[418 rows x 8 columns]


In [473]:
indexS=test[test["Embarked"]=="S"].index
test.loc[indexS,"Embarked"]=2
indexC=test[test["Embarked"]=="C"].index
test.loc[indexC,"Embarked"]=0
indexQ=test[test["Embarked"]=="Q"].index
test.loc[indexQ,"Embarked"]=1

In [474]:
print(test)

     PassengerId  Pclass  Sex        Age  SibSp  Parch      Fare Embarked
0            892       3    1  34.500000      0      0    7.8292        1
1            893       3    0  47.000000      1      0    7.0000        2
2            894       2    1  62.000000      0      0    9.6875        1
3            895       3    1  27.000000      0      0    8.6625        2
4            896       3    0  22.000000      1      1   12.2875        2
..           ...     ...  ...        ...    ...    ...       ...      ...
413         1305       3    1  24.027945      0      0    8.0500        2
414         1306       1    0  39.000000      0      0  108.9000        0
415         1307       3    1  38.500000      0      0    7.2500        2
416         1308       3    1  24.027945      0      0    8.0500        2
417         1309       3    1  24.027945      1      1   22.3583        0

[418 rows x 8 columns]


**Começando pelo Básico: Perceptron**

In [475]:
from sklearn.linear_model import Perceptron
Xtrain=titanic_train[["Pclass","Sex","Age","SibSp","Parch","Fare","Embarked"]].to_numpy()
print(Xtrain.shape)
Xteste=test[["Pclass","Sex","Age","SibSp","Parch","Fare","Embarked"]].to_numpy()
print(Xteste.shape)

Ytrain=titanic_train[["Survived"]].to_numpy()
Ytrain=Ytrain.ravel()
print(Ytrain.shape)

(891, 7)
(418, 7)
(891,)


In [511]:
#Instância do Modelo Perceptron
#perceptron=Perceptron(penalty='l1',alpha=0.0001)
#perceptron.fit(Xtrain, Ytrain)

In [512]:
#yresult=perceptron.predict(Xteste)
#print(yresult)


[0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0 1 1 1 0 1 0 1 1 0 1 0 1 1 0 0
 0 0 1 0 1 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 0 0 0 1
 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 0
 0 1 0 1 1 0 1 1 1 1 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0
 1 1 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 1 1 0 1
 0 1 0 1 0 1 1 1 0 1 0 1 0 0 0 1 1 1 1 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 0 0 0
 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 0 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1
 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0
 1 1 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 1 1 1 0 1 1 0
 1 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 1 0 1 1 1 0 1 0 0 0 1 1 1 0 1 1 0 1 1 0
 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 0 1 0 1 1 1 1 0
 1 0 1 0 1 0 0 1 0 0 1]


In [539]:
from sklearn.neural_network import MLPClassifier
MLP=MLPClassifier(hidden_layer_sizes=(800,400), max_iter=1000, alpha=0.0001, activation='tanh',solver='sgd')
MLP.fit(Xtrain, Ytrain)

In [540]:
yresult=MLP.predict(Xteste)
print(yresult)

[0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 0 0
 1 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 1 0 0 1
 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 1 0 0 1 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
 1 1 1 1 0 0 1 1 1 1 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 1 0 0 0 0 0 1 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1
 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 1 0
 1 0 1 1 0 1 0 0 1 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 1
 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0
 1 0 1 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0
 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 0 1 0 0 1 1 0
 0 1 0 0 1 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1 1 0 0 0
 0 1 1 1 1 0 0 1 0 0 0]


In [541]:
yResultado=pd.DataFrame(yresult.reshape(-1, 1),columns=["Survived"])
print(yResultado)
result=  pd.concat([test["PassengerId"], yResultado], axis=1)
print(result)  
      
      

     Survived
0           0
1           0
2           0
3           0
4           0
..        ...
413         0
414         1
415         0
416         0
417         0

[418 rows x 1 columns]
     PassengerId  Survived
0            892         0
1            893         0
2            894         0
3            895         0
4            896         0
..           ...       ...
413         1305         0
414         1306         1
415         1307         0
416         1308         0
417         1309         0

[418 rows x 2 columns]


In [542]:
#Salvando o Csv
result.to_csv('submission.csv',index=False, encoding='utf-8')

In [None]:
#Primeira Tentativa com o Perceptron: Algoritmo Cru -> Acurácia: 0.71770
