# Variable Notes  

## pclass  
**A proxy for socio-economic status (SES):**  
- **1st** = Upper  
- **2nd** = Middle  
- **3rd** = Lower  

## age  
- Age is fractional if less than 1.  
- If the age is estimated, it is in the form of `xx.5`.  

## sibsp  
The dataset defines family relations in this way:  
- **Sibling** = brother, sister, stepbrother, stepsister  
- **Spouse** = husband, wife *(mistresses and fiancés were ignored)*  

## parch  
The dataset defines family relations in this way:  
- **Parent** = mother, father  
- **Child** = daughter, son, stepdaughter, stepson  
- Some children traveled only with a nanny, therefore `parch=0` for them.  


In [51]:
import pandas as pd
df = pd.read_csv('train.csv')
print(df.columns)
print(df.head(5))

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')
   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0

In [None]:
row = df.shape[0] # Nº de linhas
columns = df.shape[1] # Nº de colunas

# OUTRO MEÉTODO
# print(len(df))
# print(len(df.columns))

print(f'Linhas: {row} | Colunas: {columns}')

Linhas: 891 | Colunas: 12


In [None]:
media_idade = df['Age'].mean()
formatada = "{:.2f}".format(media_idade)
print(f'A média da idade dos passageiro era: {formatada}')

A média da idade dos passageiro era: 29.70


In [None]:
df[['Name', 'Survived']]

Unnamed: 0,Name,Survived
0,"Braund, Mr. Owen Harris",0
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",1
2,"Heikkinen, Miss. Laina",1
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",1
4,"Allen, Mr. William Henry",0
...,...,...
886,"Montvila, Rev. Juozas",0
887,"Graham, Miss. Margaret Edith",1
888,"Johnston, Miss. Catherine Helen ""Carrie""",0
889,"Behr, Mr. Karl Howell",1


In [41]:
total_passageiros = df['Survived'].value_counts()
print(total_passageiros, '\n')

sobreviveram = total_passageiros.get(1, 0)
n_sobreviveram = total_passageiros.get(0, 0)
print(f'Sobreviventes: {sobreviveram} | Não sobreviventes: {n_sobreviveram}')

Survived
0    549
1    342
Name: count, dtype: int64 

Sobreviventes: 342 | Não sobreviventes: 549


In [50]:
sexo = df['Sex'].value_counts()
print(sexo, '\n')

mulheres = "{:.2%}".format((sexo.get('female', 0) / len(df)))
print(f'Porcentagem de mulheres: {mulheres}')

homens = "{:.2%}".format((sexo.get('male', 0) / len(df)))
print(f'Porcentagem de homens: {homens}')

Sex
male      577
female    314
Name: count, dtype: int64 

Porcentagem de mulheres: 35.24%
Porcentagem de homens: 64.76%


In [54]:
criancas = 0
adolecentes = 0
adultos = 0
idosos = 0

for index, row in df.iterrows():
    if row['Age'] >= 0.0 and row['Age'] <= 12.0:
        criancas += 1

    elif row['Age'] >= 13.0 and row['Age'] <= 17.0:
        adolecentes += 1

    elif row['Age'] >= 18.0 and row['Age'] <= 59.0:
        adultos += 1

    elif row['Age'] >= 60.0:
        idosos += 1

print(f'Crianças: {criancas}')
print(f'Adolescentes: {adolecentes}')
print(f'Adultos: {adultos}')
print(f'Idosos: {idosos}')


Crianças: 69
Adolescentes: 44
Adultos: 575
Idosos: 26


In [None]:
df['Faixa Etária'] = ''

for index, row in df.iterrows():
    if row['Age'] >= 0.0 and row['Age'] <= 12.0:
        df['Faixa Etária'] = 'Criança'

    elif row['Age'] >= 13.0 and row['Age'] <= 17.0:
        df['Faixa Etária'] = 'Adolecente'

    elif row['Age'] >= 18.0 and row['Age'] <= 59.0:
        df['Faixa Etária'] = 'Adulto'

    elif row['Age'] >= 60.0:
        df['Faixa Etária'] = 'Idoso'

grupos_etarios = df.groupby(['Faixa Etária'])
print(grupos_etarios)


<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FB986087C0>
