## Titanic Data Exploration ##

***

Over the past several weeks, you've learned the code about how to explore and manipulate a dataset. Now it's time to practice what you've learned on a real-world dataset. 

***

### Titanic Dataset

The titanic dataset holds information about the passengers on the titanic. This includes passenger name, characteristics, and if they survived the accident. The dataset has the following columns:

    * pclass = passenger class; 1 = first class, 2 = second class, 3 = third class
    * survived = passenger survival; 1 = survived, 0 = did not survive
    * name = passenger name
    * sex = sex of passenger
    * age = age of passenger
    * sibsp = # of siblings / spouses aboard the Titanic
    * parch = # of parents / children aboard the Titanic
    * ticket = ticket number
    * fare = fare paid by passenger
    * cabin = passenger cabin
    * embarked = port of embarkation; C = Cherbourg, Q = Queenstown, S = Southampton
    * boat = lifeboat assignment 
    * body = recovered body number
    * home dest = anticipated home destination
If you need some additional motivation before starting, please visit: https://www.youtube.com/watch?v=3gK_2XdjOdY

### How to work through the dataset:

Follow the prompts below to explore, manipulate, and visualize aspects of the dataset. Working with data takes time, so take your time as you start with a messy dataset and turn it into something that shows meaningful visualizations. 

***


### Import Libraries and Dataset

* Review the entire notebook to determine what you will be expected to do - then, import the necessary libraries
* Import the titanic.xlsx dataset

In [1]:
import pandas as pd



In [2]:
df = pd.read_excel("titanic.xlsx")


### Determine the Characteristics of the Dataset

   * How many columns are in this dataset?
   *     There are total of 14 columns in the dataset
   * How many rows are in this dataset?
   *     There are total of 5 rows in the dataset
   * What types of data are in each column? Does this make sense with that you know about that column?
   *     Int, Float, and an object types. Yes, it makes sense because int represents all the whole numbers in the data set and float represents          decimals and objects for other dataset that includes numbers and alphabets.
   * Which variables are numeric? Which variables are categorical? What other variables are left outside of these two groups?
   *     pclass, survived, age, sibsp, parch, fare, and body are all numerical variables and name, sex, tickets, cabin, embarked, boat and                 home.dest are categorical.
   * Which variable could be considered a 'dependent' variable?

In [3]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1309 entries, 0 to 1308
Data columns (total 14 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   pclass     1309 non-null   int64  
 1   survived   1309 non-null   int64  
 2   name       1309 non-null   object 
 3   sex        1309 non-null   object 
 4   age        1046 non-null   float64
 5   sibsp      1309 non-null   int64  
 6   parch      1309 non-null   int64  
 7   ticket     1309 non-null   object 
 8   fare       1308 non-null   float64
 9   cabin      295 non-null    object 
 10  embarked   1307 non-null   object 
 11  boat       486 non-null    object 
 12  body       121 non-null    float64
 13  home.dest  745 non-null    object 
dtypes: float64(3), int64(4), object(7)
memory usage: 143.3+ KB


### Identify the Missing Data in the Dataset

   * Is there any missing data?
   *     Yes.
   * Which columns have any missing data?
   *     cabin, age,embarked, boat, body and home.dest all have missing data.
   * Which column has the most missing information? Which column has the least?
   *     Body has the most missing infromation while age has the least amount of missing information

### Handling the Missing Data in the Dataset

   * Remove the columns with excessive missing data (any column missing greater than 500 rows)
   *     The colums with excessive missing data are boat, body, home.dest, and cabin
   * When there is very little missing data, we can make replacements. Replace the missing data for the "embarked" column with the most common embarkation point. 
           replaced the missind data in embarked with the most common embarkation point which is S
   * Replace the missing data in "fare" with the average fare of the entire sample.
       Replaced the missing data with the average fare
   * Remove the rows in the dataset that has missing "age" data. 
       Removed the rows in the dataset that has missing age data.
   * Recheck is there is any data missing in the dataset.
       There are no missing data.

In [4]:
import pandas as pd
import statistics

df = pd.read_excel("titanic.xlsx")


limit = len(df)* (809/1309)

df = df.dropna(thresh = limit, axis = 1)

s = df.embarked.fillna(df.embarked.mode()[0], inplace = True)

mean = round (df.fare.sum()/len(df.fare)*100,1)
df.fare.fillna(mean, inplace = True)
print(mean)

df = df.dropna(subset=['age'])
#print(df.isnull())





df.info()

3327.0
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1046 entries, 0 to 1308
Data columns (total 10 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   pclass    1046 non-null   int64  
 1   survived  1046 non-null   int64  
 2   name      1046 non-null   object 
 3   sex       1046 non-null   object 
 4   age       1046 non-null   float64
 5   sibsp     1046 non-null   int64  
 6   parch     1046 non-null   int64  
 7   ticket    1046 non-null   object 
 8   fare      1046 non-null   float64
 9   embarked  1046 non-null   object 
dtypes: float64(2), int64(4), object(4)
memory usage: 89.9+ KB


### Creating Columns and Replacing Labels

   * Create descriptive labels for the categorical columns: pclass, survived, and embarked. Instead of the coding that shows in the dataset, create labels to describe what each category represents (i.e. in the embarked column S = Southhampton)
   * Create a new column called "Titanic Passenger" and make all values 1
   * Create a new column called "Family Size" - this column should equal the total number of family members each passenger was traveling with.
   * Create a column called "Travel Alone" - this column should be 1 if the passenger was traveling alone, and 0 if the passenger was traveling with family. 
   * Create a column called "Has Caregiver" - this column should have a value of 1 if a passenger is less than 13-years old AND the passenger is traveling with at least one family member, otherwise the value should be 0. 
   * Create a column called "Crew" - this column should be 1 if the passenger paid 0 dollars for their ticket, and 0 otherwise. 
   * Create a column called "Age Group" to group passengers by their age (create five categories: infant, child, teen, adult, senior). You can use bins to complete this (or any other method you like). You define the cutoff points for each group you create. 
   
After create new columns, replace the basic coding "0/1" with meaningful labels. 

In [16]:
for index, x in enumerate(df["pclass"]):
    if x == 1:
        df["pclass"][index] = "first class"
    elif x == 2 :
        df["pclass"][index] = "second class"
    else :
        df["pclass"][index] = "third class"

for index, x in enumerate(df["survived"]):
    if x == 1:
        df["survived"][index] = "Survived"
    else :
        df["survived"][index] = "Not Survived"

for index, x in enumerate(df["embarked"]):
    if x == "C":
        df["embarked"][index] = "Cherbourg"
    elif x == "Q" :
        df["embarked"][index] = "Queenstown"
    elif x == "S" :
        df["embarked"][index] = "Southampton"

df["Titanic Passenger"]=1

for index, x in enumerate(df["Titanic Passenger"]):
    df["Titanic Passenger"][index]= 1

df['FamilySize'] = df['sibsp'] + df['parch']



df['Travel_Alone'] = 0
for index, x in enumerate(df['FamilySize']):
    if x in [0,1] :
        df['Travel_Alone'][index] =1
    else :
        df['Travel_Alone'][index] = 0


df['hascaregiver'] = 0

# for index, x in enumerate(df["age"]):

#     if df['age'][index] < 13 and df['FamilySize'][index] > 1 :
#         df['hascaregiver'][index] = 1
#     else :
#         df['hascaregiver'][index] = 0

df['crew'] = 0


for index, x in enumerate(df["ticket"]):
    if x == 0:
        df["crew"][index] = 1
    else :
        df["crew"][index] = 0
df['age group'] = 0


for index, x in enumerate(df["age"]):
    if x <=1 :
        df["age group"][index] = "infent"
    elif x <=12 :
        df["age group"][index] = "child"
    elif x <=19:
        df["age group"][index] = "teen"
    elif x <=20:
        df["age group"][index] = "Adult"
    else :
        df["age group"][index] = "0"

# df.rating_count
# bin = [0,2,12,18,25,100]
# binned_data - pd.cut(df.rating_count, bins)

df.head()
#embarked = port of embarkation; C = Cherbourg, Q = Queenstown, S = Southampton
#pclass = passenger class; 1 = first class, 2 = second class, 3 = third class

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["pclass"][index] = "first class"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["survived"][index] = "Survived"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["embarked"][index] = "Southampton"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["embarked"][index] = "Cherbourg"
A value is trying to be

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["Titanic Passenger"][index]= 1


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Travel_Alone'][index] =1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['Travel_Alone'][index] = 0


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["crew"][index] = 0


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["age group"][index] = "0"
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df["age group"][index] = "infent"


Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest,Titanic Passenger,FamilySize,Travel_Alone,hascaregiver,crew,age group
0,first class,Survived,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.3375,B5,Southampton,2.0,,"St Louis, MO",1,0,1,0,0,0
1,first class,Survived,"Allison, Master. Hudson Trevor",male,0.9167,1,2,113781,151.55,C22 C26,Southampton,11.0,,"Montreal, PQ / Chesterville, ON",1,3,0,0,0,infent
2,first class,Not Survived,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.55,C22 C26,Southampton,,,"Montreal, PQ / Chesterville, ON",1,3,0,0,0,child
3,first class,Not Survived,"Allison, Mr. Hudson Joshua Creighton",male,30.0,1,2,113781,151.55,C22 C26,Southampton,,135.0,"Montreal, PQ / Chesterville, ON",1,3,0,0,0,0
4,first class,Not Survived,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",female,25.0,1,2,113781,151.55,C22 C26,Southampton,,,"Montreal, PQ / Chesterville, ON",1,3,0,0,0,0


### Determine Frequencies of Groups

* How many passengers fall into each category? Determine how many passengers fall into each group for <b>each</b> categorical vairable (including the ones you just created). 

### Determine the Distribution of Numeric Data

* What are the summary statistics for <b>each</b> numeric variable in the dataset? Summary statistics include:
    * Mean=
        -
    * Median
        -
    * Mode
        -
    * Standard Deviation=
        -
    * Range
        -

In [19]:
st = df.describe()
print(df.describe())

# ma = df['pclass'].max()
# mi = df['pclass'].min()
# r = ma - mi
# print("range of pclass ", r)



# for row in range(len(df)) :
#     for col in range(len(df[row])) :
#         ma = df[row].max()
#         mi = df[row].min()
#         r = ma - mi
#         print("range of pclass ", r)

# print(st.mean)
# rangee = max(st) - min(st)
# print(rangee)


               age        sibsp        parch         fare        body  \
count  1046.000000  1309.000000  1309.000000  1308.000000  121.000000   
mean     29.881135     0.498854     0.385027    33.295479  160.809917   
std      14.413500     1.041658     0.865560    51.758668   97.696922   
min       0.166700     0.000000     0.000000     0.000000    1.000000   
25%      21.000000     0.000000     0.000000     7.895800   72.000000   
50%      28.000000     0.000000     0.000000    14.454200  155.000000   
75%      39.000000     1.000000     0.000000    31.275000  256.000000   
max      80.000000     8.000000     9.000000   512.329200  328.000000   

       Titanic Passenger   FamilySize  Travel_Alone  hascaregiver    crew  
count             1309.0  1309.000000   1309.000000        1309.0  1309.0  
mean                 1.0     0.883881      0.783040           0.0     0.0  
std                  0.0     1.583639      0.412332           0.0     0.0  
min                  1.0     0.000000 

In [7]:
pclassInfo = {}

for index, pclassName in enumerate(df.pclass):
    if pclassName not in pclassInfo.keys():
        pclassInfo[pclassName] = 1
    else:
         pclassInfo[pclassName]+=1
            
print(pclassInfo)

{'first class': 249, 'second class': 244, 'third class': 383, 3: 170}


### Relationships between Variables

* Determine the relationship between each variable and the variable "survived". This is our primary variable of interest -- did this passenger survive the accident? Did the characteristics of the passenger have any relationship with their survival?
    * <b>pclass</b>: how many survivors are in each passenger class? does a pattern emerge? which class has the most survivors? which has the least?
    * <b>sex</b>: how many survivors are in each variable group? does a pattern emerge? which group has the most survivors? which has the least?
    * <b>age</b>: how does the average age of the passenger differ based on survival group? 
    * <b>age group</b>: how many survivors are in each variable group? does a pattern emerge? which group has the most survivors? which has the least?
    * <b>family size</b>: how many survivors are in each variable group? does a pattern emerge? which group has the most survivors? which has the least?
    * <b>travel alone</b>: how many survivors are in each variable group? does a pattern emerge? which group has the most survivors? which has the least?
    * <b>crew</b>: how many survivors are in each variable group? does a pattern emerge? which group has the most survivors? which has the least?
    * <b>has caregiver</b>: how many survivors are in each variable group? does a pattern emerge? which group has the most survivors? which has the least?
    * <b>fare</b>: how does the average fare the passenger paid differ based on survival group? 
    * <b>embarked</b>: how many survivors are in each variable group? does a pattern emerge? which group has the most survivors? which has the least?
    
Based on what you learn working through this section, make (2) statements about what characteristics of passenger most influenced their survival.

In [8]:
df = pd.read_excel("titanic.xlsx")

pclassInfosurvived = {}
pclassInfoNotsurvived = {}
for index, pclassName in enumerate(df.pclass):
    if pclassName in pclassInfosurvived.keys() and pclassName in pclassInfoNotsurvived.keys():
        if df.survived[index] == 1:
            pclassInfosurvived[pclassName] += 1
        if df.survived[index] == 0:
            pclassInfoNotsurvived[pclassName] += 1
    else:
        if df.survived[index] == 1:
            pclassInfosurvived[pclassName] = 1
        if df.survived[index] == 0:
            pclassInfoNotsurvived[pclassName] = 1

print(pclassInfosurvived)
print(pclassInfoNotsurvived)

{1: 199, 2: 119, 3: 181}
{1: 123, 2: 158, 3: 526}


In [9]:
sexInfosurvived = {}
sexInfoNotsurvived = {}
for index, sexName in enumerate(df.sex):
    if sexName in sexInfosurvived.keys() and sexName in sexInfoNotsurvived.keys():
        if df.survived[index] == 1:
            sexInfosurvived[sexName] += 1
        if df.survived[index] == 0:
            sexInfoNotsurvived[sexName] += 1
    else:
        if df.survived[index] == 1:
            sexInfosurvived[sexName] = 1
        if df.survived[index] == 0:
            sexInfoNotsurvived[sexName] = 1

print(sexInfosurvived)
print(sexInfoNotsurvived)

{'female': 339, 'male': 161}
{'female': 127, 'male': 682}


In [10]:
ageInfosurvived = {}
ageInfoNotsurvived = {}
for index, ageName in enumerate(df.age):
    if ageName in ageInfosurvived.keys() and ageName in ageInfoNotsurvived.keys():
        if df.survived[index] == 1:
            ageInfosurvived[ageName] += 1
        if df.survived[index] == 0:
            ageInfoNotsurvived[ageName] += 1
    else:
        if df.survived[index] == 1:
            ageInfosurvived[ageName] = 1
        if df.survived[index] == 0:
            ageInfoNotsurvived[ageName] = 1

print(ageInfosurvived)
print(ageInfoNotsurvived)

{29.0: 12, 0.9167: 1, 48.0: 10, 63.0: 2, 53.0: 1, 18.0: 13, 24.0: 22, 26.0: 9, 80.0: 1, 50.0: 6, 32.0: 9, 37.0: 2, 47.0: 3, 42.0: 6, 25.0: 11, 19.0: 11, 35.0: 2, 28.0: 8, 40.0: 6, 30.0: 15, 58.0: 2, 45.0: 14, 22.0: 15, nan: 1, 44.0: 2, 59.0: 1, 60.0: 1, 41.0: 2, 36.0: 14, 11.0: 1, 14.0: 3, nan: 1, 76.0: 1, 27.0: 13, 33.0: 9, nan: 1, 39.0: 8, 64.0: 1, 55.0: 4, 38.0: 5, 51.0: 1, 31.0: 12, 17.0: 7, 4.0: 5, 54.0: 4, 49.0: 5, 23.0: 7, nan: 1, nan: 1, 43.0: 1, nan: 1, nan: 1, nan: 1, 52.0: 1, 16.0: 6, nan: 1, 21.0: 10, 15.0: 4, nan: 1, nan: 1, nan: 1, 56.0: 1, nan: 1, 13.0: 2, nan: 1, nan: 1, 34.0: 6, 6.0: 2, nan: 1, 62.0: 2, nan: 1, nan: 1, nan: 1, 1.0: 4, 12.0: 1, 20.0: 4, 0.8333: 1, 8.0: 1, 0.6667: 1, 7.0: 2, nan: 1, 3.0: 1, nan: 1, 2.0: 4, nan: 1, 32.5: 1, 5.0: 4, nan: 1, 0.75: 1, 9.0: 4, nan: 1, 36.5: 1, 0.1667: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1, nan: 1

In [11]:
FamilySizeInfosurvived = {}
FamilySizeInfoNotsurvived = {}
for index, FamilySizeName in enumerate(df.FamilySize):
    if FamilySizeName in FamilySizeInfosurvived.keys() and FamilySizeName in FamilySizeInfoNotsurvived.keys():
        if df.survived[index] == 1:
            FamilySizeInfosurvived[FamilySizeName] += 1
        if df.survived[index] == 0:
            FamilySizeInfoNotsurvived[FamilySizeName] += 1
    else:
        if df.survived[index] == 1:
            FamilySizeInfosurvived[FamilySizeName] = 1
        if df.survived[index] == 0:
            FamilySizeInfoNotsurvived[FamilySizeName] = 1

print(FamilySizeInfosurvived)
print(FamilySizeInfoNotsurvived)

AttributeError: 'DataFrame' object has no attribute 'FamilySize'

In [12]:
crewInfosurvived = {}
crewInfoNotsurvived = {}
for index, crewName in enumerate(df.crew):
    if crewName in crewInfosurvived.keys() and crewName in crewInfoNotsurvived.keys():
        if df.survived[index] == 1:
            crewInfosurvived[crewName] += 1
        if df.survived[index] == 0:
            crewInfoNotsurvived[crewName] += 1
    else:
        if df.survived[index] == 1:
            crewInfosurvived[crewName] = 1
        if df.survived[index] == 0:
            crewInfoNotsurvived[crewName] = 1

print(crewInfosurvived)
print(crewInfoNotsurvived)

AttributeError: 'DataFrame' object has no attribute 'crew'

In [13]:
hascaregiverInfosurvived = {}
hascaregiverInfoNotsurvived = {}
for index, hascaregiverName in enumerate(df.caregiver):
    if hascaregiverName in hascaregiverInfosurvived.keys() and hascaregiverName in hascaregiverInfoNotsurvived.keys():
        if df.survived[index] == 1:
            hascaregiverInfosurvived[hascaregiverName] += 1
        if df.survived[index] == 0:
            hascaregiverInfoNotsurvived[hascaregiverName] += 1
    else:
        if df.survived[index] == 1:
            hascaregiverInfosurvived[hascaregiverName] = 1
        if df.survived[index] == 0:
            hascaregiverInfoNotsurvived[hascaregiverName] = 1

print(hascaregiverInfosurvived)
print(hascaregiverInfoNotsurvived)

AttributeError: 'DataFrame' object has no attribute 'hascaregiver'

In [14]:
fareInfosurvived = {}
fareInfoNotsurvived = {}
for index, fareName in enumerate(df.fare):
    if fareName in fareInfosurvived.keys() and fareName in fareInfoNotsurvived.keys():
        if df.survived[index] == 1:
            fareInfosurvived[fareName] += 1
        if df.survived[index] == 0:
            fareInfoNotsurvived[fareName] += 1
    else:
        if df.survived[index] == 1:
            fareInfosurvived[fareName] = 1
        if df.survived[index] == 0:
            fareInfoNotsurvived[fareName] = 1

print(fareInfosurvived)
print(fareInfoNotsurvived)

{211.3375: 1, 151.55: 3, 26.55: 8, 77.9583: 1, 51.4792: 1, 227.525: 3, 69.3: 1, 78.85: 2, 30.0: 3, 247.5208: 2, 76.2917: 1, 52.5542: 1, 221.7792: 1, 91.0792: 1, 135.6333: 3, 31.0: 2, 164.8667: 2, 262.375: 2, 55.0: 1, 27.7208: 4, 134.5: 1, 26.2875: 1, 27.4458: 1, 512.3292: 1, 120.0: 1, 61.175: 1, 53.1: 3, 86.5: 1, 29.7: 2, 136.7792: 1, 83.1583: 5, 25.7: 1, 71.0: 1, 71.2833: 1, 30.5: 4, 52.0: 4, 57.0: 1, 81.8583: 1, 106.425: 2, 39.6: 2, 56.9292: 1, 78.2667: 1, 31.6833: 1, 110.8833: 3, 26.3875: 1, 263.0: 2, 133.65: 1, 49.5: 1, 79.2: 2, 211.5: 2, 59.4: 2, 89.1042: 1, 28.5: 1, 153.4625: 2, 63.3583: 1, 55.4417: 1, 76.7292: 1, 83.475: 1, 93.5: 2, 57.9792: 1, 90.0: 3, 80.0: 1, 0.0: 2, 51.8625: 1, 25.9292: 1, 39.4: 1, 146.5208: 2, 49.5042: 1, 82.1708: 1, 57.75: 1, 113.275: 1, 26.2833: 1, 108.9: 2, 25.7417: 1, 61.9792: 1, 66.6: 1, 26.0: 19, 55.9: 1, 35.5: 3, 60.0: 1, 82.2667: 1, 79.65: 2, 28.5375: 1, 75.25: 1, 61.3792: 1, 24.0: 1, 13.0: 17, 39.0: 2, 29.0: 1, 21.0: 5, 10.5: 12, 26.25: 4, 36.75: 2

In [15]:
embarkedInfosurvived = {}
embarkedInfoNotsurvived = {}
for index, embarkedName in enumerate(df.embarked):
    if embarkedName in embarkedInfosurvived.keys() and embarkedName in embarkedInfoNotsurvived.keys():
        if df.survived[index] == 1:
            embarkedInfosurvived[embarkedName] += 1
        if df.survived[index] == 0:
            embarkedInfoNotsurvived[embarkedName] += 1
    else:
        if df.survived[index] == 1:
            embarkedInfosurvived[embarkedName] = 1
        if df.survived[index] == 0:
            embarkedInfoNotsurvived[embarkedName] = 1

print(embarkedInfosurvived)
print(embarkedInfoNotsurvived)

{'S': 303, 'C': 150, nan: 1, 'Q': 44}
{'S': 610, 'C': 119, 'Q': 79}


### Visualize your Results

* Using the most interesting (from your POV) results from the above section, create (3) visualizations to illustrate the results. 
* Create a barplot to show the variation in average age across passenger class. On average, which passenger class has the oldest passengers?
* Create a violin plot to show the distribution of age across passenger class. 