# Welcome to the Data Manipulation Exercises

The workbook has been broken up into three sections.  Each section correlates to a reading assignment within the textbook.

In [84]:
import pandas as pd
import numpy as np

data= pd.read_csv("titanic.csv")

## Before You Get Started

We are going to be using the Titanic Dataset. Make sure to run a head() before you start working with manipulation methods.

In [97]:
# Run the head of your data set here:
print(data.head(10))
print(data.shape)
print(data.dtypes)

   survived  pclass     sex   age  sibsp  parch     fare embarked   class  \
0         0       3    male  22.0      1      0   7.2500        S   Third   
1         1       1  female  38.0      1      0  71.2833        C   First   
2         1       3  female  26.0      0      0   7.9250        S   Third   
3         1       1  female  35.0      1      0  53.1000        S   First   
4         0       3    male  35.0      0      0   8.0500        S   Third   
5         0       3    male   NaN      0      0   8.4583        Q   Third   
6         0       1    male  54.0      0      0  51.8625        S   First   
7         0       3    male   2.0      3      1  21.0750        S   Third   
8         1       3  female  27.0      0      2  11.1333        S   Third   
9         1       2  female  14.0      1      0  30.0708        C  Second   

     who  adult_male deck  embark_town alive  alone  
0    man        True  NaN  Southampton    no  False  
1  woman       False    C    Cherbourg   yes

In [12]:
# check for duplicates
duplicates = data.duplicated()
print(duplicates)

0      False
1      False
2      False
3      False
4      False
       ...  
886     True
887    False
888    False
889    False
890    False
Length: 891, dtype: bool


In [59]:
# if there are, go ahead and drop them:
data.drop_duplicates(inplace=True)
print(data.shape)

(784, 15)


### Cleaning Note:

While the columns are not the "prettiest", don't adjust any of them yet. We are going to update some values and add some values as we workthrough this notebook. Applologies for the extra visual "noise" on your screen. You will be given the option to tidy up the columns at the end of this notebook.

## Running Tables Note:  
If your tables don't appear to have accepted your changes, try the "Run All" option in the "Cell" section of the menu bar.  

<span style="background-color:dodgerblue; color:dodgerblue;">- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -</span> 

# A. Aggregation

1. Work through the section Exercises.  
    - There are 4 sections in part A:
        - Groupby
        - Aggregation Methods
        - Groupby and Basic Math
        - Groupby and Multiple Aggregations


#### Creating Variables.

As we begin to manipulate our data, create new variables to store your work in.  This will keep your original data in tact.  Having the original dataset available will save you time with each manipulation.  You can also create variable names that inform you of the purpose of the manipulation.  

### 1: Groupby <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 

#### Groupby "embark_town"

1. Using the titanic data set, groupby "embark_town".
1. Create a variable that will represent the grouping of data. 
1. Intitalize it using the groupby() function and pass it the column.


In [41]:
# Code your groupby "embark_town" here:
group_embark_town = data.groupby(["embark_town"])

In [42]:
# To view the grouped data as a table, use the variable_name.first():
print(group_embark_town.first())

             survived  pclass     sex   age  sibsp  parch     fare embarked  \
embark_town                                                                   
Cherbourg           1       1  female  38.0      1      0  71.2833        C   
Queenstown          0       3    male   2.0      0      0   8.4583        Q   
Southampton         0       3    male  22.0      1      0   7.2500        S   

             class    who  adult_male deck alive  alone  
embark_town                                              
Cherbourg    First  woman       False    C   yes  False  
Queenstown   Third    man        True    C    no   True  
Southampton  Third    man        True    C    no  False  


#### Groupby "survived"

Did you know that you can also chain on some of our exploratory methods to the groupby method?

1. Create & initalize a new variable to hold a table that will groupby "survived" 
1. Use method chaining to tack on the describe method

In [54]:
# Code your groupby "survived" table here:
survived = data.groupby(['survived'])

# run your table below:
print(survived.first())


          pclass     sex   age  sibsp  parch     fare embarked  class    who  \
survived                                                                       
0              3    male  22.0      1      0   7.2500        S  Third    man   
1              1  female  38.0      1      0  71.2833        C  First  woman   

          adult_male deck  embark_town alive  alone  
survived                                             
0               True    E  Southampton    no  False  
1              False    C    Cherbourg   yes  False  


In [55]:
# run your table with describe
survived.describe()

Unnamed: 0_level_0,pclass,pclass,pclass,pclass,pclass,pclass,pclass,pclass,age,age,...,parch,parch,fare,fare,fare,fare,fare,fare,fare,fare
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
survived,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
0,461.0,2.481562,0.770507,1.0,2.0,3.0,3.0,3.0,394.0,30.953046,...,0.0,6.0,461.0,23.944531,33.336385,0.0,7.875,13.0,26.55,263.0
1,323.0,1.904025,0.856152,1.0,1.0,2.0,3.0,3.0,284.0,28.365915,...,1.0,5.0,323.0,50.07918,68.009971,0.0,12.825,26.25,61.37915,512.3292


In [65]:
# How is this table organized?  Why are there 40 columns now?
# Now they have the table as two rows, a 0 and 1. It's now running the describe function on the 8 columns with integers instead of doing it by 


### 2. Aggregation Methods <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 

Note: **agg()** and **aggregate()** are identical [source](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.aggregate.html)

#### Method Chaining

1. Create a variable to method chain **head()** and **agg()** togehter.
1. Pass one of the following statistical values to **agg()**
   - "mean", "median", "mode", "min", "max", "std", "var", "first", "last", "sum"

In [69]:
# Code your method chain here:
head_agg = data.agg(['mode']).head(1)

In [89]:
# Create a variable to method chain head() with agg("sum")
sum_agg = data.agg(["count"]).head(5)
# run your table:
print(sum_agg)

       survived  pclass  sex  age  sibsp  parch  fare  embarked  class  who  \
count       891     891  891  714    891    891   891       889    891  891   

       adult_male  deck  embark_town  alive  alone  
count         891   203          889    891    891  


In [12]:
# Explain the sum table.  What is going on with the "sex", "class", and "alive" columns?



#### Using a Dictionary <span style="color:darkorange;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 
##### A dictionary is a Python collection type.  

Is a collection type that stores **key-value pairs**.  A key-value pair is an orgainzation system that is made up of a single *key* that has one or more *values* paired with it.  
Think of it like your contacts list.  The contacts list is the dictionary object.  
Each contact is organized by a key, usually name.  And attached to each name is contact information, or the values.
Some contacts might have email address, phone number, home or work address, etc. Other contacts may just be a name and phone number.  This is a very simple example, but understanding this orgainzational structure will be helpful as you learn to manipulate tables.  

*Here is a dictionary example with 3 keys:*
>**contacts_dictionary = {"name1": ["email", 555-5552, "work info"], 
      "name2": ["email", 555-5554],
      "name3": 555-5555}**
                     
*Here is a dictionary example with a single key-value pair*
**study_group_dictionary = {"name1": 555-5557}**   

It has a single key, and a list of values. The organization of this structure is called a "Key-Value Pair".
Using the contact list example, the key would be the name of the person and the values would be their contact information.  The key is a single item (the person's name) and the values can be a single item (an email address) or mulitple items (email, phone number, address, work info, etc).
Keys and values can be any data type, but must use correct data type syntax.  The keys do not have to be strings, but they do need to be a single value.  

For more information, you can read more on dictionary objects [here](https://www.w3schools.com/python/python_dictionaries.asp).


#### Aggregation across muliple columns using dictionary functionality

##### Syntax Example:

**age_dictionary={"age":["sum", "max"]}**

We are creating a new dictionary (**age_dictionary**).  The key is **age** and the values we want are **"sum""** and **"max"**.  This dictionary object has now become a tempate for the aggregations we want to preform.  However, on it's own, it does nothing.  Once passed to the **agg()** method, it will pick out the specific location of data we want to examine.  Making a subset table.  

The code is contained in the box below.  Run it and see what happens.


For syntax examples, review [this webpage](https://www.geeksforgeeks.org/python-pandas-dataframe-aggregate/).
#### <span style="color:coral;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>

In [90]:
# Predict the table output before you uncomment the code below.

age_dictionary={"age":["sum", "max"]}
dictionary_agg=data.agg(age_dictionary)
dictionary_agg

Unnamed: 0,age
sum,21205.17
max,80.0


1. What if we want to look at more than one column at a time?  We pass more dictionaries to the agg function.
1. Create a variable to hold at least 3 columns.  Use the syntax from the "Syntax Example" as a guide.
    - Aggregate the following:  survived: "sum" & "count"; age: "std" & "min", and sibsp: "count" & "sum"

In [91]:
# Code your dictionary here:
aggregate_dictionary = {"survived":["sum", "count"], "age":["std", "min"], "sibsp":["count","sum"]}
dict_aggregate = data.agg(aggregate_dictionary)
print(dict_aggregate)


       survived        age  sibsp
sum       342.0        NaN  466.0
count     891.0        NaN  891.0
std         NaN  14.526497    NaN
min         NaN   0.420000    NaN


### 3. Groupby and Basic Math <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 

1. Groupby "pclass".  Make sure you use a variable to hold your grouped data.

In [98]:
# Code your groupby here:
data_num = data.select_dtypes(include=['int64', 'float64'])
pclass_var = data_num.groupby("pclass") 

# Run your table using first() here instead of head():
print(pclass_var.first())

        survived   age  sibsp  parch     fare
pclass                                       
1              1  38.0      1      0  71.2833
2              1  14.0      1      0  30.0708
3              0  22.0      1      0   7.2500


### 4. Groupby and Multiple Aggregations <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 

#### Group with a List<span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>

1. We want to do muliple aggregation functions to our newly grouped data set.  We created a variable to hold a list of functions we want to perform.  These functions are part of the agg method.  When we pass our list to the method, the method will iterate through each item and perform that function for the entire table.

In [99]:
# our list of functions
agg_func_list = ['sum', 'mean', 'median', 'min', 'max', 'std', 'var', 'first', 'last', 'count']


#Apply the agg method to our passenger_class variable (made in the Groupby Basic Math section).  
# Pass our list to the function and run your table.

print(pclass_var.agg(agg_func_list))
  

       survived                                                                \
            sum      mean median min max       std       var first last count   
pclass                                                                          
1           136  0.629630    1.0   0   1  0.484026  0.234281     1    1   216   
2            87  0.472826    0.0   0   1  0.500623  0.250624     1    0   184   
3           119  0.242363    0.0   0   1  0.428949  0.183998     0    0   491   

        ...        fare                                                \
        ...         sum       mean   median  min       max        std   
pclass  ...                                                             
1       ...  18177.4125  84.154687  60.2875  0.0  512.3292  78.380373   
2       ...   3801.8417  20.662183  14.2500  0.0   73.5000  13.417399   
3       ...   6714.6951  13.675550   8.0500  0.0   69.5500  11.778142   

                                           
                var    first  

#### Group with a Dictionary<span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span>

Using only a list provides us with the entire table.  What if we only want to look at age vs pclass?  

we can create a dictionary to hold the age column for us.  The *key* would be the name of our column, and the values our list of functions to preform on that column.  The code would look like this:

In [17]:
agg_func_dict = {
    'age':
    ['sum', 'mean', 'median', 'min', 'max', 'std', 'var', 'first', 'last', 'count']
}
# We would run our table like this:
# passenger_class.agg(agg_func_dict)  

Looking at the *age_func_dict* syntax, create a dictionary variable for the "survived" column and pass it to **passenger_class.agg()** in the box below.

In [100]:
# Code it here:
survived_agg_dict = {
    'survived':
    ['sum', 'mean', 'median', 'min', 'max', 'std', 'var', 'first', 'last', 'count']
}
print(pclass_var.agg(survived_agg_dict))



       survived                                                              
            sum      mean median min max       std       var first last count
pclass                                                                       
1           136  0.629630    1.0   0   1  0.484026  0.234281     1    1   216
2            87  0.472826    0.0   0   1  0.500623  0.250624     1    0   184
3           119  0.242363    0.0   0   1  0.428949  0.183998     0    0   491


<span style="background-color:dodgerblue; color:dodgerblue;">- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -</span> 

# B. Recoding and Creating New Values and Variables 

1. Work through the Part B, there are 3 sections

### Create a New Column <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 

In the box below:
1. Create a new column by manipulating the values of different column.  Specifically, create a new column, "fare_2021" that allows us to compare the cost of fare in pounds back in 1912 to 2021.  [This website](https://www.in2013dollars.com/uk/inflation/1912) can help you find the 2021 fare amount. 

In [102]:
# Code your new "fare_2021" column here:
data["fare_2021"] = data['fare'] * 119.25
# Run the head of your table to see your new column:
print(data.head())

   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     who  adult_male deck  embark_town alive  alone    fare_2021  
0    man        True  NaN  Southampton    no  False   864.562500  
1  woman       False    C    Cherbourg   yes  False  8500.533525  
2  woman       False  NaN  Southampton   yes   True   945.056250  
3  woman       False    C  Southampton   yes  False  6332.175000  
4    man        True  NaN  Southampton    no   True   959.962500  


### Replacing Values <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 
 
Replace the values in the "alive" coloum from string "yes" or "no" to bools, where "yes" becomes True and "no" becomes False.

In [107]:
# Code your updated values here:
data['alive'] = data['alive'].replace(to_replace={'no' : False, 'yes': True})

print(data.head())

   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     who  adult_male deck  embark_town  alive  alone    fare_2021  
0    man        True  NaN  Southampton  False  False   864.562500  
1  woman       False    C    Cherbourg   True  False  8500.533525  
2  woman       False  NaN  Southampton   True   True   945.056250  
3  woman       False    C  Southampton   True  False  6332.175000  
4    man        True  NaN  Southampton  False   True   959.962500  


We can also use functions to update values.

1. Create a function that will set the alive values as bools. Apply it to your table and run your table here:

In [110]:
# Code your function here:
data['alive'] = data['alive'].astype(bool)
print(data.head())

   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     who  adult_male deck  embark_town  alive  alone    fare_2021  
0    man        True  NaN  Southampton  False  False   864.562500  
1  woman       False    C    Cherbourg   True  False  8500.533525  
2  woman       False  NaN  Southampton   True   True   945.056250  
3  woman       False    C  Southampton   True  False  6332.175000  
4    man        True  NaN  Southampton  False   True   959.962500  


### Using a function to create a new column <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 

Sometimes you might want to create a new column based on combining multiple columns together.

1. create an "age_group" column that breaks years up as 0-19, 20-29, 30-39, etc until all given ages are covered.  Make sure you check to see where you can stop counting by 10s.

In [117]:
# Write your max age check here:
print(data['age'].max())
print(data['age'].min())
def age_counts(data):
    age = data['age']
    if age <= 19:
        return '0-19'
    if age >= 20 and age <= 29:
        return '20-29'
    if age >= 30 and age <= 39:
        return '30-39'
    if age >= 40 and age <= 49:
        return '40-49'
    if age >= 50 and age <= 59:
        return '50-59'
    if age >= 60 and age <= 69:
        return '60-69'
    if age >= 70 and age <= 79:
        return '70-79'
    else:
        return '80+'

80.0
0.42


In [119]:
# Code the new "age_group" column function here:
data['age_group'] = data.apply(age_counts, axis = 1)
print(data.head())

   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     who  adult_male deck  embark_town  alive  alone    fare_2021 age_group  
0    man        True  NaN  Southampton  False  False   864.562500     20-29  
1  woman       False    C    Cherbourg   True  False  8500.533525     30-39  
2  woman       False  NaN  Southampton   True   True   945.056250     20-29  
3  woman       False    C  Southampton   True  False  6332.175000     30-39  
4    man        True  NaN  Southampton  False   True   959.962500     30-39  


<span style="background-color:dodgerblue; color:dodgerblue;">- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -</span> 

# C. Reshaping Tables

1. Work through Part C, there are 4 sections

### Sort_values <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 

Use **sort_values()** to answer the following question:
> What is the age of the person who paid the highest fare?

Hint: We want to see the highest fare value first. What order would we want? ascending or descending?  Check the [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html?highlight=sort_values#pandas.DataFrame.sort_values) for the syntax.

In [126]:
# Code your sort_values here:
data.sort_values(by = "fare", ascending = False)

# Run your table here:
print(data.sort_values(by = "fare", ascending = False).head(10))

     survived  pclass     sex   age  sibsp  parch      fare embarked  class  \
679         1       1    male  36.0      0      1  512.3292        C  First   
258         1       1  female  35.0      0      0  512.3292        C  First   
737         1       1    male  35.0      0      0  512.3292        C  First   
88          1       1  female  23.0      3      2  263.0000        S  First   
438         0       1    male  64.0      1      4  263.0000        S  First   
341         1       1  female  24.0      3      2  263.0000        S  First   
27          0       1    male  19.0      3      2  263.0000        S  First   
742         1       1  female  21.0      2      2  262.3750        C  First   
311         1       1  female  18.0      2      2  262.3750        C  First   
299         1       1  female  50.0      0      1  247.5208        C  First   

       who  adult_male deck  embark_town  alive  alone    fare_2021 age_group  
679    man        True    B    Cherbourg   True  F

### pivot_table <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 
1. pivot the table of the summed data where the values are "fare", index is "who" and "age_group", and the columns are "survived"

Hint: set the aggfunc parameter to np.sum




In [125]:
# Code your pivot_table here:
pivot_table = data.pivot_table(values = 'fare', aggfunc = sum, index = ('who', 'age_group'), columns = 'survived')


# Run your table here:
print(pivot_table)

survived                 0          1
who   age_group                      
child 0-19       1109.5459  1611.6751
man   0-19        978.4249   145.5333
      20-29      2435.2329   646.9542
      30-39      1506.2789  1720.9459
      40-49      1110.6374   466.6793
      50-59       853.3042   226.2000
      60-69       586.4875    89.7000
      70-79       181.1834        NaN
      80+        1997.7913   406.7124
woman 0-19        126.1375  1243.0209
      20-29       404.7500  2514.4291
      30-39       164.9042  3350.8791
      40-49       241.2542  1563.6335
      50-59        39.2125  1182.0833
      60-69            NaN   242.7958
      80+         407.5751  1139.9875


  pivot_table = data.pivot_table(values = 'fare', aggfunc = sum, index = ('who', 'age_group'), columns = 'survived')


### Wide to Long <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 

1. Create a table where the columns are "who" and the values are "pclass"
1. Answer the question:  How does this table differ from the pivot_table above?  Specifically, how is "who" different?

In [127]:
# Code your table here:
who_table = data.pivot_table(values = 'pclass', columns = 'who')

# Run your table here:
print(who_table)

# Answer the question here:
# Who is ran as columns in this table as opposed to rows in the above table. It's also not broken down into age groups like above. 


who        child       man     woman
pclass  2.626506  2.372439  2.084871


### Melt <span style="color:dodgerblue;"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - </span> 

1.  What does **melt** to the data? 

In [27]:
# What does melt do?
# Melt allows you to change how the table is formatted. You can go from long to wide or wide to long. Melt allows you to change the format so that 
# it can be more easily read. Some tables are better in wide format and some are better in long format. 

2. Melt to your data.  Be sure to store the output in a new variable.  What is the new shape of your table?

In [128]:
# Create your default melt table here with the following syntax:  new_name = pd.melt(data_set)
new_name = pd.melt(data)
# Run your table here:
print(new_name)
# Check the shape of your new table.
new_name.shape

        variable  value
0       survived      0
1       survived      1
2       survived      1
3       survived      1
4       survived      0
...          ...    ...
15142  age_group  20-29
15143  age_group   0-19
15144  age_group    80+
15145  age_group  20-29
15146  age_group  30-39

[15147 rows x 2 columns]


(15147, 2)

3. Create a melt table where the index variables are "embarked", and the values are "fare" and "deck"

In [134]:
# Create your melt table here:
embarked_melt = pd.melt(data, id_vars = 'embarked', value_vars = ['fare', 'deck'])
# Run your table here:
print(embarked_melt)
# Check the shape
embarked_melt.shape

     embarked variable    value
0           S     fare     7.25
1           C     fare  71.2833
2           S     fare    7.925
3           S     fare     53.1
4           S     fare     8.05
...       ...      ...      ...
1777        S     deck      NaN
1778        S     deck        B
1779        S     deck      NaN
1780        C     deck        C
1781        Q     deck      NaN

[1782 rows x 3 columns]


(1782, 3)

# Optonal Challenges:

1. Clean and Explore the table.  
    1. How would you handle any missing data?
    1. Would you keep all of the columns?
    1. Would you want to manipulate any data?