# Learning python - Day 3

## Outline:

- Errors and how to read stuff
- Concept of Modules
- Numpy/pandas
    - Really basic/high level coverage


# Error!

## What is an error?

- An error is a mistake in your code that causes it to not run
- There are many different types of errors
- Errors are your friend! They tell you what is wrong with your code
- You will encounter errors. Many, many errors. So many errors...
    - Everyone does

## How to read an error

- Read the error from the bottom up
- The last line is the most important
- The last line tells you what type of error it is
- The second to last line tells you where the error is
- The third to last line tells you what caused the error

## Example errors

In [1]:
prant("hello world")

NameError: name 'prant' is not defined

In [2]:
x = 1
y = 2
z = (x + y

SyntaxError: incomplete input (3710527719.py, line 3)

In [3]:
print(math.pi())

NameError: name 'math' is not defined

# Python Modules

A module in Python is simply a file containing Python definitions and statements (just like an R library). Putting code into modules is useful because of the ability to import the module functionality into your script or IPython session. This generally will add functionality that you need for a specific task.

different than Franklin modules

## What a module provides

Modules provide functions that you can use in your code. For example, the `math` module provides the `sqrt()` function that you can use in your code to compute the square root of a number. Modules also provide objects that you can use in your code. For example, the `math` module provides the `pi` object that you can use in your code to get the value of pi. Modules also provide classes that you can use in your code. For example, the `datetime` module provides the `datetime` class that you can use in your code to create datetime objects.



## How to Import a Module

To use a module in your code, you first need to import it (just like an R library). To import a module, use the `import` statement at the top of your code. You can also use the `from...import` statement to import specific attributes or functions from a given module.

In [4]:
import math



Now you can use the functions and variables defined in the `math` module:



In [5]:
print(math.pi)  # prints: 3.141592653589793

3.141592653589793




# Numpy

Numpy is a Python library used for working with arrays. It also has functions for working in the domain of linear algebra, fourier transform, and matrices.

## How to Import Numpy



In [6]:
import numpy as np

## Creating a Numpy Array

- A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers.
- The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.
- We can initialize numpy arrays from nested Python lists, and access elements using square brackets:
- Numpy also provides many functions to create arrays:



In [7]:
arr = np.array([1, 2, 3, 4, 5])
arr2d = np.array([range(10), range(10, 20)])
arr3d = np.array(range(3**3)).reshape((3, 3, 3))

print("A 1 dimensional array:\n", arr, "\n")
print("A 2 dimensional array:\n", arr2d, "\n")
print("A 3 dimensional array:\n", arr3d, "\n")
print("A transposed 2D array:\n", arr2d.transpose(), "\n")

A 1 dimensional array:
 [1 2 3 4 5] 

A 2 dimensional array:
 [[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]] 

A 3 dimensional array:
 [[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]] 

A transposed 2D array:
 [[ 0 10]
 [ 1 11]
 [ 2 12]
 [ 3 13]
 [ 4 14]
 [ 5 15]
 [ 6 16]
 [ 7 17]
 [ 8 18]
 [ 9 19]] 



multidimensional numpy arrays are sliced by tuple rather than multiple indexing

In [8]:
print(arr[2])
print(arr2d[1, 3])
print(arr3d[0, 2, 1])

3
13
7


In [9]:
linear = np.linspace(0.5, 5, num= 10)
logorithmic = np.logspace(0, 5, num= 6, base= 10)
geometric = np.geomspace(1, 100000, num= 6)

print(linear)
print(logorithmic)
print(geometric)

[0.5 1.  1.5 2.  2.5 3.  3.5 4.  4.5 5. ]
[1.e+00 1.e+01 1.e+02 1.e+03 1.e+04 1.e+05]
[1.e+00 1.e+01 1.e+02 1.e+03 1.e+04 1.e+05]




## Array Operations

Note the difference compared to normal lists.

- operations between arrays are "broadcast", ie elements are paired by position. The arrays must have compatible dimensions.
- operations between an array and a scalar is applied to all elements of the array

In [10]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

print(arr1 + arr2)
print(arr1 + 10)
print(arr2 ** 0)

[5 7 9]
[11 12 13]
[1 1 1]


In [11]:
arr3 = np.array([1,2])

arr1 + arr3

ValueError: operands could not be broadcast together with shapes (3,) (2,) 

In [12]:
arr2d @ arr2d

ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 2 is different from 10)

In [13]:
print(arr2d @ arr2d.transpose())

[[ 285  735]
 [ 735 2185]]


# <b> Important </b>
Arrays can be indexed, and values can be assigned by boolean mask.

In [14]:
print(arr)

mask = arr < 3
print(mask)
print(arr[mask])

arr[mask] = arr[mask] * 5
print(arr)

[1 2 3 4 5]
[ True  True False False False]
[1 2]
[ 5 10  3  4  5]


# Pandas

Pandas is a library used for data manipulation and analysis. It is used to extract data and store it as tables or named lists (series). It is an expansion of numpy arrays with names and labeled indecies.

- Series == named 1 dimensional numpy array
- Dataframe == named 2 dimensional numpy array or a list of Series that share an index

It can best be compared to either a spreadsheet (excel) or to dataframes or tibbles in R.

## Functionality that Pandas provides

- Reading and writing data
- Selecting subsets of data
- Calculating across rows and columns
- Finding and filling missing data
- Applying functions to data

## How to Import Pandas

In [16]:
import pandas as pd



## Creating a Pandas DataFrame Manually



In [17]:
data = {
    'apples': [3, 2, 0, 1],
    'oranges': [0, 3, 7, 2]
}

purchases = pd.DataFrame(data)

purchases

Unnamed: 0,apples,oranges
0,3,0
1,2,3
2,0,7
3,1,2




## Reading Data from CSV File



In [18]:
df = pd.read_csv('titanic.csv')

print(type(df))  # prints: pandas.core.frame.DataFrame
print(df.head(), "\n" * 4) #added newline X 4


df.head() #Jupyter notebooks will print and format the value returned by a code block. #This is not part of standard python!

<class 'pandas.core.frame.DataFrame'>
   Survived  Pclass                                               Name  \
0         0       3                             Mr. Owen Harris Braund   
1         1       1  Mrs. John Bradley (Florence Briggs Thayer) Cum...   
2         1       3                              Miss. Laina Heikkinen   
3         1       1        Mrs. Jacques Heath (Lily May Peel) Futrelle   
4         0       3                            Mr. William Henry Allen   

      Sex   Age  Siblings/Spouses Aboard  Parents/Children Aboard     Fare  
0    male  22.0                        1                        0   7.2500  
1  female  38.0                        1                        0  71.2833  
2  female  26.0                        0                        0   7.9250  
3  female  35.0                        1                        0  53.1000  
4    male  35.0                        0                        0   8.0500   






Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.25
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.925
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.05


In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 887 entries, 0 to 886
Data columns (total 8 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Survived                 887 non-null    int64  
 1   Pclass                   887 non-null    int64  
 2   Name                     887 non-null    object 
 3   Sex                      887 non-null    object 
 4   Age                      887 non-null    float64
 5   Siblings/Spouses Aboard  887 non-null    int64  
 6   Parents/Children Aboard  887 non-null    int64  
 7   Fare                     887 non-null    float64
dtypes: float64(2), int64(4), object(2)
memory usage: 55.6+ KB


In [20]:
df.describe()

Unnamed: 0,Survived,Pclass,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
count,887.0,887.0,887.0,887.0,887.0,887.0
mean,0.385569,2.305524,29.471443,0.525366,0.383315,32.30542
std,0.487004,0.836662,14.121908,1.104669,0.807466,49.78204
min,0.0,1.0,0.42,0.0,0.0,0.0
25%,0.0,2.0,20.25,0.0,0.0,7.925
50%,0.0,3.0,28.0,0.0,0.0,14.4542
75%,1.0,3.0,38.0,1.0,0.0,31.1375
max,1.0,3.0,80.0,8.0,6.0,512.3292


## Manipulate pandas dataframes

### Selecting Data

Dataframes as a list of Series objects. 
The quickest way to grab an entire column.

In [21]:
print(df.columns, "\n" * 4)

print(type(df["Name"]))
df["Name"]

Index(['Survived', 'Pclass', 'Name', 'Sex', 'Age', 'Siblings/Spouses Aboard',
       'Parents/Children Aboard', 'Fare'],
      dtype='object') 




<class 'pandas.core.series.Series'>


0                                 Mr. Owen Harris Braund
1      Mrs. John Bradley (Florence Briggs Thayer) Cum...
2                                  Miss. Laina Heikkinen
3            Mrs. Jacques Heath (Lily May Peel) Futrelle
4                                Mr. William Henry Allen
                             ...                        
882                                 Rev. Juozas Montvila
883                          Miss. Margaret Edith Graham
884                       Miss. Catherine Helen Johnston
885                                 Mr. Karl Howell Behr
886                                   Mr. Patrick Dooley
Name: Name, Length: 887, dtype: object

In [22]:
print(type(df[["Name", "Age"]]))
df[["Name", "Age"]]

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Name,Age
0,Mr. Owen Harris Braund,22.0
1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,38.0
2,Miss. Laina Heikkinen,26.0
3,Mrs. Jacques Heath (Lily May Peel) Futrelle,35.0
4,Mr. William Henry Allen,35.0
...,...,...
882,Rev. Juozas Montvila,27.0
883,Miss. Margaret Edith Graham,19.0
884,Miss. Catherine Helen Johnston,7.0
885,Mr. Karl Howell Behr,26.0


In [23]:
print(type(df[["Name"]]))
df[["Name"]]

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Name
0,Mr. Owen Harris Braund
1,Mrs. John Bradley (Florence Briggs Thayer) Cum...
2,Miss. Laina Heikkinen
3,Mrs. Jacques Heath (Lily May Peel) Futrelle
4,Mr. William Henry Allen
...,...
882,Rev. Juozas Montvila
883,Miss. Margaret Edith Graham
884,Miss. Catherine Helen Johnston
885,Mr. Karl Howell Behr


Like numpy arrays dataframes are indexed by tuple. Unlike numpy arrays you should use the `.loc[]` method. Values are accessed by `{row label}, {column label}` <br><br>
`.iloc[]` performs the same operation but only accepts positional integer indexes for both `Index` and `Column`

In [24]:
df.head()

Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.25
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.925
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.05


In [25]:
df.loc[1]

Survived                                                                   1
Pclass                                                                     1
Name                       Mrs. John Bradley (Florence Briggs Thayer) Cum...
Sex                                                                   female
Age                                                                     38.0
Siblings/Spouses Aboard                                                    1
Parents/Children Aboard                                                    0
Fare                                                                 71.2833
Name: 1, dtype: object

In [26]:
df.loc[[1]]

Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833


In [27]:
df.loc[:, "Name"] == df["Name"]

0      True
1      True
2      True
3      True
4      True
       ... 
882    True
883    True
884    True
885    True
886    True
Name: Name, Length: 887, dtype: bool

In [28]:
df.loc[1, "Name"]

'Mrs. John Bradley (Florence Briggs Thayer) Cumings'

In [29]:
df.loc[[85, 86], ["Name", "Fare"]] #slice the 86th and 87th rows with the columns "Name" and "Fare"

Unnamed: 0,Name,Fare
85,Mr. William Neal Ford,34.375
86,Mr. Selman Francis Slocovski,8.05


In [30]:
seniors = df.loc[df["Age"] >= 65, ["Name", "Age"]] #slice the "Name" column with a mask where "Age" is gt 65
seniors

Unnamed: 0,Name,Age
33,Mr. Edward H Wheadon,66.0
53,Mr. Engelhart Cornelius Ostby,65.0
95,Mr. George B Goldschmidt,71.0
115,Mr. Patrick Connors,70.5
278,Mr. Frank Duane,65.0
453,Mr. Francis Davis Millet,65.0
490,Mr. Ramon Artagaveytia,71.0
508,Mr. James Webber,66.0
535,Mr. Samuel Beard Risien,69.0
627,Mr. Algernon Henry Wilson Barkworth,80.0


Doesn't work in the wrong order

In [31]:
df.loc[["Name", "Fare"], [85, 86]]

KeyError: "None of [Index(['Name', 'Fare'], dtype='object')] are in the [index]"

![image.png](attachment:image.png)

### Note

Most operations are not "inplace" by default. The original value isn't changed and you need to capture the output by either rewriting the old variable or making a new one. <br><br>
This allows for chaining of manipulations, similar to piping in R

In [32]:
df.head(10) #Peek at a the first 10 rows

df.drop('Fare', axis = "columns") #Drop a column called Fare

df.sort_values('Age', ascending=False) #sort the whole dataframe based on the values in the "Age" column

df.iloc[[0, 1, 2, 3, 4, 5], [0, 3, 5]] #slice out the first 5 rows with the first 4th and 6th columns

df.sort_values('Age', ascending=False).loc[0:10, df.columns.difference(["Sex", "Fare"])] #Sort by age, return first 10 rows, and all columns besides "Sex" and "Fare"

Unnamed: 0,Age,Name,Parents/Children Aboard,Pclass,Siblings/Spouses Aboard,Survived
0,22.0,Mr. Owen Harris Braund,0,3,1,0
285,22.0,Mr. Penko Naidenoff,0,3,0,0
411,22.0,Mr. Alfred Fleming Cunningham,0,2,0,0
536,22.0,Miss. Hedwig Margaritha Frolicher,2,1,0,1
366,22.0,Miss. Annie Jermyn,0,3,0,1
...,...,...,...,...,...,...
232,5.0,Miss. Lillian Gertrud Asplund,2,3,4,1
158,5.0,Master. Thomas Henry Sage,2,3,8,0
773,5.0,Miss. Virginia Ethel Emanuel,0,3,0,1
62,4.0,Master. Harald Skoog,2,3,3,0


- `.melt()` is used to convert "wide" data into "long" data
- `.pivot()` is used to convert "long" data in "wide" data

In [33]:
short_list = df.head()
mlt_list = short_list.melt(id_vars= "Name", value_vars= ["Age", "Fare"])
pvt_list = mlt_list.pivot(index= "Name", columns= "variable", values= "value")
short_list

Unnamed: 0,Survived,Pclass,Name,Sex,Age,Siblings/Spouses Aboard,Parents/Children Aboard,Fare
0,0,3,Mr. Owen Harris Braund,male,22.0,1,0,7.25
1,1,1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,female,38.0,1,0,71.2833
2,1,3,Miss. Laina Heikkinen,female,26.0,0,0,7.925
3,1,1,Mrs. Jacques Heath (Lily May Peel) Futrelle,female,35.0,1,0,53.1
4,0,3,Mr. William Henry Allen,male,35.0,0,0,8.05


In [34]:
mlt_list

Unnamed: 0,Name,variable,value
0,Mr. Owen Harris Braund,Age,22.0
1,Mrs. John Bradley (Florence Briggs Thayer) Cum...,Age,38.0
2,Miss. Laina Heikkinen,Age,26.0
3,Mrs. Jacques Heath (Lily May Peel) Futrelle,Age,35.0
4,Mr. William Henry Allen,Age,35.0
5,Mr. Owen Harris Braund,Fare,7.25
6,Mrs. John Bradley (Florence Briggs Thayer) Cum...,Fare,71.2833
7,Miss. Laina Heikkinen,Fare,7.925
8,Mrs. Jacques Heath (Lily May Peel) Futrelle,Fare,53.1
9,Mr. William Henry Allen,Fare,8.05


### Note

After our pivot, the `Name` column has become the index.
- those values can be used to subset rows with `.loc[]`
- those values are <b> NOT </b> a `column`
- can be fixed with the `.reset_index()` method

In [35]:
pvt_list

variable,Age,Fare
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Miss. Laina Heikkinen,26.0,7.925
Mr. Owen Harris Braund,22.0,7.25
Mr. William Henry Allen,35.0,8.05
Mrs. Jacques Heath (Lily May Peel) Futrelle,35.0,53.1
Mrs. John Bradley (Florence Briggs Thayer) Cumings,38.0,71.2833


In [36]:
pvt_list.loc[0, "Age"]

KeyError: 0

In [37]:
pvt_list.loc["Miss. Laina Heikkinen", "Age"] #"Miss. Laina Heikkinen" is the index value, "Age" is the column value

26.0

In [38]:
print(pvt_list.index, "\n")
print(pvt_list.columns)

Index(['Miss. Laina Heikkinen', 'Mr. Owen Harris Braund',
       'Mr. William Henry Allen',
       'Mrs. Jacques Heath (Lily May Peel) Futrelle',
       'Mrs. John Bradley (Florence Briggs Thayer) Cumings'],
      dtype='object', name='Name') 

Index(['Age', 'Fare'], dtype='object', name='variable')


In [39]:
pvt_list["Name"]

KeyError: 'Name'

In [40]:
pvt_list.reset_index()

variable,Name,Age,Fare
0,Miss. Laina Heikkinen,26.0,7.925
1,Mr. Owen Harris Braund,22.0,7.25
2,Mr. William Henry Allen,35.0,8.05
3,Mrs. Jacques Heath (Lily May Peel) Futrelle,35.0,53.1
4,Mrs. John Bradley (Florence Briggs Thayer) Cum...,38.0,71.2833


# How to do things other than math on elements of a DataFrame or Series

In [41]:
df["Siblings/Spouses Aboard"]

0      1
1      1
2      0
3      1
4      0
      ..
882    0
883    0
884    1
885    0
886    0
Name: Siblings/Spouses Aboard, Length: 887, dtype: int64

In [42]:
float(df["Siblings/Spouses Aboard"])

TypeError: cannot convert the series to <class 'float'>

# `.apply()`

Given a function that can accept 1 parameter, that function will be called on each element of the Series. The result is a new Series of the returned values. <br><br>
Here is a bad example

In [43]:
?float

[1;31mInit signature:[0m [0mfloat[0m[1;33m([0m[0mx[0m[1;33m=[0m[1;36m0[0m[1;33m,[0m [1;33m/[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m      Convert a string or number to a floating point number, if possible.
[1;31mType:[0m           type
[1;31mSubclasses:[0m     float64

In [44]:
df["Siblings/Spouses Aboard"].apply(float)

0      1.0
1      1.0
2      0.0
3      1.0
4      0.0
      ... 
882    0.0
883    0.0
884    1.0
885    0.0
886    0.0
Name: Siblings/Spouses Aboard, Length: 887, dtype: float64

The correct way to convert the type of a Series

In [45]:
df["Siblings/Spouses Aboard"].astype("float")

0      1.0
1      1.0
2      0.0
3      1.0
4      0.0
      ... 
882    0.0
883    0.0
884    1.0
885    0.0
886    0.0
Name: Siblings/Spouses Aboard, Length: 887, dtype: float64

## `.groupby()`

Returns a series of subsets of the dataframe, where each subset all share the same value for the given group(s). <br>
The interface is like a single Dataframe, but the output is like a Series of dataframes.

In [46]:
df.groupby("Sex")["Age"].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
female,314.0,27.719745,13.83474,0.75,18.0,27.0,36.0,63.0
male,573.0,30.431361,14.197273,0.42,21.0,28.0,38.0,80.0


In [48]:
df.groupby("Sex").apply(print)

     Survived  Pclass                                               Name  \
1           1       1  Mrs. John Bradley (Florence Briggs Thayer) Cum...   
2           1       3                              Miss. Laina Heikkinen   
3           1       1        Mrs. Jacques Heath (Lily May Peel) Futrelle   
8           1       3   Mrs. Oscar W (Elisabeth Vilhelmina Berg) Johnson   
9           1       2                 Mrs. Nicholas (Adele Achem) Nasser   
..        ...     ...                                                ...   
876         1       2        Mrs. William (Imanita Parrish Hall) Shelley   
878         0       3                        Miss. Gerda Ulrika Dahlberg   
881         0       3                Mrs. William (Margaret Norton) Rice   
883         1       1                        Miss. Margaret Edith Graham   
884         0       3                     Miss. Catherine Helen Johnston   

        Sex   Age  Siblings/Spouses Aboard  Parents/Children Aboard     Fare  
1    fem

In [49]:
df.groupby(["Sex", "Pclass"]).apply(print)

     Survived  Pclass                                               Name  \
1           1       1  Mrs. John Bradley (Florence Briggs Thayer) Cum...   
3           1       1        Mrs. Jacques Heath (Lily May Peel) Futrelle   
11          1       1                            Miss. Elizabeth Bonnell   
31          1       1      Mrs. William Augustus (Marie Eugenie) Spencer   
51          1       1            Mrs. Henry Sleeper (Myna Haxtun) Harper   
..        ...     ...                                                ...   
852         1       1          Mrs. George Dennick (Mary Hitchcock) Wick   
858         1       1  Mrs. Frederick Joel (Margaret Welles Barron) S...   
867         1       1    Mrs. Richard Leonard (Sallie Monypeny) Beckwith   
875         1       1       Mrs. Thomas Jr (Lily Alexenia Wilson) Potter   
883         1       1                        Miss. Margaret Edith Graham   

        Sex   Age  Siblings/Spouses Aboard  Parents/Children Aboard      Fare  
1    fe

# Activity

- Generate descriptive statistics of the Fare passengers paid in each Passenger class `Pclass`
- Who paid the most in each class?
- Who paid the least in each class?

In [50]:
#Your code goes here


In [51]:
df.groupby("Pclass")["Fare"].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Pclass,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,216.0,84.154687,78.380373,0.0,30.92395,60.2875,93.5,512.3292
2,184.0,20.662183,13.417399,0.0,13.0,14.25,26.0,73.5
3,487.0,13.707707,11.817309,0.0,7.75,8.05,15.5,69.55


In [52]:
min = df.groupby("Pclass")["Fare"].min()
max = df.groupby("Pclass")["Fare"].max()

stats = {
    "min": min,
    "max": max
}

for st_name, stat in stats.items():
    for pclass in df["Pclass"].unique():
        print(
            f"The {st_name} fare in class {pclass} was {stat[pclass]}, paid by:"
            )

        print(
            #get column "Name" from rows where "Pclass" == pclass AND "Fare" is the appropriate statistic for that class
            df.loc[(df["Pclass"] == pclass) & (df["Fare"] == stat[pclass]), "Name"].to_string(),
            "\n\n"
            )

The min fare in class 3 was 0.0, paid by:
178                Mr. Lionel Leonard
269       Mr. William Henry Tornquist
300    Mr. William Cahoone Jr Johnson
594                Mr. Alfred Johnson 


The min fare in class 1 was 0.0, paid by:
261              Mr. William Harrison
630      Mr. William Henry Marsh Parr
802             Mr. Thomas Jr Andrews
811                   Mr. Richard Fry
818    Jonkheer. John George Reuchlin 


The min fare in class 2 was 0.0, paid by:
275               Mr. Francis Parkes
411    Mr. Alfred Fleming Cunningham
463             Mr. William Campbell
478           Mr. Anthony Wood Frost
671        Mr. Ennis Hastings Watson
728              Mr. Robert J Knight 


The max fare in class 3 was 69.55, paid by:
158      Master. Thomas Henry Sage
179    Miss. Constance Gladys Sage
200             Mr. Frederick Sage
322        Mr. George John Jr Sage
788         Miss. Stella Anna Sage
842        Mr. Douglas Bullen Sage
859       Miss. Dorothy Edith Sage 


The max f

## A very brief introduction to installing modules:

We have both numpy and pandas pre-installed for you, but if you need to use a module you don't have, you may first need to install it using pip.

You can also install modules using conda.

We will go into more detail about how to install modules on Friday.


In [None]:
# You don't need to actually run this. This is just an example.
# pip install numpy pandas