# <br><br><span style="color:purple">Python Bootcamp 4 Part 1</span>

# pandas dataframes (I)

- Pandas is one of the most commonly used Python packages/libraries for data science. Developed by Wes McKinney in January 2008. <br><br>
- Pandas is Python's answer for making two dimensional tables (like Excel).<br><br>
- Pandas calls a table a "DataFrame".<br><br>
- A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet <br><br>
- Pandas DataFrames are also used by Python's other packages for statistical analysis, data manipulation, and data visualization.<br><br>
- Pandas DataFrames can be exported as .csv and other files.<br><br>

## About these two parts

These two parts are designed as an introduction to using the package known as `pandas`.  

By the end of these two parts, you should be able to:  
    -read a csv into a dataframe  
    -filter by columns  
    -run some basic statistics on that dataframe  
    -graph the data using a second package called `seaborn`.  

## Introduction to Pandas

[Pandas](http://pandas.pydata.org/) is the essential data analysis library for Python programmers. It provides fast and flexible data structures built on top of [numpy](http://www.numpy.org/) ($\rightarrow$ the fundamental package for scientific computing in Python, such as mathematical, logical, shape manipulation,  linear algebra, basic statistical operations, etc.)

It is well suited to handle "tabular" data (that might be found in a spreadsheet), time series data, or pretty much anything you care to put in a matrix with rows and named columns.

<font color="red">It contains two primary data structures, the `Series` (1-dimensional) and the `DataFrame` (2-dimensional) as well as a host of convenience methods for loading and working with data.</font>

The main point that makes pandas is that all data is *intrinsically aligned*. That means each data structure, `DataFrame` or `Series` has something called an **Index** that links data values with a label. That link will always be there (unless you explicitly break or change it) and it's what allows pandas to quickly and efficiently "do the right thing" when working with data.

In [1]:
!pip install pandas




[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


### import pandas

Because pandas is one of the most commonly used Python packages, it often gets imported as a shortened version of it's actual name. This makes it quicker to type.

In [2]:
import pandas as pd

In [3]:
import numpy as np

## The `Series` Object

A `Series` is a one-dimensional labeled array of indexed data, capable of holding data of any type (integer, string, float, python objects, etc.)

In [4]:
data = pd.Series([0.1, 0.2, 0.3, 0.4])

In [5]:
data

0    0.1
1    0.2
2    0.3
3    0.4
dtype: float64

$\uparrow$ On a 64-bit system, default types will be 64-bit

In [6]:
type(data)

pandas.core.series.Series

The `Series` wraps a 1-d `ndarray` from numpy and an `Index` object.

In [7]:
print(data.values)

[0.1 0.2 0.3 0.4]


In [8]:
type(data.values)

numpy.ndarray

In [9]:
data.index

RangeIndex(start=0, stop=4, step=1)

In [10]:
# This particular index type, the `RangeIndex`, let us use the
# same square-bracket notation as a `ndarray` to access elements:
data[0]

0.1

In [11]:
data.values[0]

0.1

In [12]:
# or even a slice:
data[1:3]

1    0.2
2    0.3
dtype: float64

In [13]:
data.values[1:3]

array([0.2, 0.3])

We don't have to use this auto-generated list of integers as the index though. Index values can be specified manually and don't even have to be integers.

In [14]:
data = pd.Series([0.1, 0.2, 0.3, 0.4], index=['a', 'b', 'c', 'd'])

In [15]:
data

a    0.1
b    0.2
c    0.3
d    0.4
dtype: float64

In [16]:
data.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [17]:
# Item access works just like before, with square brackets, 
# even though the index values are strings
data['a']

0.1

In [18]:
#once you have labels, you can also access them this way (assuming no spaces in name)
data.a

0.1

In [19]:
# slices still work! But note the last element is included this time.
# This is the default behavior for indexes.
data['a':'d']

a    0.1
b    0.2
c    0.3
d    0.4
dtype: float64

In [20]:
# We could create a non-sequential integer index:
data = pd.Series([0.1, 0.2, 0.3, 0.4], index=[5, 8, 2, 1])
data

5    0.1
8    0.2
2    0.3
1    0.4
dtype: float64

In [21]:
data.index

Index([5, 8, 2, 1], dtype='int64')

In [22]:
data[1]

0.4

In [23]:
# Why?
data.values[1]

0.2

Remember that the values method (data.values) is converting the column into a numpy array.  This means any indexing follows the numpy rules (which are based on position), not the pandas rules (which are based on index)

`Series` are in fact a cross between a numpy array and a python dictionary. You can think of them as a dictionary with *typed* keys and *typed* values.

In [24]:
# in fact it is easy to convert a dictionary into a series
max_depths_dict = {
    'Erie': 64,
    'Huron': 229,
    'Michigan': 281,
    'Ontario': 244,
    'Superior': 406,
}

In [25]:
max_depths_dict

{'Erie': 64, 'Huron': 229, 'Michigan': 281, 'Ontario': 244, 'Superior': 406}

In [26]:
type(max_depths_dict)

dict

In [27]:
max_depths = pd.Series(max_depths_dict)

In [28]:
max_depths

Erie         64
Huron       229
Michigan    281
Ontario     244
Superior    406
dtype: int64

In [29]:
type(max_depths)

pandas.core.series.Series

In [30]:
# it looks like a dictionary!
max_depths['Michigan']

281

In [31]:
max_depths_dict['Michigan']

281

In [32]:
# Notice the index in this case was constructed automatically from the dictionary keys.
max_depths.index

Index(['Erie', 'Huron', 'Michigan', 'Ontario', 'Superior'], dtype='object')

## Numpy and `Series`

Because the values in a `Series` are contained in a numpy `ndarray`, `Series` provides all the benefits of numpy! Namely, this means we get ultra-fast vectorized math operations on the elements of a `Series`.

In [33]:
max_depths * 10

Erie         640
Huron       2290
Michigan    2810
Ontario     2440
Superior    4060
dtype: int64

You can use most numpy functions directly on a `Series` object (and later, we'll see `DataFrame` objects as well), but pandas also provides access to these numpy functions through the `Series` object methods.

In [34]:
np.sin(max_depths)

Erie        0.920026
Huron       0.329962
Michigan   -0.985151
Ontario    -0.864536
Superior   -0.670252
dtype: float64

In [35]:
np.mean(max_depths)

244.8

In [36]:
max_depths.mean()

244.8

In [37]:
#and if you are lazy and just want a bunch of standard stats
max_depths.describe()

# how to keep 2 digits next to the decimal point?

count      5.000000
mean     244.800000
std      122.713895
min       64.000000
25%      229.000000
50%      244.000000
75%      281.000000
max      406.000000
dtype: float64

`Series` objects also support Boolean mask indexing (aka boolean indexing, is a feature in Python NumPy that allows for the filtering of values in numpy arrays):

In [38]:
max_depths[max_depths > max_depths.mean()] # pass a condition in the indexing brackets, [], of an array. The condition can be any comparison.

Michigan    281
Superior    406
dtype: int64

And so-called "fancy indexing", i.e. using a list or array to specify values to access:

In [39]:
max_depths

Erie         64
Huron       229
Michigan    281
Ontario     244
Superior    406
dtype: int64

In [40]:
max_depths[['Erie', 'Ontario']]

Erie        64
Ontario    244
dtype: int64

In [41]:
max_depths['Erie':'Ontario']

Erie         64
Huron       229
Michigan    281
Ontario     244
dtype: int64

For `df[[colname(s)]]`, the **interior square brackets** are for **list**, and the **outside square brackets** are **indexing** operator, i.e. you must use double brackets if you select two or more columns. 

With one column name, single pair of brackets returns a **Series**, while double brackets return a **dataframe**.

## The DataFrame Object

Much like the `Series` is a one-dimensional array of indexed data, a `DataFrame` is a two-dimensional array of indexed data.

You can think of a `DataFrame` as a sequence of `Series` objects <font color="red">all sharing the same index.</font>

In [42]:
avg_depths_dict = {
    'Erie': 19,
    'Huron': 59,
    'Michigan': 85,
    'Ontario': 86,
    'Superior': 149,
}

In [43]:
avg_depths = pd.Series(avg_depths_dict)

In [44]:
avg_depths

Erie         19
Huron        59
Michigan     85
Ontario      86
Superior    149
dtype: int64

In [45]:
# We've already created this series:
max_depths

Erie         64
Huron       229
Michigan    281
Ontario     244
Superior    406
dtype: int64

In [46]:
lakes = pd.DataFrame({'Max Depth (m)': max_depths, 'Avg Depth (m)': avg_depths})
# Or pd.DataFrame({'Max Depth (m)': max_depths_dict, 'Avg Depth (m)': avg_depths_dict})

In [47]:
lakes

Unnamed: 0,Max Depth (m),Avg Depth (m)
Erie,64,19
Huron,229,59
Michigan,281,85
Ontario,244,86
Superior,406,149


Just like the `Series`, a `DataFrame` has an `index` property.

In [48]:
lakes.index

Index(['Erie', 'Huron', 'Michigan', 'Ontario', 'Superior'], dtype='object')

And a `values` property that exposes the underlying `ndarray`.

In [49]:
lakes.values

array([[ 64,  19],
       [229,  59],
       [281,  85],
       [244,  86],
       [406, 149]], dtype=int64)

And unlike the Series, the DataFrame has a `columns` property, which is also an index.

In [50]:
lakes.columns

Index(['Max Depth (m)', 'Avg Depth (m)'], dtype='object')

We can get the shape of a dataframe, just like a numpy ndarray:

In [51]:
lakes.shape

(5, 2)

We can do dictionary-style lookups into the dataframe by column name to get a single `Series`:

In [52]:
lakes['Max Depth (m)']

Erie         64
Huron       229
Michigan    281
Ontario     244
Superior    406
Name: Max Depth (m), dtype: int64

To select more than one column put a list of column names inside the dictionary-style square brackets:

In [53]:
lakes['Max Depth (m)': 'Avg Depth (m)']

Unnamed: 0,Max Depth (m),Avg Depth (m)


For `df[[colname(s)]]`, the **interior square brackets** are for **list**, and the **outside square brackets** are **indexing** operator, i.e. you must use double brackets if you select two or more columns. 

With one column name, single pair of brackets returns a **Series**, while double brackets return a **dataframe**.

In [54]:
lakes

Unnamed: 0,Max Depth (m),Avg Depth (m)
Erie,64,19
Huron,229,59
Michigan,281,85
Ontario,244,86
Superior,406,149


### Creating new columns

Once we have a `DataFrame`, creating new columns is done through simple assignment.

In [55]:
surface_area = pd.Series({
    'Superior': 82097,
    'Michigan': 57753,
    'Huron': 59565,
    'Erie': 25655,
    'Ontario': 19009,
})

In [56]:
lakes['Surface Area (sq km)'] = surface_area

In [57]:
lakes

Unnamed: 0,Max Depth (m),Avg Depth (m),Surface Area (sq km)
Erie,64,19,25655
Huron,229,59,59565
Michigan,281,85,57753
Ontario,244,86,19009
Superior,406,149,82097


Notice how the index values allowed pandas to "align" the new data with the existing data!

It's also possible to create new columns from existing columns. Say for example we wanted a column to track the difference between the avg depth and max depth. We'll call this the "depth spread".

In [58]:
lakes['Depth Spread'] = lakes['Max Depth (m)'] - lakes['Avg Depth (m)']

In [59]:
lakes

Unnamed: 0,Max Depth (m),Avg Depth (m),Surface Area (sq km),Depth Spread
Erie,64,19,25655,45
Huron,229,59,59565,170
Michigan,281,85,57753,196
Ontario,244,86,19009,158
Superior,406,149,82097,257


DataFrames can be created from many different kinds of data structures (`Series` objects, lists, dictionaries, numpy arrays, etc.)

If you don't specify an index explicitly when creating the DataFrame, or you are using data without implicit indexes, pandas will create a `RangeIndex` for you:

In [60]:
call_signs = ['WLUW', 'WNUR', 'WBEZ', 'WXRT', 'WFMT']

In [61]:
type(call_signs)

list

In [62]:
frequencies = [88.7, 89.3333, 91.5, 93.1, 98.7]

In [63]:
formats = ['College', 'College', 'Public Radio', 'Adult Album Alternative', 'Classical']

In [64]:
radio_station_df = pd.DataFrame({'Call Sign': call_signs, 'Frequency': frequencies, 'Format': formats})

In [65]:
radio_station_df

Unnamed: 0,Call Sign,Frequency,Format
0,WLUW,88.7,College
1,WNUR,89.3333,College
2,WBEZ,91.5,Public Radio
3,WXRT,93.1,Adult Album Alternative
4,WFMT,98.7,Classical


In [66]:
radio_station_df[['Frequency']].round(1)

Unnamed: 0,Frequency
0,88.7
1,89.3
2,91.5
3,93.1
4,98.7


### Setting the index

You may want to "move" one of the columns to be the index. You can do this with the DataFrame's `set_index` method. By default this returns a new DataFrame with the index replaced with the values in the chosen column.

The `inplace` parameter will make the change to the <font color="red">existing</font> DataFrame rather than returning a new one.

In [67]:
radio_station_df.set_index('Call Sign', inplace=True)
radio_station_df

Unnamed: 0_level_0,Frequency,Format
Call Sign,Unnamed: 1_level_1,Unnamed: 2_level_1
WLUW,88.7,College
WNUR,89.3333,College
WBEZ,91.5,Public Radio
WXRT,93.1,Adult Album Alternative
WFMT,98.7,Classical


In [68]:
# If you want, you can remove the name of index ('Call Sign')
radio_station_df.index.name = None
radio_station_df

Unnamed: 0,Frequency,Format
WLUW,88.7,College
WNUR,89.3333,College
WBEZ,91.5,Public Radio
WXRT,93.1,Adult Album Alternative
WFMT,98.7,Classical


It is possible to move the index back to a column with the `reset_index` method:

In [69]:
radio_station_df.reset_index(inplace=True)
radio_station_df

Unnamed: 0,index,Frequency,Format
0,WLUW,88.7,College
1,WNUR,89.3333,College
2,WBEZ,91.5,Public Radio
3,WXRT,93.1,Adult Album Alternative
4,WFMT,98.7,Classical


## Data Indexing and Selection

Now that we can load data into pandas objects, we need to be able to access it. Pandas offers a variety of methods for accessing the data we need.

First, both `Series` and `DataFrame` objects support dictionary-style access with square brackets. Think of index label values as dictionary keys:

In [70]:
# We saw this above -- access a series like a dictionary to get a single value.
#avg_depths
avg_depths['Michigan']

85

In [71]:
# DataFrame dictionary-style access returns the Series with that column index label:
lakes['Avg Depth (m)']

Erie         19
Huron        59
Michigan     85
Ontario      86
Superior    149
Name: Avg Depth (m), dtype: int64

Boolean masking and fancy indexing work with DataFrames, just like Series objects:

In [72]:
lakes

Unnamed: 0,Max Depth (m),Avg Depth (m),Surface Area (sq km),Depth Spread
Erie,64,19,25655,45
Huron,229,59,59565,170
Michigan,281,85,57753,196
Ontario,244,86,19009,158
Superior,406,149,82097,257


In [73]:
# use a Boolean mask to select just the items we want:
lakes[(avg_depths == 59) | (avg_depths == 86)]

Unnamed: 0,Max Depth (m),Avg Depth (m),Surface Area (sq km),Depth Spread
Huron,229,59,59565,170
Ontario,244,86,19009,158


This works because the Boolean mask creates a new `Series` with the same index values!

In [74]:
avg_depths > 60

Erie        False
Huron       False
Michigan     True
Ontario      True
Superior     True
dtype: bool

In [75]:
# There is a potential problem with non-sequential integer indexes:
data_implicit = pd.Series([100, 200, 300, 400])
data_explicit = pd.Series([100, 200, 300, 400], index=[4, 9, 8, 1])
print('data_implicit')
print(data_implicit)
print()
print('data_explicit')
print(data_explicit)

data_implicit
0    100
1    200
2    300
3    400
dtype: int64

data_explicit
4    100
9    200
8    300
1    400
dtype: int64


To handle this potential confusion between label-based and position-based access and make data access easier in general, pandas provides two "indexers": `Series` and `DataFrame` attributes that expose differents ways to access the data.

- `iloc`: always integer position-based
- `loc`: always label-based

In [76]:
data_implicit.iloc[1]

200

In [77]:
data_explicit.iloc[1]

200

In [78]:
#data_implicit.loc[4]  # Note that this should result in an error

In [79]:
data_explicit.loc[4] # # Note that this does not result in an error

100

In [80]:
# We can use slices to select more than one value as well. Here, get all values after the first one:
data_implicit.iloc[1:]

1    200
2    300
3    400
dtype: int64

In [81]:
# Let's get all rows of the lakes dataframe except the last one:
lakes.iloc[0:-1]

Unnamed: 0,Max Depth (m),Avg Depth (m),Surface Area (sq km),Depth Spread
Erie,64,19,25655,45
Huron,229,59,59565,170
Michigan,281,85,57753,196
Ontario,244,86,19009,158


These indexers (`.iloc` and `.loc`) take two parameters: the row index values to include, and the *column* index values to include. By default, all columns of a DataFrame are included, but it is possible to retrieve only a subset:

In [82]:
lakes

Unnamed: 0,Max Depth (m),Avg Depth (m),Surface Area (sq km),Depth Spread
Erie,64,19,25655,45
Huron,229,59,59565,170
Michigan,281,85,57753,196
Ontario,244,86,19009,158
Superior,406,149,82097,257


In [83]:
lakes[["Max Depth (m)","Avg Depth (m)"]]["Erie":"Michigan"]

Unnamed: 0,Max Depth (m),Avg Depth (m)
Erie,64,19
Huron,229,59
Michigan,281,85


In [84]:
# The first two rows and first two columns only
lakes.iloc[:2, :2]


#How to print the following without using .iloc or .loc?

Unnamed: 0,Max Depth (m),Avg Depth (m)
Erie,64,19
Huron,229,59


In [85]:
#lakes.loc['Michigan']
lakes.loc[1] #will result in an error
#lakes

KeyError: 1

`loc` accepts the following types of inputs:

- a single label (as above)
- a list or array of labels, e.g. ['a', 'b', 'c']
- a slice object with labels e.g. 'a':'c' (note that contrary to usual python slices, both the start and the stop are **included**!)
- A boolean array
- A callable function with one argument (the calling Series, DataFrame or Panel) that returns valid output for indexing (one of the above)

In [86]:
lakes.loc[['Michigan', 'Superior'], ['Max Depth (m)']]

Unnamed: 0,Max Depth (m)
Michigan,281
Superior,406


It is also possible to assign to the values at the locations you specify with the `iloc` and `loc` indexers! They aren't read-only.

In [87]:
df = pd.DataFrame(np.random.randint(0, 10, (3, 3)), columns = ['A', 'B', 'C']) 
# https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html
df

Unnamed: 0,A,B,C
0,4,3,5
1,2,6,7
2,5,6,6


In [88]:
# Assign the value 100 to the cells 0,B and 1,B.
# Remember with label-based access, which `loc` uses, the "stop" of the slice is **included**.
df.loc[:1, 'B'] = 100
df

Unnamed: 0,A,B,C
0,4,100,5
1,2,100,7
2,5,6,6


A few more examples with `.loc()`:

In [89]:
lakes['Max Depth (m)'].loc[['Erie', 'Michigan']]

Erie         64
Michigan    281
Name: Max Depth (m), dtype: int64

In [90]:
lakes[['Max Depth (m)', 'Avg Depth (m)']].loc[['Erie', 'Michigan']]

Unnamed: 0,Max Depth (m),Avg Depth (m)
Erie,64,19
Michigan,281,85


In [91]:
lakes[['Max Depth (m)', 'Avg Depth (m)']].loc['Erie':'Michigan']

Unnamed: 0,Max Depth (m),Avg Depth (m)
Erie,64,19
Huron,229,59
Michigan,281,85


In [92]:
lakes.loc['Erie':'Michigan', ['Max Depth (m)', 'Avg Depth (m)']]

Unnamed: 0,Max Depth (m),Avg Depth (m)
Erie,64,19
Huron,229,59
Michigan,281,85


### Examining Data

While you can manipulate and operate on your data in any way you can dream up, pandas does provide basic descriptive statistics and sorting functionality for you. I **highly** recommend reading the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/api.html#computations-descriptive-stats) to see what methods are available and save yourself some work!

The `describe` method is very useful with numeric data:

In [93]:
round(lakes.describe(),2)

Unnamed: 0,Max Depth (m),Avg Depth (m),Surface Area (sq km),Depth Spread
count,5.0,5.0,5.0,5.0
mean,244.8,79.6,48815.8,165.2
std,122.71,47.39,26114.77,77.3
min,64.0,19.0,19009.0,45.0
25%,229.0,59.0,25655.0,158.0
50%,244.0,85.0,57753.0,170.0
75%,281.0,86.0,59565.0,196.0
max,406.0,149.0,82097.0,257.0


We can get the highest value for a given Series with `max`:

In [94]:
lakes['Max Depth (m)'].max()

406

But what if we wanted the top 2? `sort_values` is the answer:

In [95]:
lakes['Max Depth (m)'].sort_values(ascending = False).head(5)

Superior    406
Michigan    281
Ontario     244
Huron       229
Erie         64
Name: Max Depth (m), dtype: int64

This is so common that there is actually a shortcut for it:

In [96]:
max_depths.nlargest(2)

Superior    406
Michigan    281
dtype: int64

Which naturally works on DataFrames as well:

In [97]:
lakes.nlargest(2, 'Avg Depth (m)')

Unnamed: 0,Max Depth (m),Avg Depth (m),Surface Area (sq km),Depth Spread
Superior,406,149,82097,257
Ontario,244,86,19009,158


## Combining DataFrames

Often you will need to combine data from multiple data sets together. There are three types of combinations in pandas: concatenations and merges (aka joins).

**Concatenating** means taking multiple DataFrame objects and appending their rows together to make a new DataFrame. In general you will do this when your datasets contain the <font color="red">same columns</font> and you are combining observations of the <font color="red">same type</font> together into one dataset that contains all the rows from all the datasets.

**Merging** is joining DataFrames together SQL-style by using common values. This is useful when you have multiple datasets with common keys and you want to combine them into one dataset that contains columns from all the datasets being merged.

In [98]:
# Concatenation example
df1 = pd.DataFrame({'Site': [1, 2, 3],
                    'Observed Value': [8.1, 5.5, 6.9]})

df2 = pd.DataFrame({'Site': [7, 8, 9],
                    'Observed Value': [10.5, 11.5, 12.0]})

print("df1: ")
df1

df1: 


Unnamed: 0,Site,Observed Value
0,1,8.1
1,2,5.5
2,3,6.9


In [99]:
print("df2: ")
df2

df2: 


Unnamed: 0,Site,Observed Value
0,7,10.5
1,8,11.5
2,9,12.0


In [100]:
print("concatenated along rows: ")
df3 = pd.concat([df1, df2])
df3
# How to set index?

concatenated along rows: 


Unnamed: 0,Site,Observed Value
0,1,8.1
1,2,5.5
2,3,6.9
0,7,10.5
1,8,11.5
2,9,12.0


In [101]:
print("concatenated along columns: ")
pd.concat([df1, df2], axis = 1)

concatenated along columns: 


Unnamed: 0,Site,Observed Value,Site.1,Observed Value.1
0,1,8.1,7,10.5
1,2,5.5,8,11.5
2,3,6.9,9,12.0


In [102]:
# Merge example
df1 = pd.DataFrame({'Site': [3, 1, 2],
                    'Observed Value': [8.1, 5.5, 6.9]})

df2 = pd.DataFrame({'Site': [1, 2, 3, 4],
                    'Temperature': [27.1, 18.2, 29.8, 30.4]})

print("df1: ")
df1

df1: 


Unnamed: 0,Site,Observed Value
0,3,8.1
1,1,5.5
2,2,6.9


In [103]:
print("df2: ")
df2

df2: 


Unnamed: 0,Site,Temperature
0,1,27.1
1,2,18.2
2,3,29.8
3,4,30.4


In [104]:
print("merged: ")
pd.merge(df1, df2) # inner/intersection

merged: 


Unnamed: 0,Site,Observed Value,Temperature
0,3,8.1,29.8
1,1,5.5,27.1
2,2,6.9,18.2


In [105]:
print("merged: ")
print(pd.merge(df1, df2, how = 'outer')) # outer/union

merged: 
   Site  Observed Value  Temperature
0     1             5.5         27.1
1     2             6.9         18.2
2     3             8.1         29.8
3     4             NaN         30.4
