**ids-pdl08-tut.ipynb**: This Jupyter notebook is provided by Joachim Vogt for the *Python Data Lab* of the module *CH-700 Introduction to Data Science* offered in Fall 2023 at Constructor University. Jupyter notebooks and other learning resources are available from a dedicated *module platform*.

# Pandas basics

This tutorial provides an introduction to the Python package Pandas. Follow the instructions below to learn to

- [ ] construct Pandas Series objects from dictionaries, lists, and arrays,
- [ ] select elements in a Pandas Series using label-based and integer-based methods,
- [ ] construct Pandas DataFrame objects from dictionaries, lists, and arrays,
- [ ] select elements in a Pandas DataFrame using label-based and integer-based methods,
- [ ] join/merge Pandas DataFrame objects, 
- [ ] operate on columns in a Pandas DataFrame, and add new columns,
- [ ] store tabular data from a file in a Pandas DataFrame,
- [ ] extract and restructure tabular data using Pandas DataFrame objects.

If you wish to keep track of your progress, you may edit this markdown cell, check a box in the list above after having worked through the respective part of this notebook, and save the file.

*Short exercises* are embedded in this notebook. *Sample solutions* can be found at the end of the document.

## Preparation

The following data file is expected to reside in the working directory. Identify the file on the module platform and upload it to the same folder as this Jupyter notebook.

- `life-expectancy-at-birth-total-years.csv`: Life expectancy at birth 1960-2019, published by the [World Bank, 2021-07-30](http://data.worldbank.org/data-catalog/world-development-indicators), available from [Our World in Data](https://ourworldindata.org/grapher/life-expectancy-at-birth-total-years).

Run the following code cell to import standard Python data science libraries. The NumPy module facilitates efficient processing of numerical arrays, and is usually imported as `np`. From the matplotlib library we import the package `pyplot` using the standard abbreviation `plt`. The magic command `%matplotlib inline` (IPython shell) allows for inline display of graphics.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

### Importing Pandas

It is common practice to import the Pandas library using the prefix `pd`.

In [None]:
import pandas as pd

### Pandas documentation and tutorials

Pandas is hosted at [https://pandas.pydata.org](https://pandas.pydata.org), with extensive [documentation](https://pandas.pydata.org/docs) and [tutorials](https://pandas.pydata.org/docs/getting_started/intro_tutorials/). If you are familiar with other tools for analyses of tabular data (R, SQL, spreadsheets, SAS, Stata), it may be worthwhile comparing terminologies, see the page [Comparisons with other tools](https://pandas.pydata.org/docs/getting_started/comparisons/).

Note the official spelling is *pandas* (lower case). In this tutorial the package name is capitalized (*Pandas*) to avoid confusion with a bunch of bears.

## Pandas Series

A Pandas Series object stores a one-dimensional labeled array, similar to a Python dictionary, but amended and optimized for data science operations. Python dictionaries relate to Pandas Series objects in a similar way as Python lists relate to NumPy ndarray objects, with the former being more flexible regarding data types, and the latter being optimized for efficient numerical processing.

### Pandas Series from Python dictionaries

Recall that Python dictionaries are collections of key:value pairs.

In [None]:
dct1 = { 'a':10, 'b':20 , 'c':30, 'd':40}
print('Dictionary : ',dct1)
print('Keys       : ',dct1.keys())
print('Values     : ',dct1.values())

When a Python dictionary is turned into a Pandas Series object, the keys are stored in a Pandas Index object.

In [None]:
ser1 = pd.Series(dct1)
print('Series :')
print(ser1)
print('\nIndex  : ',ser1.index)
print('Values : ',ser1.values)

### Pandas Series from Python lists and NumPy arrays

Series can be constructed not only from dictionaries but also from lists or ndarray objects, with the index array provided through a keyword argument of the `Series()` function. 

In [None]:
ser2 = pd.Series([10,20,30,40],index=['a','b','c','d'])
print(ser2)

If a Series is defined without explicit reference to an index, it is derived from the integer positions.

In [None]:
ser3 = pd.Series([10,20,30,40])
print(ser3)
print('\nIndex  : ',ser3.index)

### Selection of elements in a Pandas Series

The most intuitive way of selecting individual elements of a Series is through the index. Slicing produces another Series object, with the end element included in the result. The `values` attribute extracts the array of elements.

In [None]:
ser1 = pd.Series( { 'a':10, 'b':20 , 'c':30, 'd':40} )
print("ser1['c'] : ",ser1['c'])
print("ser1['b':'d'].values : ",ser1['b':'d'].values)
print("\nser1['b':'d'] : ")
print(ser1['b':'d'])

Series elements may also be referenced through their integer position along the index, with the usual logic of Python lists and NumPy arrays (e.g., end elements are omitted).

In [None]:
print("ser1[2] : ",ser1[2])
print("ser1[1:3].values : ",ser1[1:3].values)
print("\nser1[1:3] : ")
print(ser1[1:3])

In cases where the index is formed from an array of integers, slicing may lead to ambiguities, and then it is saver to reference objects through `.loc` (label-based selection) and `.iloc` (integer-based selection).

In [None]:
ser4 = pd.Series( { 1:10, 2:20, 3:30, 4:40 } )
print("ser4 : ")
print(ser4)
print("\nser4[1:3].values      : ",ser4[1:3].values)
print("ser4.loc[1:3].values  : ",ser4.loc[1:3].values)
print("ser4.iloc[1:3].values : ",ser4.iloc[1:3].values)

The referencing syntax can be applied to re-assign values, or even define a new element to th Series.

In [None]:
ser5 = pd.Series( { 'a':10, 'b':20 , 'c':30, 'd':40} )
print(ser5)
ser5['c'] = 33
ser5['e'] = 55
print()
print(ser5)

### Operations on Series objects

Operations involving two or more Series objects are naturally aligned with common elements of their index arrays. In the following example, the two Series objects `ser1` and `ser6` share the indices `a`, `c`, `d` but not `b` and `e`. The object `ser7`, defined through an operation involving both `ser1` and `ser6`, contains all index elements from both Series objects but the associated values are defined only for those indices that are shared.

In [None]:
ser1 = pd.Series( { 'a':10, 'b':20 , 'c':30, 'd':40} )
ser6 = pd.Series( { 'a':1, 'c':3 , 'd':4, 'e':5 } )
ser7 = ser1-ser6
print(ser7)

Undefined values (`ǸaN`, not-a-number) can be eliminated from a Series by means of the `dropna()` method, with the keyword `inplace` controlling if the Series itself is altered, or a copy is produced (default behavior).

In [None]:
ser7.dropna(inplace=True)
print(ser7)

### Exercise: Pandas Series

According to [Wikipedia (accessed on 2022-07-26)](https://de.wikipedia.org/wiki/Liste_der_Gro%C3%9F-_und_Mittelst%C3%A4dte_in_Deutschland), the resident numbers of Bremen, Dresden, Essen, Stuttgart in the years 1970, 1990, 2010 were as follows.

 City      | 1970   | 1990   | 2010   | 
:----------|:------:|:------:|:------:|
 Bremen    | 592533 | 551219 | 547340 |
 Dresden   | 502432 | 490571 | 523058 |
 Essen     | 696419 | 626973 | 574635 |
 Stuttgart | 634202 | 579988 | 606588 |

In the cell below, store the three sets of resident numbers for the years 1970, 1990, 2010 in Pandas Series `ser1970`, `ser1990`, `ser2010`, respectively, and complete the code according to the instructions given as comments.

In [None]:
### Construct ser1970 from a dictionary.
ser1970 = pd.Series({'Bremen':592533,'Dresden':502432,'Essen':696419,'Stuttgart':634202})
print('Residents in the year 1970:')
print(ser1970)
### Construct ser1990 using a list of resident numbers and a separate index array.
ser1990 = pd.Series([551219,490571,626973,579988],
                    index=['Bremen','Dresden','Essen','Stuttgart'])
print('\nResidents in the year 1990:')
print(ser1990)
### Copy ser1990 to initialize ser2010, and then re-assign the four values.
ser2010 = ser1990.copy()
ser2010['Bremen'] = 547340
ser2010['Dresden'] = 523058
ser2010['Essen'] = 574635
ser2010['Stuttgart'] = 606588
print('\nResidents in the year 2010:')
print(ser2010)
### Compute the change from 1970 to 1990 and store in serdiff1
serdiff1 = 100*(ser1990-ser1970)/ser1970
print('\nPercentage change in resident number from 1970 to 1990:')
print(serdiff1)
### Compute the percentage change from 1990 to 2010 and store in serdiff2
serdiff2 = 100*(ser2010-ser1990)/ser1990
print('\nPercentage change in resident number from 1990 to 2010:')
print(serdiff2)

## Pandas DataFrame

A Pandas DataFrame object stores a two-dimensional labeled array, producing a spreadsheet-like representation of the data.

### Construction of Pandas DataFrame objects

Pandas DataFrame objects can be constructed in many different ways, e.g., from a dictionary of lists or NumPy arrays having the same length. Column titles are automatically generated from the keys of the dictionary.

In [None]:
df1 = pd.DataFrame( {'City':['Bremen','Dresden','Essen','Stuttgart'],
                     'Residents in 1970':[592533,502432,696419,634202],
                     'Residents in 1990':[551219,490571,626973,579988],
                     'Residents in 2010':[547340,523058,574635,606588]} )
display(df1)
print('Index : ',df1.index)

Since an index is not explicitly specified, the integer positions of the list or array elements are used. As for Pandas Series, this default behavior can be changed by means of the keyword `index`.

In [None]:
df2 = pd.DataFrame( {'Residents in 1970':[592533,502432,696419,634202],
                     'Residents in 1990':[551219,490571,626973,579988],
                     'Residents in 2010':[547340,523058,574635,606588]},
                   index=['Bremen','Dresden','Essen','Stuttgart'])
display(df2)
print('Index : ',df2.index)

The (named) columns of a DataFrame can be understood as individual Pandas Series with shared index arrays.

In [None]:
cities = ['Bremen','Dresden','Essen','Stuttgart']
ser1970 = pd.Series([592533,502432,696419,634202],index=cities)
ser1990 = pd.Series([551219,490571,626973,579988],index=cities)
ser2010 = pd.Series([547340,523058,574635,606588],index=cities)
df3 = pd.DataFrame({'Residents in 1970':ser1970,
                    'Residents in 1990':ser1990,
                    'Residents in 2010':ser2010})
display(df3)
print('Index : ',df3.index)

If the data are available as a two-dimensional array, a DataFrame object can be constructed by specifying the titles of both the rows (keyword `index`) and the columns (keyword `columns`).  

In [None]:
res2d = np.array([[592533,551219,547340],
                  [502432,490571,523058],
                  [696419,626973,574635],
                  [634202,579988,606588]])
df4 = pd.DataFrame(res2d,index=['Bremen','Dresden','Essen','Stuttgart'],
                   columns=['Residents in 1970','Residents in 1990','Residents in 2010'])
display(df4)

Results of operations on columns can be easily included in the DataFrame. In the following example, the percentage changes 1970-1990 and 1990-2010 are computed and included in the DataFrame as two additional columns.

In [None]:
df5 = pd.DataFrame(res2d,index=['Bremen','Dresden','Essen','Stuttgart'],
                   columns=['Residents in 1970','Residents in 1990','Residents in 2010'])
df5['Change 1970-1990 [%]'] = 100*(df5['Residents in 1990']-df5['Residents in 1970'])/df5['Residents in 1970']
df5['Change 1990-2010 [%]'] = 100*(df5['Residents in 2010']-df5['Residents in 1990'])/df5['Residents in 1990']
display(df5)

Pandas DataFrame objects can be constructed directly from data files in a variety of formats. See, e.g., the documentation of the Pandas functions `read_csv()`, `read_excel()`, `read_sql()`. 

### Selection of elements in a Pandas DataFrame

Element selection in Pandas DataFrame objects is demonstrated using the following example.

In [None]:
arr2d = np.arange(15).reshape(3,5)
df6 = pd.DataFrame(arr2d,index=['one','two','three'],columns=['A','B','C','D','E'])
display(df6)

To select a *single column* from a Pandas DataFrame, it suffices to index it with the corresponding label, i.e., to enter the label in square brackets. The result is returned as a Pandas Series.

In [None]:
display(df6['B'])

While entering a single label in square brackets (*indexing*) selects a column, a range of labels (*slicing*) affects the rows.

In [None]:
display(df6['one':'two'])

For more general selections based on labels (for both rows and columns), apply the `.loc` method, accepting row (index) and column specifications separated by a comma in a variety of formats. One may also specify lists of column or rows, or boolean masks. In the following code cell, uncomment individual lines to see the effects of the selection.

In [None]:
display(df6.loc[:,'B'])                 #.. Single column returned as a Series.
#display(df6.loc[:,'B':'D'])             #.. Range of columns returned as a DataFrame.
#display(df6.loc['one',:])               #.. Single row returned as a Series.
#display(df6.loc['one':'two',:])         #.. Range of rows returned as a DataFrame.
#display(df6.loc['one':'two','B':'D'])   #.. Sub-array returned as a DataFrame.
#display(df6.loc['one':'two',['B','D']]) #.. Sub-array returned as a DataFrame.
#display(df6.loc[df6['B']>3,:])          #.. Select rows where df['B']>3.

Selections based on the integer positions within a Pandas DataFrame are accomplished by the `.iloc` method.  In the following code cell, uncomment individual lines and observe the results.

In [None]:
#display(df6.iloc[:,1])     #.. Single column returned as a Series.
#display(df6.iloc[:,1:4])   #.. Range of columns returned as a DataFrame.
#display(df6.iloc[0,:])     #.. Single row returned as a Series.
#display(df6.iloc[:-1,:])   #.. Range of rows returned as a DataFrame.
#display(df6.iloc[:-1,1:4]) #.. Sub-array returned as a DataFrame.

Further options for selecting elements of a DataFrame are the `.at`, `.iat`, `.ix` methods, see the Pandas documentation.

### Operations on DataFrame objects

The Pandas package builds on the NumPy module with its efficient array handling and numerical operations. NumPy universal functions can be applied to Pandas objects in accordance with requirements regarding their indices (index preservation and alignment).

In [None]:
arr1d = np.arange(8).reshape(4,2)
df7 = pd.DataFrame(arr1d,index=['a','b','c','d'],columns=['x','y'])
df7['sqrt(x)'] = np.sqrt(df7['x'])
df7['sin(pi*x/4)'] = np.sin(np.pi*df7['x']/4)
df7['exp(-y)'] = np.exp(-df7['y'])
df7['y^2-x^2'] = df7['y']**2 - df7['x']**2
display(df7)

The concepts of index preservation and alignment are demonstrated also in the following example where two Pandas Series with non-identical index arrays are combined into a DataFrame. Missing values (`NaN`) are naturally propagated in the operations. 

In [None]:
xs = pd.Series([0,2,4,6],index=['a','b','c','d'])
ys = pd.Series([1,3,5,7],index=['a','b','c','e'])
df8 = pd.DataFrame({'x':xs,'y':ys})
df8['sqrt(x)'] = np.sqrt(df8['x'])
df8['sin(pi*x/4)'] = np.sin(np.pi*df8['x']/4)
df8['exp(-y)'] = np.exp(-df8['y'])
df8['y^2-x^2'] = df8['y']**2 - df8['x']**2
display(df8)

In the same way as for Pandas Series, rows with undefined data can be removed from a DataFrame through `.dropna()`, and an arbitrary row by `.drop()`.

In [None]:
df8.dropna(inplace=True)
display(df8)
df8.drop('b',inplace=True)
display(df8)

In the example above, the two Series `x` and `y` are combined based on the union of their indices. Such an operation is called joining or merging. In the cell below, the logic is implemented using DataFrame objects instead of Series.

In [None]:
xdf = pd.DataFrame([0,2,4,6],index=['a','b','c','d'],columns=['x'])
ydf = pd.DataFrame([1,3,5,7],index=['a','b','c','e'],columns=['y'])
df9 = xdf.join(ydf,how='outer')
df9['sqrt(x)'] = np.sqrt(df9['x'])
df9['sin(pi*x/4)'] = np.sin(np.pi*df9['x']/4)
df9['exp(-y)'] = np.exp(-df9['y'])
df9['y^2-x^2'] = df9['y']**2 - df9['x']**2
display(df9)

### Exercise: Pandas DataFrame

Complete the code cell below according to the instructions included as comments.

In [None]:
### Define a Pandas DataFrame with columns 'x' and 'y'.
dfxy = pd.DataFrame({'x':np.arange(5,9),'y':np.arange(1,5)},index=list('abcd'))
### Add a new column 'x+y' with the sum of the columns 'x' and 'y'.
dfxy['x+y'] = dfxy['x']+dfxy['y']
### Add a new column 'x-y' with the difference of first two columns 'x' and 'y'.
dfxy['x-y'] = dfxy['x']-dfxy['y']
### Add a new column 'x*y' with the product of first two columns 'x' and 'y'.
dfxy['x*y'] = dfxy['x']*dfxy['y']
### Add a new column 'x/y' with the quotient of first two columns 'x' and 'y'.
dfxy['x/y'] = dfxy['x']/dfxy['y']
### Display the resulting DataFrame.
display(dfxy)
### Using .loc, extract (and display) the column labeled 'x+y'.
display(dfxy.loc[:,'x+y'])
### Using .iloc, extract  (and display) the row labeled 'c'.
display(dfxy.iloc[2,:])
### Using .loc, extract the sub-array with columns 'x-y','x/y' and rows 'b','c'.
display(dfxy.loc['b':'c',['x-y','x/y']])
### Using .iloc, extract the sub-array with columns 'x+y','x-y','x*y' and rows 'a','c'.
display(dfxy.iloc[[0,2],2:5])
### Using .drop(), remove the row 'c' and re-display the DataFrame.
dfxy.drop('c',axis=0,inplace=True)
display(dfxy)

## Life expectancy at birth 1960-2019

The file `life-expectancy-at-birth-total-years.csv` provides data on the life expectancy at birth in the period 1960-2019 as published by the [World Bank on 2021-07-30](http://data.worldbank.org/data-catalog/world-development-indicators), and made available through [Our World in Data](https://ourworldindata.org/grapher/life-expectancy-at-birth-total-years). In the working directory listing, the content of this text file is displayed after clicking on the file name. Data columns are separated by commas. The first line names the variables that are listed. Single countries come with a three-letter abbreviation (`Code`) which is not provided for groups of countries.

Using the Pandas function `read_csv()`, the data are loaded and stored in a DataFrame.

In [None]:
leb_full = pd.read_csv('life-expectancy-at-birth-total-years.csv')
display(leb_full)

The data for Australia (Code: AUS) are identified through a boolean array.

In [None]:
ind_aus = leb_full['Code']=='AUS'
display(leb_full[ind_aus].head())

A new DataFrame with the life expectancy data for Australia is created and displayed.

In [None]:
leb = pd.DataFrame({'Year':leb_full[ind_aus].iloc[:,2].values,
                    'Life exp. (AUS)':leb_full[ind_aus].iloc[:,3].values})
display(leb.head())

The data for Brazil (BRA), China (CHN), France (FRA), Nigeria (NGA), and the United States (USA) are added to the DataFrame using the Pandas function `merge()`, operating on the common `Year` column.

*Note on computational efficiency*: This code example is meant to illustrate DataFrame building using the methods introduced in the context of the current tutorial. More efficient Pandas tools exist, e.g., the `groupby()` method.

In [None]:
Codes = ['BRA','CHN','FRA','NGA','USA']
for code in Codes:
    ind = leb_full['Code']==code
    leb = leb.merge(pd.DataFrame({'Year':leb_full[ind].iloc[:,2].values,
                    'Life exp. ('+code+')':leb_full[ind].iloc[:,3].values}))
Codes.insert(0,'AUS')
display(leb.head())

Plot the life expectancy time series for all selected countries.

In [None]:
for code in Codes:
    plt.plot(leb['Year'],leb['Life exp. ('+code+')'],label=code)
plt.legend()
plt.grid()
plt.title('Life expectancy at birth (data from World Bank)')
plt.xlabel('Year')
plt.ylabel('Total life expectancy [years]')

Using the function `pairplot()` from the Seaborn module, univariate and bivariate statistical distributions are visualized in a matrix showing histograms on the main diagonal and scatter plots otherwise.

In [None]:
import seaborn as sns
sns.set()
sns.pairplot(leb.iloc[:,1:])

The function `describe()` is called to obtain basic statistics of the numerical data in the DataFrame.

In [None]:
leb.describe()

### Exercise: Life expectancy at birth 1960-2019

Using the life expectancy data from the file `life-expectancy-at-birth-total-years.csv`, construct a Pandas DataFrame with rows containing time series of data from the single countries, and columns containing the data from single years.

From the DataFrame `leb_full`, eliminate rows with undefined data using the Pandas function `.dropna()`. Identify the data for the year 1960 through a boolean array `ind_1960`. Construct a new DataFrame `leby` with data from the year 1960, then display the first five rows.

Successively add new columns of life expectancy data for each year in the range from 1961 to 2019. Display the first five rows final DataFrame.

*Note on computational efficiency*: This exercise is meant to illustrate DataFrame building using the methods introduced in the context of the current tutorial. More efficient Pandas tools exist, e.g., the `groupby()` method.

Call the function `describe()` to obtain a DataFrame `leby_stat` with basic statistics of the yearly distributions.

Plot time series of the yearly means, minima, quartiles, and maxima. 

---
---

## Solutions

### Solution: Pandas Series

In [None]:
### Construct ser1970 from a dictionary.
ser1970 = pd.Series({'Bremen':592533,'Dresden':502432,'Essen':696419,'Stuttgart':634202})
print('Residents in the year 1970:')
print(ser1970)
### Construct ser1990 using a list of resident numbers and a separate index array.
ser1990 = pd.Series([551219,490571,626973,579988],
                    index=['Bremen','Dresden','Essen','Stuttgart'])
print('\nResidents in the year 1990:')
print(ser1990)
### Copy ser1990 to initialize ser2010, and then re-assign the four values.
ser2010 = ser1990.copy()
ser2010['Bremen'] = 547340
ser2010['Dresden'] = 523058
ser2010['Essen'] = 574635
ser2010['Stuttgart'] = 606588
print('\nResidents in the year 2010:')
print(ser2010)
### Compute the change from 1970 to 1990 and store in serdiff1
serdiff1 = 100*(ser1990-ser1970)/ser1970
print('\nPercentage change in resident number from 1970 to 1990:')
print(serdiff1)
### Compute the percentage change from 1990 to 2010 and store in serdiff2
serdiff2 = 100*(ser2010-ser1990)/ser1990
print('\nChange in resident number from 1990 to 2010:')
print(serdiff2)

### Solution: Pandas DataFrame

In [None]:
### Define a Pandas DataFrame with columns 'x' and 'y'.
dfxy = pd.DataFrame({'x':np.arange(5,9),'y':np.arange(1,5)},index=list('abcd'))
### Add a new column 'x+y' with the sum of the columns 'x' and 'y'.
dfxy['x+y'] = dfxy['x']+dfxy['y']
### Add a new column 'x-y' with the difference of first two columns 'x' and 'y'.
dfxy['x-y'] = dfxy['x']-dfxy['y']
### Add a new column 'x*y' with the product of first two columns 'x' and 'y'.
dfxy['x*y'] = dfxy['x']*dfxy['y']
### Add a new column 'x/y' with the quotient of first two columns 'x' and 'y'.
dfxy['x/y'] = dfxy['x']/dfxy['y']
### Display the resulting DataFrame.
display(dfxy)
### Using .loc, extract (and display) the column labeled 'x+y'.
display(dfxy.loc[:,'x+y'])
### Using .iloc, extract  (and display) the row labeled 'c'.
display(dfxy.iloc[2,:])
### Using .loc, extract the sub-array with columns 'x-y','x/y' and rows 'b','c'.
display(dfxy.loc['b':'c',['x-y','x/y']])
### Using .iloc, extract the sub-array with columns 'x+y','x-y','x*y' and rows 'a','c'.
display(dfxy.iloc[[0,2],2:5])
### Using .drop(), remove the row 'c' and re-display the DataFrame.
dfxy.drop('c',axis=0,inplace=True)
display(dfxy)

### Solution: Life expectancy at birth 1960-2019

In [None]:
### From leb_full eliminate rows with undefined data using .dropna(). 
leb_full.dropna(inplace=True)
### Identify the data for the year 1960 through a boolean array ind_1960. 
ind_1960 = leb_full['Year']==1960
### Construct a new DataFrame `leby` with data from the year 1960.
leby = pd.DataFrame({'Code':leb_full[ind_1960].iloc[:,1].values,
                    1960:leb_full[ind_1960].iloc[:,3].values})
### Display the first five rows of leby.
display(leby.head())

In [None]:
### Successively add new columns of data for each year from 1961 to 2019.
for year in range(1961,2020):
    ind = leb_full['Year']==year
    leby = leby.merge(pd.DataFrame({'Code':leb_full[ind].iloc[:,1].values,
                                    year:leb_full[ind].iloc[:,3].values}))
### Display the first five rows final DataFrame.
display(leby.head())

In [None]:
### Call describe() to obtain a DataFrame leby_stat with basic statistics.
leby_stat = leby.describe()
### Display the DataFrame.
display(leby_stat)

In [None]:
### Plot time series of the yearly means, minima, quartiles, and maxima. 
for k in [1,3,4,5,6,7]:
    plt.plot(leby_stat.columns,leby_stat.iloc[k,:],label=leby_stat.index[k])
plt.legend()
plt.title('Life expectancy statistics (data from World Bank)')
plt.xlabel('Year')
plt.ylabel('Total life expectancy [years]')

---
---