Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

Make sure you fill in any place that says `YOUR CODE HERE` or "YOUR ANSWER HERE", as well as your name and collaborators below:

In [None]:
NAME = "Orion"
COLLABORATORS = ""

---

# PA3

Data wrangling with Pandas.

**Bonus:** No hidden tests! <br>
Free oppurtunity to practice masking.<br>
5 extra credit points<br>

**Note:** pd assumes same order for unit testing. Thus, order of countries in your returned data frame matters. 

Due 11/22/2019, F 5pm.


In [5]:
import numpy as np
import pandas as pd
import math

We will process two datasets in this PA. Both give annual stats on countries' performance. We will focus on life satisfaction index and GDP.

Both datasets are real-world data, based on the official values provided by the corresponding agencies. 


## Dataset 1: Better Life Index


Our first dataset is called the Better Life Index (bli). It is calculated by the OECD using the 11 metrics outlined at [Wiki](https://en.wikipedia.org/wiki/OECD_Better_Life_Index#cite_note-3):

- Housing: housing conditions and spendings (e.g. real estate pricing)
- Income: household income and financial wealth
- Jobs: earnings, job security and unemployment
- Community: quality of social support network
- Education: education and what one gets out of it
- Environment: quality of environment (e.g. environmental health)
- Governance: involvement in democracy
- Health
- Life Satisfaction: level of happiness
- Safety: murder and assault rates
- Work-life balance

These are calcualted using the following indicators

1. Labour market insecurity
1. Stakeholder engagement for developing regulations
1. Feeling safe walking alone at night
1. Dwellings without basic facilities
1. Housing expenditure
1. Rooms per person
1. Household net adjusted disposable income
1. Household net financial wealth
1. Employment rate
1. Long-term unemployment rate
1. Personal earnings
1. Quality of support network
1. Educational attainment
1. Student skills
1. Years in education
1. Air pollution
1. Water quality
1. Voter turnout
1. Life expectancy
1. Self-reported health
1. Life satisfaction
1. Homicide rate
1. Employees working very long hours
1. Time devoted to leisure and personal care

Each of these values have further sub-categories, such as `men`, `women` and `total` or similar. 

Eventually, we are interested in the Life Satisfaction data for each country. 

We will start by analyzing the data a bit. 

Let's read the official data file:

In [6]:
bli = pd.read_csv('/home/memo/public/bli.csv')
#print(bli.columns)
#print(f"\n")
#print(bli.head(4))
#print(f"\n")
#bli.set_index('LOCATION')

## Problem 1 (10 points)

Given bli dataframe (df), return a df of total_life_satisfaction

### Hint
 Use masking on the column names to pick the Life Satisfaction data for total genders for each country

#### Expected output
```` python 
tot_bli = total_life_satisfaction(bli)
print(tot_bli.columns)
print(tot_bli.head(4))
```` 

```
Index(['LOCATION', 'Country', 'INDICATOR', 'Indicator', 'MEASURE', 'Measure',
       'INEQUALITY', 'Inequality', 'Unit Code', 'Unit', 'PowerCode Code',
       'PowerCode', 'Reference Period Code', 'Reference Period', 'Value',
       'Flag Codes', 'Flags'],
      dtype='object')
     LOCATION    Country INDICATOR          Indicator MEASURE Measure  \
2859      AUS  Australia   SW_LIFS  Life satisfaction       L   Value   
2860      AUT    Austria   SW_LIFS  Life satisfaction       L   Value   
2861      BEL    Belgium   SW_LIFS  Life satisfaction       L   Value   
2862      CAN     Canada   SW_LIFS  Life satisfaction       L   Value   

     INEQUALITY Inequality Unit Code           Unit  PowerCode Code PowerCode  \
2859        TOT      Total   AVSCORE  Average score               0     Units   
2860        TOT      Total   AVSCORE  Average score               0     Units   
2861        TOT      Total   AVSCORE  Average score               0     Units   
2862        TOT      Total   AVSCORE  Average score               0     Units   

      Reference Period Code  Reference Period  Value Flag Codes Flags  
2859                    NaN               NaN    7.3        NaN   NaN  
2860                    NaN               NaN    7.0        NaN   NaN  
2861                    NaN               NaN    6.9        NaN   NaN  
2862                    NaN               NaN    7.3        NaN   NaN  
```

In [15]:
def total_life_satisfaction(bli):
    """Given bli df, return a df of total_life_satisfaction
    """
    tot_df = bli[bli['INEQUALITY']=='TOT']
    tot_df = tot_df[tot_df['Indicator']=='Life satisfaction']
    
    return tot_df

In [16]:
tot_bli = total_life_satisfaction(bli)
print(tot_bli.columns)
print(tot_bli.head(4))


Index(['LOCATION', 'Country', 'INDICATOR', 'Indicator', 'MEASURE', 'Measure',
       'INEQUALITY', 'Inequality', 'Unit Code', 'Unit', 'PowerCode Code',
       'PowerCode', 'Reference Period Code', 'Reference Period', 'Value',
       'Flag Codes', 'Flags'],
      dtype='object')
     LOCATION    Country INDICATOR          Indicator MEASURE Measure  \
2859      AUS  Australia   SW_LIFS  Life satisfaction       L   Value   
2860      AUT    Austria   SW_LIFS  Life satisfaction       L   Value   
2861      BEL    Belgium   SW_LIFS  Life satisfaction       L   Value   
2862      CAN     Canada   SW_LIFS  Life satisfaction       L   Value   

     INEQUALITY Inequality Unit Code           Unit  PowerCode Code PowerCode  \
2859        TOT      Total   AVSCORE  Average score               0     Units   
2860        TOT      Total   AVSCORE  Average score               0     Units   
2861        TOT      Total   AVSCORE  Average score               0     Units   
2862        TOT      Total   AV

In [17]:

import pandas.util.testing as pdt
sol = pd.read_pickle("/home/memo/public/pa3p1.pkl")
abli = pd.read_pickle("/home/memo/public/pa3abli.pkl")
ans = total_life_satisfaction(abli)
pdt.assert_frame_equal(ans, sol)


## Problem 2 (10 points)

Given bli dataframe, return average of Life Satisfaction by gender.

Which gender is expected to have a better life?

Note that besides men and women, this will also return the means for high, low and total.

### Hint
use groupby

#### Expected output
```` python 
life_satisfaction_by_gender(bli)
```` 

```
INEQUALITY	Value
HGH	7.059459
LW	6.023529
MN	6.502564
TOT	6.528205
WMN	6.561538
```

In [13]:
def life_satisfaction_by_gender(bli):
    """Given bli df, return average of Life Satisfaction by gender
    """
    
    avg_df = bli[bli['Indicator']=='Life satisfaction']
    ret_df = avg_df.groupby('INEQUALITY')['Value'].mean()
  
    return pd.DataFrame(ret_df)

    #avg_df = bli.loc(bli['INEQUALITY'].mean()
    #avg_df = bli.mean(axis=0,level='Value')
    
    #.dropna(axis='columns').drop(columns='PowerCode Code')

    #tot_df = tot_df.loc['INEQUALITY'] #'INEQUALITY','Values']group
    #tot_df = tot_df['INEQUALITY',(tot_df['Values']>0)] 

In [327]:
#experiment here
life_satisfaction_by_gender(bli)

Unnamed: 0_level_0,Value
INEQUALITY,Unnamed: 1_level_1
HGH,7.059459
LW,6.023529
MN,6.502564
TOT,6.528205
WMN,6.561538


In [58]:

import pandas.util.testing as pdt
ans = life_satisfaction_by_gender(abli)
sol = pd.read_pickle("/home/memo/public/pa3p2.pkl")
pdt.assert_frame_equal(ans, sol)


## Problems 3-7 (5 points each)

Given bli dataframe, return 

- Avg Life expectancy per gender
    - Note: leave the order as is, that is alphabetical.
    - which gender is expected to live longer?
- Avg Life expectancy per country
    - ignore genders
    - Note: leave the order as is, that is alphabetical.
    - How does USA rank? What do you think? 
- Avg Life expectancy for top N countries
    - same as above, except: sorted from longest to shortest expectancy
    - assume N is an int (i.e. no input verificaiton is needed)
    - ie: if N=1: return exp for JPN only. 
    - How does USA rank? What do you think? 
    

### Worst countries
Let's look at the other side of the rankings. Given bli dataframe (df), return 

- Worst N countries for avg Homicide rate (PS_REPH)
    - ignore genders
- Worst N countries for employees working very long hours (WL_EWLH)
    - ignore genders
    
Feel free to extend this analyssis to other categories, such as Feeling safe walking alone at night (PS_FSAFEN)
    
### Hint
use groupby

In [114]:
def life_expectancy_by_gender(bli):
    """Given bli df, return average of Life exp. by gender
    """
    le_df=bli[bli['Indicator']=='Life expectancy']
    le_df = le_df.groupby('INEQUALITY')['Value'].mean()
    return pd.DataFrame(le_df)
#life_expectancy_by_gender(bli)

In [190]:
def life_expectancy_by_country(bli):
    """Given bli df, return average of Life exp. by country
    """
    le_df=bli[bli['Indicator']=='Life expectancy']
    le_df = le_df[le_df['INEQUALITY']=='TOT']
    le_df=le_df.groupby('LOCATION')['Value'].mean().round(1)
   
    return pd.DataFrame(le_df)

life_expectancy_by_country(bli)

#ret_df= pd.DataFrame(le_df)
#ret_df = ret_df.set_precision(3)

Unnamed: 0_level_0,Value
LOCATION,Unnamed: 1_level_1
AUS,82.5
AUT,81.3
BEL,81.1
BRA,74.7
CAN,81.5
CHE,83.0
CHL,79.1
CZE,78.7
DEU,80.7
DNK,80.8


In [196]:
def life_expectancy_by_country_topN(bli, N):
    """Given bli df, return average of Life exp. by country for given N countries
    """
    if (N<=0): N=1
    le_dfn=bli[bli['Indicator']=='Life expectancy']
    le_dfn=le_dfn.groupby('LOCATION')['Value'].mean().round(1)
    ret_df = pd.DataFrame(le_dfn)
    return ret_df.sort_values(by='Value', ascending=False).head(N)
life_expectancy_by_country_topN(bli,10)

Unnamed: 0_level_0,Value
LOCATION,Unnamed: 1_level_1
JPN,83.9
CHE,83.0
ESP,83.0
ITA,82.6
AUS,82.5
ISL,82.5
FRA,82.4
NOR,82.4
LUX,82.4
SWE,82.3


In [203]:
def homicide_rate_by_country_N(bli, N):
    """Given bli df, return return worst N countries for Homicide rate, PS_REPH
        ignore genders
    """
    if(N<=0): N=1
    hr_df = bli[bli['INDICATOR']=='PS_REPH']
    hr_df = hr_df[hr_df['INEQUALITY']=='TOT']
    hr_df = hr_df.groupby('LOCATION')['Value'].mean().round(1)
    
    return pd.DataFrame(hr_df).sort_values(by='Value',ascending=False).head(N)
homicide_rate_by_country_N(bli,4)

Unnamed: 0_level_0,Value
LOCATION,Unnamed: 1_level_1
BRA,27.6
MEX,17.9
RUS,11.3
ZAF,10.0


In [206]:
def long_hours_by_country_N(bli, N):
    """Given bli df, return worst N countries for Employees working very long hours, WL_EWLH
    ignore genders
    """
    if(N<=0): N=1
    lh_df = bli[bli['INDICATOR']=='WL_EWLH']
    lh_df = lh_df[lh_df['INEQUALITY']=='TOT']
    lh_df = lh_df.groupby('LOCATION')['Value'].mean().round(2)
    return pd.DataFrame(lh_df).sort_values(by='Value',ascending=False).head(N)
long_hours_by_country_N(bli,5)

Unnamed: 0_level_0,Value
LOCATION,Unnamed: 1_level_1
TUR,33.77
MEX,29.48
JPN,21.81
KOR,20.84
ZAF,18.68


In [None]:
#experiment here

# life_expectancy_by_gender(bli)
# life_expectancy_by_country(bli)
life_expectancy_by_country_topN(bli,10)
# homicide_rate_by_country_N(bli,5)
# long_hours_by_country_N(bli,5)


In [202]:

import pandas.util.testing as pdt
ans = life_expectancy_by_gender(abli)
sol = pd.read_pickle("/home/memo/public/pa3p3.pkl")
pdt.assert_frame_equal(ans, sol)


In [201]:

ans = life_expectancy_by_country(abli)
sol = pd.read_pickle("/home/memo/public/pa3p4.pkl")
pdt.assert_frame_equal(ans, sol)


In [200]:
n1=5 
n2=10
n3=15

ans = life_expectancy_by_country_topN(abli, n1)
sol1 = pd.read_pickle("/home/memo/public/pa3p5a.pkl")
pdt.assert_frame_equal(ans, sol1)
ans = life_expectancy_by_country_topN(abli, n2)
sol2 = pd.read_pickle("/home/memo/public/pa3p5b.pkl")
pdt.assert_frame_equal(ans, sol2)
ans = life_expectancy_by_country_topN(abli, n3)
sol3 = pd.read_pickle("/home/memo/public/pa3p5c.pkl")
pdt.assert_frame_equal(ans, sol3)



In [205]:

ans = homicide_rate_by_country_N(abli, n1)
sol1 = pd.read_pickle("/home/memo/public/pa3p6a.pkl")
pdt.assert_frame_equal(ans, sol1)
ans = homicide_rate_by_country_N(abli, n2)
sol2 = pd.read_pickle("/home/memo/public/pa3p6b.pkl")
pdt.assert_frame_equal(ans, sol2)
ans = homicide_rate_by_country_N(abli, n3)
sol3 = pd.read_pickle("/home/memo/public/pa3p6c.pkl")
pdt.assert_frame_equal(ans, sol3)


In [224]:

ans = long_hours_by_country_N(abli, n1)
sol1 = pd.read_pickle("/home/memo/public/pa3p7a.pkl")
pdt.assert_frame_equal(ans, sol1)
ans = long_hours_by_country_N(abli, n2)
sol2 = pd.read_pickle("/home/memo/public/pa3p7b.pkl")
pdt.assert_frame_equal(ans, sol2)
ans = long_hours_by_country_N(abli, n3)
sol3 = pd.read_pickle("/home/memo/public/pa3p7c.pkl")
pdt.assert_frame_equal(ans, sol3)

## Problem 8 (10 points)

Factors listed above and others are utilized to estimate the Life satisfaction for each country.

Sort the countries by their life satisfaction index. Sort from highest to least satisfaction


Return result into a new data frame of just 1 column: values indexed by the country.

```
<class 'pandas.core.frame.DataFrame'>
Index: 39 entries, USA to ZAF
Data columns (total 1 columns):
Value    39 non-null float64
dtypes: float64(1)
```



In [185]:
def life_sat_by_cntry(bli):
    """Given bli df, return average of Life sat. by country sorted in descending order
    """
    life_sat = bli[bli['Indicator'] == 'Life satisfaction']
    life_sat = life_sat[life_sat['INEQUALITY'] == 'TOT']
    life_sat = life_sat[['LOCATION','Value']]
    life_sat = life_sat.set_index('LOCATION')
    life_sat = life_sat.groupby('LOCATION').mean()
    return life_sat.sort_values(by='Value', ascending=False)

In [208]:
#life_sat_by_cntry(bli)

In [223]:

ans = life_sat_by_cntry(bli)
sol= pd.read_pickle("/home/memo/public/pa3p8.pkl")
pdt.assert_frame_equal(ans, sol)
life_sat = sol

## Dataset 2: $


The second dataset is all about the money and is provided by The International Monetary Fund (IMF).

This data is compiled in 2018; however, we note that many columns are estimates after 2015. That is why we focus on that year. 

This dataset also includes many more countries than the bli dataset: 190+ vs 30+

We are interested in one feature per country: gross domestic product (gdp).
    

In [177]:
imf = pd.read_csv('/home/memo/public/imf_int.csv', error_bad_lines=False)
print(imf.columns)
#imf
#print(imf.head(4))
#imf.set_index('LOCATION')


Index(['WEO Country Code', 'ISO', 'WEO Subject Code', 'Country',
       'Subject Descriptor', 'Subject Notes', 'Units', 'Scale',
       'Country/Series-specific Notes', '1980', '1981', '1982', '1983', '1984',
       '1985', '1986', '1987', '1988', '1989', '1990', '1991', '1992', '1993',
       '1994', '1995', '1996', '1997', '1998', '1999', '2000', '2001', '2002',
       '2003', '2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011',
       '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020',
       '2021', 'Estimates Start After'],
      dtype='object')


In [182]:
#imf

## Problem 9 (10 points)

We are interested in just one feature per country:
- Gross domestic product per capita, 
    - current prices
    - US dollars
    - Year: 2015
    
Extract the relevant info for each country and return a new df with gdp values sorted largest to smallest.

Return result into a new data frame of 2 cols: country name (ISO) and gdp values for 2015 (2015).

Expected result:
```
        ISO	2015
4319	LUX	101994.0
7223	CHE	80675.0
6035	QAT	76576.0
5507	NOR	74822.0
...
```
and the object info is:

```
<class 'pandas.core.frame.DataFrame'>
Int64Index: 191 entries, 4319 to 7267
Data columns (total 2 columns):
ISO     191 non-null object
2015    189 non-null float64
dtypes: float64(1), object(1)
memory usage: 4.5+ KB
```


In [180]:
def gdp2015_by_cntry(imf):
    """Given imf df, return Gross domestic product per capita,current prices, US dollars,
    Year: 2015 in descending order
    """
    imf = imf[imf['WEO Subject Code']=='NGDPDPC']
    imf = imf[['ISO','2015']]
    return imf.sort_values(by='2015',ascending=False)

In [188]:

ans = gdp2015_by_cntry(imf)
sol= pd.read_pickle("/home/memo/public/pa3p9.pkl")
pdt.assert_frame_equal(ans, sol)
gdps = sol

## Problem 10 (10 points)

Time to combine the two dataframes: life satistaction and gdp in order to have a single frame that looks like:

```
     ISO	2015	Value
0	LUX	101994.0	6.9
1	CHE	80675.0	7.5
2	QAT	76576.0	NaN
3	NOR	74822.0	7.5
4	MAC	69309.0	NaN
5	USA	55805.0	6.9
...
188	SSD	221.0	NaN
189	UVK	NaN	NaN
190	SYR	NaN	NaN
191	NaN	NaN	6.5
```

Return union of both datasets and fill-in empty values with `NaN`


In [55]:
def gdp2015_bli_by_cntry(gdps, life_sat):
    """Return the union of the gdp and life satisfaction dfs
    """
    return pd.merge(gdps, life_sat, how='outer', left_on=['ISO'], right_on=['LOCATION']).replace(r'^\s*$', np.nan, regex=True)

In [210]:
#gdp2015_bli_by_cntry(gdps, life_sat)

In [211]:

ans = gdp2015_bli_by_cntry(gdps, life_sat)
sol= pd.read_pickle("/home/memo/public/pa3p10.pkl")
pdt.assert_frame_equal(ans, sol)
bli_gdp = sol
# bli_gdp.columns

## Problem 11 (10 points)

Above data looks helpful! 

However, exisiting column names, `'ISO', '2015', 'Value'`, is not intuitive.

Now, let's make it more user friendly. 

Update the column names to be more descriptive: `Cntry	Gdp_2015	Bli`
```
	Cntry	Gdp_2015	Bli
0	LUX	101994.0	6.9
1	CHE	80675.0	7.5
2	QAT	76576.0	NaN
3	NOR	74822.0	7.5
```

Sort by gdp.


In [221]:
def gdp2015_bli(bli_gdp):
    """Update the union col names to `Cntry Gdp_2015 Bli`
    """
    return bli_gdp.rename(index=str,columns={'ISO':'Cntry','2015':'Gdp_2015','Value':'Bli'})
    

In [219]:
gdp2015_bli(bli_gdp)

Unnamed: 0,Cntry,Gdp_2015,Bli
0,LUX,101994.0,6.9
1,CHE,80675.0,7.5
2,QAT,76576.0,
3,NOR,74822.0,7.5
4,MAC,69309.0,
5,USA,55805.0,6.9
6,SGP,52888.0,
7,DNK,52114.0,7.5
8,IRL,51351.0,7.0
9,AUS,50962.0,7.3


In [222]:

ans = gdp2015_bli(bli_gdp)
sol= pd.read_pickle("/home/memo/public/pa3p11.pkl")
pdt.assert_frame_equal(ans, sol)
bli_gdp = sol
bli_gdp.columns

Index(['Cntry', 'Gdp_2015', 'Bli'], dtype='object')

## Problem 12 (10 points)

Soon we will discuss how to fit and train a model to predict and classsify using our data. 

For now, assume the following relationship between the gdp values and bli: 

`bli_est = -4E-10 gdp^2 + 6E-05 gdp + 5.1325`

where E is the scientific notation and can be programmed directly in Python: ``-4E-10*x`` 

Add a new feature to your df titled `Bli_est` that utlizes the above formula to predict the life saitsfaction for each country based on its gdp.

Expected result:
```
    Cntry	Gdp_2015	Bli	Bli_est
0	LUX	101994.0	6.9	7.091030
1	CHE	80675.0	7.5	7.369618
2	QAT	76576.0	NaN	7.381506
3	NOR	74822.0	7.5	7.382487
```


In [None]:
def gdp2015_bli_bliest_by_cntry(bli_gdp):
    """Add a column with bli estimate
    """
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# gdp2015_bli_bliest_by_cntry(bli_gdp)

In [None]:

ans = gdp2015_bli_bliest_by_cntry(bli_gdp)
sol= pd.read_pickle("/home/memo/public/pa3p12.pkl")
pdt.assert_frame_equal(ans, sol)
bli_gdp = sol
# bli_gdp.columns

## Problem 13(10 points)

As you recall, bli dataset did not have as many countries as the gdp dataset.

We have predicted the life satisfaction for all countries using their gdp. One feature does not give us a perfect prediction, but the expected values are close to the bli estimates for many countries. 

We will now add a method that return the column value for a given data point. This makes it easier for others to use our dataset. 

For instance, if we are interested in details for Qatar, we can easily retrieve the relevant info unsing this method:

```python
return_row_values(bli_gdp, 'Cntry','QAT')
```
```
	Cntry	Gdp_2015	Bli	Bli_est
2	QAT	76576.0	NaN	7.381506
```

### Hint
Use masking

In [284]:
def return_row_values(df, col_name, col_value):
    """given a dataframe, column name and value, return all associated rows
    """
    df.set_index(col_name)
    df = df[df[col_name]]
    #tmp_df = df[col for col in df.columns if col_name in col]
    #search_df.set_index(col_name)
    return df

In [285]:
# return_row_values(bli_gdp, 'Cntry','QAT')
return_row_values(bli_gdp, 'Bli','USA')

# return_row_values(bli_gdp, 'Bli',7.2)
# return_row_values(bli_gdp, 'Bli',5.1)

KeyError: '[6.9 7.5 nan 7.5 nan 6.9 nan 7.5 7.  7.3 7.5 7.3 nan 6.7 7.  7.4 7.3 nan\n 7.5 7.  6.9 6.4 7.3 nan 7.2 5.9 5.9 nan nan nan 5.9 6.4 nan nan nan nan\n nan nan 5.8 5.2 nan 5.2 5.6 6.6 nan nan 6.1 nan nan nan nan nan nan 5.9\n nan 6.7 nan 6.  5.3 nan nan nan nan nan nan 5.5 nan nan 6.  6.6 nan nan\n nan 6.6 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan\n nan nan nan 4.8 nan nan nan nan nan nan nan nan nan nan nan nan nan nan\n nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan\n nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan\n nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan\n nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan\n nan nan nan nan nan nan nan nan nan nan nan 6.5] not in index'

In [None]:

ans1 = return_row_values(bli_gdp, 'Cntry','QAT')
ans2 = return_row_values(bli_gdp, 'Cntry','USA')
ans3 = return_row_values(bli_gdp, 'Bli',7.2)
ans4 = return_row_values(bli_gdp, 'Bli',5.1)

sol1= pd.read_pickle("/home/memo/public/pa3p13a.pkl")
sol2= pd.read_pickle("/home/memo/public/pa3p13b.pkl")
sol3= pd.read_pickle("/home/memo/public/pa3p13c.pkl")
sol4= pd.read_pickle("/home/memo/public/pa3p13d.pkl")

pdt.assert_frame_equal(ans1, sol1)
pdt.assert_frame_equal(ans2, sol2)
pdt.assert_frame_equal(ans3, sol3)
pdt.assert_frame_equal(ans4, sol4)



Note that inseatd of predicting bli, another approach could be to use k-Nearest Neighbors to find the `closest` countries to country of interest and return their life satisfaction estimates. 

Congrats! You are done with *Pandas* PA. <br> 
![You are done with Pandas PA.](https://media.tenor.com/images/d56419fa07494d9188339204bb71abfb/tenor.gif)
