# Home task: pandas 

## Question 1

- Load the energy data from the file [Energy Indicators.xls](http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls).
It is a list of indicators of energy supply and renewable electricity production from the United Nations for the year 2013.


- It should be put into a DataFrame with the variable name of "energy"


- Make sure to exclude the footer and header information from the datafile.


- The first two columns are unneccessary, so you should get rid of them, and you should change the column labels so that the columns are:<br>
`['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable']`


- Convert `Energy Supply` to gigajoules (there are 1,000,000 gigajoules in a petajoule).


- For all countries which have missing data (e.g. data with `...`) make sure this is reflected as `np.NaN` values.


- Rename the following list of countries (for use in later questions):
    - `Republic of Korea`: `South Korea`,
    - `United States of America`: `United States`,
    - `United Kingdom of Great Britain and Northern Ireland`: `United Kingdom`,
    - `China, Hong Kong Special Administrative Region`: `Hong Kong`


- There are also several countries with numbers and/or parenthesis in their name. Be sure to remove these, e.g.:
    - `Bolivia (Plurinational State of)` should be `Bolivia`,
    - `Switzerland17` should be `Switzerland`.


- Next, load the GDP data from the file ["world_bank.csv"](http://data.worldbank.org/indicator/NY.GDP.MKTP.CD). 
It is a csv containing countries' GDP from 1960 to 2015 from World Bank. Call this DataFrame "GDP"


- Make sure to skip the header, and rename the following list of countries:
    - `Korea, Rep.`: `South Korea`,
    - `Iran, Islamic Rep.`: `Iran`,
    - `Hong Kong SAR, China`: `Hong Kong`


- Finally, load the "Sciamgo Journal and Country Rank data for [Energy Engineering and Power Technology"](http://www.scimagojr.com/countryrank.php?category=2102). It ranks countries based on their journal contributions in the aforementioned area. Call this DataFrame "ScimEn"


- Join the three datasets: Energy, GDP, and ScimEn into a new dataset (using the intersection of country names). Use only the 10 years (2006-2015) of GDP data and only the top 15 countries by Scimagojr 'Rank' (Rank 1 through 15).


- The index of this DataFrame should be the name of the country, and the columns should be<br>
`['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', 'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', 2012', '2013', '2014', '2015']`

Function "answer_one" should return the resulted DataFrame (20 columns and 15 entries)

## Import all necessary packages

In [1]:
import pandas as pd
import numpy as np

## Modifying data

### Drop useless stuff, rename column names

In [2]:
# read Excel with all data and drop some useless data
energy_df = pd.read_excel("Energy Indicators.xls", skiprows=16, skipfooter=38).drop(['Unnamed: 0', 'Unnamed: 1'], axis=1).drop(0)

# rename [Unnamed: 2] -> [Country], [Renewable Electricity Production] -> [% Renewable]
energy_df = energy_df.rename(columns = {'Unnamed: 2': 'Country', 'Renewable Electricity Production': '% Renewable'})

energy_df.head(5)

Unnamed: 0,Country,Energy Supply,Energy Supply per capita,% Renewable
1,Afghanistan,321,10,78.66928
2,Albania,102,35,100.0
3,Algeria,1959,51,0.55101
4,American Samoa,...,...,0.641026
5,Andorra,9,121,88.69565


### Replace missing data (e.g. data with `...`) with `np.NaN` values.

In [3]:
cols = list(energy_df.columns)[1:]

# First method
energy_df[cols] = energy_df[cols].apply(lambda x: [item if isinstance(item, (int, float)) else np.NaN for item in x.values])

# Second method
# mask = df[cols].applymap(lambda x: isinstance(x, (int, float)))
# df[cols] = df[cols].where(mask)

print(energy_df.head(5))

energy_df.dtypes

          Country  Energy Supply  Energy Supply per capita  % Renewable
1     Afghanistan          321.0                      10.0    78.669280
2         Albania          102.0                      35.0   100.000000
3         Algeria         1959.0                      51.0     0.551010
4  American Samoa            NaN                       NaN     0.641026
5         Andorra            9.0                     121.0    88.695650


Country                      object
Energy Supply               float64
Energy Supply per capita    float64
% Renewable                 float64
dtype: object

### Convert `Energy Supply` to gigajoules (there are 1,000,000 gigajoules in a petajoule).

In [4]:
# Converting
energy_df['Energy Supply'] = energy_df['Energy Supply']*1_000_000

energy_df.head(5)

Unnamed: 0,Country,Energy Supply,Energy Supply per capita,% Renewable
1,Afghanistan,321000000.0,10.0,78.66928
2,Albania,102000000.0,35.0,100.0
3,Algeria,1959000000.0,51.0,0.55101
4,American Samoa,,,0.641026
5,Andorra,9000000.0,121.0,88.69565


### Rename the following list of countries (for use in later questions):
    - `Republic of Korea`: `South Korea`,
    - `United States of America`: `United States`,
    - `United Kingdom of Great Britain and Northern Ireland`: `United Kingdom`,
    - `China, Hong Kong Special Administrative Region`: `Hong Kong`

In [5]:
energy_df = energy_df.replace({ 'Country': {'Republic of Korea': 'South Korea',
                         'United States of America\d+': 'United States',
                         'United Kingdom of Great Britain and Northern Ireland\d+': 'United Kingdom',
                         'China, Hong Kong Special Administrative Region\d+': 'Hong Kong'}}, regex=True)

# check if replace was successful
[energy_df.loc[energy_df['Country'] == country] for country in ['Hong Kong', 'South Korea', 'United States', 'United Kingdom']]


    

[      Country  Energy Supply  Energy Supply per capita  % Renewable
 44  Hong Kong    585000000.0                      82.0          0.0,
          Country  Energy Supply  Energy Supply per capita  % Renewable
 165  South Korea   1.100700e+10                     221.0     2.279353,
            Country  Energy Supply  Energy Supply per capita  % Renewable
 217  United States   9.083800e+10                     286.0     11.57098,
             Country  Energy Supply  Energy Supply per capita  % Renewable
 215  United Kingdom   7.920000e+09                     124.0     10.60047]

### There are also several countries with numbers and/or parenthesis in their name. Be sure to remove these, e.g.:
    - `Bolivia (Plurinational State of)` should be `Bolivia`,
    - `Switzerland17` should be `Switzerland`.

In [6]:
# replace:
# Switzerland17 -> Switzerland
# Bolivia (Plurinational State of) -> Bolivia
energy_df = energy_df.replace({ 'Country': { r'([A-z]+)\d+': r'\1', r'(\w+) \(.*\)': r'\1' } }, regex=True)

[energy_df.loc[energy_df['Country'] == country] for country in ['Switzerland', 'Bolivia']]

[         Country  Energy Supply  Energy Supply per capita  % Renewable
 198  Switzerland   1.113000e+09                     136.0     57.74548,
     Country  Energy Supply  Energy Supply per capita  % Renewable
 25  Bolivia    336000000.0                      32.0     31.47712]

### Next, load the GDP data from the file ["world_bank.csv"](http://data.worldbank.org/indicator/NY.GDP.MKTP.CD).

It is a csv containing countries' GDP from 1960 to 2015 from World Bank. Call this DataFrame "GDP"

Make sure to skip the header, and rename the following list of countries:
- `Korea, Rep.`: `South Korea`,
- `Iran, Islamic Rep.`: `Iran`,
- `Hong Kong SAR, China`: `Hong Kong`

In [7]:
GDP_df = pd.read_excel('world_bank.xls', skiprows=3)

GDP_df.head(2)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,Aruba,ABW,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,...,2549721000.0,2534637000.0,2727850000.0,2790849000.0,2962905000.0,2983637000.0,3092430000.0,3202189000.0,,
1,Africa Eastern and Southern,AFE,GDP (current US$),NY.GDP.MKTP.CD,19291930000.0,19701860000.0,21470350000.0,25705000000.0,23501650000.0,26781170000.0,...,896256100000.0,913197400000.0,927655500000.0,956318700000.0,893099700000.0,854751900000.0,962269000000.0,984032000000.0,977809200000.0,898474100000.0


In [8]:
GDP_df = GDP_df.rename(columns={ 'Country Name': 'Country' })

GDP_df = GDP_df.replace({ 'Country': { 'Korea, Rep.': 'South Korea', 'Iran, Islamic Rep.': 'Iran', 'Hong Kong SAR, China': 'Hong Kong' } })

[GDP_df.loc[GDP_df['Country'] == country] for country in ['South Korea', 'Iran', 'Hong Kong']]

[         Country Country Code     Indicator Name  Indicator Code  \
 126  South Korea          KOR  GDP (current US$)  NY.GDP.MKTP.CD   
 
              1960          1961          1962          1963          1964  \
 126  3.958191e+09  2.417558e+09  2.814319e+09  3.988785e+09  3.458939e+09   
 
              1965  ...          2011          2012          2013  \
 126  3.120833e+09  ...  1.253223e+12  1.278428e+12  1.370795e+12   
 
              2014          2015          2016          2017          2018  \
 126  1.484318e+12  1.465773e+12  1.500112e+12  1.623901e+12  1.724846e+12   
 
              2019          2020  
 126  1.646739e+12  1.630525e+12  
 
 [1 rows x 65 columns],
     Country Country Code     Indicator Name  Indicator Code          1960  \
 112    Iran          IRN  GDP (current US$)  NY.GDP.MKTP.CD  4.199134e+09   
 
              1961          1962          1963          1964          1965  \
 112  4.426949e+09  4.693566e+09  4.928628e+09  5.379846e+09  6.197320e+

## Read the last dataset
- Finally, load the "Sciamgo Journal and Country Rank data for [Energy Engineering and Power Technology"](http://www.scimagojr.com/countryrank.php?category=2102). It ranks countries based on their journal contributions in the aforementioned area. Call this DataFrame "ScimEn"


- Join the three datasets: Energy, GDP, and ScimEn into a new dataset (using the intersection of country names). Use only the 10 years (2006-2015) of GDP data and only the top 15 countries by Scimagojr 'Rank' (Rank 1 through 15).


- The index of this DataFrame should be the name of the country, and the columns should be<br>
`['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', 'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', 2012', '2013', '2014', '2015']`

In [9]:
ScimEn_df = pd.read_excel('scimagojr country rank 1996-2020.xlsx')

ScimEn_df.head(5)

Unnamed: 0,Rank,Country,Region,Documents,Citable documents,Citations,Self-citations,Citations per document,H index
0,1,China,Asiatic Region,273437,272374,2336764,1615239,8.55,245
1,2,United States,Northern America,175891,172431,2230544,724472,12.68,363
2,3,India,Asiatic Region,55082,53775,463165,162944,8.41,181
3,4,Japan,Asiatic Region,50523,50065,488062,119930,9.66,193
4,5,United Kingdom,Western Europe,43389,42284,615670,111290,14.19,226


#### Merge all datasets

In [10]:

merged_df = GDP_df.merge(energy_df, on='Country').merge(ScimEn_df, on='Country').set_index('Country')

merged_df = merged_df.filter(items=['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations', 'Citations per document', 'H index', 'Energy Supply', 'Energy Supply per capita', '% Renewable', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015'])

merged_df = merged_df.sort_values(by=['Rank'])

merged_df.head(5)

Unnamed: 0_level_0,Rank,Documents,Citable documents,Citations,Self-citations,Citations per document,H index,Energy Supply,Energy Supply per capita,% Renewable,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
China,1,273437,272374,2336764,1615239,8.55,245,127191000000.0,93.0,19.75491,2752132000000.0,3550343000000.0,4594307000000.0,5101703000000.0,6087164000000.0,7551500000000.0,8532230000000.0,9570406000000.0,10475680000000.0,11061550000000.0
United States,2,175891,172431,2230544,724472,12.68,363,90838000000.0,286.0,11.57098,13814610000000.0,14451860000000.0,14712840000000.0,14448930000000.0,14992050000000.0,15542580000000.0,16197010000000.0,16784850000000.0,17527160000000.0,18238300000000.0
India,3,55082,53775,463165,162944,8.41,181,33195000000.0,26.0,14.96908,940259900000.0,1216735000000.0,1198896000000.0,1341887000000.0,1675615000000.0,1823050000000.0,1827638000000.0,1856722000000.0,2039127000000.0,2103588000000.0
Japan,4,50523,50065,488062,119930,9.66,193,18984000000.0,149.0,10.23282,4530377000000.0,4515265000000.0,5037908000000.0,5231383000000.0,5700098000000.0,6157460000000.0,6203213000000.0,5155717000000.0,4850414000000.0,4389476000000.0
United Kingdom,5,43389,42284,615670,111290,14.19,226,7920000000.0,124.0,10.60047,2706015000000.0,3093407000000.0,2921432000000.0,2412390000000.0,2481580000000.0,2659882000000.0,2704017000000.0,2783251000000.0,3065521000000.0,2932785000000.0


### Create function `answer_one` that returns top-15 countries by Rank

In [23]:
def answer_one(df=merged_df):
    '''Returns top-15 countries by rank
    '''
    return df.loc[df['Rank'] < 16]

answer_one()

Unnamed: 0_level_0,Rank,Documents,Citable documents,Citations,Self-citations,Citations per document,H index,Energy Supply,Energy Supply per capita,% Renewable,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
China,1,273437,272374,2336764,1615239,8.55,245,127191000000.0,93.0,19.75491,2752132000000.0,3550343000000.0,4594307000000.0,5101703000000.0,6087164000000.0,7551500000000.0,8532230000000.0,9570406000000.0,10475680000000.0,11061550000000.0
United States,2,175891,172431,2230544,724472,12.68,363,90838000000.0,286.0,11.57098,13814610000000.0,14451860000000.0,14712840000000.0,14448930000000.0,14992050000000.0,15542580000000.0,16197010000000.0,16784850000000.0,17527160000000.0,18238300000000.0
India,3,55082,53775,463165,162944,8.41,181,33195000000.0,26.0,14.96908,940259900000.0,1216735000000.0,1198896000000.0,1341887000000.0,1675615000000.0,1823050000000.0,1827638000000.0,1856722000000.0,2039127000000.0,2103588000000.0
Japan,4,50523,50065,488062,119930,9.66,193,18984000000.0,149.0,10.23282,4530377000000.0,4515265000000.0,5037908000000.0,5231383000000.0,5700098000000.0,6157460000000.0,6203213000000.0,5155717000000.0,4850414000000.0,4389476000000.0
United Kingdom,5,43389,42284,615670,111290,14.19,226,7920000000.0,124.0,10.60047,2706015000000.0,3093407000000.0,2921432000000.0,2412390000000.0,2481580000000.0,2659882000000.0,2704017000000.0,2783251000000.0,3065521000000.0,2932785000000.0
Germany,6,38739,38013,433148,95145,11.18,196,13261000000.0,165.0,17.90153,2992197000000.0,3421229000000.0,3730028000000.0,3397791000000.0,3396354000000.0,3744409000000.0,3527345000000.0,3732743000000.0,3883920000000.0,3356236000000.0
Russian Federation,7,36735,36560,115938,54993,3.16,90,30709000000.0,214.0,17.28868,989930500000.0,1299706000000.0,1660846000000.0,1222644000000.0,1524917000000.0,2045926000000.0,2208296000000.0,2292473000000.0,2059242000000.0,1363481000000.0
Canada,8,33472,32863,568080,100953,16.97,227,10431000000.0,296.0,61.94543,1319265000000.0,1468820000000.0,1552990000000.0,1374625000000.0,1617343000000.0,1793327000000.0,1828366000000.0,1846597000000.0,1805750000000.0,1556509000000.0
Italy,9,27983,26940,352993,87828,12.61,166,6530000000.0,109.0,33.66723,1947920000000.0,2210293000000.0,2398857000000.0,2191242000000.0,2134018000000.0,2291991000000.0,2087077000000.0,2141315000000.0,2159134000000.0,1835899000000.0
South Korea,10,27655,27445,328488,61531,11.88,155,11007000000.0,221.0,2.279353,1053217000000.0,1172614000000.0,1047339000000.0,943941900000.0,1144067000000.0,1253223000000.0,1278428000000.0,1370795000000.0,1484318000000.0,1465773000000.0


## Answer the following questions in the context of only the top 15 countries by Scimagojr Rank (aka the DataFrame returned by `answer_one()`)

### Question 2
What is the average GDP over the last 10 years for each country? (exclude missing values from this calculation.)

*This function should return a Series named `avgGDP` with 15 countries and their average GDP sorted in descending order.*

In [22]:
def answer_two():
    '''Calculate average GDP for top-15 countries'''
    # Get top-15 values
    Top15 = answer_one()

    # Get only [Year] columns
    avgGDP = Top15.filter(regex='\d+')

    # Calculate the average GDP for each country and sort by descending
    avgGDP = avgGDP.mean(axis=1).sort_values(ascending=False)
    
    return avgGDP

answer_two()

Country
United States         1.567102e+13
China                 6.927702e+12
Japan                 5.177131e+12
Germany               3.518225e+12
United Kingdom        2.776028e+12
France                2.687371e+12
Italy                 2.139775e+12
Brazil                1.988885e+12
Russian Federation    1.666746e+12
Canada                1.616359e+12
India                 1.602352e+12
Spain                 1.398587e+12
South Korea           1.221372e+12
Australia             1.206526e+12
Iran                  4.389286e+11
dtype: float64

### Question 3
By how much had the GDP changed over the 10 year span for the country with the 6th largest average GDP?

*This function should return a single number.*

In [44]:
def answer_three():
    '''Calculate how GDP was changed over the 10 years for 6th largst average GDP
    '''
    # get top-15 values
    Top15 = answer_one()

    # get 6th largest average GDP
    sixth_country_gdp = Top15.filter(items=['2006', '2015'], axis=1).loc['France']

    return sixth_country_gdp[1] - sixth_country_gdp[0]

answer_three()

119614244263.3833

### Question 4

Create a new column that is the ratio of Self-Citations to Total Citations. 
What is the maximum value for this new column, and what country has the highest ratio?

*This function should return a tuple with the name of the country and the ratio.*

In [80]:
def answer_four():
    '''Calculate the ratio of Self-citations to all Citations.
       Find the highest ratio and which country is it.
    '''
    # Get top-15 values
    Top15 = answer_one()

    # Get neeeded columns
    df_citations = Top15.filter(items=['Citations', 'Self-citations'])

    # Calculating [Citations Ratio]
    df_citations['Citations Ratio'] = df_citations['Self-citations']/df_citations['Citations']

    # Find the highest value
    max_cit_ratio = df_citations['Citations Ratio'].max()
    
    return (max_cit_ratio, df_citations.index[df_citations['Citations Ratio'] == max_cit_ratio].tolist()[0])

answer_four()

(0.6912289816173135, 'China')

### Question 5

Create a column that estimates the population using Energy Supply and Energy Supply per capita. 
What is the third most populous country according to this estimate?

*This function should return a single string value.*

In [90]:
\
def answer_five():
    '''Calculate population for top-15 countries using energy
    '''
    # get top-15 values
    Top15 = answer_one()

    # get needed values
    energy = Top15.filter(items=['Energy Supply', 'Energy Supply per capita'])

    # calculate population
    energy['Population'] = energy['Energy Supply']/energy['Energy Supply per capita']

    # sort values in [Population] column by descending
    energy = energy.sort_values(by='Population', ascending=False)
    
    return energy.index[energy['Population'] == energy['Population'][2]].tolist()[0]

answer_five()

'United States'

### Question 6
Create a column that estimates the number of citable documents per person. 
What is the correlation between the number of citable documents per capita and the energy supply per capita? Use the `.corr()` method, (Pearson's correlation).

*This function should return a single number.*


In [102]:
def answer_six():
    '''Calculate correlation between citable doc per capita and energy supply per capita
    '''
    # get top 15 values
    Top15 = answer_one()

    # get needed info
    taskSix_df = Top15.filter(items=['Citable documents', 'Energy Supply', 'Energy Supply per capita'])

    # calculate population
    taskSix_df['Population'] = taskSix_df['Energy Supply']/taskSix_df['Energy Supply per capita']

    # calculate citable documents per capita
    taskSix_df['Citable documents per capita'] = taskSix_df['Citable documents']/taskSix_df['Population']
    
    return taskSix_df['Citable documents per capita'].corr(taskSix_df['Energy Supply per capita'])

answer_six()

0.7434709127726777

### Question 7
Use the following dictionary to group the Countries by Continent, then create a dateframe that displays the sample size (the number of countries in each continent bin), and the sum, mean, and std deviation for the estimated population of each country.

```python
ContinentDict  = {'China':'Asia', 
                  'United States':'North America', 
                  'Japan':'Asia', 
                  'United Kingdom':'Europe', 
                  'Russian Federation':'Europe', 
                  'Canada':'North America', 
                  'Germany':'Europe', 
                  'India':'Asia',
                  'France':'Europe', 
                  'South Korea':'Asia', 
                  'Italy':'Europe', 
                  'Spain':'Europe', 
                  'Iran':'Asia',
                  'Australia':'Australia', 
                  'Brazil':'South America'}
```

*This function should return a DataFrame with index named Continent `['Asia', 'Australia', 'Europe', 'North America', 'South America']` and columns `['size', 'sum', 'mean', 'std']`*

In [170]:
def answer_seven():
    # get top-15 values
    Top15 = answer_one()

    # get needed info
    taskSeven_df = Top15.filter(items=['Energy Supply', 'Energy Supply per capita'])

    # calculate population
    taskSeven_df['Population'] = taskSeven_df['Energy Supply']/taskSeven_df['Energy Supply per capita']

    # create dict {[country]: [continent]}
    ContinentDict  = {'China':'Asia', 
                  'United States':'North America', 
                  'Japan':'Asia', 
                  'United Kingdom':'Europe', 
                  'Russian Federation':'Europe', 
                  'Canada':'North America', 
                  'Germany':'Europe', 
                  'India':'Asia',
                  'France':'Europe', 
                  'South Korea':'Asia', 
                  'Italy':'Europe', 
                  'Spain':'Europe', 
                  'Iran':'Asia',
                  'Australia':'Australia', 
                  'Brazil':'South America'}

    # add [Continent] column
    taskSeven_df['Continent'] = [ContinentDict[x] for x in taskSeven_df.index.tolist()]

    # get continents list
    continent_indexes = taskSeven_df.groupby('Continent').count().index.tolist()

    # find how much countries we have per continent
    countries_per_continent = taskSeven_df['Continent'].value_counts().sort_index().tolist()
    # print(taskSeven_df['Continent'].value_counts().sort_index().tolist())

    # sum population
    sum_population = taskSeven_df.groupby(['Continent']).sum()['Population'].tolist()
    # print(taskSeven_df.groupby(['Continent']).sum()['Population'].tolist())

    # mean population
    mean_population = taskSeven_df.groupby(['Continent']).mean()['Population'].tolist()
    # print(taskSeven_df.groupby(['Continent']).mean()['Population'].tolist())

    # std population
    std_population = taskSeven_df.groupby(['Continent']).std()['Population'].tolist()
    # print(taskSeven_df.groupby(['Continent']).std()['Population'])

    # create result df
    result_df = pd.DataFrame({ 'size': countries_per_continent, 'sum': sum_population, 'mean': mean_population, 'std': std_population }, index=continent_indexes, columns=['size', 'sum', 'mean', 'std'])

    return result_df

answer_seven()


Continent
Asia             6.790979e+08
Australia                 NaN
Europe           3.464767e+07
North America    1.996696e+08
South America             NaN
Name: Population, dtype: float64


NameError: name 'std_population' is not defined