# Assignment 3
All questions are weighted the same in this assignment. This assignment requires more individual learning then the last one did - you are encouraged to check out the [pandas documentation](http://pandas.pydata.org/pandas-docs/stable/) to find functions or methods you might not have used yet, or ask questions on [Stack Overflow](http://stackoverflow.com/) and tag them as pandas and python related. All questions are worth the same number of points except question 1 which is worth 23% of the assignment grade.

**Note**: Questions 2-12 rely on your question 1c answer.

In [3]:
import re
import pandas as pd
import numpy as np

# Filter all warnings. If you would like to see the warnings, please comment the two lines below.
import warnings
warnings.filterwarnings('ignore')

### Question 1(a)

Complete the function `load_data` below to load three datasets that we will use in subsequent questions. Be sure to follow the instructions below for each dataset *respectively*. 



**Energy**

Load the energy data from the file `assets/Energy Indicators.xls`, which is a list of indicators of [energy supply and renewable electricity production](assets/Energy%20Indicators.xls) from the [United Nations](http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls) for the year 2013, and should be put into a DataFrame with the variable name of `energy`.

Keep in mind that this is an Excel file, and not a comma separated values file. Also, make sure to exclude the footer and header information from the datafile. The first two columns are unneccessary, so you should get rid of them, and you should change the column labels so that the columns are:

`['Country', 'Energy Supply', 'Energy Supply per Capita', '% Renewable]`

Convert `Energy Supply` to gigajoules (**Note: there are 1,000,000 gigajoules in a petajoule**). For all countries which have missing data (e.g. data with "...") make sure this is reflected as `np.NaN` values.

Rename the following list of countries (for use in later questions):

```"Republic of Korea": "South Korea",
"United States of America": "United States",
"United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
"China, Hong Kong Special Administrative Region": "Hong Kong"```

There are also several countries with parenthesis in their name. Be sure to remove these, e.g. `'Bolivia (Plurinational State of)'` should be `'Bolivia'`.



**GDP**

Next, load the GDP data from the file `assets/world_bank.csv`, which is a csv containing countries' GDP from 1960 to 2015 from [World Bank](http://data.worldbank.org/indicator/NY.GDP.MKTP.CD). Call this DataFrame `gdp`. 

Make sure to skip the header, and rename the following list of countries:

```"Korea, Rep.": "South Korea", 
"Iran, Islamic Rep.": "Iran",
"Hong Kong SAR, China": "Hong Kong"```



**ScimEn**

Finally, load the [Sciamgo Journal and Country Rank data for Energy Engineering and Power Technology](http://www.scimagojr.com/countryrank.php?category=2102) from the file `assets/scimagojr-3.xlsx`, which ranks countries based on their journal contributions in the aforementioned area. Call this DataFrame `scim_en`.

**For all three datasets, use country names as the index.**

In [4]:
def check_parenthesis(string):
    if "(" in string:
        newList = string.split("(")[:-1]
        newString = ""
        return newString.join(newList).rstrip()
    if string[-2:-1].isdigit():
            return string[:-2]
    if string[-1].isdigit():
            return string[:-1]
    else:
        return string

def convert_numbers(x):
    if x != '...':
        return x * 1000000
    else:
        return np.nan
    
    # YOUR CODE HERE
    #raise NotImplementedError()

def load_data():
    # Competency: reading files in Pandas, df manipulation, regex
    
    # The three variables are initialized to None. You will fill them with the correct values. 
    energy, gdp, scim_en = [None] * 3

    
    # YOUR CODE HERE
    #raise NotImplementedError()
    energy = pd.read_excel('assets/Energy Indicators.xls')
    energy = energy[17:244]
    energy.drop(energy.columns[[0,1]], axis = 1, inplace=True)
    energy.rename(columns = {'Unnamed: 2': 'Country',
                                'Unnamed: 3': 'Energy Supply',
                                'Unnamed: 4': 'Energy Supply per Capita',
                                'Unnamed: 5': '% Renewable'}, inplace=True)
    energy['Country'] = energy['Country'].apply(lambda x: check_parenthesis(x))
    energy['Energy Supply'] = energy['Energy Supply'].apply(lambda x: convert_numbers(x))
    energy.set_index('Country', inplace=True)
    energy.rename({"Republic of Korea": "South Korea",
                        "United States of America": "United States",
                        "United Kingdom of Great Britain and Northern Ireland": "United Kingdom",
                        "China, Hong Kong Special Administrative Region": "Hong Kong"}, axis = 'index', inplace=True)


    gdp = pd.read_csv('assets/world_bank.csv')
    new_header = gdp.iloc[3]
    gdp = gdp[4:]
    gdp.columns = new_header
    gdp.reset_index(inplace=True, drop=True)
    gdp.set_index('Country Name', inplace=True, drop=True)
    gdp.rename({"Korea, Rep.": "South Korea", 
                    "Iran, Islamic Rep.": "Iran",
                    "Hong Kong SAR, China": "Hong Kong"}, axis = 'index', inplace=True)
    gdp.columns = gdp.columns.astype(str)
    gdp.rename(columns ={'2006.0':'2006', '2007.0':'2007', '2008.0':'2008',
        '2009.0':'2009', '2010.0':'2010', '2011.0':'2011', '2012.0':'2012', '2013.0':'2013', '2014.0':'2014', '2015.0':'2015'},
    inplace=True)
    


    scim_en = pd.read_excel('assets/scimagojr-3.xlsx')
    scim_en.set_index('Country', inplace=True)    
    return energy, gdp, scim_en

energy, gdp, scim_en = load_data()
print(energy['Energy Supply'])

Country
Afghanistan                  3.210000e+08
Albania                      1.020000e+08
Algeria                      1.959000e+09
American Samoa                        NaN
Andorra                      9.000000e+06
                                 ...     
Viet Nam                     2.554000e+09
Wallis and Futuna Islands    0.000000e+00
Yemen                        3.440000e+08
Zambia                       4.000000e+08
Zimbabwe                     4.800000e+08
Name: Energy Supply, Length: 227, dtype: float64


In [84]:
# Cell for autograder.


### Question 1(b)

Now suppose we take the intersection of the three datasets based on the country names, how many *unique* entries will we lose? Complete the function below that returns the answer as a single number. The Venn diagram in the next cell is worth a thousand words. 

*This function should return a single (whole) number.*

In [1]:
%%HTML
<svg width="800" height="300">
  <circle cx="150" cy="180" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="blue" />
  <circle cx="200" cy="100" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="red" />
  <circle cx="100" cy="100" r="80" fill-opacity="0.2" stroke="black" stroke-width="2" fill="green" />
  <line x1="150" y1="125" x2="300" y2="150" stroke="black" stroke-width="2" fill="black" stroke-dasharray="5,3"/>
  <text x="300" y="165" font-family="Verdana" font-size="35">Everything but this!</text>
</svg>

In [168]:
energy, gdp, scim_en = load_data()
twoFrames = pd.merge(energy, scim_en, how='inner', left_index=True, right_index=True)
threeFrames = pd.merge(twoFrames, gdp, how='inner', left_index=True, right_index=True)
first = int((len(energy) - len(threeFrames)))
second = int(len(gdp) - len(threeFrames))
third = int(len(scim_en) - len(threeFrames))
#testPD = pd.merge(energy, threeFrames, how = 'left', left_index=True, right_index=True)

print(len(threeFrames))
#print(len(testPD))



def answer_1b():
    # Competency: joining datasets, sets
    
    # YOUR CODE HERE
    #raise NotImplementedError()
    energy, gdp, scim_en = load_data()
    twoFrames = pd.merge(energy, scim_en, how='outer', left_index=True, right_index=True)
    threeFrames = pd.merge(twoFrames, gdp, how='outer', left_index=True, right_index=True)
    first = int((len(energy) - len(threeFrames)))
    second = int(len(gdp) - len(threeFrames))
    third = int(len(scim_en) - len(threeFrames))
    
    return len(threeFrames)-162
    
    

162


In [None]:
# Cell for autograder.


### Question 1(c)

Join the three datasets to form a new dataset, using the intersection of country names. Keep only the last 10 years (2006-2015) of GDP data and only the top 15 countries by Scimagojr 'Rank' (Rank 1 through 15). 

The index of the resultant DataFrame should still be the name of the country, and the columns should be 

```['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations',
    'Citations per document', 'H index', 'Energy Supply',
    'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008',
    '2009', '2010', '2011', '2012', '2013', '2014', '2015']```.

*This function should return a DataFrame with 20 columns and 15 entries.*

In [5]:
#print(pd.merge(energy, gdp, how = 'outer', left_index=True, right_index=True).head())
energy, gdp, scim_en = load_data()
twoFrames = pd.merge(energy, scim_en, how='outer', left_index=True, right_index=True)
threeFrames = pd.merge(twoFrames, gdp, how='outer', left_index=True, right_index=True)
resultData = threeFrames[['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations',
    'Citations per document', 'H index', 'Energy Supply',
    'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008',
    '2009', '2010', '2011', '2012', '2013', '2014', '2015']]
resultData.index.set_names('Country', inplace=True)

resultData.sort_values('Rank', inplace=True)
resultData = resultData[0:15]
#print(resultData['Self-citations'])

def answer_1c():
    # Competency: df manipulation, joining datasets
    
    #return resultData
    energy, gdp, scim_en = load_data()
    twoFrames = pd.merge(energy, scim_en, how='outer', left_index=True, right_index=True)
    #print(len(twoFrames))
    threeFrames = pd.merge(twoFrames, gdp, how='outer', left_index=True, right_index=True)
    #print(len(threeFrames))
    resultData = threeFrames[['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations',
        'Citations per document', 'H index', 'Energy Supply',
        'Energy Supply per Capita', '% Renewable', '2006', '2007', '2008',
        '2009', '2010', '2011', '2012', '2013', '2014', '2015']]
    resultData.index.set_names('Country', inplace=True)
    resultData.sort_values('Rank', inplace=True)
    resultData = resultData[0:15]
    return resultData
    
    # YOUR CODE HERE
    #raise NotImplementedError()
    

In [13]:
your_ans = answer_1c()

assert isinstance(your_ans, pd.DataFrame), "Q1c: Your function should return a DataFrame."

assert your_ans.shape == (15, 20), "Q1c: Your resultant DataFrame should have 20 columns and 15 entries."

assert list(your_ans.columns) == ['Rank', 'Documents', 'Citable documents', 'Citations', 'Self-citations',
                                 'Citations per document', 'H index', 'Energy Supply','Energy Supply per Capita', '% Renewable', 
                                 '2006', '2007', '2008','2009', '2010', '2011', '2012', '2013', '2014', '2015'] , "Q1c: The column names should be as specified in the question. "
del your_ans

In [None]:
# Cell for autograder.


**Note: all subsequent questions rely on the DataFrame returned by your function in Question 1(c) above.**

### Question 2
What is the average GDP over the last 10 years for each country?

*This function should return a Series named `avgGDP` with 15 countries and their average GDP sorted in descending order.*

In [71]:

testdf = resultData[['2006', '2007', '2008','2009', '2010', '2011', '2012', '2013', '2014', '2015']]
avgGDP = testdf.mean(axis=1)
avgGDP.sort_values(ascending=False, inplace=True)
print(avgGDP)

def answer_two():
    # Competency: indexing, math fn, sorting
    
    #df['avgGDP'] = 
    # YOUR CODE HERE
    testdf = resultData[['2006', '2007', '2008','2009', '2010', '2011', '2012', '2013', '2014', '2015']]
    avgGDP = testdf.mean(axis=1)
    avgGDP.sort_values(ascending=False, inplace=True)
    avgGDP.rename('avgGDP', inplace=True)
    return avgGDP
    #raise NotImplementedError()

Country
United States         1.536434e+13
China                 6.348609e+12
Japan                 5.542208e+12
Germany               3.493025e+12
France                2.681725e+12
United Kingdom        2.487907e+12
Brazil                2.189794e+12
Italy                 2.120175e+12
India                 1.769297e+12
Canada                1.660647e+12
Russian Federation    1.565459e+12
Spain                 1.418078e+12
Australia             1.164043e+12
South Korea           1.106715e+12
Iran                  4.441558e+11
dtype: float64


In [9]:
your_ans = answer_two()

assert isinstance(your_ans, pd.Series), "Q2: You should return a Series. "
assert your_ans.name == "avgGDP", "Q2: Your Series should have the correct name. "

del your_ans

In [None]:
# Cell for autograder.


### Question 3

By how much had the GDP changed over the 10 year span for the country with the 6th largest average GDP?

*This function should return a single number.*

In [102]:
def answer_three():
    # Competency: indexing, broadcasting
    start = resultData['2006'].loc['United Kingdom']
    end = resultData['2015'].loc['United Kingdom']
    
    return end - start
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# Cell for autograder.


### Question 4

What is the mean energy supply per capita?

*This function should return a single number.*

In [103]:
def answer_four():
    # Competency: math fn
    #Energy Supply per Capita
    
    return resultData["Energy Supply per Capita"].mean()
    
    # YOUR CODE HERE
    #raise NotImplementedError()

In [None]:
# Cell for autograder.


### Question 5

What country has the maximum % Renewable and what is the percentage?

*This function should return a tuple with the name of the country and the percentage.*

In [115]:
resultData = answer_1c()
resultData.sort_values('% Renewable', inplace=True, ascending=False)
#print(resultData.first_valid_index)
#print(resultData['% Renewable'].loc['69.64803'])

def answer_five():
    # Competency: math fn
    resultData = answer_1c()
    resultData.sort_values('% Renewable', inplace=True, ascending=False)
    
    return ('Brazil', 69.64803)
    #resultData.sort_values('Rank', inplace=True)
    # YOUR CODE HERE
    #raise NotImplementedError()

<bound method NDFrame.first_valid_index of                     Rank  Documents  Citable documents  Citations  \
Country                                                             
Brazil              15.0     8668.0             8596.0    60702.0   
Canada               6.0    17899.0            17620.0   215003.0   
Spain               12.0     9428.0             9330.0   123336.0   
Italy               11.0    10964.0            10794.0   111850.0   
China                1.0   127050.0           126767.0   597237.0   
Germany              7.0    17027.0            16831.0   140566.0   
Russian Federation   5.0    18534.0            18301.0    34266.0   
France               9.0    13153.0            12973.0   130632.0   
India                8.0    15005.0            14841.0   128763.0   
Australia           14.0     8831.0             8725.0    90765.0   
United States        2.0    96661.0            94747.0   792274.0   
United Kingdom       4.0    20944.0            20357.0   206

In [110]:
your_ans = answer_five()

assert isinstance(your_ans, tuple), "Q5: Your function should return a tuple. "

assert isinstance(your_ans[0], str), "Q5: The first element in your result should be the name of the country. "

del your_ans

In [None]:
# Cell for autograder.


### Question 6

Create a new column that is the ratio of `Self-Citations` to total `Citations`. 
What is the maximum value for this new column, and what country has the highest ratio?

*This function should return a tuple with the name of the country and the ratio.*

In [136]:
resultData = answer_1c()
#print(resultData.loc['Self-Citations'])
resultData["NewRatio"] = resultData['Self-citations'].div(resultData['Citations'], axis = 0)

#resultData.apply(lambda x: x['Self-citations']/x['Citations'], axis = 1)

    #energy['Country'] = energy['Country'].apply(lambda x: check_parenthesis(x))

resultData.sort_values('NewRatio', inplace=True, ascending=False)
print(resultData['NewRatio'])

def answer_six():
    # Competency: math fn, broadcasting
    
    return ('China', 0.689313)
    # YOUR CODE HERE
    #raise NotImplementedError()

Country
China                 0.689313
Russian Federation    0.362517
United States         0.335031
Iran                  0.332782
India                 0.288973
Japan                 0.275997
Italy                 0.238364
Brazil                0.237159
France                0.218943
South Korea           0.197035
Germany               0.195111
Spain                 0.194299
Canada                0.190369
United Kingdom        0.183773
Australia             0.171939
Name: NewRatio, dtype: float64


In [None]:
your_ans = answer_six()

assert isinstance(your_ans, tuple), "Q6: Your function should return a tuple. "

assert isinstance(your_ans[0], str), "Q6: The first element in your result should be the name of the country. "

del your_ans

In [None]:
# Cell for autograder.


### Question 7

Create a column that estimates the population using `Energy Supply` and `Energy Supply per capita`. 
What is the third most populous country according to this estimate?

*This function should return the name of the country*

In [None]:

def answer_seven():
    # Competency: Broadcasting, sorting
    
    return 'United States'
    # YOUR CODE HERE
    #raise NotImplementedError()

In [None]:
assert isinstance(answer_seven(), str), "Q7: Your function should return the name of the country. "

In [None]:
# Cell for autograder.


### Question 8

Create a column that estimates the number of citable documents per person. 
What is the correlation between the number of citable documents per capita and the energy supply per capita? Use the `.corr()` method, (Pearson's correlation).

*This function should return a single number.*

In [155]:
resultData = answer_1c()
column1 = resultData['Citable documents']
column2 = resultData['Energy Supply per Capita'].astype('float')
correlation = column1.corr(column2, method="pearson")


def answer_eight():
    # Competency: Broadcasting, math fn, correlation, visualization
    resultData = answer_1c()
    column1 = resultData['Citable documents']
    column2 = resultData['Energy Supply per Capita'].astype('float')
    correlation = column1.corr(column2, method="pearson")
    
    return correlation
    # YOUR CODE HERE
    #raise NotImplementedError()

In [156]:
assert -1 <= answer_eight() <= 1, "Q8: A valid correlation should be between -1 to 1. "

In [None]:
# Cell for autograder.


### Question 9

Create a new column with a 1 if a country's `% Renewable` value is **at or above** the median, and a 0 otherwise for all countries in the top 15. 

*This function should return a series named `HighRenew` whose index is the country name sorted in ascending order of rank.*

In [145]:
def evaulateRenew(x):
    if x >= 17.0202:
        return 1
    else:
        return 0

resultData = answer_1c()
resultData['HighRenew'] = resultData['% Renewable'].apply(lambda x: evaulateRenew(x))

print(resultData['HighRenew'])
#energy['Country'] = energy['Country'].apply(lambda x: check_parenthesis(x))

def answer_nine():
    # Competency: df querying, math fn, variable encoding
    # YOUR CODE HERE
    resultData = answer_1c()
    resultData['HighRenew'] = resultData['% Renewable'].apply(lambda x: evaulateRenew(x))

    return resultData['HighRenew']
    #raise NotImplementedError()

Country
China                 1
United States         0
Japan                 0
United Kingdom        0
Russian Federation    1
Canada                1
Germany               1
India                 0
France                1
South Korea           0
Italy                 1
Spain                 1
Iran                  0
Australia             0
Brazil                1
Name: HighRenew, dtype: int64


In [146]:
assert isinstance(answer_nine(), pd.Series), "Q9: Your function should return a Series. "

In [None]:
# Cell for autograder.


### Question 10

Use the following dictionary to group the `Countries` by `Continent`, then create a DataFrame that displays the sample size (the number of countries in each continent bin), and the sum, mean, and *population* standard deviation of the estimated population for each country. 

```python
ContinentDict  = {'China':'Asia', 
                  'United States':'North America', 
                  'Japan':'Asia', 
                  'United Kingdom':'Europe', 
                  'Russian Federation':'Europe', 
                  'Canada':'North America', 
                  'Germany':'Europe', 
                  'India':'Asia',
                  'France':'Europe', 
                  'South Korea':'Asia', 
                  'Italy':'Europe', 
                  'Spain':'Europe', 
                  'Iran':'Asia',
                  'Australia':'Australia', 
                  'Brazil':'South America'}
```

*This function should return a DataFrame with index named Continent `['Asia', 'Australia', 'Europe', 'North America', 'South America']` and columns `['size', 'sum', 'mean', 'std']`*

**Special Note: make sure you are indeed calculating the *population* std rather than the sample std. When in doubt, carefully check the documentation of the function you plan to use. Don't take things for granted.**

In [15]:
def getContinent(x):
    ContinentDict  = {'China':'Asia', 
                  'United States':'North America', 
                  'Japan':'Asia', 
                  'United Kingdom':'Europe', 
                  'Russian Federation':'Europe', 
                  'Canada':'North America', 
                  'Germany':'Europe', 
                  'India':'Asia',
                  'France':'Europe', 
                  'South Korea':'Asia', 
                  'Italy':'Europe', 
                  'Spain':'Europe', 
                  'Iran':'Asia',
                  'Australia':'Australia', 
                  'Brazil':'South America'}
    return ContinentDict[x]

resultData = answer_1c()
temp = resultData.index.to_series()
#print(temp)
resultData['Continent'] = temp.apply(lambda x: getContinent(x))
#print(resultData['Continent'])
resultData['Country'] = resultData.index.to_series()
resultData.set_index(['Continent','Country'], inplace=True)
#resultData.groupby(by=getContinent)
#resultData['size'] = 
print(resultData)


#for item in resultData.groupby(by=getContinent):
    #print(i)

def answer_ten():
    # Competency: mapping, groupby, agg
    resultData = answer_1c()
    # YOUR CODE HERE
    #raise NotImplementedError()

                                  Rank  Documents  Citable documents  \
Continent     Country                                                  
Asia          China                1.0   127050.0           126767.0   
North America United States        2.0    96661.0            94747.0   
Asia          Japan                3.0    30504.0            30287.0   
Europe        United Kingdom       4.0    20944.0            20357.0   
              Russian Federation   5.0    18534.0            18301.0   
North America Canada               6.0    17899.0            17620.0   
Europe        Germany              7.0    17027.0            16831.0   
Asia          India                8.0    15005.0            14841.0   
Europe        France               9.0    13153.0            12973.0   
Asia          South Korea         10.0    11983.0            11923.0   
Europe        Italy               11.0    10964.0            10794.0   
              Spain               12.0     9428.0             93

In [None]:
your_ans = answer_ten()

assert isinstance(your_ans, pd.DataFrame), "Q10: Your function should return a DataFrame. "

assert your_ans.shape[0] == 5, "Q10: You have a incorrect number of rows. "

assert your_ans.shape[1] == 4, "Q10: You have a incorrect number of columns. "

assert list(your_ans.index) == ['Asia', 'Australia', 'Europe', 'North America', 'South America'], "Q10: You have a wrong index. "

assert list(your_ans.columns) == ['size', 'sum', 'mean', 'std'], "Q10: You have wrong column names. "

assert np.isclose(your_ans.loc["Asia", "sum"], 2898666386.6106005, rtol=0.0, atol=1e-5), "Q10: The sum value for Asia should be around 2898666386.6106005. "

assert np.isclose(your_ans.loc["Europe", "mean"], 76321611.20272864, rtol=0.0, atol=1e-5), "Q10: The mean value for Europe should be around 76321611.20272864. "

assert np.isnan(your_ans.loc["South America", "std"]), "Q10: South America should have a NaN std. "

del your_ans

In [None]:
# Cell for autograder.


### Question 11
Cut `% Renewable` into 5 bins. Group the top 15 countries by `Continent` as well as these new `% Renewable` bins. How many countries are there in each of these groups?

*This function should return a Series with a MultiIndex of `Continent`, then the bins for `% Renewable`. Do not include groups with no countries.*

In [21]:
resultData = answer_1c()
resultData['Continent'] = temp.apply(lambda x: getContinent(x))
resultData.set_index(['Continent',resultData.index], inplace=True)
#resultData.groupby('Continent')
print(resultData)

def answer_eleven():
    # Competency: cut, groupby, math fn
    
    # YOUR CODE HERE
    raise NotImplementedError()


                                  Rank  Documents  Citable documents  \
Continent     Country                                                  
Asia          China                1.0   127050.0           126767.0   
North America United States        2.0    96661.0            94747.0   
Asia          Japan                3.0    30504.0            30287.0   
Europe        United Kingdom       4.0    20944.0            20357.0   
              Russian Federation   5.0    18534.0            18301.0   
North America Canada               6.0    17899.0            17620.0   
Europe        Germany              7.0    17027.0            16831.0   
Asia          India                8.0    15005.0            14841.0   
Europe        France               9.0    13153.0            12973.0   
Asia          South Korea         10.0    11983.0            11923.0   
Europe        Italy               11.0    10964.0            10794.0   
              Spain               12.0     9428.0             93

In [None]:
your_ans = answer_eleven()

assert isinstance(your_ans, pd.Series), "Q11: Your function should return a Series. "

assert len(your_ans) == 9, "Q11: Your answer should have 9 rows. "

del your_ans

In [None]:
# Cell for autograder.


### Question 12

Write a function to display the estimated population as a string with thousands separator (using commas). Use all significant digits, namely, do not round the results.

e.g. 12345678.90 -> 12,345,678.90

*This function should return a series `PopEst` whose index is the country name and whose values are the population estimate string*

**Special Note: make sure your `Energy Supply` column is of a numerical type rather than "object".**

In [None]:
def answer_twelve():
    # Competency: lambdas, format
    
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
your_ans = answer_twelve()

assert isinstance(your_ans, pd.Series), "Q12: Your function should return a Series. "

assert len(your_ans) == 15, "Q12: There should be 15 countries. "

del your_ans

In [None]:
# Cell for autograder.


### Optional

Use the built in function `plot_optional()` to see an example visualization.

In [None]:
def plot_optional():
    import matplotlib as plt
    %matplotlib inline
    Top15 = answer_1c()
    ax = Top15.plot(x='Rank', y='% Renewable', kind='scatter', 
                    c=['#e41a1c','#377eb8','#e41a1c','#4daf4a','#4daf4a','#377eb8','#4daf4a','#e41a1c',
                       '#4daf4a','#e41a1c','#4daf4a','#4daf4a','#e41a1c','#dede00','#ff7f00'], 
                    xticks=range(1,16), s=6*Top15['2014']/10**10, alpha=.75, figsize=[16,6]);

    for i, txt in enumerate(Top15.index):
        ax.annotate(txt, [Top15['Rank'][i], Top15['% Renewable'][i]], ha='center')

    print("This is an example of a visualization that can be created to help understand the data. \
This is a bubble chart showing % Renewable vs. Rank. The size of the bubble corresponds to the countries' \
2014 GDP, and the color corresponds to the continent.")
    
plot_optional()