# CE-157 Problem Set 3

Run the cell below and continue onwards! It is just needed to run the initial code needed to run the rest of the functions in this problem set.

In [None]:
import matplotlib
matplotlib.use('Agg')
%matplotlib inline
import matplotlib.pyplot as plt
from datascience import *
import numpy as np
import pandas as pd

#### Shortcuts for column names<a id='section8'></a>

|Column Name   | Description | 
|--------------|---------|
|co2_commulative |Historical Emissions A Cumulative CO2 emission from energy, 1850-2007 (million tonnes) |
|ghg_commulative | Historical Emissions B Cumulative GHG Emissions, 1990-2010 (million tonnes CO2 equivalent) |
|ghg_2010| Current GHG Emissions Total GHG Emissions, 2010 (million tonnes CO2 equivalent)|
|co2_2011|Current CO2 Emissions CO2 emissions from fossil fuel combustion, 2011 (million tonnes) |
|change_1971_2011|Change from 1971–2011 (%) |
|change_1990_2011|Change from 1990–2011 (%) |
|total_footprint|Total carbon footprint Footprint of all goods and services consumed (million tonnes CO2 equivalent) |
|pop_2010|Population 2010|
|gdp_ppp_2010|GDP-PPP 2010 (Million $ (2005))|
|hdi_2011|HDI, 2011 |
|hdi_change_1990_2011|HDI Change from 1990-2011 (%) |
|gender_inequality_2012|Gender Inequality Index Value, 2012 |
|maternal_2010|Maternal Mortality Ratio, 2010 |

#### GENERAL NOTES:
1. **A tip to navigate Jupyter**: Pull up the documentation for any function in Jupyter by typing the function name, then `<Shift>-<Tab>` on your keyboard. This is very useful when you want to know what arguments a function takes, or the order of the arguments in a function. You can press`<Tab>` multiple times to expand the docs. 
2. **How to write comments in python?**
    * Type 1: You can create a single-line comment by simply beginning a line with a hash (#). These are usually for yourself. Look at the example below for reference.
    * Type 2: You can use multi-line comments or paragraphs that serve as documentation for others reading your code. Look at the example below for reference.
    * Throughout the notebook, you will find several comments made for you so that you can understand what the code does. We will also ask you to comment some lines of codes to make sure you don't get any assertion errors.  
    
source: https://www.pythonforbeginners.com/comments/comments-in-python    

In [None]:
# Type 1:
#This would be a comment in Python

# Type 2:
def comment(x):
    """
    This function prints the comments stored in the varaible x.

    Arguments:
    x: string 
    
    Example: 
    > x = "Hello world!"
    > comment(x) 
    >>> This function prints the comment stored the  in varible x. The comment is: Hello World!
    
    """
    print(f"This function prints the comment stored the  in varible x. The comment is: {x}")

#### Alright! Let's begin!

In this problem set, you will dive into some publicly available data on climate change, economic growth, and human development in an attempt to understand a little about the complex relationships between these parameters. With each chart you create, be sure to label your axes, create a chart title, and provide a simple regression line (including the R2 value). Note that you don’t need a chart legend if you only have one set of data. Remember – presentation is important! Also remember that a robust analysis would use far more in depth statistics, in particular focusing on each component of your regression model, both the size of the effect of each component as well as the significance, but for the purposes of this problem set linear regression slopes and R2 values will do.




## Table of Contents 

1 - [Introduction to Ploting](#section1)<br>

2 - [Correlation](#section2)<br>

3 - [Regression](#section3)<br>

4 - [Example: Putting it all together](#section4)<br>

5 - [Recap](#section5)<br>

6 - [Problems](#section6)<br>

7 - [Final Survey](#section7)<br>

The data is in a table named `problem_set` (Run the next cell to see what it looks like).

In [None]:
problem_set=Table.read_table('problem_set.csv')
problem_set

## Introduction to Ploting <a id='section1'></a>
#### A quick tutorial on how to plot with Numpy and Matplotlib

Ploting is one of the most important steps of exploratory data analysis. It can help us uncover things we could not perceive by simply looking at summary statistics or at the first 10 rows of our data set. Python has to very helpful libraries that contain many handy functions to make plotting easy and intuitive namely Numpy and Matplotlib. 

Some of the functions we might find useful in this problem set are: 
   - Plot
   - Scatter

#### Scatter Plots

Lets explore the **scattter** function. Using scattter plot you're able to take any two columns of a table, and plot them quite easily! For example, the scatter plot for Total GHG Emissions, 2010 vs. GDP-PPP 2010 (Million $ (2005)) should look like:

<img src="images/plot_no_labels.png" style="height:300px" align="left" float="left"/>




Now, lets crete a function that will help us with create a scatter plot. 

In [None]:
def scatter(x, y):
    """
    Generate a scatter plot using x and y

    Arguments:
    x -- the vector of values x
    y -- the vector of values y
    
    Example:
    x = problem_set.column("column name")
    y = problem_set.column("column name")
    p1= scatter(x,y)
    
    Tip: You can plt.scatter(x,y, s=6) to change the size of your points to 6. Try any number! 
    """
    plt.figure(figsize=(8, 6))
    plt.scatter(x,y)  
    

# Do not worry too much about the implementation yet, we will discuss the details later.

#### NOTE: Make sure to save your plots to a variable as it is done in the example (e.g. p1=scatter(x,y)). You will need this to run future functions.

#### YOUR TURN
1. Complete the following function by filling in the (...) below with the appropiate arguments.
2. Use [plot, xlab, ylab, plot_title] to fill in the blanks. Note not all of them are going to be used.

Make sure to comment out the line that reads **"raise NotImplementedError()"** by using **#** once you have completed the question. 

In [None]:
 def put_label_on_plot(plot, xlab, ylab, plot_title):
    """
    Generate labels for a plot using plot, xlab, ylab, and plot title

    Arguments:
    plot -- Any plot. For instance, a plot created using the scatter function above
    xlab -- Label for x-axis 
    ylab -- Label for y-axis 
    plot_title -- Title for your plot 
    
       
    Example:
    x = problem_set.column("column name")
    y = problem_set.column("column name")
    my_scatter = scatter(x, y)
    put_label_on_plot(my_scatter, "name for x axis", "name for y axis ", "plot title")
    """
    
    # YOUR CODE HERE
    plt.xlabel(...)  
    plt.ylabel(...)  
    plt.title(...)   
    
raise NotImplementedError()

#### Labeling:

If we look back the scattter plot created above we will notice that it is very hard to interpret the plot since we do not know what x and y mean. For us to make any inferences or useful obervations on the plot we need more context. 
The following functions are very useful to help you, and others reading your plots understand what you are talking about. Some helpful functions that we can use to enhance out plots are:

* plt.xlabel()
* plt.ylabel()
* plt.title()

Now we will write a utility function that will allow us to label our plots.

####  Running the two functions above on **Total GHG Emissions, 2010** and **GDP-PPP 2010 (Million (2005))** should produce the following plot:

 <img src="images/E157AC_scatter_2.png" style="height:300px" align="left" float="left"/>

#### YOUR TURN <a id='section9'></a>
* Use the functions above (scatter and put_label_on_plot) to rereate the plot above. 
* Make sure to replace the ... with your solution 
* Do not forget to give your plot meaningful labels. 
* Do not forget to comment out the implementation test.

Tips:
* How do I know if my labels are good enough?
    * Can someone who is not taking the class understand the your plot without any additional infromation? If yes, you did it! If not, try to find more meaningful names for your plot and labels. 

In [None]:
# YOUR CODE HERE
x =  problem_set.column("ghg_2010")
y = problem_set.column("gdp_ppp_2010")

test_plot = scatter(...,..)

put_label_on_plot(test_plot, ..., ..., ...) 


raise NotImplementedError()

## Correlation <a id='section2'></a>



#### The correlation coefficient - *r*

* r is a numerical measure of correlation ranging from -1 to 1. It gives us information about the strength of the relationship between two variables.
* Although there are different types of correlation, we will use "Pearson's correlation" or "Pearson's R".

#### How to interpret *r* ?
* r=1: For every unit increase in a variable, say X, there is a positive increase of a fixed proportion on the other varaible, say Y.
* r=-1: For every unit increase in a Variable, say X, the is a negative increase of a fixed proportion on the other varaible, say Y.
* r=0: For every unit increase in a variable, say X, there is not a negative or positive increase on the other variable, say Y. This means that the two variables are un-correlated.


#### These are some examples of different correlatiosn r

<img src="images/E157AC_correlation-examples.svg" style="height:300px" align="left" float="left"/>

source: mathsisfun.com

Now that you have a visual and conceptual idea of what correlation is, we can use the functions defined below to calculate the corralation for the variables in your data set.

You do not have to worry about understanding the implementation of the two functions below, you just need to know how to use and interpret the results for the *correlation* function we created for you. 

If you would like to learn more about how to obtain the corraltion coefficient please visit https://www.inferentialthinking.com/chapters/15/1/Correlation for more details.

In [None]:
def standard_units(x):
    """
    Convert any array of numbers to standard units.
    """
    
    return (x - np.average(x))/np.std(x)

In [None]:
def correlation(t, x, y):
    """
    Determines the correlation between the x and y variables
    
    Arguments:
    t: Data Frame 
    x: Column in your data frame
    y: Column in you data framel
    """
    x_in_standard_units = standard_units(t.column(x))
    y_in_standard_units = standard_units(t.column(y))
    return np.average(x_in_standard_units * y_in_standard_units)

#### YOUR TURN
Find the correlation for the two variables we explored on the previous example. Namely, between "ghg_2010" and "gdp_ppp_2010". Replace the  (...) with your solution. 

Make sure to comment out **raise NotImplementedError()** on the following questions. 

In [None]:
#YOUR CODE HERE 
t= ...   
x="ghg_2010" 
y="gdp_ppp_2010"
correlation(t,x,y)

raise NotImplementedError()

**Note** that the process of data cleaning for data analysis can get to be very tideous, but detrimental for you analysis if you are not careful!

Lets break it down. The correlation for "ghg_2010" and "gdp_ppp_2010" returned `nan`. Did soemthing go wrong in the implementation of the correlation function? The answer is no. It has to do with an error that occurred in how `correaltion` is computed. The above function is made out of the average x_in_standard_units  and  y_in_standard_units. Lets take a look at `average x_in_standard_units` to see if you can figure out what is wrong with it?

In [None]:
x_in_standard_units =standard_units(t.column(x))
x_in_standard_units 

What do you notice?

In [None]:
correlation_exp = " "
print(correlation_exp)


raise NotImplementedError

From the equation above we can see that standard_units is calculate by perfroming some mathematical operations to column/ variable x, particualrly by taking the average of its elemnts and the standard devidation. Now, lets take a look at the variable "ghg_2010" in our data set. 

In [None]:
t.column(x)

By inspecting x  we can notice that there are a  couple of `nan` (or missing values) on the coulmn. These values break our two functions because we **cannot** take the average of an unknown value. Now its your turn to try it! 

In [None]:
var =[ 1,2,3,np.NaN,4,5]
avg= np.average(var)
avg

Now remove the `nan` value from the list and see what happens. Save the new list into the variable  `new_var` and see what happends. 

In [None]:
new_var = ... 
new_avg= np.average(new_var)
new_avg

raise NotImplementedError()

The key takeway is that we cannot simply run these functions mindlessly and hope to get a solution. Cleaning the data is one of the most important aspects of conducting effective analysis. From this example we learned that the best way to handle this situtation is to remove the unknown values. On a later section we will go over how to handle this situations. We will create a function to automate the cleaning of our data. 

## Regression <a id='section3'></a>

#### What is a regression line?

* Linear regression allows us to model two or more varaibles. This line allows us to correlate our data.
    * For the purpose of this homework we will only be working with two. Namely X-independent variable and Y-dependent variable.
* Linear regressions allow us to predict one variable from another. In this case we will predict Y from X.
* There are many predictive models that help us find estiamates for our data. However, the goal of a simple linear regression is to create a linear model that minimize the sum of square errors. Making it the line of "best fit".
    * The error represents how far off our obervations(real data) are from our predicted values(data from our model).
    * For instance, if our sample had only one data point, say 5, and our model predicted that it would be 7. Our square error would be (5-7)^2= 4.

<img src="images/residuals.png" style="height:400px" align="left" float="left"/>


* Slope of the regression line "line of averages" can be found using the r we found before!
* The slope of this line determines the whether our correlation is possitive or negative.

No matter the shape of the scatter plot, this unique line minimizes the mean squared error of estimation!

#### Formula for linear regression?
First some *definitions*:

General format: $Y_i =\beta_0 +\beta_1Xi+\epsilon_i$ but when we are doing simple linea rgression we write it like this $E(Y_i) =\beta_0 +\beta_1Xi$, which is the mean or expected value of y, for a given x.

* notice that it looks very similar to : $y=mx +b$

$Y_i$: Dependent (response variable)

$\beta_0$: Y Intercept

$\beta_1$: Slope

$\epsilon_0$: Random Error (unexplained variation in Y)

#### Whta is R-squared (coefficient of determination)?

* R-squared allow us to see how close our data is to the regression line
* It is the percentage of the response variable variation that is explained by a linear model.
    * R-squared = Explained variation / Total variation
    * R-squared is always between 0 and 100%:
        * A 0% indicates that the model explains none of the variability of the response data around its mean.
        * A 100% indicates that the model explains all the variability of the response data around its mean.
* In general, the higher the R-squared, the better the model fits your data.
* However r_square cannot tell if:
    * predictions are biased
    * regression model is adequate

source: https://www.inferentialthinking.com/chapters/15/2/Regression_Line

source: http://statisticsbyjim.com/regression/interpret-r-squared-regression/

source: http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit



## Example: Putting it all together<a id='section4'></a> 

<img src="images/father_son.png" style="height:500px" align="left" float="left"/>

* Notice that in order to find line that helps us predict values, all we need to do is find a slope and a y-intercept.
* In this example, the regression line predeicts the son's height from the father's height.
* The equation for the regression line would look like:

$SONH_i= \beta_0 + \beta_1 FATHERHA + \epsilon_i$

source: https://galton.uchicago.edu/~wichura/Stat200/Handouts/C10.pdf

Note: On the Problems section we will go over utility functions that will let use create a regression line just like the one in the father-son example.

## Recap <a id='section5'></a>

Up to now you have learned:

* What a scatter plot is and how to create it
* How to label your plots
* What the correaltion coefficient (r) is, how to interpret it, and how to calculate it
* What the linar regression is, how to interpret its equation, and how to calculate it
* What the coefficeint of determination is (r_squared), how to intepret it, and how to calculate it


Now, you are ready to answer the rest of the questions.

## Problems <a id='section6'></a>


#### READ BEFORE ANSWERING THE QUESTIONS


When you explain your graphs below, do not describe them but instead interpret and explain them. Are there any correlations (or lack of)? Is there anything else that we may learn from the graph? You can what you learned about r_squared, and the betas when writting down your interpretetations

Finally, the scatter plots you are asked to create are written below using the standard convention of dependent vs. independent (i.e. plot "Y vs. X”).

Make sure to comment out **raise NotImplementedError()** on the questions below.

## Problem 1

Make a chart of Total GHG Emissions, 2010 vs. GDP. 

Make sure to fill the (...) with the appropiate values. If you need help, please refer back to the [example](#section9)<br> we did before.

**Note**: If you forget what each coulumn name stands for, you can access the table of at the beginning of the notebok [here](#section7)<br>

In [None]:
# YOUR CODE HERE
x =  problem_set.column(...)
y = problem_set.column(...)

... = scatter(x,y)

put_label_on_plot(test_plot, ..., ..., ...) 


raise NotImplementedError()


What do you notice? Why do you think your plot looks this way? Write down a few thoughts about why you think your plot looks like this.  . 

In [None]:
# YOUR CODE HERE
q1_answer = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

raise NotImplementedError()

print(q1_answer)

## Question 1

Calculate the `correlation` for the two variables. You can look at the example you did before to get started.
Note: You will run into an error after running this cell. Scroll down to see why.

In [None]:
# YOUR CODE HERE
correlation_1a = ...

In the `correlation` section we learned that there is missing data in at least one of our columns! (welcome to the "real world"). If, in your calculations, you don't exclude (i.e. delete) these, you will possibly run across some errors, and some meaningless results! (For example, if you were to try to calculate the "**Per-Capita** CO2 Emission from Fossil Fuel Consumption" of Afghanistan, you would get a result of zero because the emission data is missing. 
If you plotted this and used it to determine your linear regression, your regression would obviously be meaningless.)

<img src="images/error_pic.png" height='50' width='850'>

So, in order to avoid these issues, we've created a function to remove all the non-integers from the columns! The function is in the cell below. Do not worry about its implementation, just know that it removes the rows in a column that contain a `nan` value.

In [None]:
def clean(t,x,y):
    """
    This functions drops all the nan values in a column 
    
    Arguments: 
    t: Data Frame 
    x: Name of column you want to clean 
    y: Name of column you want to clean 
    
    Example:
    > t = problem_set
    > x = "gdp_ppp_2010"
    > y = "ghg_2010" 
    > len(t.column(x))
    >>> 186
  
    > cleaned = clean(my_data, x, y)
    > len(cleaned)
    >>> 173
    """
    unclean_x= pd.Series(t.column(x))
    unclean_y= pd.Series(t.column(y))

    df=  pd.DataFrame({x:unclean_x,y:unclean_y})
    pre_clean_df = df.dropna()


    clean_x = list(pre_clean_df.iloc[:, 0])
    clean_y = list(pre_clean_df.iloc[:, 1])
    

    
    cleaned_df= Table().with_columns([
        x, clean_x,
        y,  clean_y
    ])

    return cleaned_df

With the function "clean" above, you should be able to perform any graphing you may be asked to do. Note that the clean function outputs a data frame with two cleaned columns. The first value is the cleaned x column, and the second value is the cleaned y value.

As you will notice below we have

1. Assigned the output of the function to a varible
2. Assigned each cleaned column from the data frame in step 1 to two new variables called "q1_cleaned_x" and "q1_cleaned_y".

You should be able to re-use part of the code in the next problem for the rest of you assigment.

Moving forward make sure to clean you columns before plotting, finding the betas or r_squared!

### Question 2
Assign a cleaned version of problem_set for the columns ghg_2010 and  gdp_ppp_2010 to a new variable q1_cleaned_df.

In [None]:
# YOUR CODE HERE
q1_cleaned_df = clean(problem_set,... ) 

### Question 3

To convince yourself that the `clean` function works, take the length of the the old data frame `problem_set` and the cleaned data frame `q1_cleaned_df` using the `len()` method on any column of your data frame. The second one should contain fewer values as it removed the null values on the data frame.

In [None]:
# YOUR CODE HERE 
original_len = len(..) 
cleaned_len = len(...) 

print(original_len,cleaned_len)

### Question 4

Make a scatter plot using your cleaned data set. Dot not forget to label it.

In [None]:
# YOUR CHART HERE 
cleaned_x_column = clean(problem_set...,...) 
cleaned_y_column = clean(problem_Set...,...) 

... = scatter(cleaned_x_column,... )

put_label_on_plot(..., ..., ..., ...)  



raise NotImplementedError()

### Question 5 

Find *beta_0*. Do not forget to use your cleaned data frame.

To answer this question we will use a new utility function that calculates the y intercept.

In [None]:
def intercept(t, x, y):
    """
    Returns the y-intercept needed to find the linear regression
    
    Arguments:
    t: Data Frame 
    x: Column in your data frame
    y: Column in you data frame
    """
    
    x_mean = np.mean(t.column(x))
    y_mean = np.mean(t.column(y)) 
    return  y_mean - slope(t, x, y)*x_mean

In [None]:
# YOUR CODE HER
q1_beta0= 

raise NotImplementedError()

### Question 6 

Find *beta_1*. Do not forget to use your cleaned data frame.

To answer this question we will use a new utility function that calculates the slope.

In [None]:
def slope(t, x, y):
    """
    Returns the slope needed to find the linear regression
    
    Arguments:
    t: Data fram 
    x: Column in your data frame
    y: Column in you data frame
    """
  
    r = correlation(t, x, y)
    y_sd = np.std(t.column(y))
    x_sd = np.std(t.column(x))
    return r * y_sd / x_sd

In [None]:
# YOUR CODE HERE
q1_beta1= 

raise NotImplementedError()

### Question 7

Find *r_squared*. Do not forget to use your cleaned data frame.

To answer this question we will use a new utility function that calculates r_squared. 

In [None]:
def r_squared(t, x, y):
    """
    Returns r squared
    
    t: Data Frame 
    x: Column in your data frame
    y: Column in you data frame
    """

    r = correlation(t, x, y)
    return r**2

In [None]:
# YOUR CODE HERE
q1_squared =  


raise NotImplementedError()

### Question 8 

Now that you have computed beta_0, beta_1, you are ready to add the line of best fit to your original plot. 

Hint: Recall that `slope = beta1` and ` intercept = beta0`. To answer this question we will use a new utility function `draw_line` that plots the line of best fit. Don't forget to add the proper labels.

In [None]:
def draw_line(t, x, y,slope, intercept,color='r'):
    
    """
    Draws the linear regression line 
    
    t: Data Frame 
    x: Column in your data frame
    y: Column in you data frame
    
    
    """

    line= slope*t.column(x) + intercept
    x= t.column(x)
    y= t.column(y)
    
    plt.plot(x, y,'o', x,line)

In [None]:
 # YOUR CODE HERE
t= ...
x_label=... 
y_label = .. 
slope= q1_beta1
intercept= q1_beta0
draw_line(t, x_label,y_label, slope, intercept)
put_label_on_plot(..., ..., ..., ...)  

### Question 9 
Explain the results above in 2-3 sentences.


In [None]:
# YOUR CODE HERE

q1_9 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

raise NotImplementedError()

print(q1_9)

In [None]:
assert(store(q1_rquare))== '911ed95f081fe8b8590e56c984045ef4'
assert(store(q1_beta0))== 'a2897207b9c5d045f8f8422d5d5b26ef'
assert(store(q1_beta1))=='3f4537a038f630f6eabcad5ae82233b7'
assert(store(len(q1_cleaned_x_column)))== 'f7e6c85504ce6e82442c770f7c8606f0'
assert(store(len(q1_cleaned_y_column)))== 'f7e6c85504ce6e82442c770f7c8606f0'
assert(original_len != cleaned_len)

## Problem 2


### Question 1 

Make a chart of Per-Capita Total GHG Emissions, 2010 vs. HDI.

Hint:
* One way to do this is by adding a new column by `dataframe['new column name'] = [data]`. e.g. add a column called `per_capital_total_GHG` to the `problem_set` table.
* Do not forget to clean your data before you making a chart! 

In [None]:
# YOUR CHART HERE 

raise NotImplementedError()

### Question 2

Find *r_squared*.

In [None]:
# YOUR CODE HERE
q2_rquare=

raise NotImplementedError()

### Question 3
Find beta_0.

In [None]:
# YOUR CODE HERE
q2_beta0= 

raise NotImplementedError()

### Question 4
Find beta_1.

In [None]:
# YOUR CODE HERE
q2_beta1= 

raise NotImplementedError()

### Question 5
Explain your result in 2-3 sentences.

In [None]:
q2_5 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

# YOUR CODE HERE
raise NotImplementedError()

print(q2_5)

In [None]:
#assert(store(q2_rquare))== 
#assert(sotre(q2_beta0))==
#assert(sotre(q2_beta1))==

## Problem 3

### Question 1
Make a chart of Cumulative CO2 Emissions from Energy (1850-2007) Rank vs. HDI Rank.

Hint: 
* Order the countries in descending order accoridng to the "Cumulative CO2 Emissions from Emergy" variable. 
* The method `sort` can home handy for this question http://data8.org/datascience/_autosummary/datascience.tables.Table.sort.html#datascience.tables.Table.sort

In [None]:
# YOUR CHART HERE 

raise NotImplementedError()

### Question 2
Find r_squared

In [None]:
q3_rquare=# YOUR CODE HERE

raise NotImplementedError()

### Question 3
Explain your result in 2-3 sentences.

In [None]:
q3_3 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

# YOUR CODE HERE
raise NotImplementedError()

print(q3_3)

In [None]:
#assert(store(q3_rquared))==

## Problem 4


### Question 1

Make a chart of Cumulative CO2 Emissions from Energy (1850-2007) Per Capita Rank vs. HDI Rank. **Make sure to include labels and a regression line. **

In [None]:
# YOUR CHART HERE 

raise NotImplementedError()

### Question 2

Find r_squared.

In [None]:
q4_rquare=# YOUR CODE HERE

raise NotImplementedError()

### Question 3
Explain your result in 2-3 sentences.

In [None]:
q4_3 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

# YOUR CODE HERE
raise NotImplementedError()

print(q4_3)

In [None]:
#assert(store(q4_rquare))==

## Problem 5

Make sure to write the response for q5_a and q5_3 without spaces. For example: `"Mexico"` not  `"Mexico "`.

### Question 1

Which country has the highest Total GHG Emissions in 2010?

In [None]:
# YOUR CODE HERE
q5_1= ""

raise NotImplementedError()

### Question 2
Where do they rank on the per-capita scale?

In [None]:
# YOUR CODE HERE
q5_2= 

raise NotImplementedError()

### Question 3
Which country has the highest per-capita?

In [None]:
# YOUR CODE HERE
q5_3= ""

raise NotImplementedError()

### Question 4
Explain the results above in 2-3 sentences.

In [None]:
# YOUR CODE HERE
q5_4 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""


raise NotImplementedError()

print(q5_4)

In [None]:
assert(store(q5_1.lower())=='8a7d7ba288ca0f0ea1ecf975b026e8e1'
assert(store(q5_2))== 'a3f390d88e4c41f2747bfa2f1b5f87db'
assert(store(q5_3.lower())== 'b18fa0d42ad9b3706237ef5b02434829'

## Problem 6
Make sure to write the response for Question 1 and Question 2 without spaces. For example: `"Mexico"` not  `"Mexico "`.


### Question 1
Which country has the highest “Footprint of all goods and services consumed”? 

In [None]:
# YOUR CODE HERE
q6_1= ""

raise NotImplementedError()

### Question 2
Which has the highest per-capita?

In [None]:
# YOUR CODE HERE
q6_2= ""

raise NotImplementedError()

### Question 3
Explain the results above in 2-3 sentences.

In [None]:
q6_3 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

# YOUR CODE HERE
raise NotImplementedError()

print(q6_3)

In [None]:
assert(store(q6_1.lower()) in ['ade6b3bd5e720abb20ed8a9a4c6b9ae8','ada53304c5b9e4a839615b6e8f908eb6','152649df347ee2891a9eacc883e07d17']
assert(store(q6_2.lower()) == '458e4cbc78201c1aec5fc53a31c59378'

## Problem 7

* Go to www.gapminder.org/tools. Create an animated graph that tells you something interesting about climate change (CO2 Emissions should be on one axis). 
    * Note you can change an axis by clicking on the axis label and selecting a new measure from the various options. 
    
* Use the print tool(or schreenshots) to create an image of **one year** (be careful – sometimes the most recent data doesn’t include many countries, so take a screenshot that includes most of the world) and include that with your assignment. 

* **Explain your results**

Now, we are going to add the image you just created to this notebook. First, we'll need to upload the screenshots into your datahub. To do this, click the upload button on the top right-hand corner of the datahub file browser. Then select your files and upload them to datahub. 
![Demo](images/demo.png)
Once both images have been uploaded, you can display them below by replacing the "..." with the filenames.png. Make sure to give it an appropriate and descriptive title.

###### To display the graph (double click on this cell)

**Graph Title**
![Graph](...)


In [None]:
q7 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

# YOUR CODE HERE
raise NotImplementedError()

print(q7)

## Problem 8

Reflect on your findings.


### Question 1
Do you think per-capita or total national emissions are the more appropriate way to do carbon accounting, and why?

In [None]:
# YOUR CODE HERE
q8a_1 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

raise NotImplementedError()

print(q8a_1)

### Question 2
Do you think accounting should be based on what a country emits within its boundaries, or what a country consumes, including emissions from the production of goods elsewhere?

In [None]:
# YOUR CODE HERE
q8_2 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

raise NotImplementedError()

print(q8_2)

### Question 3

Do you think countries should reduce their emissions in proportion to 

a) their past emissions

b) their level of development & capacity to reduce

c) the degree to which they will be impacted by climate change

d) a combination of these, or something else (explain)

In [None]:
# YOUR CODE HERE
q8_3 = r"""

Put your answer here, replacing this text. Do not take into account the ### YOUR CODE HERE below

"""

raise NotImplementedError()

print(q8_3)

## Final Survey <a id='section7'></a>

Congrats! You've finished the final Jupyter Notebook assignment! The Division of Data Sciences and Information would like to ask you to please fill this survey out as a part of your assignment. We would like to improve the module for future semesters, and would really appreciate it if you took the time to fill this out so we can better serve you!

Please make sure you are logged into your Berkeley (.edu) email address to access the form.
### [Survey Link](https://goo.gl/forms/FqSRIYCzAAOfZ5Bv2)

Alternatively, please copy and paste this link into your URL bar: https://goo.gl/forms/FqSRIYCzAAOfZ5Bv2

## Saving the Notebook as an HTML
As usual, you will be submitting this notebook as an HTML file. To turn in this lab assignment follow the steps below:

1. **Important:** Click the Save icon located at the far left on the top toolbar. Make sure to do this before following the next steps.
2. Run the very next cell to convert this notebook to HTML.
3. Go to the `ProblemSet3` folder, which contains the HTML file named "Problem Set 3.html". (Click the jupyter icon at the top left and navigate to the same folder where this notebook is located)
4. Check the box next to the file and click the Download button to download it to your computer
5. Click to open in a web browser of your choice to make sure that everything looks okay
6. Submit to the Problem Set 3 Assignment on bCourses.

In [None]:
# DO NOT EDIT: This cell converts the current notebook into an HTML file
!python ipy2html.py "Problem Set 3.ipynb"