1. Create a `data` folder in your local project repository.  

2. Download these two CSV files and place them in the data folder:

    a.	Gross Domestic Product (GDP) per capita http://data.un.org/Data.aspx?d=WDI&f=Indicator_Code%3aNY.GDP.PCAP.PP.KD **DO NOT APPLY ANY FILTERS**
     - rename the file to `gdp_percapita.csv`
     - open it with a text editor (**not excel**) and take a look

    b.	Percentage of Individuals using the Internet http://data.un.org/Data.aspx?d=ITU&f=ind1Code%3aI99H  **DO NOT APPLY ANY FILTERS**
     - rename the file to `internet_use.csv`
     - open it with a text editor (**not excel**) and take a look

2.	Launch a Jupyter Notebook. 
 - _*IMPORTANT:  You are likely to get errors along the way. When you do, read the errors to try to understand what is happening and how to correct it.*_
  - Use markdown cells to record your answers to any questions asked in this exercise. On the menu bar, you can toggle the cell type from `Code` to `Markdown`.

3. Import the required packages with their customary aliases as follows:

    `import pandas as pd`   
    `import numpy as np`  
    `import matplotlib.pyplot as plt`  
    `import seaborn as sns`
    
4. Use the `%matplotlib inline` magic command so that your plots show in the notebook _without_ having to call `plt.show()` every time.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

5.	Using the pandas `read_csv()` method, read the GDP dataset into your notebook as a DataFrame called `gdp_df`. Take a look at the first 6 rows.

In [None]:
gdp_df = pd.read_csv('../data/gdp_percapita.csv')
gdp_df.head(6)

6. Repeat for the internet use dataset. Call this DataFrame `internet_df`. Take a look at the first six rows.

In [None]:
internet_df = pd.read_csv('../data/internet_use.csv', nrows = 4495)
internet_df.head(6)
internet_df.tail(2)

98. Look at the shape of each DataFrame - how many rows? How many columns?

In [None]:
gdp_df.shape

In [None]:
internet_df.shape

6.	Take a look at the datatypes for the columns in each DataFrame.

In [None]:
gdp_df.dtypes

In [None]:
internet_df.dtypes

99. Take a look at the last 10 rows of each DataFrame in turn.

In [None]:
gdp_df.tail(10)

In [None]:
internet_df.tail(10)

7.	Drop the `value footnotes` column from both DataFrames. Check that this worked as expected.


In [None]:
gdp_df.columns
gdp_df = gdp_df.drop(columns = ['Value Footnotes'])
gdp_df.head()

In [None]:
internet_df.columns
internet_df = internet_df.drop(columns = ['Value Footnotes'])
internet_df.head()

8.	Change the columns for the GDP Per Capita DataFrame to ‘Country’, ‘Year’, and ‘GDP_Per_Capita’.

In [None]:
gdp_df.columns = ['Country', 'Year', 'GDP_Per_Capita']
gdp_df.head()

9. Change the columns for the Internet Users DataFrame to ‘Country’, ‘Year’, and ‘Internet_Users_Pct’.

In [None]:
internet_df.columns = ['Country', 'Year', 'Internet_Per_Capita']
internet_df.head()

10.	Merge the two DataFrames to one. Merge **all rows** from each of the two DataFrames. Call the new DataFrame `gdp_and_internet_use`.

In [None]:
gdp_and_internet_use = pd.merge(gdp_df,internet_df, on = ['Country' , 'Year'], how = 'inner')

11.	Look at the first five rows of your new DataFrame to confirm it merged correctly.

In [None]:
gdp_and_internet_use.head(5)

12.	Look at the last five rows to make sure the data is clean and as expected.


In [None]:
gdp_and_internet_use.tail(5)

13.	Subset the combined DataFrame to keep only the data for 2004, 2009, and 2014. Check that this happened correctly.

In [None]:
gdp_and_internet_use.dtypes
yearslist = [2004, 2009, 2014]
gdp_internet_2004_2009_2014 = gdp_and_internet_use.query("Year in @yearslist")
gdp_internet_2004_2009_2014.head(5)

14.	Create three new DataFrames, one for 2004, one for 2009, and one for 2014. Give them meaningful names that aren't too long.

In [None]:
gdp_internet_2004 = gdp_internet_2004_2009_2014.loc[gdp_internet_2004_2009_2014['Year'] == 2004].reset_index(drop = True)
gdp_internet_2004.head(1)

In [None]:
gdp_internet_2009 = gdp_internet_2004_2009_2014.loc[gdp_internet_2004_2009_2014['Year'] == 2009].reset_index(drop = True)
gdp_internet_2009.head(1)

In [None]:
gdp_internet_2014 = gdp_internet_2004_2009_2014.loc[gdp_internet_2004_2009_2014['Year'] == 2014].reset_index(drop = True)
gdp_internet_2014.head(1)

15.	Which country had the highest percentage of internet users in 2014? What was the percentage? (Try typing the first 3 letters of your DataFrame name and hitting the tab key for auto-complete options).

In [None]:
gdp_internet_2014.loc[gdp_internet_2014['Internet_Per_Capita'] == gdp_internet_2014.Internet_Per_Capita.max()]

16.	Which country had the lowest percentage of internet users in 2014? What was the percentage?

In [None]:
gdp_internet_2014.loc[gdp_internet_2014['Internet_Per_Capita'] == gdp_internet_2014.Internet_Per_Capita.min()]

17.	Repeat for 2004 and 2009.

In [None]:
gdp_internet_2009.loc[gdp_internet_2009['Internet_Per_Capita'] == gdp_internet_2009.Internet_Per_Capita.min()]

In [None]:
gdp_internet_2004.loc[gdp_internet_2004['Internet_Per_Capita'] == gdp_internet_2004.Internet_Per_Capita.min()]

18.	Which country had the highest gdp per capita in 2014? What was the gdp per capita?

In [None]:
gdp_internet_2014.loc[gdp_internet_2014['GDP_Per_Capita'] == gdp_internet_2014.GDP_Per_Capita.max()]

In [None]:
gdp_internet_2014.GDP_Per_Capita.max()

20.	Which country had the lowest gdp per capita in 2014? What was the gdp per capita?

In [None]:
gdp_internet_2014.loc[gdp_internet_2014['GDP_Per_Capita'] == gdp_internet_2014.GDP_Per_Capita.min()]

In [None]:
gdp_internet_2014.GDP_Per_Capita.min()

21.	Create some scatterplots:  
    a.  2004 Percent Using the Internet vs GDP Per Capita  
    b.	2009 Percent Using the Internet vs GDP Per Capita  
    c.	2014 Percent Using the Internet vs GDP Per Capita 

In [None]:
plt.scatter(x = 'Internet_Per_Capita', y = 'GDP_Per_Capita', data = gdp_internet_2004)

In [None]:
plt.scatter(x = 'Internet_Per_Capita', y = 'GDP_Per_Capita', data = gdp_internet_2009)

In [None]:
plt.scatter(x = 'Internet_Per_Capita', y = 'GDP_Per_Capita', data = gdp_internet_2014)

22.	Are there differences across years? What do the plots tell you about any relationship between these two variables? Enter your observations as a markdown cell.

There does seem to be a correlation between these two variables. The internet usage per capita goes up as the GDP per capita grows. Also the correlations seem to get stronger with time.

23.	Look at the distribution of gdp per capita values for 2014. Is it unimodal?

In [None]:
print('GDP Per Capita is unimodal')
plt.hist(x = 'GDP_Per_Capita', data = gdp_internet_2014)

24.	Look at the distribution of Internet Use for 2014. Is it unimodal?

In [None]:
print('Internet Per Capita is NOT unimodal')
plt.hist(x = 'Internet_Per_Capita', data = gdp_internet_2014)

25.	What are the top 5 countries in terms of internet use in 2014?

In [None]:
gdp_internet_2014.nlargest(5, 'Internet_Per_Capita')

26.	Create a DataFrame called top_5_internet **from the combined DataFrame that has all three years _for the 5 countries that had the greatest 2014 internet usage_**. You should have 15 rows. Check that this is true.

In [None]:
%who

In [None]:
top_5_internet = gdp_internet_2004_2009_2014.loc[(gdp_internet_2004_2009_2014.Country == 'Iceland') | (gdp_internet_2004_2009_2014.Country == 'Bermuda') | (gdp_internet_2004_2009_2014.Country == 'Norway') | (gdp_internet_2004_2009_2014.Country == 'Denmark') | (gdp_internet_2004_2009_2014.Country == 'Luxembourg')]

In [None]:
top_5_internet

27.	Create a seaborn FacetGrid to show the internet usage trend across the years 2004, 2009, and 2014 for these 5 countries (those with the highest reported internet use in 2014). Which country had the greatest growth between 2004 and 2014?

Bermuda

In [None]:
g_top_5_internet = sns.FacetGrid(top_5_internet, col="Country", ylim = (0,100))
g_top_5_internet.map_dataframe(sns.lineplot, x="Year", y="Internet_Per_Capita")

28.	Repeat the steps above to look at the trend for the 5 countries with the lowest 2014 internet usage. Which country has consistently had the least internet use?

Burundi

In [None]:
gdp_internet_2014.nsmallest(5, 'Internet_Per_Capita')

In [None]:
bottom_5_internet = gdp_internet_2004_2009_2014.loc[(gdp_internet_2004_2009_2014.Country == 'Timor-Leste') | (gdp_internet_2004_2009_2014.Country == 'Burundi') | (gdp_internet_2004_2009_2014.Country == 'Somalia') | (gdp_internet_2004_2009_2014.Country == 'Guinea') | (gdp_internet_2004_2009_2014.Country == 'Niger')]

In [None]:
bottom_5_internet

In [None]:
g_bottom_5_internet = sns.FacetGrid(bottom_5_internet, col="Country", ylim = (0,5))
g_bottom_5_internet.map_dataframe(sns.lineplot, x="Year", y="Internet_Per_Capita")

29.	Find the top 5 countries for 2014 in terms of GDP per capita; create a DataFrame to look at 10-year trends (use 2004, 2009, 2014 to look at the 10-year trend) in gdp per capita for the 5 countries with the highest 2014 GDP per capita. Use a seaborn facet grid for this.

In [None]:
gdp_internet_2014.nlargest(5, 'GDP_Per_Capita')
top_5_GDP = gdp_internet_2004_2009_2014.loc[(gdp_internet_2004_2009_2014.Country == 'Luxembourg') | (gdp_internet_2004_2009_2014.Country == 'Qatar') | (gdp_internet_2004_2009_2014.Country == 'Singapore') | (gdp_internet_2004_2009_2014.Country == 'Bermuda') | (gdp_internet_2004_2009_2014.Country == 'Switzerland')]
g_top_5_GDP = sns.FacetGrid(top_5_GDP, col="Country", ylim = (0,120000))
g_top_5_GDP.map_dataframe(sns.lineplot, x="Year", y="GDP_Per_Capita")

96. Repeat this one more time to look at 10-year trend for the 5 countries for 2014 with the lowest GDP per capita.

In [None]:
gdp_internet_2014.nsmallest(5, 'GDP_Per_Capita')
bottom_5_GDP = gdp_internet_2004_2009_2014.loc[(gdp_internet_2004_2009_2014.Country == 'Burundi') | (gdp_internet_2004_2009_2014.Country == 'Somalia') | (gdp_internet_2004_2009_2014.Country == 'Niger') | (gdp_internet_2004_2009_2014.Country == 'Mozambique') | (gdp_internet_2004_2009_2014.Country == 'Malawi')]
g_bottom_5_GDP = sns.FacetGrid(bottom_5_GDP, col="Country",ylim = (0,3000))
g_bottom_5_GDP.map_dataframe(sns.lineplot, x="Year", y="GDP_Per_Capita")

30.	Is there anything surprising or unusual in any of these plots? Searching on the internet, can you find any possible explanations for unusual findings?

### Bonus exercise:
1.    Download another data set from the UN data (http://data.un.org/Explorer.aspx) to merge with your data and explore.