In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
gdp_df = pd.read_csv('../data/gdp_percapita.csv.csv')

In [None]:
gdp_df.head()

In [None]:
gdp_df.tail()

#### How many rows and columns are in `gdp_df`? What are the data types of each column?

In [None]:
gdp_df.shape

In [None]:
gdp_df.dtypes

#### Drop the `Value Footnotes` column and rename the remaining three to 'Country', 'Year', and 'GDP_Per_Capita'.

In [None]:
gdp_df = gdp_df.drop(columns = ['Value Footnotes'])

In [None]:
gdp_df = gdp_df.rename(columns = {'Country or Area' : 'Country' , 'Value' : 'GDP_Per_Capita'})

In [None]:
gdp_df

In [None]:
gdp_df.GDP_Per_Capita = gdp_df.GDP_Per_Capita.astype(float)

In [None]:
gdp_df.dtypes

In [None]:
gdp_df = gdp_df.round({'GDP_Per_Capita':2})

In [None]:
gdp_df

#### How many countries have data for all years? Which countries are missing many years of data? Look at the number of observations per year. What do you notice?

205 countries have data for all years. Most of the underdeveloped countries have little data to go off of.

In [None]:
gdp_df.Country.value_counts()

In [None]:
country_counts = gdp_df.Country.value_counts()

In [None]:
country_counts = country_counts.to_frame()

In [None]:
country_counts = country_counts.reset_index()

In [None]:
country_counts = country_counts.rename(columns = {'index' : 'country', 'Country' : 'years_counted'})

In [None]:
country_counts.loc[country_counts['years_counted'] < 30].head(35)

In [None]:
all_years = country_counts.years_counted.value_counts()

In [None]:
all_years.head(1)

In [None]:
gdp_df

#### In this question, you're going to create some plots to show the distribution of GDP per capita for the year 2020. Go to the Python Graph Gallery (https://www.python-graph-gallery.com/) and look at the different types of plots under the Distribution section. Create a histogram, a density plot, a boxplot, and a violin plot. What do you notice when you look at these plots? How do the plots compare and what information can you get out of one type that you can't necessarily get out of the others?

Some of these charts are extremely useful in finding specific information. The histogram is a great way to look at the distribution of data. The density plot allows you to show the median and where the data peaks. I did not find the box plot to be that useful in this scenario, but on the other hand, the violin plot represented the data really well.

##### GDP 2020

In [None]:
gdp_2020 = gdp_df.loc[gdp_df['Year'] == 2020]
gdp_2020

### Histogram

In [None]:
plt.hist(gdp_2020.GDP_Per_Capita)

plt.show()

### Density Plot

In [None]:
xmedian = np.median(gdp_2020.GDP_Per_Capita)
gdp_2020.GDP_Per_Capita.plot.density()
plt.axvline(xmedian, c = 'red')
plt.title('GDP 2020')
plt.show()

### Boxplot

In [None]:
sns.boxplot(x=gdp_2020['Year'], y=gdp_2020['GDP_Per_Capita'])
plt.show()

### Violin Plot

In [None]:
sns.violinplot(x=gdp_2020['Year'], y=gdp_2020['GDP_Per_Capita'])
plt.show()

#### What was the median GDP per capita value in 2020?

In [None]:
from statistics import median

In [None]:
median(gdp_2020.GDP_Per_Capita)

In [None]:
gdp_df

#### For this question, you're going to create some visualizations to compare GDP per capita values for the years 1990, 2000, 2010, and 2020. Start by subsetting your data to just these 4 years into a new DataFrame named gdp_decades. Using this, create the following 4 plots:
	* A boxplot
	* A barplot (check out the Barplot with Seaborn section: https://www.python-graph-gallery.com/barplot/#Seaborn)
	* A scatterplot
	* A scatterplot with a trend line overlaid (see this regplot example: https://www.python-graph-gallery.com/42-custom-linear-regression-fit-seaborn)  
Comment on what you observe has happened to GDP values over time and the relative strengths and weaknesses of each type of plot.

In [None]:
my_list = [2020, 2010, 2000, 1990]
gdp_decades = gdp_df.set_index('Year').loc[my_list].reset_index()
gdp_decades

## GDP Decades

### Boxplot

In [None]:
sns.boxplot(x=gdp_decades['Year'], y=gdp_decades['GDP_Per_Capita'])
plt.show()

The boxplot gives a good representation of the cluster of GDP values. It also shows some outliers for each individual year.

### Barplot

In [None]:
sns.barplot(x=gdp_decades['Year'], y=gdp_decades['GDP_Per_Capita'])
plt.ylabel('GDP_Per_Capita')
plt.show()

The barplot is really straight forward and shows the variance in GDP.

### Scatterplot

In [None]:
sns.scatterplot(data=gdp_decades, x='Year', y='GDP_Per_Capita')
plt.show()

This scatter plot is similar to the barchart, but the strength is being able to see each countrt as an individual plot.

### Scatterplot with trend line

In [None]:
sns.regplot(x='Year', y='GDP_Per_Capita', data = gdp_decades, line_kws={"color":"r","alpha":0.7,"lw":5})
plt.show()

Adding the trend line shows the gradual increase in GDP that was not represented in the previous chart.

#### Which country was the first to have a GDP per capita greater than $100,000?

In [None]:
large_gdp = gdp_df.loc[gdp_df.GDP_Per_Capita > 100000]
large_gdp

In [None]:
large_gdp.sort_values(by = 'Year', ascending = True).head(1)

####  Which country had the highest GDP per capita in 2020? Create a plot showing how this country's GDP per capita has changed over the timespan of the dataset.

In [None]:
gdp_2020 = gdp_2020.reset_index(drop = True)
top_2020_country = gdp_2020.sort_values('GDP_Per_Capita', ascending = False).head(1).loc[130, 'Country']

In [None]:
gdp_2020.sort_values('GDP_Per_Capita', ascending = False).reset_index(drop = True)['Country'][0]

In [None]:
gdp_2020 = gdp_2020.reset_index(drop = True)
top2020_country_b= gdp_2020.sort_values('GDP_Per_Capita', ascending = False).head(1)['Country']

In [None]:
lux_mask = gdp_df.loc[gdp_df['Country'].isin(top2020_country_b)]
lux_mask

In [None]:
lux_mask = gdp_df.loc[gdp_df['Country'] == top_2020_country]
lux_mask

##### Luxembourg GDP change over time

In [None]:
plt.plot(lux_mask.Year, lux_mask.GDP_Per_Capita)
plt.title('Luxembourg')
plt.show()

#### Which country had the lowest GDP per capita in 2020? Create a plot showing how this country's GDP per capita has changed over the timespan of the dataset.

In [None]:
gdp_2020.sort_values('GDP_Per_Capita', ascending = True).head(1)

In [None]:
bur_mask = gdp_df.loc[gdp_df['Country'] == 'Burundi']
bur_mask

#### Burundi GDP change over time

In [None]:
plt.plot(bur_mask.Year, bur_mask.GDP_Per_Capita)
plt.title('Burundi')
plt.show()

#### **Bonus question:** Is it true in general that countries had a higher GDP per capita in 2020 than in 1990? Which countries had lower GDP per capita in 2020 than in 1990?

In [None]:
my_list = [2020, 1990]
gdp_comparison = gdp_df.set_index('Year').loc[my_list].reset_index()
gdp_comparison

In [None]:
gdp_comparison.sort_values('GDP_Per_Capita', ascending = False).head(50)

In [None]:
gdp_1990 = gdp_df.loc[gdp_df['Year'] == 1990]