# Capstone Project Exploratory Data Analysis (EDA)

In this step of the capstone project, I explored the data to understand how it is distributed. I first started with importing the data and with that there were many difficulties and challenges. After that was resolved, I took a look at which columns were needed to explore the data and got rid of the ones that weren't. Next I compared some of the oldest values recorded with the newest ones, saw how the values overall correlated, and finally used pandas profiling for report of the data.


## Importing the Data

### Difficulties and Changes

When starting out with the data, I realized that the JSON format the data was originally saved in would not be easy to work with. The data itself was thuroughly nested, and once I got that worked out, it did not include the data points I needed. With that, I decided to pivot and pull the data as CSVs instead. From there, it was easier to import it into pandas data frames and 

### Data Importation Methods

With the files I currently have from my data source, [the World Bank](https://data.worldbank.org/), I had to import the data two different ways:

This first way, I imported pandas and pandas profiling, and put all of the cells from the CSV into a pandas data frame. The first few rows were skipped because it included information on when the file was last updated and other non-noteworthy artifacts.  
```
import pandas as pd
from pandas_profiling import ProfileReport
    
df = pd.read_csv('data.csv', skiprows = 4)
df
```
This second way, all I needed to do was import pandas and pandas profiling and read the CSV. Using the skipfooter parameter, I'm able to remove unneccessary rows from the bottom of the file.

```
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.read_csv('data.csv', engine='python', skipfooter = 49)
df
```

Below, I used the second way of data importing to import GDP figures ranging back from 1960 until today.

In [35]:
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.read_csv('data.csv', engine='python', skipfooter = 49)
df

Unnamed: 0,Series Name,Series Code,Country Name,Country Code,1960,...,2016,2017,2018,2019,2020
0,GDP (current US$),NY.GDP.MKTP.CD,Afghanistan,AFG,537777811.100000,...,18116562465.000000,18753469630.000000,18053228579.000000,18799450743.000000,20116137326.000000
1,GDP (current US$),NY.GDP.MKTP.CD,Albania,ALB,,...,11861199831.000000,13019689337.000000,15156432310.000000,15400242875.000000,14887629268.000000
2,GDP (current US$),NY.GDP.MKTP.CD,Algeria,DZA,2723593385.000000,...,160034000000.000000,170097000000.000000,174911000000.000000,171767000000.000000,145009000000.000000
3,GDP (current US$),NY.GDP.MKTP.CD,American Samoa,ASM,,...,671000000.000000,612000000.000000,639000000.000000,648000000.000000,709000000.000000
4,GDP (current US$),NY.GDP.MKTP.CD,Andorra,AND,,...,2896679212.000000,3000180750.000000,3218316013.000000,3155065488.000000,
...,...,...,...,...,...,...,...,...,...,...,...
212,GDP (current US$),NY.GDP.MKTP.CD,Virgin Islands (U.S.),VIR,,...,3798000000.000000,3794000000.000000,3900000000.000000,4068000000.000000,
213,GDP (current US$),NY.GDP.MKTP.CD,West Bank and Gaza,PSE,,...,15405400000.000000,16128000000.000000,16276600000.000000,17133500000.000000,15561300000.000000
214,GDP (current US$),NY.GDP.MKTP.CD,"Yemen, Rep.",YEM,,...,31317365269.000000,26840128755.000000,21606140907.000000,,
215,GDP (current US$),NY.GDP.MKTP.CD,Zambia,ZMB,713000000.000000,...,20958412538.000000,25873601261.000000,26311590297.000000,23308667781.000000,18110631358.000000


## Data Cleaning

Afterwards, I realized that there were columns that aren't needed in the EDA, however they may be useful later on. I removed them from my results using `df.drop()`.

```
df = df.drop(columns=['Indicator Name', 'Indicator Code', 'Unnamed: 65'])
df
```

After that, the only other issue was the numbers showing up in scientific notation because they are between the millions and trillions of dollars range. To fix that, I changed the display options for floats with this code here: `pd.options.display.float_format = '{:.2f}'.format`

In [36]:
df = df.drop(columns=['Series Name', 'Series Code'])
pd.options.display.float_format = '{:.2f}'.format
df

Unnamed: 0,Country Name,Country Code,1960,1961,1962,...,2016,2017,2018,2019,2020
0,Afghanistan,AFG,537777811.10,548888895.60,546666677.80,...,18116562465.00,18753469630.00,18053228579.00,18799450743.00,20116137326.00
1,Albania,ALB,,,,...,11861199831.00,13019689337.00,15156432310.00,15400242875.00,14887629268.00
2,Algeria,DZA,2723593385.00,2434727330.00,2001428328.00,...,160034000000.00,170097000000.00,174911000000.00,171767000000.00,145009000000.00
3,American Samoa,ASM,,,,...,671000000.00,612000000.00,639000000.00,648000000.00,709000000.00
4,Andorra,AND,,,,...,2896679212.00,3000180750.00,3218316013.00,3155065488.00,
...,...,...,...,...,...,...,...,...,...,...,...
212,Virgin Islands (U.S.),VIR,,,,...,3798000000.00,3794000000.00,3900000000.00,4068000000.00,
213,West Bank and Gaza,PSE,,,,...,15405400000.00,16128000000.00,16276600000.00,17133500000.00,15561300000.00
214,"Yemen, Rep.",YEM,,,,...,31317365269.00,26840128755.00,21606140907.00,,
215,Zambia,ZMB,713000000.00,696285714.30,693142857.10,...,20958412538.00,25873601261.00,26311590297.00,23308667781.00,18110631358.00


## Data at a Glance

Overall, I'm left with 217 rows pertaining to world countries with 63 columns pertaining to the country name, country code, and year. I then used `df.describe()` to get an overview of the data as well as seeing the diffences between the mean values from 1960 and 2020. In 60 years, the world average GDP dramatically increased. Some of the rows are missing GDP data due the GDP not being recorded for that country.

In [37]:
df1960mean = df['1960'].mean()
df2020mean = df['2020'].mean()
print('Mean GDP in 1960: {m1960}\nMean GDP in 2020: {m2020}'.format(m1960=df1960mean, m2020=df2020mean))
print('Difference in GDP between 1960 to 2020: {mdif}'.format(mdif=df2020mean-df1960mean))
df.describe()

Mean GDP in 1960: 11614905878.833878
Mean GDP in 2020: 430990113284.3397
Difference in GDP between 1960 to 2020: 419375207405.50586


Unnamed: 0,1960,1961,1962,1963,1964,...,2016,2017,2018,2019,2020
count,98.0,101.0,104.0,104.0,104.0,...,208.0,208.0,208.0,205.0,194.0
mean,11614905878.83,11906336553.98,12630162297.38,13615198721.58,14905901025.68,...,362773134768.17,385953305417.55,410166193965.98,422337620580.79,430990113284.34
std,55924191269.38,57154619021.81,60527339386.19,64025794833.88,68948817766.79,...,1600167269667.4,1688091240186.3,1810180258700.04,1881976352922.97,1914465862733.67
min,12012012.01,11592011.59,9122751.45,10840095.13,12712471.4,...,36547799.58,40619251.99,42588164.97,47271463.33,48855550.2
25%,283286082.35,271066000.0,284161587.15,328767087.58,364379525.75,...,5808218370.0,6299854430.5,6644358830.25,7220395248.0,7780402262.5
50%,1023646137.5,1058975266.0,1114083740.5,1179979554.0,1258118950.0,...,23221376289.0,25426395630.5,26143956879.0,26896660000.0,31192751628.5
75%,4255954032.25,4817580184.0,5114321932.75,5724960131.0,5975611956.75,...,161676000000.0,174377250000.0,190487750000.0,205144000000.0,203106750000.0
max,543300000000.0,563300000000.0,605100000000.0,638600000000.0,685800000000.0,...,18745100000000.0,19543000000000.0,20611900000000.0,21433200000000.0,20953000000000.0


## Data Insights

From there, I decided to ask a few questions of my data:

1. Year over year, how does the data correlate?
2. Overall, how many missing values are there?
3. What was the minimum recorded GDP from 1960?
4. What was the maximum recorded GDP from 1960?
5. What were the five countries with the lowest recorded GDP from 1960?
6. What were the five countries with the highest recorded GDP from 1960?
7. What was the minimum recorded GDP from 2020?
8. What was the maximum recorded GDP from 2020?
9. What were the five countries with the lowest recorded GDP from 2020?
10. What were the five countries with the highest recorded GDP from 2020?

Below, each one of the questions are answered in order.

### 1

In [38]:
pd.set_option('display.max_columns', df.shape[1]+1)
df.isnull().sum().to_frame().T

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,0,0,119,116,113,113,113,104,102,99,97,97,91,89,89,89,88,86,85,82,83,82,71,68,67,66,65,63,61,57,54,54,40,45,41,38,34,25,25,25,23,22,18,17,12,12,12,12,11,11,10,10,9,7,8,7,7,8,9,9,9,12,23


### 2

In [39]:
df1960 = df[df['1960'] == df['1960'].min()]
df1960

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
169,Seychelles,SYC,12012012.01,11592011.59,12642026.57,13923029.26,15393032.35,15603032.8,16443034.56,16632032.81,16074027.35,16452027.99,18432031.36,21965951.72,30645121.01,36896278.22,43134498.69,47803145.96,49278979.55,64526398.66,85552369.91,127261099.2,147357222.8,154902869.0,147912069.8,146712850.5,151313242.0,168887539.1,207850623.6,249267039.8,283828769.0,304832867.4,368584758.9,374359556.1,433667193.8,473916819.5,486451204.6,508221508.2,503068472.2,562958836.5,608369282.2,622985493.7,614879764.8,622262057.2,697518248.2,705704816.0,839319927.3,919103254.5,1016418229.0,1033561654.0,967199594.0,847397850.1,969936525.3,1065826670.0,1060226126.0,1328157609.0,1343007845.0,1377495054.0,1426651769.0,1528242026.0,1547690759.0,1582841059.0,1059886364.0


### 3

In [40]:
df1960 = df[df['1960'] == df['1960'].max()]
df1960

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
206,United States,USA,543300000000.0,563300000000.0,605100000000.0,638600000000.0,685800000000.0,743700000000.0,815000000000.0,861700000000.0,942500000000.0,1019900000000.0,1073300000000.0,1164850000000.0,1279110000000.0,1425380000000.0,1545240000000.0,1684900000000.0,1873410000000.0,2081830000000.0,2351600000000.0,2627330000000.0,2857310000000.0,3207040000000.0,3343790000000.0,3634040000000.0,4037610000000.0,4338980000000.0,4579630000000.0,4855220000000.0,5236440000000.0,5641580000000.0,5963140000000.0,6158130000000.0,6520330000000.0,6858560000000.0,7287240000000.0,7639750000000.0,8073120000000.0,8577550000000.0,9062820000000.0,9630660000000.0,10252300000000.0,10581800000000.0,10936400000000.0,11458200000000.0,12213700000000.0,13036600000000.0,13814600000000.0,14451900000000.0,14712800000000.0,14448900000000.0,14992100000000.0,15542600000000.0,16197000000000.0,16784800000000.0,17527200000000.0,18238300000000.0,18745100000000.0,19543000000000.0,20611900000000.0,21433200000000.0,20953000000000.0


### 4

In [41]:
df = df.sort_values('1960', ascending=True)
df.head(5)

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
169,Seychelles,SYC,12012012.01,11592011.59,12642026.57,13923029.26,15393032.35,15603032.8,16443034.56,16632032.81,16074027.35,16452027.99,18432031.36,21965951.72,30645121.01,36896278.22,43134498.69,47803145.96,49278979.55,64526398.66,85552369.91,127261099.2,147357222.8,154902869.0,147912069.8,146712850.5,151313242.0,168887539.1,207850623.6,249267039.8,283828769.0,304832867.4,368584758.9,374359556.1,433667193.8,473916819.5,486451204.6,508221508.2,503068472.2,562958836.5,608369282.2,622985493.7,614879764.8,622262057.2,697518248.2,705704816.0,839319927.3,919103254.5,1016418229.0,1033561654.0,967199594.0,847397850.1,969936525.3,1065826670.0,1060226126.0,1328157609.0,1343007845.0,1377495054.0,1426651769.0,1528242026.0,1547690759.0,1582841059.0,1059886364.0
181,St. Kitts and Nevis,KNA,12366563.61,12483229.31,12541562.15,12833226.39,13416554.86,13593932.32,14469078.18,16742338.25,14600000.0,15850000.0,16300000.0,19624746.45,22944849.12,24196018.38,31514856.31,33364055.3,30095602.29,44496296.3,49433333.33,58840740.74,68459259.26,80888888.89,86022222.22,86874074.07,98603703.7,111007407.4,130685185.2,147748148.1,172692592.6,192518518.5,217259259.3,220540740.7,242137037.0,263755555.6,295159259.3,313485185.2,333944444.4,374641308.0,383257331.4,406595484.4,421695769.8,458643829.0,481077373.7,469869869.9,506900000.0,547203703.7,644411111.1,689285185.2,751233333.3,747862963.0,760170370.4,817759259.3,800414814.8,839770370.4,916566666.7,923155555.6,1008888889.0,1060740741.0,1078518519.0,1164814815.0,980740740.7
184,St. Vincent and the Grenadines,VCT,13066557.78,13999883.33,14524878.96,13708219.1,14758210.35,15108207.43,16099865.83,15835177.93,15350000.0,16650000.0,18450000.0,20051648.18,27585488.99,30165373.62,32924215.86,33237164.72,32792480.97,49353161.85,60844771.48,71096359.63,82340339.63,102086539.3,113759203.3,122255349.6,135024987.8,145641705.2,160846656.7,175580647.4,200726712.6,214745002.2,240366666.7,254829629.6,277955555.6,286307407.4,289437037.0,316007407.4,331488888.9,347770370.4,373618518.5,390718518.5,396262963.0,430040740.7,461885185.2,481807407.4,521974074.1,550729629.6,610929629.6,684444444.4,695429629.6,674922222.2,681225925.9,676129629.6,692933333.3,721207407.4,727714814.8,755400000.0,774429629.6,792177777.8,811300000.0,825040740.7,807474074.1
19,Belize,BLZ,28071888.56,29964370.71,31856922.86,33749405.01,36193826.12,40069930.07,44405594.41,47379310.34,44910179.64,47305389.22,53233532.93,59207317.07,66062500.0,78343558.28,103216374.3,118066298.3,96905829.6,117650000.0,136300000.0,151800000.0,197938222.4,196089854.7,182206327.0,192103186.0,214381949.0,212643742.7,231638320.5,281082558.6,320093360.3,369133890.7,412086445.5,444720750.0,518559700.0,560205300.0,581269550.0,620422500.0,641660000.0,654582900.0,689140000.0,732732350.0,832072450.0,868358750.0,925197950.0,983575600.0,1051386600.0,1102564900.0,1210603750.0,1271598050.0,1351338650.0,1317309450.0,1377177100.0,1460797903.0,1522897506.0,1579411253.0,1667335061.0,1721700991.0,1789304088.0,1858529677.0,1915899787.0,1982518541.0,1636280797.0
25,Botswana,BWA,30412308.99,32902336.64,35643207.63,38091150.57,41613969.05,45790869.75,51464435.15,58646443.51,66248256.62,77356914.08,96245114.46,127456485.1,164466873.7,244129088.0,306033848.4,355172413.8,372010119.6,451603325.4,590376720.6,819877300.6,1060923829.0,1073861599.0,1014907255.0,1172258182.0,1240796365.0,1114764007.0,1392634772.0,1965274882.0,2644536804.0,3083800685.0,3790567052.0,3942792837.0,4146513722.0,4160086253.0,4259330999.0,4730611067.0,4847752843.0,5020214747.0,4790458837.0,5484257417.0,5788329609.0,5489608300.0,5438857107.0,7511582173.0,8957467707.0,9918907108.0,10137883299.0,10939053365.0,10945070442.0,10267133178.0,12786654498.0,15351972361.0,14380004175.0,14901750991.0,15654660710.0,13578754072.0,15082578065.0,16088437675.0,16914245098.0,16593720656.0,15061922802.0


### 5

In [42]:
df = df.sort_values('1960', ascending=False)
df.head(5)

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
206,United States,USA,543300000000.0,563300000000.0,605100000000.0,638600000000.0,685800000000.0,743700000000.0,815000000000.0,861700000000.0,942500000000.0,1019900000000.0,1073300000000.0,1164850000000.0,1279110000000.0,1425380000000.0,1545240000000.0,1684900000000.0,1873410000000.0,2081830000000.0,2351600000000.0,2627330000000.0,2857310000000.0,3207040000000.0,3343790000000.0,3634040000000.0,4037610000000.0,4338980000000.0,4579630000000.0,4855220000000.0,5236440000000.0,5641580000000.0,5963140000000.0,6158130000000.0,6520330000000.0,6858560000000.0,7287240000000.0,7639750000000.0,8073120000000.0,8577550000000.0,9062820000000.0,9630660000000.0,10252300000000.0,10581800000000.0,10936400000000.0,11458200000000.0,12213700000000.0,13036600000000.0,13814600000000.0,14451900000000.0,14712800000000.0,14448900000000.0,14992100000000.0,15542600000000.0,16197000000000.0,16784800000000.0,17527200000000.0,18238300000000.0,18745100000000.0,19543000000000.0,20611900000000.0,21433200000000.0,20953000000000.0
205,United Kingdom,GBR,73233967692.0,77741965703.0,81247564157.0,86561961812.0,94407558351.0,101825000000.0,108573000000.0,113117000000.0,107760000000.0,116465000000.0,130672000000.0,148114000000.0,169965000000.0,192538000000.0,206131000000.0,241757000000.0,232615000000.0,263066000000.0,335883000000.0,438994000000.0,564948000000.0,540766000000.0,515049000000.0,489618000000.0,461487000000.0,489285000000.0,601453000000.0,745163000000.0,910123000000.0,926885000000.0,1093170000000.0,1142800000000.0,1179660000000.0,1061390000000.0,1140490000000.0,1346420000000.0,1421510000000.0,1559570000000.0,1653390000000.0,1685760000000.0,1662130000000.0,1643910000000.0,1784080000000.0,2057090000000.0,2421810000000.0,2544830000000.0,2717060000000.0,3106180000000.0,2938880000000.0,2425800000000.0,2491110000000.0,2674890000000.0,2719160000000.0,2803290000000.0,3087170000000.0,2956570000000.0,2722850000000.0,2699020000000.0,2900790000000.0,2878670000000.0,2759800000000.0
68,France,FRA,62225478001.0,67461644222.0,75607529810.0,84759195106.0,94007851047.0,101537000000.0,110046000000.0,118973000000.0,129785000000.0,141903000000.0,148456000000.0,165967000000.0,203494000000.0,264430000000.0,285552000000.0,360832000000.0,372319000000.0,410279000000.0,506708000000.0,613953000000.0,701288000000.0,615552000000.0,584878000000.0,559869000000.0,530684000000.0,553138000000.0,771471000000.0,934173000000.0,1018850000000.0,1025210000000.0,1269180000000.0,1269280000000.0,1401470000000.0,1322820000000.0,1393980000000.0,1601090000000.0,1605680000000.0,1452880000000.0,1503110000000.0,1492650000000.0,1362250000000.0,1376470000000.0,1494290000000.0,1840480000000.0,2115740000000.0,2196130000000.0,2318590000000.0,2657210000000.0,2918380000000.0,2690220000000.0,2642610000000.0,2861410000000.0,2683830000000.0,2811080000000.0,2852170000000.0,2438210000000.0,2471290000000.0,2588740000000.0,2789590000000.0,2728870000000.0,2630320000000.0
41,China,CHN,59716467625.0,50056868958.0,47209359006.0,50706799903.0,59708343489.0,70436266147.0,76720285970.0,72881631327.0,70846535056.0,79705906247.0,92602973434.0,99800958648.0,113688000000.0,138544000000.0,144182000000.0,163432000000.0,153940000000.0,174938000000.0,149541000000.0,178281000000.0,191149000000.0,195866000000.0,205090000000.0,230687000000.0,259947000000.0,309488000000.0,300758000000.0,272973000000.0,312354000000.0,347768000000.0,360858000000.0,383373000000.0,426916000000.0,444731000000.0,564325000000.0,734548000000.0,863747000000.0,961604000000.0,1029040000000.0,1094000000000.0,1211350000000.0,1339400000000.0,1470550000000.0,1660290000000.0,1955350000000.0,2285970000000.0,2752130000000.0,3550340000000.0,4594310000000.0,5101700000000.0,6087160000000.0,7551500000000.0,8532230000000.0,9570410000000.0,10475700000000.0,11061600000000.0,11233300000000.0,12310400000000.0,13894800000000.0,14279900000000.0,14722700000000.0
98,Japan,JPN,44307342950.0,53508617739.0,60723018684.0,69498131797.0,81749006382.0,90950278258.0,105628000000.0,123782000000.0,146601000000.0,172204000000.0,212609000000.0,240152000000.0,318031000000.0,432083000000.0,479626000000.0,521542000000.0,586162000000.0,721412000000.0,1013610000000.0,1055010000000.0,1105390000000.0,1218990000000.0,1134520000000.0,1243320000000.0,1318380000000.0,1398890000000.0,2078950000000.0,2532810000000.0,3071680000000.0,3054910000000.0,3132820000000.0,3584420000000.0,3908810000000.0,4454140000000.0,4998800000000.0,5545560000000.0,4923390000000.0,4492450000000.0,4098360000000.0,4635980000000.0,4968360000000.0,4374710000000.0,4182850000000.0,4519560000000.0,4893120000000.0,4831470000000.0,4601660000000.0,4579750000000.0,5106680000000.0,5289490000000.0,5759070000000.0,6233150000000.0,6272360000000.0,5212330000000.0,4896990000000.0,4444930000000.0,5003680000000.0,4930840000000.0,5036890000000.0,5148780000000.0,5057760000000.0


### 6

In [43]:
df2020 = df[df['2020'] == df['2020'].min()]
df2020

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
201,Tuvalu,TUV,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8824447.74,9365165.91,9742949.47,9630762.95,10886825.56,11025945.14,12334846.23,12700905.45,12757632.87,13687141.11,13742057.05,13196544.95,15450994.24,18231078.54,21534931.61,21839098.89,22902861.45,27030374.03,30290219.76,27101076.28,31823518.62,38711810.21,37671774.69,37509075.11,37290607.54,35492074.22,36547799.58,40619251.99,42588164.97,47271463.33,48855550.2


### 7

In [44]:
df2020 = df[df['2020'] == df['2020'].max()]
df2020

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
206,United States,USA,543300000000.0,563300000000.0,605100000000.0,638600000000.0,685800000000.0,743700000000.0,815000000000.0,861700000000.0,942500000000.0,1019900000000.0,1073300000000.0,1164850000000.0,1279110000000.0,1425380000000.0,1545240000000.0,1684900000000.0,1873410000000.0,2081830000000.0,2351600000000.0,2627330000000.0,2857310000000.0,3207040000000.0,3343790000000.0,3634040000000.0,4037610000000.0,4338980000000.0,4579630000000.0,4855220000000.0,5236440000000.0,5641580000000.0,5963140000000.0,6158130000000.0,6520330000000.0,6858560000000.0,7287240000000.0,7639750000000.0,8073120000000.0,8577550000000.0,9062820000000.0,9630660000000.0,10252300000000.0,10581800000000.0,10936400000000.0,11458200000000.0,12213700000000.0,13036600000000.0,13814600000000.0,14451900000000.0,14712800000000.0,14448900000000.0,14992100000000.0,15542600000000.0,16197000000000.0,16784800000000.0,17527200000000.0,18238300000000.0,18745100000000.0,19543000000000.0,20611900000000.0,21433200000000.0,20953000000000.0


### 8

In [45]:
df = df.sort_values('2020', ascending=False)
df.head(5)

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
206,United States,USA,543300000000.0,563300000000.0,605100000000.0,638600000000.0,685800000000.0,743700000000.0,815000000000.0,861700000000.0,942500000000.0,1019900000000.0,1073300000000.0,1164850000000.0,1279110000000.0,1425380000000.0,1545240000000.0,1684900000000.0,1873410000000.0,2081830000000.0,2351600000000.0,2627330000000.0,2857310000000.0,3207040000000.0,3343790000000.0,3634040000000.0,4037610000000.0,4338980000000.0,4579630000000.0,4855220000000.0,5236440000000.0,5641580000000.0,5963140000000.0,6158130000000.0,6520330000000.0,6858560000000.0,7287240000000.0,7639750000000.0,8073120000000.0,8577550000000.0,9062820000000.0,9630660000000.0,10252300000000.0,10581800000000.0,10936400000000.0,11458200000000.0,12213700000000.0,13036600000000.0,13814600000000.0,14451900000000.0,14712800000000.0,14448900000000.0,14992100000000.0,15542600000000.0,16197000000000.0,16784800000000.0,17527200000000.0,18238300000000.0,18745100000000.0,19543000000000.0,20611900000000.0,21433200000000.0,20953000000000.0
41,China,CHN,59716467625.0,50056868958.0,47209359006.0,50706799903.0,59708343489.0,70436266147.0,76720285970.0,72881631327.0,70846535056.0,79705906247.0,92602973434.0,99800958648.0,113688000000.0,138544000000.0,144182000000.0,163432000000.0,153940000000.0,174938000000.0,149541000000.0,178281000000.0,191149000000.0,195866000000.0,205090000000.0,230687000000.0,259947000000.0,309488000000.0,300758000000.0,272973000000.0,312354000000.0,347768000000.0,360858000000.0,383373000000.0,426916000000.0,444731000000.0,564325000000.0,734548000000.0,863747000000.0,961604000000.0,1029040000000.0,1094000000000.0,1211350000000.0,1339400000000.0,1470550000000.0,1660290000000.0,1955350000000.0,2285970000000.0,2752130000000.0,3550340000000.0,4594310000000.0,5101700000000.0,6087160000000.0,7551500000000.0,8532230000000.0,9570410000000.0,10475700000000.0,11061600000000.0,11233300000000.0,12310400000000.0,13894800000000.0,14279900000000.0,14722700000000.0
98,Japan,JPN,44307342950.0,53508617739.0,60723018684.0,69498131797.0,81749006382.0,90950278258.0,105628000000.0,123782000000.0,146601000000.0,172204000000.0,212609000000.0,240152000000.0,318031000000.0,432083000000.0,479626000000.0,521542000000.0,586162000000.0,721412000000.0,1013610000000.0,1055010000000.0,1105390000000.0,1218990000000.0,1134520000000.0,1243320000000.0,1318380000000.0,1398890000000.0,2078950000000.0,2532810000000.0,3071680000000.0,3054910000000.0,3132820000000.0,3584420000000.0,3908810000000.0,4454140000000.0,4998800000000.0,5545560000000.0,4923390000000.0,4492450000000.0,4098360000000.0,4635980000000.0,4968360000000.0,4374710000000.0,4182850000000.0,4519560000000.0,4893120000000.0,4831470000000.0,4601660000000.0,4579750000000.0,5106680000000.0,5289490000000.0,5759070000000.0,6233150000000.0,6272360000000.0,5212330000000.0,4896990000000.0,4444930000000.0,5003680000000.0,4930840000000.0,5036890000000.0,5148780000000.0,5057760000000.0
73,Germany,DEU,,,,,,,,,,,215838000000.0,249985000000.0,299802000000.0,398374000000.0,445303000000.0,490637000000.0,519754000000.0,600498000000.0,740470000000.0,881345000000.0,950291000000.0,800472000000.0,776576000000.0,770684000000.0,725111000000.0,732535000000.0,1046260000000.0,1298180000000.0,1401230000000.0,1398970000000.0,1771670000000.0,1868950000000.0,2131570000000.0,2071320000000.0,2205070000000.0,2585790000000.0,2497240000000.0,2211990000000.0,2238990000000.0,2194200000000.0,1943150000000.0,1944110000000.0,2068620000000.0,2496130000000.0,2809190000000.0,2845800000000.0,2992200000000.0,3421230000000.0,3730030000000.0,3397790000000.0,3396350000000.0,3744410000000.0,3527340000000.0,3732740000000.0,3883920000000.0,3356240000000.0,3467500000000.0,3681730000000.0,3975350000000.0,3888330000000.0,3846410000000.0
205,United Kingdom,GBR,73233967692.0,77741965703.0,81247564157.0,86561961812.0,94407558351.0,101825000000.0,108573000000.0,113117000000.0,107760000000.0,116465000000.0,130672000000.0,148114000000.0,169965000000.0,192538000000.0,206131000000.0,241757000000.0,232615000000.0,263066000000.0,335883000000.0,438994000000.0,564948000000.0,540766000000.0,515049000000.0,489618000000.0,461487000000.0,489285000000.0,601453000000.0,745163000000.0,910123000000.0,926885000000.0,1093170000000.0,1142800000000.0,1179660000000.0,1061390000000.0,1140490000000.0,1346420000000.0,1421510000000.0,1559570000000.0,1653390000000.0,1685760000000.0,1662130000000.0,1643910000000.0,1784080000000.0,2057090000000.0,2421810000000.0,2544830000000.0,2717060000000.0,3106180000000.0,2938880000000.0,2425800000000.0,2491110000000.0,2674890000000.0,2719160000000.0,2803290000000.0,3087170000000.0,2956570000000.0,2722850000000.0,2699020000000.0,2900790000000.0,2878670000000.0,2759800000000.0


### 9

In [46]:
df = df.sort_values('2020', ascending=True)
df.head(5)

Unnamed: 0,Country Name,Country Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
201,Tuvalu,TUV,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,8824447.74,9365165.91,9742949.47,9630762.95,10886825.56,11025945.14,12334846.23,12700905.45,12757632.87,13687141.11,13742057.05,13196544.95,15450994.24,18231078.54,21534931.61,21839098.89,22902861.45,27030374.03,30290219.76,27101076.28,31823518.62,38711810.21,37671774.69,37509075.11,37290607.54,35492074.22,36547799.58,40619251.99,42588164.97,47271463.33,48855550.2
137,Nauru,NRU,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,47564520.39,66055407.67,96927201.48,98491843.64,104654365.2,86529661.37,99723394.96,109359680.2,124021393.7,118724073.8,114626625.6
102,Kiribati,KIR,,,,,,,,,,,14295279.54,15278632.48,18936526.95,31710657.73,85637174.37,55081816.99,41109617.5,38748059.44,45210026.32,42620165.44,38715554.54,41369800.05,40572066.13,37837837.84,41246160.6,32125148.4,32085561.5,33608738.27,42972107.2,41119721.65,39809538.68,47515189.28,47737955.35,46919624.64,54832577.86,56338028.17,66515376.79,67537479.59,65334841.06,69032258.06,67254174.4,63101272.37,72196457.68,90231856.8,102367039.3,112133944.3,110234939.8,132671743.0,141042610.3,132419902.0,156120439.3,181705153.6,190243432.8,185114059.6,179703165.4,171117816.7,178328984.1,187276124.8,200157020.6,188391770.6,197508774.3
124,Marshall Islands,MHL,,,,,,,,,,,,,,,,,,,,,,31020000.0,34918000.0,41749000.0,45144000.0,43879000.0,55989000.0,62983000.0,70688000.0,72798000.0,78476000.0,82507000.0,91063000.0,99461000.0,108071000.0,120230000.0,110858000.0,110705600.0,112279400.0,114326300.0,115347500.0,122824000.0,131738200.0,131398500.0,132440200.0,136559500.0,141515800.0,148346700.0,151898900.0,149769600.0,160407100.0,172188500.0,180436300.0,184840400.0,182142800.0,183814300.0,201510900.0,213204100.0,221588900.0,239462200.0,244462400.0
150,Palau,PLW,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,146297500.0,156907900.0,163184900.0,153963200.0,165186200.0,190452900.0,192382400.0,198897700.0,198283900.0,187522800.0,185943000.0,196911100.0,212397800.0,221117200.0,241669800.0,280457700.0,298300000.0,285300000.0,284700000.0,274200000.0,257700000.0


### 10

In [49]:
corr = df.corr()
pd.reset_option("all")
pd.options.display.float_format = '{:.6f}'.format
corr

As the xlwt package is no longer maintained, the xlwt engine will be removed in a future version of pandas. This is the only engine in pandas that supports writing in the xls format. Install openpyxl and write to an xlsx file instead.

: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.



: boolean
    use_inf_as_null had been deprecated and will be removed in a future
    version. Use `use_inf_as_na` instead.



Unnamed: 0,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
1960,1.000000,0.999583,0.999130,0.998771,0.998465,0.998523,0.998208,0.997148,0.995912,0.994906,...,0.906976,0.901237,0.901711,0.898131,0.899460,0.899196,0.891055,0.878608,0.881140,0.869865
1961,0.999583,1.000000,0.999872,0.999652,0.999396,0.999333,0.999050,0.998387,0.997421,0.996539,...,0.899806,0.894166,0.893280,0.889369,0.891563,0.891719,0.882716,0.870109,0.872738,0.861310
1962,0.999130,0.999872,1.000000,0.999808,0.999600,0.999472,0.999228,0.998704,0.997949,0.997144,...,0.897902,0.891695,0.890084,0.885722,0.887571,0.887906,0.878691,0.865603,0.868128,0.856396
1963,0.998771,0.999652,0.999808,1.000000,0.999813,0.999692,0.999419,0.999138,0.998436,0.997741,...,0.901664,0.895064,0.892987,0.888409,0.889548,0.890051,0.880806,0.867697,0.870135,0.858293
1964,0.998465,0.999396,0.999600,0.999813,1.000000,0.999953,0.999548,0.999308,0.998613,0.998099,...,0.906111,0.899569,0.897157,0.892475,0.893480,0.894118,0.884921,0.871939,0.874298,0.862625
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2016,0.899196,0.891719,0.887906,0.890051,0.894118,0.897755,0.897863,0.895221,0.892123,0.894375,...,0.983359,0.989298,0.994891,0.997118,0.999605,1.000000,0.999502,0.998235,0.998237,0.996413
2017,0.891055,0.882716,0.878691,0.880806,0.884921,0.888690,0.888507,0.885502,0.882149,0.884314,...,0.980370,0.987023,0.994102,0.996886,0.999419,0.999502,1.000000,0.999361,0.999290,0.997762
2018,0.878608,0.870109,0.865603,0.867697,0.871939,0.875963,0.875775,0.872471,0.868695,0.870822,...,0.974970,0.982732,0.991423,0.995135,0.998483,0.998235,0.999361,1.000000,0.999857,0.999310
2019,0.881140,0.872738,0.868128,0.870135,0.874298,0.878287,0.878024,0.874635,0.870949,0.873000,...,0.973641,0.981746,0.990461,0.994268,0.998378,0.998237,0.999290,0.999857,1.000000,0.999464


## Data Report

Finally, using [pandas profiling](https://github.com/ydataai/pandas-profiling), I'm able create a 

In [None]:
df = pd.read_csv('data.csv', engine='python', skipfooter = 49)
pd.options.display.float_format = '{:.2f}'.format
df = df.drop(columns=['Series Name', 'Series Code'])
profile = ProfileReport(sample, title="GDP per capita (current US$)", minimal=True, dark_mode=True)
profile.to_file("GDP per capita (current US$).html")
profile

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]