This is the notebook for the 2nd week. It will teach you some more advanced stuff than in the first one. This week you learn: 
* How to import modules
* How to load a csv file
* How to create a data frame with pandas from csv
* How to filter rows in a data frame
    * by index/name
    * by conditions
* How to filter columns in a data frame
    * by index, 
    * by name, 
    * by condition
* Numpy
* Simple stats with numpy:
    * avg, median, mode, variance
    * standard deviation
    * order lists
    * find min/max

# Importing Modules

A module is functionality that you import from another piece of code (library). Python provides a huge set of modules for all type of functionality: mathematics, statistics, importing and exporting data, visalization, etc. As there are so many modules available, only a tiny fraction of them are imported by default when you create a new python program in order to keep your program small and slick.

Here is how you import a module that gives your program the ability to import a csv file. Import statements have to be in the beginning of your files before you write any actual code.

You import the `csv` module like so:

In [1]:
import csv

# Reading Files

We can now load csv files. More information: https://docs.python.org/2/library/csv.html.
The following code opens a csv file and saves it into the variable `csvfile`. The second row is reading the content from `csvfile` using the function `reader` from the module `csv`. The parameters `delimiter` and `quotechar` help python to understand how your csv file is formatted. E.g. the `delimiter` is the character that separates columns in a row, while `quotechar` is the charactert used to package string-values. Why is this necessary? Well, CSV files usually seperate their columns through commas (comma-separated-values). However, sometimes you have values that contain commas, e.g. the following name: Smith, Jon. You don't want python to think of this as two columns, rather as a single one. That's why CSV files can encode such strings with a special char, e.g. double quotes `"`. The `quotechar` parameter tells python how to read your file.

Now, import the file `gpd_per_capita.csv` and print all its rows:

In [2]:
# read file
raw_data = []

with open("gpd_per_capita_new.csv", "r") as file:
    csv_content = csv.reader(file, delimiter=",")
    for row in csv_content:
        # print(row) # returns row as list of text
        raw_data.append(row)
        
print(raw_data)

# Note: 'rt' read each cell into string, read text
# Note: 'r' read

[['countries', '1950', '1955', '1960', '1965', '1970', '1975', '1980', '1985', '1990', '1995', '2000', '2005', '2010', '2015'], ['Afghanistan', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['Brazil', '4297.82385399312', '3739.91938942857', '3693.27582014833', '3279.68639609452', '3584.07401646004', '3245.57315217104', '3154.72055184122', '2363.5112024022', '1613.13692305826', '1581.62542312949', '4716.61412549824', '3976.61916776916', '3696.14677192256'], ['China', '1864.10270222039', '1105.95255695671', '774.884634037432', '475.928869661473', '341.021293511227', '208.170863491313', '149.66006823476', '128.934979893935', '99.0801230616385', '72.3249273149585', '2426.33246633954', '1464.10762700312', '949.178062082992'], ['Germany', '25297.3853934236', '23217.3320270132', '21502.7214441946', '20685.689979311', '17636.8306336143', '15810.553192641', '14545.4282572055', '12711.1567623386', '', '', '25420.2757175313', '23564.3851678751', '22945.7088501507'], ['Iran', '2125.03025238

The file you just loaded contains a row for for each country, with the first row contining the headers of the file. The first entry in each row shows the country name, the following three rows show the countries' GPD for the years 1950, 1955, and 1960.  The structure is similar to what you might see in a spreadsheet table. 

The problem with this file is that python thinks the numbers are strings, not actual numbers. That means that you cannot tell python to run statistics and other mathematical functions on it. Before continuing, we need to convert all the strings into numbers. Here is how we do that:
* iterate through all cells with a nested loop (remember last class)
* convert values to number with the int() function (remember last class)

Now, here is how you iterate through all cells---we already said that the `csvcontent` is a list of lists. Hence, in order to iterate through all cells, we need two loop functions one inside the other:

In [3]:
# convert data type
# if-else list comprehension
# https://stackoverflow.com/questions/4260280/if-else-in-pythons-list-comprehension

# list_entries = [[int(x) if x.isdigit() else x for x in row] for row in raw_data[1:]]

print(raw_data)

converted_data = [raw_data[0]] # add first raw
for row in raw_data[1:]:
    converted_row = [row[0]]
    for entry in row[1:]:
        if entry != "":
            converted_row.append(float(entry)) # convert nstring numbers to float
        else:
            converted_row.append(entry)
    converted_data.append(converted_row)
    
            
print("-----")

print(converted_data)

[['countries', '1950', '1955', '1960', '1965', '1970', '1975', '1980', '1985', '1990', '1995', '2000', '2005', '2010', '2015'], ['Afghanistan', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['Brazil', '4297.82385399312', '3739.91938942857', '3693.27582014833', '3279.68639609452', '3584.07401646004', '3245.57315217104', '3154.72055184122', '2363.5112024022', '1613.13692305826', '1581.62542312949', '4716.61412549824', '3976.61916776916', '3696.14677192256'], ['China', '1864.10270222039', '1105.95255695671', '774.884634037432', '475.928869661473', '341.021293511227', '208.170863491313', '149.66006823476', '128.934979893935', '99.0801230616385', '72.3249273149585', '2426.33246633954', '1464.10762700312', '949.178062082992'], ['Germany', '25297.3853934236', '23217.3320270132', '21502.7214441946', '20685.689979311', '17636.8306336143', '15810.553192641', '14545.4282572055', '12711.1567623386', '', '', '25420.2757175313', '23564.3851678751', '22945.7088501507'], ['Iran', '2125.03025238

# Cleaning Data

Never trust your data: in most cases, there are glitches, errors, and missing data in your file that require to be fixed before you can convert your data into some form of analyzable object in python. Such glitches may be a misplaced comma or other characters, misspelled entries, or differently formatted dates.  Sometimes certain irregularities are there on purpose.

In our case, we need to convert all but the first entry in each row, for all but the first row. There are several options. Here, we use a counter that tells us which row and which cell in each row we are in. We reset the cell counter each time we start looping through a new row. The following example shows how to use a counter to do stuff with only elements after the 5th elements in a list (or a csvcontent)

Can you convert only the actual numeric values in our data file into floats? Remember that you need to "ignore" the 1st row the file (this one containing the column headers) as well as the first value in each row (the one containing the country name). To test if the conversion was successfull, print each row after conversion.

In [4]:
print(raw_data)
row_counter = 0
for row in raw_data:
    if row_counter > 0:
        entry_counter = 0
        for entry in row:
            if entry_counter > 0:
                if entry != "":
                    entry = float(entry) # why this not work? because it does not change the variaable in memory
            entry_counter += 1
    row_counter += 1
print("---")
print(raw_data)

[['countries', '1950', '1955', '1960', '1965', '1970', '1975', '1980', '1985', '1990', '1995', '2000', '2005', '2010', '2015'], ['Afghanistan', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['Brazil', '4297.82385399312', '3739.91938942857', '3693.27582014833', '3279.68639609452', '3584.07401646004', '3245.57315217104', '3154.72055184122', '2363.5112024022', '1613.13692305826', '1581.62542312949', '4716.61412549824', '3976.61916776916', '3696.14677192256'], ['China', '1864.10270222039', '1105.95255695671', '774.884634037432', '475.928869661473', '341.021293511227', '208.170863491313', '149.66006823476', '128.934979893935', '99.0801230616385', '72.3249273149585', '2426.33246633954', '1464.10762700312', '949.178062082992'], ['Germany', '25297.3853934236', '23217.3320270132', '21502.7214441946', '20685.689979311', '17636.8306336143', '15810.553192641', '14545.4282572055', '12711.1567623386', '', '', '25420.2757175313', '23564.3851678751', '22945.7088501507'], ['Iran', '2125.03025238

# Pandas

Pandas (http://pandas.pydata.org) is a python module that provides you with convenient methods to manipulate, filter, and aggregate complexer data in table format. The central structure in pandas is called `dataframe`. A dataframe is a table representation of your data like in the previous example, but with a lot of functionality.

## Importing and Creating Dataframes

First, we need to import the pandas module. 
When importing a module, you can give it an abbreviation, as some of the modules can have long names. Abbreviations are indicated with the `as` keyword after the `import modulename`:
`import mymodulename as myabbreviation`. 

In the following, we want to import the `pandas` module while using the abbreviation `pd`:

In [5]:
import pandas as pd

`pd` is the standard abbrevation for pandas and which is used in most online tutorials.

Now, let's load our csv file into a data frame. Pandas already comes with its own csv-import function that returns a dataframe:

`pd.read_csv('somefile.csv')`

Load the data file `gpd_per_capita.csv` and put the results in a variable called `myDataFrame`. Pring `myDataFrame` to be sure it's properly loaded.

In [6]:
df_gdp = pd.read_csv('gpd_per_capita_new.csv')
print(df_gdp)

        countries          1950          1955          1960          1965  \
0     Afghanistan           NaN           NaN           NaN           NaN   
1          Brazil   4297.823854   3739.919389   3693.275820   3279.686396   
2           China   1864.102702   1105.952557    774.884634    475.928870   
3         Germany  25297.385393  23217.332027  21502.721444  20685.689979   
4            Iran   2125.030252   1679.664398   1484.666023   1445.852456   
5           Japan  40837.266644  37363.287444  37518.580858  35465.925444   
6          Malawi    157.621368    142.391448    155.373892    129.360236   
7          Russia   2888.847355   1967.518602   1590.696500   2106.223762   
8  United Kingdom  29771.303348  26214.967350  22732.538466  19721.738642   
9   United States  38710.885442  35427.909964  31831.461594  28401.465176   

           1970          1975          1980          1985          1990  \
0           NaN           NaN           NaN           NaN           NaN   
1 

You should see a a table like this: 

Now, its time to load all the data. Load the file `gpd_per_capita.csv` and print the entire table.

As already mentioned, dataframes are complex but sophisticated objects. So far, we have called functions from python directly (e.g. `print()`, `len()`) or on modules (`csv.reader()`). Sometimes, we can also call methods on objects, such as the DataFrame-object. E.g. the following two methods can be called on any variable that is a `DataFrame` object: 

* `DataFrame.head()`: shows the 5 first rows
* `DataFrame.tail()`: shows the 5 last rows

Can you try that for our dataframe? Replace the `DataFrame` part int the above examples by the name of our DataFrame-variable.

In [7]:
# showing rows in pandas
print(df_gdp.head(5))
df_gdp.tail(5) # head and tails refer to raw

     countries          1950          1955          1960          1965  \
0  Afghanistan           NaN           NaN           NaN           NaN   
1       Brazil   4297.823854   3739.919389   3693.275820   3279.686396   
2        China   1864.102702   1105.952557    774.884634    475.928870   
3      Germany  25297.385393  23217.332027  21502.721444  20685.689979   
4         Iran   2125.030252   1679.664398   1484.666023   1445.852456   

           1970          1975          1980          1985         1990  \
0           NaN           NaN           NaN           NaN          NaN   
1   3584.074016   3245.573152   3154.720552   2363.511202  1613.136923   
2    341.021294    208.170863    149.660068    128.934980    99.080123   
3  17636.830634  15810.553193  14545.428257  12711.156762          NaN   
4   1241.647502   1477.011852   2204.420456   1793.574657  1117.218268   

          1995          2000          2005          2010  2015  
0          NaN           NaN           NaN   

Unnamed: 0,countries,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015
5,Japan,40837.266644,37363.287444,37518.580858,35465.925444,29064.222359,24444.201743,20680.65741,18398.721184,13726.062655,9157.323309,39971.787453,39295.306204,37291.706158,
6,Malawi,157.621368,142.391448,155.373892,129.360236,138.371883,147.791025,153.275312,140.893353,126.00861,101.99653,180.902296,149.48433,155.271544,
7,Russia,2888.847355,1967.518602,1590.6965,2106.223762,,,,,,,2928.005033,2442.962966,1775.141291,
8,United Kingdom,29771.303348,26214.96735,22732.538466,19721.738642,18699.035806,15734.180645,15060.07028,13682.098477,11645.431541,10253.593702,28244.336936,28354.039583,25057.61353,
9,United States,38710.885442,35427.909964,31831.461594,28401.465176,26549.876022,22310.239369,21491.820177,19441.379908,17374.536484,14426.757715,37329.615914,37718.005367,35081.923084,


There are some simple metrics we can calculate about the data frame, using the following functions:
* `DataFrame.shape`: returns the numbers of rows and columns in the data frame in the format (rows, columns)
* `list(DataFrame.columns)`: returns all the column names

_How many rows and columns does our DataFrame has?_

In [8]:
# attribute of data frame
df_gdp.shape # return tuple (row, column)
print("{} rows, {} columns".format(df_gdp.shape[0], df_gdp.shape[1]))

10 rows, 15 columns


_Can you output all the column names?_ 

In [9]:
df_gdp.columns # return pandas.indexes.base.Inde
print(list(df_gdp.columns))

['countries', '1950', '1955', '1960', '1965', '1970', '1975', '1980', '1985', '1990', '1995', '2000', '2005', '2010', '2015']


## Selecting Rows and Columns

Now, we want to filter rows and columns to calculate statistics on the values and create visualizations later on. There are a couple of functions for selecting values and set of values in a DataFrame: 

* `DataFrame['mycolumnname']`: Selects the column with the name `columnname` in the column title (pay attention to the single quotes and squared brackets).
* `DataFrame[startrow:endrow]`: Selects all rows between `start` and `end`. 
* `DataFrame.loc['mylabel']`: Selects the row with the label `label`
* `DataFrame.iloc[...]`: Selecting rows and columns by index.

We will exercise each of these functions individually in the following.
More information here: https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing

### Selecting columns

A single column is selected through `DataFrame['columnname']`. In our example, selecting columns means selecting individual years or ranges of years. 

_Can you select the colum for the year 1955?_

In [10]:
# dataframe is 0 index based for both rows and columns

# but df[int(index):int(index) -> range] refer to row
df_gdp[0:3]

# df["str"] refer to column
df_gdp["1955"]
list(df_gdp["1955"])

[nan,
 3739.9193894285704,
 1105.9525569567099,
 23217.332027013199,
 1679.6643980523299,
 37363.287444285001,
 142.39144785848899,
 1967.5186024261,
 26214.967350107403,
 35427.909964491497]

### Selecting rows
To select a range of rows, use `DataFrame[startRowIndex:endRowIndex]`. 
_Can you select rows 2 to 5?_

In [11]:
df_gdp[1:6]

Unnamed: 0,countries,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015
1,Brazil,4297.823854,3739.919389,3693.27582,3279.686396,3584.074016,3245.573152,3154.720552,2363.511202,1613.136923,1581.625423,4716.614125,3976.619168,3696.146772,
2,China,1864.102702,1105.952557,774.884634,475.92887,341.021294,208.170863,149.660068,128.93498,99.080123,72.324927,2426.332466,1464.107627,949.178062,
3,Germany,25297.385393,23217.332027,21502.721444,20685.689979,17636.830634,15810.553193,14545.428257,12711.156762,,,25420.275718,23564.385168,22945.70885,
4,Iran,2125.030252,1679.664398,1484.666023,1445.852456,1241.647502,1477.011852,2204.420456,1793.574657,1117.218268,,,1906.590054,1550.090608,
5,Japan,40837.266644,37363.287444,37518.580858,35465.925444,29064.222359,24444.201743,20680.65741,18398.721184,13726.062655,9157.323309,39971.787453,39295.306204,37291.706158,


The above method selects a set of rows but what if we want one row, say `Brazil`? There are two options. 

Option 1 is to select-by-row-position using `DataFrame.iloc[rownumber]`. To select Brazil, we have to find the row number for Brazil and pass it as a parameter to the `iloc` selector. 

_Can you do it?_ 

In [12]:
df_gdp.iloc[1] # square brackets, no round brackets
# df_gdp[1] this will return an error (pandas is expecting column!)

countries     Brazil
1950         4297.82
1955         3739.92
1960         3693.28
1965         3279.69
1970         3584.07
1975         3245.57
1980         3154.72
1985         2363.51
1990         1613.14
1995         1581.63
2000         4716.61
2005         3976.62
2010         3696.15
2015             NaN
Name: 1, dtype: object

### Selecting Rows and columns with `iloc[]`. 

The `iloc` function helps you selecting both, rows and colums by their indices, i.e. their order number. 

`iloc[`ROW_SELECTION_GOES_HERE`, `COLUMN_SELECTION_GOES_HERE`]` takes 1 - 2 parameters. The first one (`ROW_SELECTION_GOES_HERE`) is a specification of the rows you want to select, the second parameter (`COLUMN_SELECTION_GOES_HERE`) is a specification of the columns you want to select. Both parameters can be one of the following three forms, independent from each other:

* An **individual value (e.g., 1)** use this when you want to select a single row or column. E.g. `DataFrame.iloc[2,4]` gives you the value in the 4th column in row 2 (there are two individual values). 

* An **enumeration/list (e.g., [0,1,3])** use this when you have specific rows and or columns to select and be careful to use squared brackets around your array of numbers. For example, the expression `DataFrame.iloc[[0,2], [1,3]]` returns you the values of columns 1 and 3 for rows 0 and 2 (4 values in total).

* **Ranges (e.g., 1:3)** when you want to select a range of rows or colums. For example, the expression `DataFrame.iloc[0:3, 1:3]` returns you the values of columns 1 to 3 for rows 0 to 3 (9 values in total). When using ranges, you can leave fields blank, meaning that you refer to the first or last row or column. For example, `DataFrame.iloc[:2, 3:]` returns you all columns from colum 3 on  for rows 0 to 2. 


Of course, you can mix the above values for the the `ROW_SELECTION_GOES_HERE` parameter and the `COLUMN_SELECTION_GOES_HERE` parameter. E.g. the statement DataFrame.iloc[2,0:3] will return columns 0 to 3 for row 2. 
 
More information here: https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/



_Can you select the values for first 3 years for `Germany` and `Malawi`?_

In [13]:
# iloc[ROW_SELECTION_GOES_HERE,COLUMN_SELECTION_GOES_HERE]
# iloc[index enumeration/list, index range]

df_gdp.iloc[[3, 6], :4]

Unnamed: 0,countries,1950,1955,1960
3,Germany,25297.385393,23217.332027,21502.721444
6,Malawi,157.621368,142.391448,155.373892


### Creating labels for rows

That previous example required you to look up the indices for `Germany` and `Malawi` individually and pass the indices to `iloc`. Not only is this inconvenient, but if you have a larger data set, this is simply impossible. 

Pandas has a function that allows you to ask for names, rather than indices. This function is called `loc` (while `iloc` stands for index-`loc`). `loc` allows you to **pass columnnames and row labels** and can be much more convenient to use. 

Now, columns names are usually specified in the first row of your csv files (the table header), which in our case are the years. We have already used them with `DataFrame['somecolumnname']`.

However, in addition to years in the firs row, our data has country names in the very fist column (which is the colum with the index 0). While Pandas assumes that you have column labels, it does not assume that you have row labels. This is because many tables have an index in the first column, rather than a name like us.

Thus, we need to tell Pandas that our first column should be used as  **labels** (which is how Pandas calls them). Important is that each label **must be _unique_**, i.e. no two rows can have the same lables. In our case, we have only one row per country and no two countries with the same name. Great, let's move on.
 
In order to tell Pandas which column we want to use as lables, we use the `DataFrame.set_index(COLUMN_NAME, inplace=False)` method. The `inplace=False` parameter prevents Pandas from modifying the `DataFrame` variable; rather, the function returns a new data frame and leaves the old one untouched. Hence, here, we create a new dataframe which we want to call `countryData` and which is returned by calling `set_index(..)` on our first dataframe.

**NOTE:** In your own projects, you can well use `inplace=True`, which is more convenient since you do not have to create a new data frame. However, for the excerises in this notebook, you need to create a new data frame otherwise the following examples will not work properly.

_Can you set the country column as labels and save the result in a new data frame called `countryData'?_ 

In [14]:
# index is the label in the dataframe
# set_index sets the labels of rows to an existing column
countryData = df_gdp.set_index("countries", inplace=False) # assigning lables to enable the use of loc[] method

_Now, print the first few rows of both data frames (`myDataFrame` and `countryData`) to see the difference._ 

In [15]:
df_gdp[:3]

Unnamed: 0,countries,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015
0,Afghanistan,,,,,,,,,,,,,,
1,Brazil,4297.823854,3739.919389,3693.27582,3279.686396,3584.074016,3245.573152,3154.720552,2363.511202,1613.136923,1581.625423,4716.614125,3976.619168,3696.146772,
2,China,1864.102702,1105.952557,774.884634,475.92887,341.021294,208.170863,149.660068,128.93498,99.080123,72.324927,2426.332466,1464.107627,949.178062,


In [16]:
countryData[:3]

Unnamed: 0_level_0,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015
countries,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Afghanistan,,,,,,,,,,,,,,
Brazil,4297.823854,3739.919389,3693.27582,3279.686396,3584.074016,3245.573152,3154.720552,2363.511202,1613.136923,1581.625423,4716.614125,3976.619168,3696.146772,
China,1864.102702,1105.952557,774.884634,475.92887,341.021294,208.170863,149.660068,128.93498,99.080123,72.324927,2426.332466,1464.107627,949.178062,


In [17]:
df_gdp.iloc[:3]

Unnamed: 0,countries,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015
0,Afghanistan,,,,,,,,,,,,,,
1,Brazil,4297.823854,3739.919389,3693.27582,3279.686396,3584.074016,3245.573152,3154.720552,2363.511202,1613.136923,1581.625423,4716.614125,3976.619168,3696.146772,
2,China,1864.102702,1105.952557,774.884634,475.92887,341.021294,208.170863,149.660068,128.93498,99.080123,72.324927,2426.332466,1464.107627,949.178062,


In the first print out the first column of your table should contain numbers (=indices). In the second printout (after having set `set_index(..)`, the first column should contain the country names and the numbers are gone. My (chrome) browser renders the entries in the first colum conveniently in bold. 

In the following, we will continue with the `countryData` data frame that has our country names as labels.

### Selecting Rows and columns with `loc[]`. 

Now, we can use `loc` with both row and column labels. `loc[]` works pretty much the same way than `iloc[]` but instead of integer indices (e.g. `DataFrame.iloc[2,3]`), `loc[]` understands our labels and column names. 

The three query methods to select rows and eventually colmns are the same as for `iloc`: 
1. **Individual value (e.g. 'Brazil')**: `DataFrame.iloc['Brazil']`
2. **Enumeration (e.g. ['Brazil','United Kingdom'])**: `DataFrame.iloc[['Brazil', 'United Kingdom']]` (note the double rectangular brackets). , and 
3. **Ranges (e.g. 'Brazil':'United Kingdom')**: `DataFrame.iloc['Brazil': 'United Kingdom']`

In [18]:
countryData.loc[:"China"]

Unnamed: 0_level_0,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015
countries,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Afghanistan,,,,,,,,,,,,,,
Brazil,4297.823854,3739.919389,3693.27582,3279.686396,3584.074016,3245.573152,3154.720552,2363.511202,1613.136923,1581.625423,4716.614125,3976.619168,3696.146772,
China,1864.102702,1105.952557,774.884634,475.92887,341.021294,208.170863,149.660068,128.93498,99.080123,72.324927,2426.332466,1464.107627,949.178062,


_Can you select all values for `Germany`._

In [19]:
#after setting labels, we can use loc[row, column] now
countryData.loc["Germany"]

1950    25297.385393
1955    23217.332027
1960    21502.721444
1965    20685.689979
1970    17636.830634
1975    15810.553193
1980    14545.428257
1985    12711.156762
1990             NaN
1995             NaN
2000    25420.275718
2005    23564.385168
2010    22945.708850
2015             NaN
Name: Germany, dtype: float64

In [20]:
type(countryData.loc["Germany"]) # is this an iterable? yes

for item in countryData.loc["Germany"]:
    print(type(item))

<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>


_Can you select the values for `Germany` and `Malawi` for the years 1960 to 1980?_

In [21]:
# countryData.loc[["Germany", "Malawi"], ["1960", "1965", "1970", "1975", "1980"]]
countryData.loc[["Germany", "Malawi"], "1960":"1980"] # Range over columns (string), ":" also can be used to range over rows

Unnamed: 0_level_0,1960,1965,1970,1975,1980
countries,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Germany,21502.721444,20685.689979,17636.830634,15810.553193,14545.428257
Malawi,155.373892,129.360236,138.371883,147.791025,153.275312


**NOTE**: `loc` takes only labels and column names, no indices.

## Conditional Selection with Pandas and Boolean Operations
One of the most useful tools in pandas are **conditional selection**, i.e. selecting rows based on their values in particular columns. For example, you want to calculate statistics for high-income countries only. Let's see how that works. `loc` gives us almost all we need.

### Conditions
First, we need to know about **conditions**. A condition performs a test and returns `True` or `False`. The following condition tests if a value in the column `1960` is higher than `1000`.

`df['1960'] > 1000`

Used with `loc`, we can filter all rows which have a value higher than `1000` in column `1960`: 

`df.loc[df['1960'] > 1000]`

The `loc` in the above statement iterates over all rows in the dataframe and tests whether the value in brackets is true or false. Remember, when you use `loc` to pass a label value, `loc` checks whether the first column in that row matches the passed name. Now, we match a condition instead.

If the statement in the squared brackets returns true, this row is included into the result.

Conditions are powerful mechanisms and besides the greater than `>` operations include the following numeric relations:

* **lesser than**: `<`, e.g. `df.loc[df['1960'] < 1000]`
* **equals**: `==`, e.g. `df.loc[df['1960'] == 1000]`
* **unequals**: `!=`, e.g. `df.loc[df['1960'] != 1000]`
* **equal or greater**: `>=` e.g. `df.loc[df['1960'] >= 1000]`
* **equal or lesser**: `<=` e.g. `df.loc[df['1960'] <= 1000]`

_Can you filter all rows (countries) with values lower than `10000` in 2010?_ 

In [22]:
# filter rows when specific columns meet condition

countryData['1960'] > 10000

countries
Afghanistan       False
Brazil            False
China             False
Germany            True
Iran              False
Japan              True
Malawi            False
Russia            False
United Kingdom     True
United States      True
Name: 1960, dtype: bool

In [23]:
# "filter" returns a series of boolean
type(countryData['1960'] > 10000)

pandas.core.series.Series

In [24]:
# which then can be used in loc[] method

countryData.loc[countryData['2010'] > 10000]

Unnamed: 0_level_0,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015
countries,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Germany,25297.385393,23217.332027,21502.721444,20685.689979,17636.830634,15810.553193,14545.428257,12711.156762,,,25420.275718,23564.385168,22945.70885,
Japan,40837.266644,37363.287444,37518.580858,35465.925444,29064.222359,24444.201743,20680.65741,18398.721184,13726.062655,9157.323309,39971.787453,39295.306204,37291.706158,
United Kingdom,29771.303348,26214.96735,22732.538466,19721.738642,18699.035806,15734.180645,15060.07028,13682.098477,11645.431541,10253.593702,28244.336936,28354.039583,25057.61353,
United States,38710.885442,35427.909964,31831.461594,28401.465176,26549.876022,22310.239369,21491.820177,19441.379908,17374.536484,14426.757715,37329.615914,37718.005367,35081.923084,


In [25]:
countryData.loc[countryData['1960'] > 10000].index # return labels that meets filter's condition

Index(['Germany', 'Japan', 'United Kingdom', 'United States'], dtype='object', name='countries')

In [26]:
list(countryData.loc[countryData['1960'] > 10000].index)

['Germany', 'Japan', 'United Kingdom', 'United States']

In [27]:
countryData.loc[countryData['1960'] > 10000].index[0] # series can also be indexed

'Germany'

### Boolean operations

Moreover, we can combine conditions through logical **boolean operations**. Boolean operations are logical constructs that work through the following **boolean operators**: 

1. **AND (`&`)**: selects a row if **all** conditions joined by an `&` sign are true:
    *  e.g. `df.loc[(df['1960'] > 1000) & (df['1960'] < 3000)]` returs all countries with values between 1000 **and** 3000. 
    
* **OR ('|')**: selects a row if **at least one** of the conditions joined by a `|` (say 'pipe') sign are true:
    * e.g. `df.loc[(df['1960'] < 1000) | (df['1960'] > 3000])` returs all countries with values smaller than 1000 **or** with values larger than 3000. 
    
* **NOT(~)**: selects a row if a **condition is not met**. the `~` charater (say 'tilde') has to stand _before_ the condition: 
    *  e.g. `df.loc[~(df['1960'] < 1000)]` returs all countries with values not lower than 1000 (i.e. rows with values higher than 1000, including 1000.) 

You can combine these boolean operations in many ways using parentheses, as in the following example, which returns all countries with values between 1000 and 3000, or countries with values exactly 10000 in 1960.

`df.loc[((df['1960'] > 1000) & (countryData['1960'] < 3000)) | (countryData['1960'] == 10000)]`

_Can you get only those countries whose values have increased from below 10000 in 1950 to over 300000 in 2010?_

In [28]:
# Boolean operations can be used in df.iloc[] and df.loc[]
# comparision between a fixed value
countryData.loc[(countryData["1950"]) < 10000 & (countryData["2010"] > 300000)] # () is important

Unnamed: 0_level_0,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015
countries,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1


Being able to filter rows and columns by index (`iloc[]`), name (`loc[]`), and conditional values, we can now proceed with calculating statistical values on rows and columns. In the following, we introduce Numpy, a library for exactly this purpose.  

Write a few more scripts to search for some more values, using conditions:

In [29]:
# get only those countries whose values have decreased in 2010 from 2005?
# comparision between columns
countryData.loc[(countryData["2010"] < countryData["2005"])].index

Index(['Brazil', 'China', 'Germany', 'Iran', 'Japan', 'Russia',
       'United Kingdom', 'United States'],
      dtype='object', name='countries')

# Numpy

Numpy is a python module for all sorts of numerical operations and statistical analysis (http://www.numpy.org). We import numpy as follows:

In [30]:
import numpy as np

The directive `as np` assigns an abbreviation to the module with the official name `numpy` (`np` is the standard abbrevation for numpy used in most other  tutorials and references). To call a function from an imported module, you can then use your abbreviation like so:

`np.sum(1,2,3)` instead of `numpy.sum(1,2,3)` (the `sum(..)` function returns the sum of the passed arguments). 

## Descriptive Statistics with Numpy

Given the data above, let's calculate some simple descriptive statistic values for each country: mean, median, standart deviation, min, max, etc. 
The functions we need are
* `np.mean(myarrayhere)` --- returns the arithmetic mean 
* `np.median(myarrayhere)` --- returns the median (the value half-way through an ordered set https://en.wikipedia.org/wiki/Median)
* `np.std(myarrayhere)`
More useful numpy functions are found here: https://docs.scipy.org/doc/numpy/reference/routines.statistics.html


## Simple Array Example

Let's start with a a simple example. We create an array of length 10 of some random numers, using numpy's `rand` function in the `random` package (`np.random.rand(DESIRED_ARRAY_LENGTH_HERE)`). Then print that array.

In [31]:
a = np.random.rand(10)

Now, can you calculate all the values in the above box for these numbers? (mean, median, sum, std) 

In [32]:
print(np.mean(a))
print(np.median(a))
print(np.std(a))

0.478229008143
0.438307872265
0.245415774928


## Data Example

Now, we are turning back to our country example. Note that numpy wants arrays, not tables. Hence, from our data set we can calculate statistics only per row and per colum, but not for multiple rows or multiple columns. 

_Can you calculate the mean value in 1990 across all countries?_ Tip: first, get the country data for 1990 using some of pandas selection functions, then pass them as a parameter to the corresponding numpy function.

In [33]:
# mean vertically
print(countryData['1990'])
np.mean(countryData['1990']) # np functions can directly used for data frame

countries
Afghanistan                NaN
Brazil             1613.136923
China                99.080123
Germany                    NaN
Iran               1117.218268
Japan             13726.062655
Malawi              126.008610
Russia                     NaN
United Kingdom    11645.431541
United States     17374.536484
Name: 1990, dtype: float64


6528.782086357598

The value should be `6528.782086357598`

_Can you calculate the mean for `Brazil` for all years?_

In [34]:
# mean herizontally
print(countryData.loc["Brazil"])
np.mean(countryData.loc["Brazil"])

1950    4297.823854
1955    3739.919389
1960    3693.275820
1965    3279.686396
1970    3584.074016
1975    3245.573152
1980    3154.720552
1985    2363.511202
1990    1613.136923
1995    1581.625423
2000    4716.614125
2005    3976.619168
2010    3696.146772
2015            NaN
Name: Brazil, dtype: float64


3303.2866764551354

The value should be `3303.2866764551354`

Can you calculate the mean for Brazil for the years 1950 to 1970?

In [35]:
np.mean(countryData.loc["Brazil", "1950":"1970"])

3718.9558952249158

In [36]:
np.mean(countryData.loc["Brazil", "1950":"1970"]) # in loc[] function, range is inclusice in both sides

3718.9558952249158

_Can you calculate the two means for Brazil and for Germany for the two years 1950 and 1970?_

In [37]:
print(np.mean(countryData.loc["Brazil", ["1950", "1970"]]))
print(np.mean(countryData.loc["Germany", ["1950", "1970"]]))

3940.94893523
21467.1080135


In [38]:
np.mean(countryData.loc["Brazil", countryData.columns]) # we can use countryData.columns, for rows use countryData.index
np.mean(countryData.loc["Brazil", countryData.columns != '1950']) # condition

print(type(countryData.loc["Brazil"])) # <class 'pandas.core.series.Series'>

# one way to find mode
# https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list
from collections import Counter

data_for_mode = Counter(countryData.loc["Brazil"])
data_for_mode.most_common(1)[0][0]

        

<class 'pandas.core.series.Series'>


3584.0740164600397

Can you calculate the mode for Brazil for all those years?

### Iterating through a DataFrame. 

Now that we can print values for each row and column individually, you may ask for an automization of that proceedure: can we just print all the means for all rows in one statement. Unfortunately not automatically, but we can iterate through the rows using a loop:

`for index, row in countryData.iterrows():
   print(index)`
    
`index` in our case will return the label and row an array of numbers in this row.

_Can you complete the loop and print all the countries means?_

In [39]:
# iterating through dataframe <-> iterate rows in the dataframe
for label, row in countryData.iterrows():
    print ("{}: {}".format(label, np.mean(countryData.loc[label])))

Afghanistan: nan
Brazil: 3303.2866764551354
China: 773.8214749084223
Germany: 20303.406129572628
Iran: 1638.706047734752
Japan: 29478.080681966087
Malawi: 144.518602088105
Russia: 2242.7707869061855
United Kingdom: 20397.765254432183
United States: 28161.221247288784


## More Array Functions with Numpy

Numpy can do more than descrptics statistics. Here is a list of useful functions: 

* `np.size(array)`: returns the length of an array, i.e. how many elements it contains
* `np.sort(array)`: sorts the array
* `np.maximum(array)`: 



Can you output only the countries with the highest and lowest means across all years?

In [40]:
# is there another way to do this?
# d = {}
# for index, row in countryData.iterrows():
#     d[index] = np.mean(countryData.loc[index])

# from operator import itemgetter
# min_key, _ = min(d.items(), key=itemgetter(1))
# max_key, _ = max(d.items(), key=itemgetter(1))

# print(min_key, max_key) # Malawi Japan

##################################################
# import pandas as pd
# import numpy as np

# raw_data = pd.read_csv("gpd_per_capita_new.csv")
# countryData = raw_data.set_index('countries', inplace=False)

# means = []
# for label, row in countryData.iterrows():
#     means.append(np.mean(row))
# print(means)

# https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html
countryData["mean"] = countryData.mean(axis=1)

max_mean = np.max(countryData["mean"])
min_mean = np.min(countryData["mean"])

# user filter to answer the query
print(countryData.loc[countryData["mean"] == max_mean].index[0]) # .loc[].index return row label
                                                                 # filter, countryData["mean"] == max_mean
print(countryData.loc[countryData["mean"] == min_mean].index[0])

Japan
Malawi


In [41]:
countryData

Unnamed: 0_level_0,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015,mean
countries,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Afghanistan,,,,,,,,,,,,,,,
Brazil,4297.823854,3739.919389,3693.27582,3279.686396,3584.074016,3245.573152,3154.720552,2363.511202,1613.136923,1581.625423,4716.614125,3976.619168,3696.146772,,3303.286676
China,1864.102702,1105.952557,774.884634,475.92887,341.021294,208.170863,149.660068,128.93498,99.080123,72.324927,2426.332466,1464.107627,949.178062,,773.821475
Germany,25297.385393,23217.332027,21502.721444,20685.689979,17636.830634,15810.553193,14545.428257,12711.156762,,,25420.275718,23564.385168,22945.70885,,20303.40613
Iran,2125.030252,1679.664398,1484.666023,1445.852456,1241.647502,1477.011852,2204.420456,1793.574657,1117.218268,,,1906.590054,1550.090608,,1638.706048
Japan,40837.266644,37363.287444,37518.580858,35465.925444,29064.222359,24444.201743,20680.65741,18398.721184,13726.062655,9157.323309,39971.787453,39295.306204,37291.706158,,29478.080682
Malawi,157.621368,142.391448,155.373892,129.360236,138.371883,147.791025,153.275312,140.893353,126.00861,101.99653,180.902296,149.48433,155.271544,,144.518602
Russia,2888.847355,1967.518602,1590.6965,2106.223762,,,,,,,2928.005033,2442.962966,1775.141291,,2242.770787
United Kingdom,29771.303348,26214.96735,22732.538466,19721.738642,18699.035806,15734.180645,15060.07028,13682.098477,11645.431541,10253.593702,28244.336936,28354.039583,25057.61353,,20397.765254
United States,38710.885442,35427.909964,31831.461594,28401.465176,26549.876022,22310.239369,21491.820177,19441.379908,17374.536484,14426.757715,37329.615914,37718.005367,35081.923084,,28161.221247


Can you output only the countries with the highest and lowest values in 1980? 

In [42]:
# print(countryData.index) # this is row
# print(countryData.columns) # this is column

print(countryData["1980"])

print(countryData.loc[countryData["1980"] == np.max(countryData["1980"])].index[0])
print(countryData.loc[countryData["1980"] == np.min(countryData["1980"])].index[0])

countries
Afghanistan                NaN
Brazil             3154.720552
China               149.660068
Germany           14545.428257
Iran               2204.420456
Japan             20680.657410
Malawi              153.275312
Russia                     NaN
United Kingdom    15060.070280
United States     21491.820177
Name: 1980, dtype: float64
United States
China


Can you output the means for the 3 countries that have the highest values in 1990.

In [43]:
### sort dataframe given a specific column

# https://stackoverflow.com/questions/37787698/how-to-sort-pandas-dataframe-from-one-column
# countryData.sort_values("1990")
# countryData.sort_values("1990")[-4:] # nan -> infinate

# print(countryData.notnull("1990"))
# notnull() takes 1 positional argument but 2 were given

### remove nan data in specific column

# countryData.notnull() # return dataframe of boolean
# countryData.notnull()["1990"] # return series of boolean of column "1990"
# we use filter to remove countries wirh nan values in year 1990
# countryData.loc[countryData.notnull()["1990"]]

# dataframe of 3 countries that have the highest values in 1990
# countryData.loc[countryData.notnull()["1990"]].sort_values("1990")[-3:]
np.mean(countryData.loc[countryData.notnull()["1990"]].sort_values("1990")[-3:]["1990"])

14248.6768933365

Which country has the highest increase in values from the first to the last year? 

In [44]:
countryData.loc[countryData["2010"] - countryData["1950"] == np.max(countryData["2010"] - countryData["1950"])].index[0]

'Malawi'

Which countries have higher values in 2010 (not 1990) than in 1960?

In [45]:
list(countryData.loc[countryData["2010"] > countryData["1960"]].index)

['Brazil',
 'China',
 'Germany',
 'Iran',
 'Russia',
 'United Kingdom',
 'United States']

# Writing CSV files 

Eventually, after doing some stats or even after cleaning a large table, you may want to write data back into a csv file for later use. Let's assume that the comma in the previous data example had been misplaced by one digit and which we need to correct. 

First, iterate over all the numerical values and shift the comma one digit to the left. (e.g. 10.4 > 1.04). 

In [46]:
print(converted_data)

converted_data_v2 = [raw_data[0]]
for row in converted_data[1:]:
    converted_row = [row[0]]
    for entry in row[1:]:
        if entry != "":
            converted_row.append(entry / 10)
        else:
            converted_row.append(entry)
    converted_data_v2.append(converted_row)
    
            
print("-----")

print(converted_data_v2)

[['countries', '1950', '1955', '1960', '1965', '1970', '1975', '1980', '1985', '1990', '1995', '2000', '2005', '2010', '2015'], ['Afghanistan', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['Brazil', 4297.82385399312, 3739.91938942857, 3693.27582014833, 3279.68639609452, 3584.07401646004, 3245.57315217104, 3154.72055184122, 2363.5112024022, 1613.13692305826, 1581.62542312949, 4716.61412549824, 3976.61916776916, 3696.14677192256], ['China', 1864.10270222039, 1105.95255695671, 774.884634037432, 475.928869661473, 341.021293511227, 208.170863491313, 149.66006823476, 128.934979893935, 99.0801230616385, 72.3249273149585, 2426.33246633954, 1464.10762700312, 949.178062082992], ['Germany', 25297.3853934236, 23217.3320270132, 21502.7214441946, 20685.689979311, 17636.8306336143, 15810.553192641, 14545.4282572055, 12711.1567623386, '', '', 25420.2757175313, 23564.3851678751, 22945.7088501507], ['Iran', 2125.03025238419, 1679.66439805233, 1484.66602266377, 1445.852455639, 1241.64750201368, 

Next, write all the values (including the first row) into a new csv file. Python writes csv files by opening a file, than creating a csv writer with the `csv.writer()` function, and eventually to write a single row int the file using the `writerow()` function:

`with open('eggs.csv', 'wb') as csvfile: //note the 'wb' which opens the file in write mode.
    spamwriter = csv.writer(csvfile, delimiter=' ',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])`
    
There are more ways to write a csv file. See https://docs.python.org/2/library/csv.html for more information.    


Can you export the data with the corrected values?

In [47]:
# windows specific issue: newline=""
# https://stackoverflow.com/questions/3191528/csv-in-python-adding-an-extra-carriage-return
with open("gdp_changed.csv", "w", newline="",) as f:
    csv_writer = csv.writer(f, delimiter=',')
    for row in converted_data_v2:
        csv_writer.writerow(row)

# for index, row in countryData.iterrows():
#     # print(type(item)) # return tuple
#     print(type(index)) # str
#     print(type(row))  # return series

In [48]:
# since we want /10, we can also  use pure dataframe to do this
# and then write into csv

import pandas as pd
import numpy as np
import csv

raw_data = pd.read_csv("gpd_per_capita_new.csv")
data = raw_data.set_index("countries", inplace=False)
data = data / 10

with open("gpd_per_capita_new_changed.csv", "w", newline="") as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerow(list(data.columns.values))
    for index, row in data.iterrows():
        writer.writerow([index] + list (row)) # write row as list
        

# That's it!

Congratulations. You have made it. 

This tutorial introduced you to `pandas` and `numpy`. We explained you how to use pandas to select rows and columns and numpy to calculate some simple statistics. Numpy is very powerful and you will use it a lot in cases where you do not need `pandas`.

However, to speed up the coding, pandas comes with some built-in functions for the above statistics:

* `DataFrame.loc[COLUMN_NAME].mean()` --- returns you the mean for all values inthe `COLUMN_NAME` column.

More infos on descriptive statistics with `pandas` can be found here: 
https://pandas.pydata.org/pandas-docs/stable/basics.html#descriptive-statistics

Happy weekend. 

In [49]:
print("end")

end


In [50]:
countryData.loc[countryData.notnull()['1950']]

Unnamed: 0_level_0,1950,1955,1960,1965,1970,1975,1980,1985,1990,1995,2000,2005,2010,2015,mean
countries,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Brazil,4297.823854,3739.919389,3693.27582,3279.686396,3584.074016,3245.573152,3154.720552,2363.511202,1613.136923,1581.625423,4716.614125,3976.619168,3696.146772,,3303.286676
China,1864.102702,1105.952557,774.884634,475.92887,341.021294,208.170863,149.660068,128.93498,99.080123,72.324927,2426.332466,1464.107627,949.178062,,773.821475
Germany,25297.385393,23217.332027,21502.721444,20685.689979,17636.830634,15810.553193,14545.428257,12711.156762,,,25420.275718,23564.385168,22945.70885,,20303.40613
Iran,2125.030252,1679.664398,1484.666023,1445.852456,1241.647502,1477.011852,2204.420456,1793.574657,1117.218268,,,1906.590054,1550.090608,,1638.706048
Japan,40837.266644,37363.287444,37518.580858,35465.925444,29064.222359,24444.201743,20680.65741,18398.721184,13726.062655,9157.323309,39971.787453,39295.306204,37291.706158,,29478.080682
Malawi,157.621368,142.391448,155.373892,129.360236,138.371883,147.791025,153.275312,140.893353,126.00861,101.99653,180.902296,149.48433,155.271544,,144.518602
Russia,2888.847355,1967.518602,1590.6965,2106.223762,,,,,,,2928.005033,2442.962966,1775.141291,,2242.770787
United Kingdom,29771.303348,26214.96735,22732.538466,19721.738642,18699.035806,15734.180645,15060.07028,13682.098477,11645.431541,10253.593702,28244.336936,28354.039583,25057.61353,,20397.765254
United States,38710.885442,35427.909964,31831.461594,28401.465176,26549.876022,22310.239369,21491.820177,19441.379908,17374.536484,14426.757715,37329.615914,37718.005367,35081.923084,,28161.221247
