# Review

1. Import `pandas` and load in the file called `gapminder_gdp_oceania.csv`

1. Re-import the data, this time setting the `index_col` argument to the `country` variable.

1. Describe the variables and datatypes

- Do the same for the file `gapminder_gdp_amercas.csv`

- Use `.iloc` to select the object in the `[0, 0]` position.  What does this value correspond to?

- Use `.loc` to select all rows containing the variable `Albania`

- Use `.loc` to select all columns for the variable `gdpPercap_1952`

- What does the following line of code do to the dataframe `data`:

```python 
print(data.loc['Italy':'Poland', 'gdpPercap_1962':'gdpPercap_1972'])
```

- Use the results of the above code and apply the `max()` method to the slice.

- Select the rows of the dataframe where the variable `gdpPercap_1972` is greater than 13,000.  How many countries are there?

- Are there any missing values?  Where?

- What does the following line of code do?

```python
mask_higher = data.apply(lambda x:x>x.mean())
```

- How about this one?

```python
wealth_score = mask_higher.aggregate('sum',axis=1)/len(data.columns)
```

- And finally:

```python
data.groupby(wealth_score).sum()
```

Write an expression to find the Per Capita GDP of Serbia in 2007.


- What does the following code do?  Explain each line.

```python
first = pandas.read_csv('data/gapminder_all.csv', index_col='country')
second = first[first['continent'] == 'Americas']
third = second.drop('Puerto Rico')
fourth = third.drop('continent', axis = 1)
fourth.to_csv('result.csv')
```

Write an expression to find the following:

1. GDP per capita for all countries in 1982.
2. GDP per capita for Denmark for all years.
3. GDP per capita for all countries for years after 1985.
4. GDP per capita for each country in 2007 as a multiple of GDP per capita for that country in 1952.

### Loops and Data Sets

In [1]:
import pandas as pd

In [4]:
for file in ['data/gap_data/gapminder_gdp_africa.csv', 'data/gap_data/gapminder_gdp_asia.csv']:
    data = pd.read_csv(file, index_col='country')
    print(file, data.min())

data/gap_data/gapminder_gdp_africa.csv gdpPercap_1952    298.846212
gdpPercap_1957    335.997115
gdpPercap_1962    355.203227
gdpPercap_1967    412.977514
gdpPercap_1972    464.099504
gdpPercap_1977    502.319733
gdpPercap_1982    462.211415
gdpPercap_1987    389.876185
gdpPercap_1992    410.896824
gdpPercap_1997    312.188423
gdpPercap_2002    241.165877
gdpPercap_2007    277.551859
dtype: float64
data/gap_data/gapminder_gdp_asia.csv gdpPercap_1952    331.0
gdpPercap_1957    350.0
gdpPercap_1962    388.0
gdpPercap_1967    349.0
gdpPercap_1972    357.0
gdpPercap_1977    371.0
gdpPercap_1982    424.0
gdpPercap_1987    385.0
gdpPercap_1992    347.0
gdpPercap_1997    415.0
gdpPercap_2002    611.0
gdpPercap_2007    944.0
dtype: float64


### Using `glob.glob` to load files

In Unix, the term “globbing” means “matching a set of files with a pattern”.
The most common patterns are:
- * meaning “match zero or more characters”
- ? meaning “match exactly one character”

Python contains the glob library to provide pattern matching functionality
The glob library contains a function also called glob to match file patterns E.g., glob.glob('\*.txt') matches all files in the current directory whose names end with .txt.
Result is a (possibly empty) list of character strings.

In [6]:
import glob
print('all csv files in data directory:', glob.glob('data/gap_data/*.csv'), '\n')

all csv files in data directory: ['data/gap_data/gapminder_all.csv', 'data/gap_data/gapminder_gdp_africa.csv', 'data/gap_data/gapminder_gdp_americas.csv', 'data/gap_data/gapminder_gdp_asia.csv', 'data/gap_data/gapminder_gdp_europe.csv', 'data/gap_data/gapminder_gdp_oceania.csv'] 



In [8]:
for filename in glob.glob('data/gap_data/gapminder_*.csv'):
    data = pd.read_csv(filename)
    print(filename, data['gdpPercap_1952'].min())

data/gap_data/gapminder_all.csv 298.8462121
data/gap_data/gapminder_gdp_africa.csv 298.8462121
data/gap_data/gapminder_gdp_americas.csv 1397.7171369999999
data/gap_data/gapminder_gdp_asia.csv 331.0
data/gap_data/gapminder_gdp_europe.csv 973.5331947999999
data/gap_data/gapminder_gdp_oceania.csv 10039.595640000001


**PROBLEM**


Which of these files is not matched by the expression glob.glob('data/gap_data/\*as\*.csv')?

1. `data/gapminder_gdp_africa.csv`
2. `data/gapminder_gdp_americas.csv`
3. `data/gapminder_gdp_asia.csv`
4. 1 and 2 are not matched.

### Writing Functions

In [9]:
def hi_there(name):
    print('Hi there', name)

In [10]:
hi_there('Steve')

Hi there Steve


Using `return` in functions:

Use return ... to give a value back to the caller.
May occur anywhere in the function.
But functions are easier to understand if return occurs:
- At the start to handle special cases.
- At the very end, with a final result.

In [11]:
def average(values):
    if len(values) == 0:
        return None
    return sum(values) / len(values)

**PROBLEM**


Fill in the blanks to create a function that takes a single filename as an argument, loads the data in the file named by the argument, and returns the minimum value in that data

```python
import pandas

def min_in_data(____):
    data = ____
    return ____
```

**PROBLEM**

Assume the code below has been executed:

```python
import pandas as pd

df = pd.read_csv('data/gap_data/gapminder_gdp_asia.csv', index_col=0)
japan = df.loc['Japan']
```

- Complete the statements below to obtain the average GDP for Japan across the years reported for the 1980s.

```python
year = 1983
gdp_decade = 'gdpPercap_' + str(year // ____)
avg = (japan.loc[gdp_decade + ___] + japan.loc[gdp_decade + ___]) / 2
```


- Abstract the code above into a single function.

```python
def avg_gdp_in_decade(country, continent, year):
    df = pandas.read_csv('data/gapminder_gdp_'+___+'.csv',delimiter=',',index_col=0)
    ____
    ____
    ____
    return avg
```