# Plotting and Programming in Python
## Looping Over Data Sets
Questions
* How can I process many data sets with a single command?

Objectives
* Be able to read and write globbing expressions that match sets of files.
* Use glob to create lists of files.
* Write for loops to perform operations on files given their names in a list.

## Use a `for` loop to process files given a list of their names

In [None]:
import pandas

In [None]:
for filename in ['../data/gapminder_gdp_africa.csv', '../data/gapminder_gdp_asia.csv']:
    data = pandas.read_csv(filename, index_col='country')
    print(filename, data.min())

### Use `glob.glob` to find sets of files whose names match a pattern
* In Unix, the term "globbing" means "matching a set of files with a pattern".
* The most common patterns are:
 * `*` meaning "match zero or more characters"
 * `?` meaning "match exactly one character"

In [None]:
import glob

In [None]:
print('all csv files in data directory:', glob.glob('../data/*.csv'))

In [None]:
print('all PDB files:', glob.glob('*.pdb'))

### Use `glob` and `for` to process batches of files

In [None]:
for filename in sorted(glob.glob('../data/gapminder_*.csv')):
    data = pandas.read_csv(filename)
    print(filename, data['gdpPercap_1952'].min())

### Exercise - Minimum File Size
Modify this program so that it prints the number of records in the file that has the fewest records.

In [None]:
fewest = float('Inf')
for filename in glob.glob('../data/*.csv'):
    dataframe = pandas.read_csv(filename)
    fewest = min(fewest, dataframe.shape[0])
print('smallest file has', fewest, 'records')

### Exercise - Comparing Data
Write a program that reads in the regional data sets and plots the average GapMinder GDP per capita for each region over time in a single chart.

In [None]:
import matplotlib.pyplot as plt

In [None]:
fig, ax = plt.subplots(1,1)
for filename in glob.glob('../data/gapminder_gdp*.csv'):
    dataframe = pandas.read_csv(filename, index_col='country')
    # extract region from the filename, expected to be in the format '../data/gapminder_gdp_<region>.csv'
    region = filename.split('_')[2][:-4] 
    dataframe.mean().plot(label=region)
plt.legend()
plt.show()