# Part 4.Analyzing Data from Multiple Files
Adapted from [Programming with Python](http://swcarpentry.github.io/python-novice-inflammation/), [copyright © Software Carpentry](http://swcarpentry.github.io/python-novice-inflammation/license/), under the Creative Commons license [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/).

### Questions
* How can I do the same operations on many different files?

### Objectives
* Use a library function to get a list of filenames that match a wildcard pattern.
* Write a `for` loop to process multiple files.

We now have almost everything we need to process all our data files.
The only thing that's missing is a library with a rather unpleasant name:
```Python
import glob
```

The `glob` library contains a single function, also called `glob`,
that finds files whose names match a pattern.
We provide those patterns as strings:
the character `*` matches zero or more characters,
while `?` matches any one character.
We can use this to get the names of all the HTML files in the current directory:

```Python
from glob import glob

glob('*.ipynb')
```

As these examples show,
`glob.glob`'s result is a list of strings,
which means we can loop over it
to do something with each filename in turn.
In our case,
the "something" we want to do is generate a set of plots for each file in our inflammation dataset.
Let's test it by analyzing the first three files in the list:
```Python
filenames = glob('../data/inflammation*.csv')
filenames = filenames[0:3]
```

In [None]:
for f in filenames:
    print(f)

    data = numpy.loadtxt(fname=f, delimiter=',')

    fig = plt.figure(figsize=(10.0, 3.0))

    axes1 = fig.add_subplot(1, 3, 1)
    axes2 = fig.add_subplot(1, 3, 2)
    axes3 = fig.add_subplot(1, 3, 3)

    axes1.set_ylabel('average')
    axes1.plot(data.mean(axis=0))

    axes2.set_ylabel('max')
    axes2.plot(data.max(axis=0))

    axes3.set_ylabel('min')
    axes3.plot(data.min(axis=0))

    fig.tight_layout()


Sure enough,
the maxima of the first two data sets show exactly the same ramp as the first,
and their minima show the same staircase structure;
a different situation has been revealed in the third dataset,
where the maxima are a bit less regular, but the minima are consistently zero.

## Exercises

### Plotting Differences

Plot the difference between the average of the first dataset and the average of the second dataset, i.e., the difference between the leftmost plot of the first two figures.

In [None]:
# pseudocode
# import glob, numpy, and matplotlib.pyplot (if you haven't already)

# use glob to get all the csv files from './data/' into a list.
# Each has a filename like 'inflammation1.csv'

# Use numpy.loadtext to load data.
# Assign the data from inflammation1.csv to a variable named 'data0'
# and the data from inflammation2.csv to a variable naemd 'data1'.
# The first argument, fname, should be one filename from your list
# That you means you need to index into the list!
# Remember you also need to tell loadtxt the delimiter, which is a comma.

# I'm giving you the code to make the figure
fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))

matplotlib.pyplot.ylabel('Difference in average')
matplotlib.pyplot.plot(data0.mean(axis=0) - data1.mean(axis=0))

fig.tight_layout()
matplotlib.pyplot.show()

### Generate Composite Statistics

Use each of the files once to generate a dataset containing values averaged over all patients.

Then use pyplot to generate average, max, and min for all patients.

In [None]:
filenames = glob.glob('inflammation*.csv')
composite_data = numpy.zeros((60,40))
for f in filenames:
    # load each file's data into a variable named data
    # using loadtxt as we did before

    # now add the array loaded into the variable data
    # to the array composite_data, using the + operator
    # to perform elementwise addition of numpy arrays
    # (will look like array1 = array1 + array2)

# and then divide the composite_data by number of samples
composite_data /= len(filenames)

fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))

axes1 = fig.add_subplot(1, 3, 1)
axes2 = fig.add_subplot(1, 3, 2)
axes3 = fig.add_subplot(1, 3, 3)

axes1.set_ylabel('average')
axes1.plot(numpy.mean(composite_data, axis=0))

axes2.set_ylabel('max')
axes2.plot(numpy.max(composite_data, axis=0))

axes3.set_ylabel('min')
axes3.plot(numpy.min(composite_data, axis=0))

fig.tight_layout()

matplotlib.pyplot.show()

## Key Points
* Use `glob.glob(pattern)` to create a list of files whose names match a pattern.
* Use `*` in a pattern to match zero or more characters, and `?` to match any single character.