# Looping

## Overview
**Teaching:** 5 min

**Exercises:** 10 min

### Questions
How can I process many data sets with a single command?

### Objectives
- Be able to read and write globbing expressions that match sets of files.

- Use glob to create lists of files.

- Write for loops to perform operations on files given their names in a list.

## Use a `for` loop to process files given a list of their names.

- A filename is a character string.
- And lists can contain character strings.

In [None]:
import pandas as pd
for filename in ['../../data/gapminder_gdp_africa.csv', '../../data/gapminder_gdp_asia.csv']:
    data = pd.read_csv(filename, index_col='country')
    print(filename, data.min())

## Use `glob.glob` to find sets of files whose names match a pattern.

- In Unix, the term “globbing” means “matching a set of files with a pattern”.
- The most common patterns are:
  - `*` meaning “match zero or more characters”
  - `?` meaning “match exactly one character”
- Python’s standard library contains the [`glob`](https://docs.python.org/3/library/glob.html) module to provide pattern matching functionality
- The `glob` module contains a function also called `glob` to match file patterns
- E.g., `glob.glob('*.txt')` matches all files in the current directory whose names end with `.txt`.
- Result is a (possibly empty) list of character strings.

In [None]:
import glob
print('all csv files in data directory:', glob.glob('../../data/*.csv'))

In [None]:
print('all PDB files:', glob.glob('*.pdb'))

## Use `glob` and `for` to process batches of files.

- Helps a lot if the files are named and stored systematically and consistently so that simple patterns will find the right data.

In [None]:
for filename in glob.glob('../../data/gapminder_*.csv'):
    data = pd.read_csv(filename)
    print(filename, data['gdpPercap_1952'].min())

- This includes all data, as well as per-region data.
- Use a more specific pattern in the exercises to exclude the whole data set.
- But note that the minimum of the entire data set is also the minimum of one of the data sets, which is a nice check on correctness.

## Exercises

See `../exercises/06-looping_exercies.ipynb`.

## Dealing with File Paths

The [`pathlib` module](https://docs.python.org/3/library/pathlib.html) provides useful abstractions for file and path manipulation like returning the name of a file without the file extension. This is very useful when looping over files and directories. In the example below, we create a `Path` object and inspect its attributes.



In [None]:
from pathlib import Path

p = Path("data/gapminder_gdp_africa.csv")
print(p.parent)
print(p.stem)
print(p.suffix)

**Hint:** It is possible to check all available attributes and methods on the Path object with the `dir()` function!

## Key Points
- Use a `for` loop to process files given a list of their names.

- Use `glob.glob` to find sets of files whose names match a pattern.

- Use `glob` and `for` to process batches of files.

Licensed under [CC-BY 4.0](http://swcarpentry.github.io/python-novice-gapminder/14-looping-data-sets/index.html) 2018–2023 by [The Carpentries](https://carpentries.org/)

Licensed under [CC-BY 4.0](http://swcarpentry.github.io/python-novice-gapminder/14-looping-data-sets/index.html) 2016–2018 by [Software Carpentry Foundation](https://software-carpentry.org/)