# glob. Filename Pattern Matching

Even though the glob API is very simple, the module packs a lot of power. It is useful in any situation where your program needs to look for a list of files on the filesystem with names matching a pattern. If you need a list of filenames that all have a certain extension, prefix, or any common string in the middle, use glob instead of writing code to scan the directory contents yourself.

The pattern rules for glob are not regular expressions. Instead, **they follow standard Unix path expansion rules**. There are only a few special characters: two different wild-cards, and character ranges are supported. The patterns rules are applied to segments of the filename (stopping at the path separator, /). Paths in the pattern can be relative or absolute. Shell variable names and tilde (~) are not expanded.

In [1]:
import glob

Next examples, we will be working on, are based on the next directories and files stucture. 

The current path is that defined by the next string: 

**'/home/sato/Desktop/Github/Tutorials'**. 


Also, going one level up in the directory hierarchy, the parent directory, we get the following structure. 

['../data/titanic.csv',

 '../Tutorials/pathlib.ipynb',
 '../Tutorials/pathlib',
 '../Tutorials/glob.ipynb',
 
 '../20191013/Linear_Regressor_v2.ipynb',
 '../20191013/Linear_Regressor_I.ipynb',
 '../20191013/Linear_Regressor_v1.ipynb',
 '../20191013/Logistic_Regressor_v1.ipynb',
 
 '../20191020/Linear_Regressor_v2.ipynb',
 '../20191020/Linear_Regressor_v1.ipynb',
 '../20191020/Logistic_Regressor_v1.ipynb']

## Wildcards

An asterisk **`(*)`** matches zero or more characters in a segment of a name.

In [2]:
# listing all files in current directory
glob.glob('./*')

['./pathlib.ipynb', './pathlib', './glob.ipynb']

In [3]:
# listing all directories in the parent directory
glob.glob('../*')

['../data', '../Tutorials', '../NMIST_V1.ipynb', '../20191013', '../20191020']

In [4]:
# listing all files and directories recursively
glob.glob('../*/*')

['../data/titanic.csv',
 '../Tutorials/pathlib.ipynb',
 '../Tutorials/pathlib',
 '../Tutorials/glob.ipynb',
 '../20191013/Linear_Regressor_v2.ipynb',
 '../20191013/Linear_Regressor_I.ipynb',
 '../20191013/Linear_Regressor_v1.ipynb',
 '../20191013/Logistic_Regressor_v1.ipynb',
 '../20191020/Linear_Regressor_v2.ipynb',
 '../20191020/Linear_Regressor_v1.ipynb',
 '../20191020/Logistic_Regressor_v1.ipynb']

In [5]:
# listing all files having .csv extension
glob.glob('../*/*.csv')

['../data/titanic.csv']

## Single Character Wildcard

The other wildcard character supported is the question mark **`(?)`**. It matches any single character in that position in the name.

In [6]:
# listing all the files that match the given pattern
glob.glob('../*/?it*.*')

['../data/titanic.csv']

In [7]:
# listing all the files that match the given pattern
glob.glob('../*/L*_?*.ipynb')

['../20191013/Linear_Regressor_v2.ipynb',
 '../20191013/Linear_Regressor_I.ipynb',
 '../20191013/Linear_Regressor_v1.ipynb',
 '../20191013/Logistic_Regressor_v1.ipynb',
 '../20191020/Linear_Regressor_v2.ipynb',
 '../20191020/Linear_Regressor_v1.ipynb',
 '../20191020/Logistic_Regressor_v1.ipynb']

## Character Ranges

When you need to match a specific character, use a character range instead of a question mark. For example, to find all of the files which have a digit in the name before the extension.

In [8]:
# listing all the files that match the given pattern
glob.glob('../*/Lo*_v[0-9].*')

['../20191013/Logistic_Regressor_v1.ipynb',
 '../20191020/Logistic_Regressor_v1.ipynb']