# get_files in plain English with demos

A dataset is usually files stores within folders, a main folder and many subfolders (set `recurse`).

We want to extract all files from a folder while keep their original place noted (use `_get_files`).

We are only interested in non-hidden files with specific extensions, and sometimes only particular subfolders (set `extensions` and `include`).

Final output is a list of FilePath objects.

The following code examples give you a feel of how `get_files` behaves.

In [1]:
from fastai.vision import *

In [2]:
path_data = untar_data(URLs.MNIST_TINY)

In [3]:
path_data.ls()

[PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/valid'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/models'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train')]

With `recurse=False` by default, no subfolder files are made available.

In [4]:
list_FilePath_noRecurse = get_files(path_data)
list_FilePath_noRecurse

[PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv')]

With `recurse=True`, all subfolder files are made available, except hidden files.

In [5]:
list_FilePath_recurse = get_files(path_data, recurse=True)
list_FilePath_recurse[:3]

[PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/valid/7/9294.png')]

In [6]:
list_FilePath_recurse[-2:]

[PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train/3/7263.png'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/train/3/7288.png')]

With `extensions=['.csv']`, only files with suffix of `.csv` are made available.

In [7]:
list_FilePath_recurse_csv = get_files(path_data, recurse=True, extensions=['.csv'])
list_FilePath_recurse_csv

[PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/labels.csv'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/history.csv')]

With `include=['test']`, only files in `path_data` and its subfolder `test` are made available.

In [8]:
list_FilePath_include = get_files(path_data, recurse=True, extensions=['.png','.jpg','.jpeg'],
                                  include=['test'])
list_FilePath_include[:3]

[PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/4605.png'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/617.png'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/205.png')]

In [9]:
list_FilePath_include[-3:]

[PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/1605.png'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/2642.png'),
 PosixPath('/Users/Natsume/.fastai/data/mnist_tiny/test/5071.png')]

See my code example in official docs

In [10]:
doc(get_files)