# Understanding Datablocks

## Imports


In [1]:
from fastai.data.all import *
from fastai.vision.all import *

### The first step is to download and decompress our data (if it’s not already done) and get its location:



In [2]:
path = untar_data(URLs.PETS)

In [4]:
Path.BASE_PATH = path

In [5]:
path.ls()

(#2) [Path('images'),Path('annotations')]

###  The filenames are in the “images” folder. The get_image_files function helps get all the images in subfolders

In [6]:
fnames = get_image_files(path/"images")


### Let’s begin with an empty DataBlock.



In [7]:
dblock = DataBlock()

In [12]:
fnames[4]

Path('images/english_setter_6.jpg')

> #### _By itself, a DataBlock is just a blue print on how to assemble your data. It does not do anything until you pass it a source. You can choose to then convert that source into a Datasets or a DataLoaders by using the DataBlock.datasets or DataBlock.dataloaders method. Since we haven’t done anything to get our data ready for batches, the dataloaders method will fail here, but we can have a look at how it gets converted in Datasets. This is where we pass the source of our data, here all our filenames_

In [13]:
dsets = dblock.datasets(fnames)
dsets.train[0]

(Path('images/beagle_182.jpg'), Path('images/beagle_182.jpg'))

In [20]:
dsets

(#7390) [(Path('images/beagle_115.jpg'), Path('images/beagle_115.jpg')),(Path('images/boxer_18.jpg'), Path('images/boxer_18.jpg')),(Path('images/Maine_Coon_157.jpg'), Path('images/Maine_Coon_157.jpg')),(Path('images/scottish_terrier_28.jpg'), Path('images/scottish_terrier_28.jpg')),(Path('images/english_setter_6.jpg'), Path('images/english_setter_6.jpg')),(Path('images/american_pit_bull_terrier_79.jpg'), Path('images/american_pit_bull_terrier_79.jpg')),(Path('images/boxer_128.jpg'), Path('images/boxer_128.jpg')),(Path('images/Persian_265.jpg'), Path('images/Persian_265.jpg')),(Path('images/Maine_Coon_182.jpg'), Path('images/Maine_Coon_182.jpg')),(Path('images/keeshond_89.jpg'), Path('images/keeshond_89.jpg'))...]

> _By default, the data block API assumes we have an input and a target, which is why we see our filename repeated twice._

> _The first thing we can do is use a get_items function to actually assemble our items inside the data block_

In [21]:
dblock = DataBlock(get_items = get_image_files)

In [22]:
get_image_files

<function fastai.data.transforms.get_image_files(path, recurse=True, folders=None)>

In [25]:
dsets = dblock.datasets(path/"images")
dsets.valid[0]

(Path('images/shiba_inu_67.jpg'), Path('images/shiba_inu_67.jpg'))

In [27]:
def label_func(fname):
    return "cat" if fname.name[0].isupper() else "dog"

In [28]:
dblock = DataBlock(get_items = get_image_files,
                   get_y     = label_func)


In [None]:
dsets = dblock.datasets(path/"images")
dsets.train[0]