# Migration Guide from FastAI's python library to FastAI.jl - Beginner

If you are a user of Fastai's python library and want to migrate to FastAI.jl, this three part tutorial series will get you started. In this first part of tutorial we'll see about **Datasets**, **dataloaders** and **Learner** objects of FastAI.jl

In [None]:
using FastAI
using FastAI.Datasets

# Datasets

All of the fastai datasets are available in FastAI.jl. In fastai's python library you would have to download the image files first and read their paths and feed them to dataloaders. FastAI.jl simplifies this task into single step. ```Datasets.loadtaskdata()``` function takes two inputs - path of dataset and type of the task (Classification or Segmentation) and gives out Julia version of dataloaders.

So the following code in python translates to:
```
path = untar_data(URLs.PETS)
files = get_image_files(path/"images")
dls = ImageDataLoaders.from_name_func(path, files, label_func, item_tfms=Resize(224))
```

In [None]:
# Julia Code
data = Datasets.loadtaskdata(Datasets.datasetpath("imagenette2-160"), ImageClassificationTask)

This program has requested access to the data dependency fastai-imagenette2-160.
which is not currently installed. It can be installed automatically, and you will not see this message again.

"imagenette2-160" from the fastai dataset repository (https://course.fast.ai/datasets)



Download size: ???



Do you want to download the dataset from https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz to "/root/.julia/datadeps/fastai-imagenette2-160"?
[y/n]
stdin> y


┌ Info: Downloading
│   source = https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz
│   dest = /root/.julia/datadeps/fastai-imagenette2-160/imagenette2-160.tgz
│   progress = 1.0
│   time_taken = 2.49 s
│   time_remaining = 0.0 s
│   average_speed = 37.858 MiB/s
│   downloaded = 94.417 MiB
│   remaining = 0 bytes
│   total = 94.417 MiB
└ @ HTTP /root/.julia/packages/HTTP/cxgat/src/download.jl:119



7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,2 CPUs Intel(R) Xeon(R) CPU @ 2.20GHz (406F0),ASM,AES-NI)


Extracting archive: 
--
Path = 
Type = tar
Code Page = UTF-8

Everything is Ok

Folders: 23
Files: 13397
Size:       107794109
Compressed: 6872064


mapobs((input = FastAI.Datasets.loadfile, target = FastAI.Datasets.var"#27#32"()), DataSubset(::FileDataset, ::Vector{Int64}, ObsDim.Undefined())
 13394 observations)

```datasetpath()``` function downloads the given dataset and returns the path of the dataset. It consists of all FastAI datasets from python. Each dataset is referred by an unique keyword. The keywords for common datasets are,

|                      Dataset                      |      Keyword      |
|:-------------------------------------------------:|:-----------------:|
|                       MNIST                       |    "mnist-png"    |
|                      CIFAR10                      |     "cifar10"     |
|                      Food 101                     |     "food-101"    |
|                  Oxford-IIIT Pets                 | "oxford-iiit-pet" |
|                     Imagenette                    |    "imagenette"   |
|                     Imagewang                     |    "imagewang"    |
|                     Imagewoof                     |    "imagewoof"    |
|            IMDB Large Movie review data           |       "imdb"      |
|                    Wikitext-103                   |   "wikitext-103"  |
| CamVid: Motion based Segmentation and Recognition |      "camvid"     |
|           PASCAL: Visual Object Classes           |    "pascal-voc"   |

# DataLoaders

Dataloaders are important aspects of fastai's python library. These objects are like assembly lines that feed materials to the machine. Likewise dataloaders feed data to the model. 

Though there is dataloader object in FastAI.jl, we can skip the complexity of creating separate training and validation dataloaders by using data container object and using ```methodlearner()``` function (more below)

## To convert fastai datasets

As dicussed above fastai datasets downloaded using ```loadtaskdata()``` function will be automatically converted into data container object. 


In [None]:
data = Datasets.loadtaskdata(Datasets.datasetpath("imagenette2-160"), ImageClassificationTask)

## Labeling from parent folder names

To convert data that is already present in the directory and the labels are the name of the parent directory (which is mostly used format in datasets) to data container object, the following code is used,

In [None]:
using FastAI.Datasets: FileDataset, loadfile, filename

imagedata = FileDataset("/path/to/Dataset")

`imagedata` consists of all the images in the given folder.
 But images aren't associated with their respective classes.
 To associate images with their respective classes, we create
 a function called `label_func`


In [None]:
function label_func(image)
  return(
    Datasets.loadfile(image),
    filename(parent(image)),
  )
  end

label_func (generic function with 1 method)

Then finally we will map the images with their classes using `mapobs` function

In [None]:
image_data_container = mapobs(label_func, imagedata)

## Labeling from a CSV file 

If the images are associated with their classses through a CSV file (which is another popular way Datasets are available), then following can be done,

In [None]:
using CSV
using DataFrames

df = CSV.read("example.csv",DataFrame) # CSV file which consist of that bindings)

Then we need to modify the `label_func`

In [None]:
function label_func(dataframe)
  return(
    mapobs(Datasets.loadfile(df[!,"path"]), df[!,"class"]),
  )
  end

image_data_container = label_func(df)

This version of `label_func` will return the `image_data_container` itself. So no need to call `mapobs`

# Learner

The learner is the object that binds the data loaders with the model and sets the other hyper parameters like loss function, optimizers, etc. There is `Learner()` function which is similar to fastai's `cnn_learner` function. The below python code can be translated as,

```
learn = cnn_learner(dls, resnet34, metrics=error_rate)

#to fine tune
learn.fine_tune(1)

#to fit one cycle
learn.fit_one_cycle(1)
```

In [None]:
model = Models.resnet34(pretrained = true)
metric = Metrics(Metric(Flux.error_rate))
learn = Learner(model, dls, lossfn , metric)

# to finetune
finetune!(learn, 1)

# to fit one cycle
fitonecycle!(learn,1)

However, `Learn` method makes us to define every parameters manually. Like we should explicitly convert our data container object to `dataloader` object and pass them. 

Instead we can cut the complexity by using `methodlearner` function where everything can be passed as hyperparameters. The cool thing about `methodlearner` is it takes the data container object and implicitly converts it into dataloader, where the batch size, validation split can be customized if we need. Therefore the above code can be converted into, 

In [None]:
method = ImageClassification(classnames , image_size)
learn = methodlearner(method, image_data_container, Models.resnet34(pretrained = true))

# to finetune
finetune!(learn, 1)

# to fit one cycle
fitonecycle!(learn, 1)

where `classnames` is a vector consists of names of classes in whole dataset and `image_size` is a tuple with length and breadth of images.