As the title suggests, we will be using fastai to achieve the best possible score with minimum lines of code.

This is the training notebook, you can find the [inference notebook here](https://www.kaggle.com/ankursingh12/fastai-plant2021-starter-inference).

Lets get started . . . 

First, we will to import fastai.

In [None]:
from fastai.vision.all import *

seed = 42
set_seed(seed, reproducible=True)

We are setting the seed for reproducibility. 

Next, we will initialize some path (& other) variables (for use throughout the notebook)

PS: I will be using my version for the dataset. I have resized all the images so that its much faster to load them into the RAM. You can find the dataset [here](https://www.kaggle.com/ankursingh12/resized-plant2021). 

In [None]:
path = Path('../input/plant-pathology-2021-fgvc8')
data_path = Path('../input/resized-plant2021')

### Data

Enough prep-work! Lets read our data . . .

In [None]:
df = pd.read_csv(path/'train.csv')
df.head()

hmm, just image names and their label. Looks simple ? Not so soon. The labels are space-delimited strings. Its a multi-label problem. 

We will use Fastai's datablock API to load our data. DataBlock API is simply amazing. Infinitely  flexibility and incredibly powerful. To use the datablock API, you need to define some functions.

In [None]:
def get_x(x): return str(data_path/'img_sz_640') + os.path.sep + x['image']
def get_y(y): return y['labels']

datablock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                   splitter=RandomSplitter(seed=seed),
                   get_x=get_x, get_y=get_y,
                   item_tfms = RandomResizedCrop(512),
                   batch_tfms=[*aug_transforms(mult=2.0,flip_vert=True, size=460), 
                               Normalize.from_stats(*imagenet_stats)])

Don't worry a lot if the above code looks cryptic. You can read the [6th notebook (or chapter)](https://github.com/fastai/fastbook/blob/master/06_multicat.ipynb) in fastbook for details. It explains the topic in the most simplest way possible. And once you master datablock API, you will feel like a Ninja (trust me on this)!

You are amazing! Now lets create our dataloaders, & then take a look at some images.

In [None]:
dls = datablock.dataloaders(df)
dls.show_batch(max_n=9)

Looks good to me, what do you think?

We are done with data, time for some training.

### Model

Fastai has an awesome class which puts everything together, called `cnn_learner`. Here we are using ResNet50. 

In [None]:
f1score = F1Score(average='macro')
learn = cnn_learner(dls, resnet50, metrics=[accuracy, f1score]).to_fp16()

We are using `accuracy_multi` and `f1score` metrics, because its a multi-label problem and the evaluation metric for the competition is *F1Score*.

Finally, lets train (technically, fine-tune 🤯) our model.

In [None]:
learn.fine_tune(5, 3e-3, wd=0.5)

lets train it some more

In [None]:
learn.fit_one_cycle(5, slice(3e-3), wd=0.5) 

Okay, done with training! Lets look at some predictions . . .

In [None]:
learn.show_results()

Amazing! Lets export the model so that we can deploy it to production 😂. Just kidding, we will (only) use it for inference.

In [None]:
learn.export(f'resnet50.pkl')

Fastai is extremely flexible and powerful at the same time. This is just the baseline notebook. You can easily build on top of it. Here are some things that you can experiment with:

- Preprocess and Feature Engineering
- Data Augmentation and External Datasets
- Different Model Architectures
- Training Schedule, Optimizer, etc
- Postprocess

You can find the **[inference notebook here](https://www.kaggle.com/ankursingh12/fastai-plant2021-starter-inference)**.

Hope you had fun reading the notebook. Kindly consider **upvoting**.