At the end of this notebook, we will have submitted to the ["Paddy Doctor: Paddy Disease Classification" competition on Kaggle](https://www.kaggle.com/competitions/paddy-disease-classification/overview)!

Let us begin by downloading the data.

In [13]:
%%bash

exec bash
# rm -rf data
# mkdir data
apt install unzip


Reading package lists...
Building dependency tree...
Reading state information...
unzip is already the newest version (6.0-21ubuntu1.1).
0 upgraded, 0 newly installed, 0 to remove and 43 not upgraded.






In [2]:
pip install -U timm==0.6.2dev

Collecting torchvision
  Downloading torchvision-0.12.0-cp39-cp39-macosx_11_0_arm64.whl (1.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: torchvision
Successfully installed torchvision-0.12.0
Note: you may need to restart the kernel to use updated packages.


In [15]:
!cd data && kaggle competitions download -c paddy-disease-classification && unzip -q paddy-disease-classification.zip

paddy-disease-classification.zip: Skipping, found more recently modified local copy (use --force to force download)


In [1]:
import timm

We have now downloaded and extracted the data to the `data` directory.

In [2]:
ls data

paddy-disease-classification.zip  train.csv
sample_submission.csv             [34mtrain_images[m[m/
[34mtest_images[m[m/


In [3]:
ls data/train_images

[34mbacterial_leaf_blight[m[m/    [34mbrown_spot[m[m/               [34mnormal[m[m/
[34mbacterial_leaf_streak[m[m/    [34mdead_heart[m[m/               [34mtungro[m[m/
[34mbacterial_panicle_blight[m[m/ [34mdowny_mildew[m[m/
[34mblast[m[m/                    [34mhispa[m[m/


In [4]:
ls data/test_images | head

200001.jpg
200002.jpg
200003.jpg
200004.jpg
200005.jpg
200006.jpg
200007.jpg
200008.jpg
200009.jpg
200010.jpg


Seems that the train data is organized by directories, with the name of the directory being the label.

Test images just live in `data/test_images`

Let us see what is the format of the sample submission file to have a full picture.

In [5]:
import pandas as pd

sample_sub = pd.read_csv('data/sample_submission.csv')
sample_sub.head()

Unnamed: 0,image_id,label
0,200001.jpg,
1,200002.jpg,
2,200003.jpg,
3,200004.jpg,
4,200005.jpg,


Mhmm. Guessing the labels for the submission are the names of the directories.

Ok, let's start training!

In [6]:
from fastai.vision.all import *
from fastcore.parallel import *

In [7]:
path = Path('data')
trn_path= path/'train_images'
tst_files = get_image_files(path/'test_images').sorted()

In [8]:
tta_res = []

In [9]:
def train(desc, arch, item, batch, accum=False):
    kwargs = {'bs':32} if accum else{}
    dls = ImageDataLoaders.from_folder(trn_path, seed=42, valid_pct=0.2, item_tfms=item, batch_tfms=batch, **kwargs)
    cbs = GradientAccumulation(2) if accum else []
    learn = vision_learner(dls, arch, metrics=error_rate, cbs=cbs).to_fp16()
    learn.fine_tune(3, 0.02)
    tta_res.append(learn.tta(dl=dls.test_dl(tst_files)))
    learn.export(f'{arch}_{desc}')
    

In [10]:
timm

<module 'timm' from '/Users/bsmi067/mambaforge/lib/python3.9/site-packages/timm/__init__.py'>

In [12]:
arch = 'convnext_tiny_in22k'

In [13]:
train('squish', arch, item=Resize(480, method='squish'), batch=aug_transforms(size=128, min_scale=0.75), accum=True)



epoch,train_loss,valid_loss,error_rate,time
0,1.576814,1.186038,0.358001,23:29


epoch,train_loss,valid_loss,error_rate,time
0,0.858342,0.69588,0.235464,33:54
1,0.487654,0.3459,0.105238,34:03
2,0.238839,0.219533,0.064392,34:27


In [14]:
train('squish64', arch, item=Resize(480, method='squish'), batch=aug_transforms(size=64, min_scale=0.75), accum=True)



epoch,train_loss,valid_loss,error_rate,time
0,1.683572,1.132257,0.333974,21:59


epoch,train_loss,valid_loss,error_rate,time
0,0.987379,0.775341,0.256607,25:50
1,0.527337,0.496249,0.158097,31:47
2,0.299733,0.255496,0.079289,15:32


In [15]:
train('squish32', arch, item=Resize(480, method='squish'), batch=aug_transforms(size=32, min_scale=0.75), accum=True)



epoch,train_loss,valid_loss,error_rate,time
0,2.052876,1.377285,0.433926,06:42


epoch,train_loss,valid_loss,error_rate,time
0,1.195391,0.968515,0.319558,47:54
1,0.754575,0.575923,0.181163,48:51
2,0.505646,0.405346,0.133109,31:26


In [16]:
save_pickle('tta_res.pkl', tta_res)
tta_prs = first(zip(*tta_res))

In [23]:
tta_prs

(TensorBase([[1.4467e-04, 3.5593e-06, 1.5443e-05,  ..., 9.8472e-01, 9.2436e-03,
          3.6062e-05],
         [1.0460e-05, 1.0821e-05, 1.2065e-06,  ..., 1.3769e-03, 9.9784e-01,
          9.9138e-05],
         [1.6940e-03, 1.4689e-03, 4.0445e-04,  ..., 2.0533e-01, 2.6199e-02,
          2.4929e-03],
         ...,
         [3.4687e-07, 4.2655e-08, 1.6575e-07,  ..., 3.4927e-06, 9.9998e-01,
          6.3852e-06],
         [1.4066e-03, 9.7516e-01, 3.4856e-04,  ..., 1.2356e-02, 1.7219e-03,
          3.8615e-04],
         [4.5315e-11, 1.0754e-14, 1.6617e-12,  ..., 1.9259e-10, 7.4155e-11,
          9.2955e-11]]),
 TensorBase([[6.9185e-04, 3.9026e-07, 3.4659e-06,  ..., 9.9902e-01, 2.0757e-04,
          5.6475e-06],
         [8.1294e-08, 4.7761e-07, 1.9983e-08,  ..., 1.7230e-05, 9.9994e-01,
          1.8909e-06],
         [6.1377e-03, 3.6428e-03, 2.1927e-03,  ..., 9.5144e-02, 1.5961e-01,
          2.9712e-03],
         ...,
         [1.9596e-07, 4.5837e-09, 6.8226e-09,  ..., 2.9205e-05, 9.9984e

In [24]:
tta_prs += tta_prs[1:2]

In [25]:
tta_prs += tta_prs[1:2]

In [26]:
tta_prs += tta_prs[2:3]

In [27]:
t_tta = torch.stack(tta_prs)

In [29]:
avg_pr = t_tta.mean(0)
idxs = avg_pr.argmax(dim=1)
idxs.shape

torch.Size([3469])

In [36]:
dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=Resize(224))
mapping = dict(enumerate(dls.vocab))
ss = pd.read_csv('data/sample_submission.csv')
results = pd.Series(idxs.numpy(), name='idxs').map(mapping)
ss.label = results
ss.to_csv('data/submissions/subm.csv', index=False)

In [32]:
!kaggle competitions submit -c paddy-disease-classification -f data/submissions/subm.csv -m "4th Sub"

Traceback (most recent call last):
  File "/root/.local/bin/kaggle", line 8, in <module>
    sys.exit(main())
  File "/root/.local/lib/python3.7/site-packages/kaggle/cli.py", line 67, in main
    out = args.func(**command_args)
  File "/root/.local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 562, in competition_submit_cli
    competition, quiet)
  File "/root/.local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 513, in competition_submit
    content_length=os.path.getsize(file_name),
  File "/opt/conda/lib/python3.7/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: 'data/submissions/brismith_sub1.csv.gz'


In [37]:
!kaggle competitions submit -c paddy-disease-classification -f data/submissions/subm.csv -m "M1 chip entry"

100%|██████████████████████████████████████| 70.1k/70.1k [00:02<00:00, 32.0kB/s]
Successfully submitted to Paddy Doctor: Paddy Disease Classification