At the end of this notebook, we will have submitted to the ["Paddy Doctor: Paddy Disease Classification" competition on Kaggle](https://www.kaggle.com/competitions/paddy-disease-classification/overview)!

Let us begin by downloading the data.

In [14]:
%%bash

exec bash
rm -rf data
mkdir data
apt install unzip


Reading package lists...
Building dependency tree...
Reading state information...
Suggested packages:
  zip
The following NEW packages will be installed:
  unzip
0 upgraded, 1 newly installed, 0 to remove and 43 not upgraded.
Need to get 168 kB of archives.
After this operation, 567 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 unzip amd64 6.0-21ubuntu1.1 [168 kB]
Fetched 168 kB in 0s (3155 kB/s)
Selecting previously unselected package unzip.
(Reading database ... 44825 files and directories currently installed.)
Preparing to unpack .../unzip_6.0-21ubuntu1.1_amd64.deb ...
Unpacking unzip (6.0-21ubuntu1.1) ...
Setting up unzip (6.0-21ubuntu1.1) ...
Processing triggers for mime-support (3.60ubuntu1) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...




debconf: delaying package configuration, since apt-utils is not installed


In [15]:
pip install -U timm==0.6.2dev

Note: you may need to restart the kernel to use updated packages.


In [16]:
!cd data && kaggle competitions download -c paddy-disease-classification && unzip -q paddy-disease-classification.zip

Downloading paddy-disease-classification.zip to /notebooks/git/paddy_doctor/data
100%|█████████████████████████████████████▉| 1.02G/1.02G [00:46<00:00, 23.5MB/s]
100%|██████████████████████████████████████| 1.02G/1.02G [00:47<00:00, 23.2MB/s]


In [1]:
import timm

We have now downloaded and extracted the data to the `data` directory.

In [2]:
ls data

paddy-disease-classification.zip  [0m[01;34mtest_images[0m/  [01;34mtrain_images[0m/
sample_submission.csv             train.csv


In [3]:
ls data/train_images

[0m[01;34mbacterial_leaf_blight[0m/     [01;34mbrown_spot[0m/                    [01;34mhispa[0m/
[01;34mbacterial_leaf_streak[0m/     convnext_large_in22k_Convnext  [01;34mnormal[0m/
[01;34mbacterial_panicle_blight[0m/  [01;34mdead_heart[0m/                    [01;34mtungro[0m/
[01;34mblast[0m/                     [01;34mdowny_mildew[0m/


In [4]:
ls data/test_images | head

200001.jpg
200002.jpg
200003.jpg
200004.jpg
200005.jpg
200006.jpg
200007.jpg
200008.jpg
200009.jpg
200010.jpg
ls: write error: Broken pipe


Seems that the train data is organized by directories, with the name of the directory being the label.

Test images just live in `data/test_images`

Let us see what is the format of the sample submission file to have a full picture.

In [5]:
import pandas as pd

sample_sub = pd.read_csv('data/sample_submission.csv')
sample_sub.head()

Unnamed: 0,image_id,label
0,200001.jpg,
1,200002.jpg,
2,200003.jpg,
3,200004.jpg,
4,200005.jpg,


Mhmm. Guessing the labels for the submission are the names of the directories.

Ok, let's start training!

In [6]:
from fastai.vision.all import *
from fastcore.parallel import *

In [7]:
path = Path('data')
trn_path= path/'train_images'
tst_files = get_image_files(path/'test_images').sorted()

In [8]:
trn_path

Path('data/train_images')

In [9]:
tta_res = []

In [10]:
def train(desc, arch, item, batch, accum=False):
    kwargs = {'bs':32} if accum else{}
    dls = ImageDataLoaders.from_folder(trn_path, seed=42, valid_pct=0.2, item_tfms=item, batch_tfms=batch, **kwargs)
    cbs = GradientAccumulation(2) if accum else []
    learn = vision_learner(dls, arch, metrics=error_rate, cbs=cbs).to_fp16()
    learn.fine_tune(13, 0.01)
    tta_res.append(learn.tta(dl=dls.test_dl(tst_files)))
    learn.export(f'{arch}_{desc}')
    

In [27]:
timm.list_models("convnext_large*")

['convnext_large',
 'convnext_large_384_in22ft1k',
 'convnext_large_in22ft1k',
 'convnext_large_in22k']

In [28]:
arch = 'convnext_large_in22k'

In [29]:
train('Convnext', arch, item=Resize(480, method='squish'), batch=aug_transforms(size=224, min_scale=0.75), accum=False)

Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_large_22k_224.pth" to /root/.cache/torch/hub/checkpoints/convnext_large_22k_224.pth


epoch,train_loss,valid_loss,error_rate,time
0,1.054951,0.649235,0.206151,04:18


epoch,train_loss,valid_loss,error_rate,time
0,0.455112,0.255148,0.083614,05:40
1,0.365565,0.224651,0.075444,05:37
2,0.322103,0.224895,0.075444,05:39
3,0.267427,0.174977,0.049015,05:38
4,0.189853,0.196187,0.052379,05:39
5,0.176,0.166631,0.050937,05:38
6,0.129584,0.138106,0.037001,05:39
7,0.091225,0.129585,0.034118,05:38
8,0.0733,0.114819,0.028352,05:38
9,0.049931,0.084069,0.021144,05:38


In [32]:
save_pickle('tta_res.pkl', tta_res)

In [37]:
tta_res

[(TensorBase([[1.1382e-07, 4.4558e-10, 1.7501e-08,  ..., 1.0000e+00, 2.7908e-07,
           4.7529e-09],
          [1.2972e-06, 2.5286e-07, 5.0332e-08,  ..., 2.3146e-07, 1.0000e+00,
           4.5895e-08],
          [1.9073e-06, 1.4750e-08, 5.6766e-06,  ..., 3.2842e-05, 3.9817e-04,
           2.0156e-07],
          ...,
          [2.4073e-08, 4.0051e-08, 1.6932e-06,  ..., 1.8608e-07, 9.9999e-01,
           2.1930e-08],
          [1.1173e-07, 9.9720e-01, 1.7964e-07,  ..., 2.5631e-03, 5.7499e-06,
           6.2488e-06],
          [1.0303e-14, 3.5787e-16, 3.2132e-11,  ..., 4.2362e-12, 1.4127e-12,
           4.1721e-13]]),
  None)]

In [11]:
tta_res=load_pickle('tta_res.pkl')

In [91]:
timm.list_models('vit*')

['vit_base_patch8_224',
 'vit_base_patch8_224_dino',
 'vit_base_patch8_224_in21k',
 'vit_base_patch16_18x2_224',
 'vit_base_patch16_224',
 'vit_base_patch16_224_dino',
 'vit_base_patch16_224_in21k',
 'vit_base_patch16_224_miil',
 'vit_base_patch16_224_miil_in21k',
 'vit_base_patch16_224_sam',
 'vit_base_patch16_384',
 'vit_base_patch16_plus_240',
 'vit_base_patch16_rpn_224',
 'vit_base_patch32_224',
 'vit_base_patch32_224_in21k',
 'vit_base_patch32_224_sam',
 'vit_base_patch32_384',
 'vit_base_patch32_plus_256',
 'vit_base_r26_s32_224',
 'vit_base_r50_s16_224',
 'vit_base_r50_s16_224_in21k',
 'vit_base_r50_s16_384',
 'vit_base_resnet26d_224',
 'vit_base_resnet50_224_in21k',
 'vit_base_resnet50_384',
 'vit_base_resnet50d_224',
 'vit_giant_patch14_224',
 'vit_gigantic_patch14_224',
 'vit_huge_patch14_224',
 'vit_huge_patch14_224_in21k',
 'vit_large_patch14_224',
 'vit_large_patch16_224',
 'vit_large_patch16_224_in21k',
 'vit_large_patch16_384',
 'vit_large_patch32_224',
 'vit_large_patch

In [12]:
arch = 'vit_large_patch16_224'

In [13]:
train('squishvitlg', arch, item=Resize(480, method='squish'), batch=aug_transforms(size=224, min_scale=0.75), accum=False)

Could not do one pass in your dataloader, there is something wrong in it. Please see the stack trace below:


RuntimeError: cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling `cusolverDnSgetrf( handle, m, n, dA, ldda, static_cast<float*>(dataPtr.get()), ipiv, info)`

In [None]:
save_pickle('tta_res.pkl', tta_res_post_vit)

In [None]:
tta_res=load_pickle('tta_res.pkl_post_vit')

In [None]:
arch = 'swinv2_large_window12_192_22k'

In [None]:
train('squish32', arch, item=Resize(480, method='squish'), batch=aug_transforms(size=192, min_scale=0.75), accum=False)

In [None]:
save_pickle('tta_res.pkl', tta_res)
tta_prs = first(zip(*tta_res))

In [23]:
tta_prs

(TensorBase([[1.4467e-04, 3.5593e-06, 1.5443e-05,  ..., 9.8472e-01, 9.2436e-03,
          3.6062e-05],
         [1.0460e-05, 1.0821e-05, 1.2065e-06,  ..., 1.3769e-03, 9.9784e-01,
          9.9138e-05],
         [1.6940e-03, 1.4689e-03, 4.0445e-04,  ..., 2.0533e-01, 2.6199e-02,
          2.4929e-03],
         ...,
         [3.4687e-07, 4.2655e-08, 1.6575e-07,  ..., 3.4927e-06, 9.9998e-01,
          6.3852e-06],
         [1.4066e-03, 9.7516e-01, 3.4856e-04,  ..., 1.2356e-02, 1.7219e-03,
          3.8615e-04],
         [4.5315e-11, 1.0754e-14, 1.6617e-12,  ..., 1.9259e-10, 7.4155e-11,
          9.2955e-11]]),
 TensorBase([[6.9185e-04, 3.9026e-07, 3.4659e-06,  ..., 9.9902e-01, 2.0757e-04,
          5.6475e-06],
         [8.1294e-08, 4.7761e-07, 1.9983e-08,  ..., 1.7230e-05, 9.9994e-01,
          1.8909e-06],
         [6.1377e-03, 3.6428e-03, 2.1927e-03,  ..., 9.5144e-02, 1.5961e-01,
          2.9712e-03],
         ...,
         [1.9596e-07, 4.5837e-09, 6.8226e-09,  ..., 2.9205e-05, 9.9984e

In [24]:
tta_prs += tta_prs[1:2]

In [25]:
tta_prs += tta_prs[1:2]

In [26]:
tta_prs += tta_prs[2:3]

In [27]:
t_tta = torch.stack(tta_prs)

In [29]:
avg_pr = t_tta.mean(0)
idxs = avg_pr.argmax(dim=1)
idxs.shape

torch.Size([3469])

In [36]:
dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=Resize(224))
mapping = dict(enumerate(dls.vocab))
ss = pd.read_csv('data/sample_submission.csv')
results = pd.Series(idxs.numpy(), name='idxs').map(mapping)
ss.label = results
ss.to_csv('data/submissions/subm.csv', index=False)

In [None]:
ss

In [32]:
!kaggle competitions submit -c paddy-disease-classification -f data/submissions/subm.csv -m "5th Sub 321"

Traceback (most recent call last):
  File "/root/.local/bin/kaggle", line 8, in <module>
    sys.exit(main())
  File "/root/.local/lib/python3.7/site-packages/kaggle/cli.py", line 67, in main
    out = args.func(**command_args)
  File "/root/.local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 562, in competition_submit_cli
    competition, quiet)
  File "/root/.local/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 513, in competition_submit
    content_length=os.path.getsize(file_name),
  File "/opt/conda/lib/python3.7/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: 'data/submissions/brismith_sub1.csv.gz'


In [37]:
!kaggle competitions submit -c paddy-disease-classification -f data/submissions/subm.csv -m "M1 chip entry"

100%|██████████████████████████████████████| 70.1k/70.1k [00:02<00:00, 32.0kB/s]
Successfully submitted to Paddy Doctor: Paddy Disease Classification