# It's Corn (PogChamps #3) Kaggle Competition - Experiment 4

Following along with fast.ai lecture [Scaling Up: Road to the Top, Part 3](https://www.kaggle.com/code/jhoward/scaling-up-road-to-the-top-part-3) with tweaks along the way.

In [1]:
# install fastkaggle if not available
try: import fastkaggle
except ModuleNotFoundError:
    !pip install -Uq fastkaggle

from fastkaggle import *

## Set up packages and processing competition data

In [2]:
comp = 'kaggle-pog-series-s01e03'
path = setup_comp(comp, install='fastai "timm>=0.6.2.dev0"')
from fastai.vision.all import *
set_seed(8)

comp_path = path/'corn'

trn_path = comp_path/'train'
# trn_path.ls()

tst_path = comp_path/'test'
# tst_path.ls()

trn_files = get_image_files(trn_path).sorted()
tst_files = get_image_files(tst_path).sorted()

## Memory and gradient accumulation

In this analysis our goal is to train an ensemble of larger models with larger inputs.  In order to bypass the potential need for larger GPU memory (especially if running inference on Kaggle's machines) we will use gradient accumulation.

First we will quickly try a few models and image sizes to find out what will run without having memory issues.  To make this quick, we will grab a small subset of the data for running short epochs -- the memory use will be the same, but it will run faster.

In [3]:
df = pd.read_csv(comp_path/'train.csv')
df.label.value_counts()
print(df)

       seed_id    view            image       label
0            0     top  train/00000.png      broken
1            1  bottom  train/00001.png        pure
2            3     top  train/00003.png      broken
3            4     top  train/00004.png        pure
4            5     top  train/00005.png  discolored
...        ...     ...              ...         ...
14317    17795     top  train/17795.png        pure
14318    17796     top  train/17796.png  discolored
14319    17797     top  train/17797.png      broken
14320    17799  bottom  train/17799.png        pure
14321    17800  bottom  train/17800.png  discolored

[14322 rows x 4 columns]


In [4]:
training_set = df.sample(200)

Now we'll set up a `train` function.  The function will add a `finetune` argument to pick whether the model will be updated using the `fine_tune()` or `fit_one_cycle()` method -- the latter is faster since it doesn't do an initial fine-tuning of the head.  When we fine tune in this function we will also perform Test Time Augmentation (TTA) on the predictions on the test set (as the idea will be to ensemble the TTA results of a number of models by the end of this notebook).  There will be no hardcoded seed in the `ImageDataLoaders` line providing different training and validation sets every run of the notebook (not ideal) but providing different training and validation sets for each model we train in the ensemble (ideal).  An `accum` argument is added to implement gradient accumulation.  This parameter is used in two places:

1. Divide the batch size by `accum` (controls the ammount of memory needed on the GPU)
2. Add the `GradientAccumulation` callback, passing in `accum` (ensuring the gradients computed are identical to the original batch size).

In [5]:
def train(arch, size, item=Resize(256, method=ResizeMethod.Pad, pad_mode=PadMode.Border), accum=1, finetune=True, epochs=12, model_filename='model'):
    dls = ImageDataLoaders.from_df(training_set, 
        path=comp_path, fn_col=2, label_col=3,
        valid_pct=0.2, item_tfms=item,
        batch_tfms=aug_transforms(size=size, min_scale=0.75),
        bs=64//accum)
    cbs = GradientAccumulation(64) if accum else []
    learn = vision_learner(dls, arch, metrics=[accuracy,error_rate], cbs=cbs).to_fp16()
    if finetune:
        learn.fine_tune(epochs, 0.01)
        return learn.tta(dl=dls.test_dl(tst_files))
    else:
        learn.unfreeze()
        learn.fit_one_cycle(epochs, 0.01)
    learn.save(model_filename)


Let's test out the imact of gradient accumulation on a small model.

In [6]:
train('convnext_small_in22k', 128, epochs=1, accum=1, finetune=False)

epoch,train_loss,valid_loss,accuracy,error_rate,time
0,2.092889,4.214939,0.25,0.75,00:02


Let's create a function to find out how much GPU memory is being used and also clear it out for the next run:

In [7]:
import gc
def report_gpu():
    print(torch.cuda.list_gpu_processes())
    gc.collect()
    torch.cuda.empty_cache()

In [8]:
report_gpu()

GPU:0
process       2875 uses     4477.000 MB GPU memory


With `accum=1` it uses < 5GB VRAM.  Let's try `accum=2`:

In [9]:
train('convnext_small_in22k', 128, epochs=1, accum=2, finetune=False)
report_gpu()

epoch,train_loss,valid_loss,accuracy,error_rate,time
0,2.726989,2.233786,0.25,0.75,00:00


GPU:0
process       2875 uses     3433.000 MB GPU memory


VRAM usage has now gone to below 3.5GB.  It's not halved as there's other overhead involved (including all the model parameters).

Let's try `accum=4`:

In [10]:
train('convnext_small_in22k', 128, epochs=1, accum=4, finetune=False)
report_gpu()

epoch,train_loss,valid_loss,accuracy,error_rate,time
0,2.261818,2.19887,0.225,0.775,00:01


GPU:0
process       2875 uses     2903.000 MB GPU memory


Now under 3GB VRAM usage!

## Checking memory use

Let's check the memory use fo some various architectures and sizes we might want to train later and ensure they will fit in our VRAM.  For each of these, we will start with `accum=1` first and then double it until the model fits in the VRAM available.

First, `convnext_large`:

In [11]:
train('convnext_large_in22k', 128, epochs=1, accum=1, finetune=False)
report_gpu()

epoch,train_loss,valid_loss,accuracy,error_rate,time
0,2.525592,3.637675,0.125,0.875,00:01


GPU:0
process       2875 uses     8091.000 MB GPU memory


In [12]:
train('convnext_large_in22k', 256, epochs=1, accum=1, finetune=False)
report_gpu()

epoch,train_loss,valid_loss,accuracy,error_rate,time
0,2.301363,2.486521,0.4,0.6,00:01


GPU:0
process       2875 uses    19487.000 MB GPU memory


Next we will try `vit_large`:

In [13]:
report_gpu()

GPU:0
process       2875 uses     1869.000 MB GPU memory


In [14]:
train('vit_large_patch16_224', 224, epochs=1, accum=1, finetune=False)
report_gpu()

epoch,train_loss,valid_loss,accuracy,error_rate,time
0,2.511879,2.476487,0.15,0.85,00:02


GPU:0
process       2875 uses    22001.000 MB GPU memory


Lastly, let's try `swinv2` and `swin` models:

In [15]:
train('swinv2_large_window12_192_22k', 192, epochs=1, accum=1, finetune=False)
report_gpu()


  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,2.464439,1.904587,0.175,0.825,00:01


GPU:0
process       2875 uses    22231.000 MB GPU memory


In [16]:
train('swin_large_patch4_window7_224', 224, epochs=1, accum=1, finetune=False)
report_gpu()

epoch,train_loss,valid_loss,accuracy,error_rate,time
0,2.363899,1.329497,0.275,0.725,00:01


GPU:0
process       2875 uses    17819.000 MB GPU memory


## Running the models

We will use a `dict` to list out the preprocessing approaches we'll use for each architecture of interest:

In [17]:
models = {
    'convnext_large_in22k': {
        (Resize(256), 128, 'convnext_large_in22k_item256_size128'),
        (Resize(256), 256, 'convnext_large_in22k_item256_size256'),
        (Resize(256, method=ResizeMethod.Pad, pad_mode=PadMode.Border), 128, 'convnext_large_in22k_item256pad_size128'),
        (Resize(256, method=ResizeMethod.Pad, pad_mode=PadMode.Border), 256, 'convnext_large_in22k_item256pad_size256'),
    }, 'vit_large_patch16_224': {
        (Resize(256), 224, 'vit_large_patch16_224_item256'),
        (Resize(256, method=ResizeMethod.Pad, pad_mode=PadMode.Border), 224, 'vit_large_patch16_224_item256pad'),
    }, 'swinv2_large_window12_192_22k': {
        (Resize(256), 192, 'swinv2_large_window12_192_22k_item256'),
        (Resize(256, method=ResizeMethod.Pad, pad_mode=PadMode.Border), 192, 'swinv2_large_window12_192_22k_item256pad'),
    }, 'swin_large_patch4_window7_224': {
        (Resize(256), 224, 'swin_large_patch4_window7_224_item256'),
        (Resize(256, method=ResizeMethod.Pad, pad_mode=PadMode.Border), 224, 'swin_large_patch4_window7_224_item256pad'),
    }
}

Before training, let's switch to the full training set.

In [18]:
training_set = pd.read_csv(comp_path/'train.csv')
# training_set = df.sample(100) # test architecture dict with this

Now we can train all of the models.  Remember each will use a different training and validation set so the results won't be directly comparable.

We'll append each set of TTA predictions on the test set into a list called `tta_res`.

In [19]:
tta_res = []

for arch,details in models.items():
    for item,size,filename in details:
        print('---',arch)
        print(size)
        print(item.name)
        tta_res.append(train(arch, size, item=item, accum=2, model_filename=filename)) #, epochs=1)) ## train on this
#         tta_res.append(train(arch, size, item=item, accum=1, finetune=False, epochs=1, model_filename=filename)) ## test architecture definitions with this
        gc.collect()
        torch.cuda.empty_cache()

--- convnext_large_in22k
128
Resize -- {'size': (256, 256), 'method': 'pad', 'pad_mode': 'border', 'resamples': (2, 0), 'p': 1.0}


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.990871,0.790018,0.705656,0.294344,00:25


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.664697,0.602662,0.759427,0.240573,00:39
1,0.651417,0.561478,0.774092,0.225908,00:39
2,0.53511,0.567558,0.786662,0.213338,00:39
3,0.509198,0.566533,0.775838,0.224162,00:39
4,0.427079,0.561606,0.791899,0.208101,00:39
5,0.314936,0.639763,0.784916,0.215084,00:39
6,0.221185,0.746963,0.779679,0.220321,00:39
7,0.128905,0.904216,0.77933,0.22067,00:39
8,0.077669,0.971725,0.791899,0.208101,00:39
9,0.059545,0.9762,0.796439,0.203561,00:39


--- convnext_large_in22k
128
Resize -- {'size': (256, 256), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0}


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.97914,0.783063,0.710894,0.289106,00:25


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.685512,0.604097,0.747905,0.252095,00:39
1,0.618682,0.587314,0.770251,0.229749,00:40
2,0.567375,0.607418,0.771648,0.228352,00:39
3,0.503421,0.606839,0.751397,0.248603,00:39


IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



--- convnext_large_in22k
256
Resize -- {'size': (256, 256), 'method': 'pad', 'pad_mode': 'border', 'resamples': (2, 0), 'p': 1.0}


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.94745,0.83971,0.685754,0.314246,01:17


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.635195,0.606222,0.744763,0.255237,01:46
1,0.616333,0.566891,0.773045,0.226955,01:46
2,0.580128,0.645295,0.750698,0.249302,01:46
3,0.507595,0.552698,0.776187,0.223813,01:46
4,0.391674,0.681959,0.767109,0.232891,01:46
5,0.285752,0.709987,0.776885,0.223115,01:46
6,0.187731,0.888752,0.766061,0.233939,01:46
7,0.123601,0.853221,0.792249,0.207751,01:46
8,0.073518,1.003833,0.791899,0.208101,01:46
9,0.042788,1.013038,0.795042,0.204958,01:46


--- vit_large_patch16_224
224
Resize -- {'size': (256, 256), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0}


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.96835,0.805483,0.671439,0.328561,01:33


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.610052,0.540867,0.787011,0.212989,02:07
1,0.579119,0.703638,0.729399,0.270601,02:07
2,0.596386,0.665314,0.736033,0.263966,02:07
3,0.559221,0.591587,0.767109,0.232891,02:07
4,0.477273,0.651102,0.770601,0.229399,02:07
5,0.349836,0.737274,0.771299,0.228701,02:07
6,0.275343,0.690971,0.770251,0.229749,02:07
7,0.164554,0.834886,0.788059,0.211941,02:07
8,0.061171,1.07014,0.777933,0.222067,02:07
9,0.04357,1.062354,0.794693,0.205307,02:07


--- vit_large_patch16_224
224
Resize -- {'size': (256, 256), 'method': 'pad', 'pad_mode': 'border', 'resamples': (2, 0), 'p': 1.0}


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,1.03847,0.875081,0.661662,0.338338,01:33


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.602313,0.535195,0.784916,0.215084,02:07
1,0.577039,0.59629,0.769553,0.230447,02:07
2,0.578772,0.600304,0.770601,0.229399,02:07
3,0.557834,0.612983,0.748603,0.251397,02:07
4,0.466114,0.623054,0.766061,0.233939,02:07
5,0.367942,0.692493,0.760126,0.239874,02:07
6,0.252245,0.797961,0.770601,0.229399,02:07
7,0.170118,0.767248,0.791201,0.208799,02:07
8,0.080795,0.990987,0.77933,0.22067,02:07
9,0.037055,1.108594,0.785964,0.214036,02:07


IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.655635,0.615824,0.759777,0.240223,01:37
1,0.597687,0.577416,0.771648,0.228352,01:37
2,0.613894,0.581358,0.767458,0.232542,01:37
3,0.563934,0.633852,0.757332,0.242668,01:37
4,0.504411,0.613383,0.76257,0.23743,01:37
5,0.442633,0.538284,0.791899,0.208101,01:37
6,0.363675,0.611616,0.78317,0.21683,01:37
7,0.275925,0.65814,0.784218,0.215782,01:37
8,0.23829,0.700585,0.783869,0.216131,01:37
9,0.169662,0.759009,0.78352,0.21648,01:37


--- swin_large_patch4_window7_224
224
Resize -- {'size': (256, 256), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0}


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,1.095951,0.844235,0.67493,0.32507,01:16


epoch,train_loss,valid_loss,accuracy,error_rate,time
0,0.710242,0.585344,0.762221,0.237779,01:37
1,0.625428,0.566005,0.780028,0.219972,01:37
2,0.614395,0.56061,0.782123,0.217877,01:38
3,0.541873,0.531768,0.795042,0.204958,01:37
4,0.514698,0.544063,0.795391,0.204609,01:37
5,0.459638,0.547013,0.788059,0.211941,01:38
6,0.39098,0.555384,0.805517,0.194483,01:37
7,0.319841,0.598471,0.797486,0.202514,01:37
8,0.241135,0.637378,0.800279,0.199721,01:37
9,0.200656,0.67609,0.803771,0.196229,01:37


IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)



## Ensembling

Since this has taken quite a while to run, let's save the results, just in case something goes wrong!

In [20]:
save_pickle('tta_res.pkl', tta_res)

`Learner.tta` returns predictions and targets for each row.  We just want the predictions:

In [21]:
tta_prs = first(zip(*tta_res))

Optional provide weightings to the various models

An ensemble simply refers to a model which is itself the result of combining a number of other models.  The simplest way to do ensembling is to take the average of the predictions of each model:

In [22]:
avg_pr = torch.stack(tta_prs).mean(0)
avg_pr.shape

torch.Size([3479, 4])

## Submission

Let's make a copy of the dataloader we used in the train function to generate our vocab:

In [23]:
dls = ImageDataLoaders.from_df(training_set, 
        path=comp_path, fn_col=2, label_col=3,
        valid_pct=0.2, item_tfms=item,
        batch_tfms=aug_transforms(size=size, min_scale=0.75))

In [25]:
idxs = avg_pr.argmax(dim=1)
idxs

TensorBase([0, 2, 2,  ..., 2, 2, 2])

In [26]:
vocab = np.array(dls.vocab)
vocab

array(['broken', 'discolored', 'pure', 'silkcut'], dtype='<U10')

In [30]:
results = pd.Series(vocab[idxs], name="idxs")
results

0           broken
1             pure
2             pure
3           broken
4           broken
           ...    
3474    discolored
3475        broken
3476          pure
3477          pure
3478          pure
Name: idxs, Length: 3479, dtype: object

In [29]:
test_csv_fname = comp_path/'test.csv'
test = pd.read_csv(test_csv_fname)
test.head()

Unnamed: 0,seed_id,view,image
0,2,top,test/00002.png
1,11,bottom,test/00011.png
2,13,top,test/00013.png
3,19,bottom,test/00019.png
4,27,bottom,test/00027.png


In [31]:
test.sort_values(by=['seed_id'])
test = test.drop(columns=['view', 'image'])
test['label'] = results
test

Unnamed: 0,seed_id,label
0,2,broken
1,11,pure
2,13,pure
3,19,broken
4,27,broken
...,...,...
3474,17775,discolored
3475,17781,broken
3476,17790,pure
3477,17794,pure


In [32]:
test.to_csv('submission_3.csv', index=False)
!head submission_2.csv

seed_id,label
2,broken
11,pure
13,pure
19,broken
27,broken
30,pure
32,pure
41,pure
42,broken


Even though I suspect this won't work that well (validation loss seems to diverge too quickly for my liking on all these models) let's see what a submission looks like:

In [33]:
if not iskaggle:
    from kaggle import api
    api.competition_submit_cli('submission_3.csv', 'ensemble potential overfitting', comp)

100%|██████████████████████████████████████| 41.6k/41.6k [00:00<00:00, 43.9kB/s]


## Submission \#3

Wow ... so this provided an accuracy of 0.81321 with a rank of 13th on the leaderboard (currently 69 participants).  Top accuracy on the public leaderboard is currently 0.83189.  Oddly enough, this accuracy is currently tied with 4 people (very weird).