# Comparison

Over the course of the project I have touched my hand at 4 models (ConvNeXt, ViT, Swin, BEiT(?)). They have all been competitive in terms of performance after fine-tuning, but they are shared the same trait: big.

Now, "size matters" cuts both ways: larger models (tend to) have better performance, but they are always slower and requires more CPU (and GPU/TPU i.e. accelerated hardware) RAM. If you have infinite computing and processing power (e.g., 100+ NVIDIA 80GB H100s lying around), no problem. But if you are deploying your model to something else, or if latency costs you a lot of money, you are in for a big problem. You will want to reduce the size of the model while retaining as much performance as possible.

My first deployment on Hugging Face Space is a Swin-Large model. It fits just fine on the space, but each prediction takes ~5.4s to carry out. I want to explore different alternatives, which may have worse performance but offer better latency.

## Looking around

A helpful rule of thumb I follow is: "People have already done that."

I always expect that whatever I can think of, people have already thought of, achieved, or come very close to. There are many reasons behind this, but the interesting corollary is that the first thing I do is looking up what people have done.

I found [Jeremy Howard's visualization of `timm`'s benchmark](https://www.kaggle.com/code/jhoward/which-image-models-are-best/) and [Daniel Bourke's result with ViT and EfficientNet](https://www.learnpytorch.io/09_pytorch_model_deployment/) (and while I am at it, yes, I am following Bourke's course).

Jeremy's visualization suggested that I should check out:
- LeViT
- ViT (okay, it was not even there - just my pick)
- Swin
- ConvNeXt
- BeIT
- EfficientNet

from `timm`.

And I planed to do exactly that in this notebook, with the help of Pytorch-Lightning.

## Objectives

I want a model that is:
- 95%++ accuracy
- As low latency as possible, preferably close to FPS24 (the standard one for old movie with synchonized sound...)
- As low memory as possible
- High F1 score

Specifications:
- Dataset: Food101 (100% data)
- Hardware: NVIDIA GeForce RTX 3090 + CUDA 11.6 + PyTorch 2.0.1
- Batch size: 64
- Epochs: 3
- Optimizer: AdamW
- Scheduler: OneCycler
- Metrics: Accuracy, F1-Score
- Tracking: Tensorboard

In [1]:
from typing import Union
from pathlib import Path

import pytorch_lightning as pl
import torch
import timm
import torch.nn as nn
import torchmetrics
import torchvision

In [2]:
class Food101DataModule(pl.LightningDataModule):
    def __init__(self, data_dir: Union[str, Path] = "data", batch_size: int = 64) -> None:
        super().__init__()
        self.data_dir = data_dir
        self.batch_size = batch_size
        self.transform = 
    
    def 
    
    