## Возможные оптимизации:

**- torch.compile (+)**

**- torch.jit.trace(+)**

Оптимизации, которые были в nvidia (но тут их нет 😞):
1. torch optim

Некоторые широко используемые оптимизационные алгоритмы,  доступные в torch.optim:
- SGD.
- Adam.
- RMSprop.
- Adagrad.
  
Но это все используется для собственного train. Для eval не подходит, поэтому я не рассматриваю

2. trt plugin

Почему-то не сработал на Swin. Просто выдает ошибку, что не переопределен для Swin. Как это использовала nvidia? Наверное, проблема в том, что модели они писали "ручками"


## Импорт и установка необходимых зависимостей

In [50]:
!pip install datasets
!pip install torch==2.0.1
!pip install torchvision==0.15.2
!pip install torcheval
!pip install transformers
!pip install cjm_pytorch_utils
!pip install tqdm



In [51]:
from tqdm.auto import tqdm
from time import perf_counter
from pathlib import Path
import torch
import torch._dynamo as torchdynamo
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from transformers import AutoImageProcessor, AutoModelForImageClassification
from datasets import load_dataset
from cjm_pytorch_utils.core import get_torch_device
import multiprocessing
import sys

## Код оптимизаций

### torch.compile

In [52]:
def torch_compile(model, device, dtype):
  ! ldconfig /usr/lib64-nvidia
  torchdynamo.config.guard_nn_modules = True
  model = torch.compile(model)

### torch.jit.trace

In [None]:
def torch_jit_trace(model, device, dtype):
  example = torch.rand(1, 3, 224, 224).to(device)
  model = torch.jit.trace(model, example_inputs=example, strict=False, check_trace=False)


## Подготовка datasets and models


Рассматриваются следующие пары датасет/модель:
  1. Matthijs/snacks: aspis/swin-base-finetuned-snacks
  2. lewtun/dog_food: sasha/swin-tiny-finetuned-dogfood
  3. food101: aspis/swin-finetuned-food101
  
| dataset         | size of pictures         | num of classes | num of pictures |
|-----------------|--------------------------|----------------|-----------------|
| Mattijs/snacks  | 256x256 (min)            | 20             | 952             |
| lewtun/dog_food | differs (exmp.: 620x350) | 3              | 750             |
| food101         | 512x512 (max)            | 101            | 25_250          |


In [53]:
datasets_and_models = {
  'Matthijs/snacks': 'aspis/swin-base-finetuned-snacks',
  'lewtun/dog_food': 'sasha/swin-tiny-finetuned-dogfood',
  'food101': 'aspis/swin-finetuned-food101',
}

device = get_torch_device()
dtype = torch.float32

dataset_dir = Path("/content/datasets/")
dataset_dir.mkdir(parents=True, exist_ok=True)

In [54]:
loaded_models = []
loaded_image_processors = []
for model_name in datasets_and_models.values():
  model = AutoModelForImageClassification.from_pretrained(model_name)
  model = model.to(device=device, dtype=dtype)
  model.eval()
  torch_jit_trace(model, device, dtype)
  # torch_compile(model, device, dtype)
  image_processor = AutoImageProcessor.from_pretrained(model_name)
  loaded_models.append(model)
  loaded_image_processors.append(image_processor)

  if num_channels != self.num_channels:
  if width % self.patch_size[1] != 0:
  if height % self.patch_size[0] != 0:
  if min(input_resolution) <= self.window_size:
  was_padded = pad_values[3] > 0 or pad_values[5] > 0
  if was_padded:
  should_pad = (height % 2 == 1) or (width % 2 == 1)
  if should_pad:
  self.window_size = min(input_resolution)


Downloading (…)rocessor_config.json:   0%|          | 0.00/240 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/937 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/110M [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/240 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/5.81k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/348M [00:00<?, ?B/s]

Downloading (…)rocessor_config.json:   0%|          | 0.00/240 [00:00<?, ?B/s]

In [55]:
num_workers = multiprocessing.cpu_count()

loaded_datasets = []
for dataset_number, dataset_name in enumerate(datasets_and_models.keys()):
  cache_dir = Path(f'{dataset_dir}/{dataset_name}')
  dataset = load_dataset(dataset_name, cache_dir=cache_dir, num_proc=num_workers)
  if (dataset_name == 'food101'):
    dataset = dataset['validation']
  else:
    dataset = dataset['test']
  loaded_datasets.append(dataset)

Downloading builder script:   0%|          | 0.00/3.50k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/1.69k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/110M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating test split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

Downloading readme:   0%|          | 0.00/4.34k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/929 [00:00<?, ?B/s]

  

Downloading data files #0:   0%|          | 0/1 [00:00<?, ?obj/s]

Downloading data files #1:   0%|          | 0/1 [00:00<?, ?obj/s]

Downloading data:   0%|          | 0.00/67.9M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/217M [00:00<?, ?B/s]

  

Extracting data files #1:   0%|          | 0/1 [00:00<?, ?obj/s]

Extracting data files #0:   0%|          | 0/1 [00:00<?, ?obj/s]

Generating train split:   0%|          | 0/2250 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/750 [00:00<?, ? examples/s]

Downloading builder script:   0%|          | 0.00/6.21k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/5.56k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/10.3k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

  

Downloading data files #0:   0%|          | 0/1 [00:00<?, ?obj/s]

Downloading data files #1:   0%|          | 0/1 [00:00<?, ?obj/s]

Downloading data:   0%|          | 0.00/489k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.47M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/75750 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/25250 [00:00<?, ? examples/s]

## Тестирование

In [56]:
prepared_data_for_model_testing = zip(loaded_models,
                                      datasets_and_models.values(),
                                      loaded_image_processors,
                                      loaded_datasets,
                                      datasets_and_models.keys())

for model, model_name, processor, dataset, dataset_name in prepared_data_for_model_testing:
  print(f'Testing "{model_name}" on "{dataset_name}" dataset')

  progress_bar = tqdm(total=len(dataset), desc="Test")

  correct_predictions, total_inference_time = 0, 0
  max_inference_time, min_inference_time = -1, 100000

  for data_unit in dataset:
    image, label = data_unit.values()
    inputs = processor(image, return_tensors='pt').to(device)

    with torch.no_grad():
      start = perf_counter()
      outputs = model(**inputs)
      end = perf_counter()

      if outputs.logits.argmax(-1).item() == label:
        correct_predictions += 1

    inference_time = end - start
    max_inference_time = max(max_inference_time, inference_time)
    min_inference_time = min(min_inference_time, inference_time)
    total_inference_time += inference_time

    progress_bar.update()

  accuracy = correct_predictions / len(dataset)
  average_inference_time = total_inference_time / len(dataset)
  images_per_second = 1 / average_inference_time

  print('GPU T4 Google Colab')
  print(f'Achieved accuracy: {accuracy:.6f}')
  print(f'Max inference time: {max_inference_time:.6f}')
  print(f'Min inference time: {min_inference_time:.6f}')
  print(f'Average inference time: {average_inference_time:.6f}')
  print(f'Average images per second: {images_per_second:.6f}')
  progress_bar.close()

Testing "aspis/swin-base-finetuned-snacks" on "Matthijs/snacks" dataset


Test:   0%|          | 0/952 [00:00<?, ?it/s]

GPU T4 Google Colab
Achieved accuracy: 0.943277
Max inference time: 0.097498
Min inference time: 0.033045
Average inference time: 0.045951
Average images per second: 21.762208
Testing "sasha/swin-tiny-finetuned-dogfood" on "lewtun/dog_food" dataset


Test:   0%|          | 0/750 [00:00<?, ?it/s]

GPU T4 Google Colab
Achieved accuracy: 0.984000
Max inference time: 0.044745
Min inference time: 0.016896
Average inference time: 0.022441
Average images per second: 44.562174
Testing "aspis/swin-finetuned-food101" on "food101" dataset


Test:   0%|          | 0/25250 [00:00<?, ?it/s]

GPU T4 Google Colab
Achieved accuracy: 0.920198
Max inference time: 0.299051
Min inference time: 0.032315
Average inference time: 0.045985
Average images per second: 21.746271


## Результаты

### food101: aspis/swin-finetuned-food101

| dataset         | size of pictures         | num of classes | num of pictures |
|-----------------|--------------------------|----------------|-----------------|
| food101         | 512x512 (max)            | 101            | 25_250          |

| optimization    | accuracy | max inf. time | min inf. time | avg. inf. time | avg. img/s |
|-----------------|----------|---------------|---------------|----------------|------------|
| none            | 0.92     | 0.14          | 0.029         | 0.039          | 25.8       |
| torch.compile   | 0.92     | 585.45        | 0.026         | 0.053          | 18.75      |
| torch.jit.trace | 0.92     | 0.3           | 0.032         | 0.046          | 21.7       |

*! max inf. time в torch.compile большой, потому что включает время компиляции модели на первом изображении*

### Matthijs/snacks: aspis/swin-base-finetuned-snacks


| dataset         | size of pictures         | num of classes | num of pictures |
|-----------------|--------------------------|----------------|-----------------|
| Mattijs/snacks  | 256x256 (min)            | 20             | 952             |

| optimization    | accuracy | max inf. time | min inf. time | avg. inf. time | avg. img/s |
|-----------------|----------|---------------|---------------|----------------|------------|
| none            | 0.94     | 0.069         | 0.029         | 0.038          | 26.0       |
| torch.compile   | 0.94     | 554.80        | 0.027         | 0.036          | 28.06      |
| torch.jit.trace | 0.94     | 0.097         | 0.033         | 0.046          | 21.8       |

*! max inf. time в torch.compile большой, потому что включает время компиляции модели на первом изображении*

### lewtun/dog_food: sasha/swin-tiny-finetuned-dogfood


| dataset         | size of pictures         | num of classes | num of pictures |
|-----------------|--------------------------|----------------|-----------------|
| lewtun/dog_food | differs (exmp.: 620x350) | 3              | 750             |

| optimization    | accuracy | max inf. time | min inf. time | avg. inf. time | avg. img/s |
|-----------------|----------|---------------|---------------|----------------|------------|
| none            | 0.98     | 0.034         | 0.0156        | 0.0206         | 48.4       |
| torch.compile   | 0.98     | 357.22        | 0.0137        | 0.0188         | 52.96      |
| torch.jit.trace | 0.98     | 0.045         | 0.0169        | 0.0224         | 44.6       |


*! max inf. time в torch.compile большой, потому что включает время компиляции модели на первом изображении*