## Установка в Google Colab

Ячейки ниже рекомендуется использовать для установки зависимостей в Google Colab.

1. Убедитесь, что вы подключены к окружению с GPU

In [None]:
!nvidia-smi

Thu Jul  4 07:14:36 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   56C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

2. Установка необходимых библиотек для сборки MinkowskiEngine

In [None]:
!sudo apt-get install libopenblas-dev

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
libopenblas-dev is already the newest version (0.3.20+ds-1).
0 upgraded, 0 newly installed, 0 to remove and 45 not upgraded.


In [None]:
!pip install ninja

Collecting ninja
  Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m307.2/307.2 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: ninja
Successfully installed ninja-1.11.1.1


3. Сборка и установка MinkowskiEngine из репозитория для работы с CUDA 12.2 (занимает много времени)

In [None]:
!pip install -U git+https://github.com/richlukich/MinkowskiEngine -v --no-deps \
                          --config-settings="--install-option=--force_cuda" \
                          --config-settings="--install-option=--blas=openblas"

Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Collecting git+https://github.com/richlukich/MinkowskiEngine
  Cloning https://github.com/richlukich/MinkowskiEngine to /tmp/pip-req-build-vahjyvqh
  Running command git version
  git version 2.34.1
  Running command git clone --filter=blob:none https://github.com/richlukich/MinkowskiEngine /tmp/pip-req-build-vahjyvqh
  Cloning into '/tmp/pip-req-build-vahjyvqh'...
  Updating files:  46% (114/244)
  Updating files:  47% (115/244)
  Updating files:  48% (118/244)
  Updating files:  49% (120/244)
  Updating files:  50% (122/244)
  Updating files:  51% (125/244)
  Updating files:  52% (127/244)
  Updating files:  53% (130/244)
  Updating files:  54% (132/244)
  Updating files:  55% (135/244)
  Updating files:  56% (137/244)
  Updating files:  57% (140/244)
  Updating files:  58% (142/244)
  Updating files:  59% (144/244)
  Updating files:  60% (147/244)
  Updating files:  61% (149/244)
  Updating files:  62% (

4. Проверка, что все работает

In [None]:
import torch
print(f"Is CUDA available in torch?: {torch.cuda.is_available()}")
import MinkowskiEngine as ME
print(f"Is CUDA available in MinkowskiEngine?: {ME.is_cuda_available()}")
ME.print_diagnostics()

Is CUDA available in torch?: True
Is CUDA available in MinkowskiEngine?: True
Linux-6.1.85+-x86_64-with-glibc2.35
3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
2.3.0+cu121
torch.cuda.is_available(): True
Driver Version 535.104.05
CUDA Version 12.2
VBIOS Version 90.04.A7.00.01
Image Version G183.0200.00.02
GSP Firmware Version N/A
0.5.4
MinkowskiEngine compiled with CUDA Support: True
NVCC version MinkowskiEngine is compiled: 12020
CUDART version MinkowskiEngine is compiled: 12020


5. Финальный шаг - установка библиотеки [opr](https://github.com/alexmelekhin/open_place_recognition), код из которой будет использоваться в бейзлайне

In [None]:
!git clone -b feat/itlp_outdoor https://github.com/alexmelekhin/open_place_recognition
%cd open_place_recognition
!pip install -e .  # флаг -e необходим для возможности редактировать код уже установленной библиотеки
%cd ..

Cloning into 'open_place_recognition'...
remote: Enumerating objects: 1467, done.[K
remote: Counting objects: 100% (696/696), done.[K
remote: Compressing objects: 100% (304/304), done.[K
remote: Total 1467 (delta 470), reused 494 (delta 376), pack-reused 771[K
Receiving objects: 100% (1467/1467), 22.87 MiB | 11.73 MiB/s, done.
Resolving deltas: 100% (842/842), done.
/content/open_place_recognition
Obtaining file:///content/open_place_recognition
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
Collecting hydra-core>=1.2 (from opr==0.2.1)
  Downloading hydra_core-1.3.2-py3-none-any.whl (154 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m154.5/154.5 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting kaleido (from opr==0.2.1)
  Downloading kaleido-0

/content


In [None]:
!pip install loguru

Collecting loguru
  Downloading loguru-0.7.2-py3-none-any.whl (62 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/62.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.5/62.5 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: loguru
Successfully installed loguru-0.7.2


In [None]:
!python -m pip install numpy-quaternion

Collecting numpy-quaternion
  Downloading numpy_quaternion-2023.0.4-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (195 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m195.8/195.8 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: numpy-quaternion
Successfully installed numpy-quaternion-2023.0.4


## Загрузка датасета в Google Colab

Пример кода для загрузки датасета.

Вы можете воспользоваться утилитой gdown, которая по умолчанию доступна в Colab. Допустим, https://drive.google.com/file/d/1rXsx0GEotwwTHjX1PveKdGzcatssO7wg/view?usp=sharing - ссылка на файл. Чтобы скачать его, нам нужно передать в gdown в качестве аргумента его id - для данного примера это `1rXsx0GEotwwTHjX1PveKdGzcatssO7wg` (часть ссылки между `file/d/` и `/view`).

In [None]:
!gdown 1t43hsvmYdDTwu7aYkZrXSxNF-YynqifB

Downloading...
From (original): https://drive.google.com/uc?id=1t43hsvmYdDTwu7aYkZrXSxNF-YynqifB
From (redirected): https://drive.google.com/uc?id=1t43hsvmYdDTwu7aYkZrXSxNF-YynqifB&confirm=t&uuid=9200ab51-bc14-49a8-a926-5951102166ea
To: /content/public.zip
100% 4.37G/4.37G [01:07<00:00, 65.1MB/s]


In [None]:
#!gdown 1txG6aPiy5XtxLOxFk0CxFk9Ey_VUsB5T

Вы можете сверить хэш-сумму файла:

In [None]:
#!sha256sum public.zip

И распаковать архив:

In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


In [None]:
!unzip -q '/content/drive/MyDrive/Colab Notebooks/public.zip'

In [None]:
!unzip -q public.zip

## Базовое решение

Создаем даталоадеры

In [None]:
DATA_ROOT = "/content/public"

In [None]:
from torch.utils.data import DataLoader

from opr.datasets.itlp_outdoor import ITLPCampusOutdoor
from opr.samplers import BatchSampler

In [None]:
train_dataset = ITLPCampusOutdoor(
    DATA_ROOT,
    subset="train",
    sensors=("front_cam", "back_cam"),
    load_semantics=True,
)
val_dataset = ITLPCampusOutdoor(
    DATA_ROOT,
    subset="val",
    sensors=("front_cam", "back_cam"),
    load_semantics=True,
)

test_dataset = ITLPCampusOutdoor(
    DATA_ROOT,
    subset="test",
    sensors=("front_cam", "back_cam"),
    load_semantics=True,
)

In [None]:
train_sampler = BatchSampler(
    train_dataset,
    batch_size=16,  # initial batch size
    batch_size_limit=256,  # maximum batch size (see "Dynamic Batch Sizing")
    batch_expansion_rate=1.4,
    drop_last=True,
)
val_sampler = BatchSampler(
    val_dataset,
    batch_size=256,  # initial batch size
    drop_last=True,
)

In [None]:
train_dl = DataLoader(
    dataset=train_dataset,
    batch_sampler=train_sampler,
    collate_fn=train_dataset.collate_fn,
    num_workers=4,
    pin_memory=True,
)
val_dl = DataLoader(
    dataset=val_dataset,
    batch_sampler=val_sampler,
    collate_fn=val_dataset.collate_fn,
    num_workers=4,
    pin_memory=True,
)

test_dl = DataLoader(
    dataset=test_dataset,
    batch_size=256,
    collate_fn=test_dataset.collate_fn,
    num_workers=4,
    pin_memory=True,
    drop_last=False,
)




## BLIP + BERT

### BLIP

In [None]:
!pip3 install salesforce-lavis

Collecting salesforce-lavis
  Downloading salesforce_lavis-1.0.2-py3-none-any.whl (1.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting contexttimer (from salesforce-lavis)
  Downloading contexttimer-0.3.3.tar.gz (4.9 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting decord (from salesforce-lavis)
  Downloading decord-0.6.0-py3-none-manylinux2010_x86_64.whl (13.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.6/13.6 MB[0m [31m61.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting einops>=0.4.1 (from salesforce-lavis)
  Downloading einops-0.8.0-py3-none-any.whl (43 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fairscale==0.4.4 (from salesforce-lavis)
  Downloading fairscale-0.4.4.tar.gz (235 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m235.4/

In [None]:
import torch
from PIL import Image

from lavis.models import load_model_and_preprocess

In [None]:
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

### текстовые данные для модели BERT

## В качестве базовой модели будем использовать ResNet18FPN для извлечения признаков изображения, Generalized Mean (GeM) pooling, фьюзить данные будем с помощью суммирования

In [None]:
import torch

from opr.models.place_recognition.base import ImageModel
from opr.modules.feature_extractors import ResNet18FPNFeatureExtractor
from opr.modules import Add, GeM, Concat

In [None]:
from typing import Dict, Optional

from loguru import logger
from torch import Tensor, nn
import torch.nn.functional as F

In [None]:
class ImageModel2(nn.Module):
    """Meta-model for image-based Place Recognition. Combines feature extraction backbone and head modules."""

    def __init__(
        self,
        backbone: nn.Module,
        head: nn.Module,
        image2text,
        text_embedding,
        tokenizer,
        fusion: Optional[nn.Module] = None,
    ) -> None:
        """Meta-model for image-based Place Recognition.

        Args:
            backbone (ImageFeatureExtractor): Image feature extraction backbone.
            head (ImageHead): Image head module.
            fusion (FusionModule, optional): Module to fuse descriptors for multiple images in batch.
                Defaults to None.
        """
        super().__init__()
        self.backbone = backbone
        self.head = head
        self.fusion = fusion
        self.image2text = image2text
        self.text_embedding = text_embedding
        self.tokenizer = tokenizer

        self.linear = nn.Linear(1024, 256)
        self.batch_norm = nn.BatchNorm1d(256)
        self.linear1 = nn.Linear(512, 256)
        self.batch_norm1 = nn.BatchNorm1d(256)
        #self.linear_layer1 = nn.Linear(, 256)

    def get_describe(self, image_tensor):
        res = image_tensor.unsqueeze(0)
        resized_images = F.interpolate(res, size=(384, 384), mode='bilinear', align_corners=False).to(device)
        return self.image2text.generate({"image": resized_images})


    def forward(self, batch: Dict[str, Tensor]) -> Dict[str, Tensor]:  # noqa: D102
        img_descriptors = {}
        for key, value in batch.items():
            if key.startswith("images_"):
                text = [self.get_describe(b)[0] for b in value]
                encoded_input = self.tokenizer(text , padding=True, truncation=True, return_tensors='pt').to(device)
                model_output = self.text_embedding(**encoded_input)
                cls_embedding = model_output.last_hidden_state[:, 0, :]
                images_embedding = self.head(self.backbone(value))
                print(images_embedding.shape, cls_embedding.shape)
                concatenated = torch.cat((images_embedding, cls_embedding), dim=1)
                compressed = self.linear(concatenated)
                activated = F.relu(compressed)
                normalized = self.batch_norm(activated)
                result =  torch.cat((images_embedding, normalized), dim=1)
                result = self.batch_norm1(F.relu(self.linear1(result)))
                img_descriptors[key] = result

        if len(img_descriptors) > 1:
            if self.fusion is None:
                raise ValueError("Fusion module is not defined but multiple images are provided")
            descriptor = self.fusion(img_descriptors)
        else:
            if self.fusion is not None:
                raise ValueError("Fusion module is defined but only one image is provided")
            descriptor = list(img_descriptors.values())[0]
        out_dict: Dict[str, Tensor] = {"final_descriptor": descriptor}
        return out_dict

In [None]:
from transformers import BertTokenizer, BertModel
import torch

In [None]:
feature_extractor = ResNet18FPNFeatureExtractor(
    in_channels=3,
    lateral_dim=256,
    fh_num_bottom_up=4,
    fh_num_top_down=0,
    pretrained=True,
)
pooling = GeM()
descriptor_fusion_module = Concat()

blip, vis_processors, _ = load_model_and_preprocess(
    name="blip_caption", model_type="large_coco", is_eval=True, device=device
)

tokenizer = BertTokenizer.from_pretrained('distilbert-base-uncased')
bert = BertModel.from_pretrained('distilbert-base-uncased').to(device)

Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 95.5MB/s]
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

100%|██████████| 1.66G/1.66G [00:42<00:00, 41.7MB/s]


vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

In [None]:
model = ImageModel2(
    backbone=feature_extractor,
    head=pooling,
    image2text=blip,
    text_embedding = bert,
    tokenizer=tokenizer,
    fusion=descriptor_fusion_module,
)

In [None]:
IMAGE_LR = 0.0001
HEAD_LR = 0.0001
FUSION_LR = 0.0001
WEIGHT_DECAY = 0.0001
SCHEDULER_GAMMA = 0.1
SCHEDULER_STEPS = [5]

In [None]:
from torch.optim import Adam
from torch.optim.lr_scheduler import MultiStepLR

In [None]:
params_list = []
if model.backbone is not None and IMAGE_LR is not None:
    params_list.append({"params": model.backbone.parameters(), "lr": IMAGE_LR})
if model.head is not None and HEAD_LR is not None:
    params_list.append({"params": model.head.parameters(), "lr": HEAD_LR})
if model.fusion is not None and FUSION_LR is not None:
    params_list.append({"params": model.fusion.parameters(), "lr": FUSION_LR})
params_list.append({"params":model.linear.parameters(),"lr":IMAGE_LR})
params_list.append({"params":model.linear1.parameters(),"lr":IMAGE_LR})
params_list.append({"params":model.batch_norm.parameters(),"lr":IMAGE_LR})
params_list.append({"params":model.batch_norm1.parameters(),"lr":IMAGE_LR})

In [None]:
params_list.append({"params":model.text_embedding.encoder.layer[-1].parameters(),"lr":IMAGE_LR})

In [None]:
from opr.trainers.place_recognition import UnimodalPlaceRecognitionTrainer
from opr.losses import BatchHardTripletMarginLoss

In [None]:
!mkdir checkpoints

Инициализируем наш Trainer

In [None]:
loss_fn = BatchHardTripletMarginLoss(margin=0.2)
optimizer = Adam(params_list, weight_decay=WEIGHT_DECAY)
scheduler = MultiStepLR(optimizer, milestones=SCHEDULER_STEPS, gamma=SCHEDULER_GAMMA)

trainer = UnimodalPlaceRecognitionTrainer(
    checkpoints_dir='/content/checkpoints',
    model=model,
    loss_fn=loss_fn,
    optimizer=optimizer,
    scheduler=scheduler,
    batch_expansion_threshold=0.7,
    wandb_log=False,
    device='cuda',
)

Обучаем

In [None]:
trainer.train(
    epochs=2,
    train_dataloader=train_dl,
    val_dataloader=val_dl,
    test_dataloader=test_dl,
)

[32m2024-07-04 07:47:34.049[0m | [1mINFO    [0m | [36mopr.trainers.place_recognition.unimodal[0m:[36mtrain[0m:[36m113[0m - [1m=====> Epoch:   1/2:[0m
[32m2024-07-04 07:47:34.050[0m | [1mINFO    [0m | [36mopr.trainers.place_recognition.unimodal[0m:[36m_loop_epoch[0m:[36m244[0m - [1m=> Train stage:[0m
  self.pid = os.fork()


torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
Train:   1%|          | 1/88 [00:30<43:57, 30.32s/it]

torch.Size([16, 256]) torch.Size([16, 768])


Train:   2%|▏         | 2/88 [00:47<32:33, 22.71s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:   3%|▎         | 3/88 [01:04<28:37, 20.20s/it]

torch.Size([16, 256]) torch.Size([16, 768])


Train:   5%|▍         | 4/88 [01:21<26:15, 18.75s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:   6%|▌         | 5/88 [01:38<25:10, 18.20s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:   7%|▋         | 6/88 [01:57<24:58, 18.27s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:   8%|▊         | 7/88 [02:15<24:53, 18.44s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:   9%|▉         | 8/88 [02:34<24:42, 18.53s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  10%|█         | 9/88 [02:53<24:40, 18.74s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  11%|█▏        | 10/88 [03:12<24:20, 18.73s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  12%|█▎        | 11/88 [03:31<24:19, 18.95s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  14%|█▎        | 12/88 [03:50<24:01, 18.97s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  15%|█▍        | 13/88 [04:08<23:17, 18.63s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  16%|█▌        | 14/88 [04:26<22:41, 18.40s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  17%|█▋        | 15/88 [04:44<22:14, 18.28s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  18%|█▊        | 16/88 [05:02<21:51, 18.21s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  19%|█▉        | 17/88 [05:21<21:36, 18.26s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  20%|██        | 18/88 [05:38<21:06, 18.09s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  22%|██▏       | 19/88 [05:56<20:47, 18.08s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  23%|██▎       | 20/88 [06:15<20:31, 18.11s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  24%|██▍       | 21/88 [06:33<20:10, 18.07s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  25%|██▌       | 22/88 [06:51<19:53, 18.08s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  26%|██▌       | 23/88 [07:09<19:39, 18.14s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  27%|██▋       | 24/88 [07:27<19:20, 18.13s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  28%|██▊       | 25/88 [07:45<19:03, 18.16s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  30%|██▉       | 26/88 [08:03<18:42, 18.11s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  31%|███       | 27/88 [08:22<18:34, 18.27s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  32%|███▏      | 28/88 [08:42<18:46, 18.78s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  33%|███▎      | 29/88 [09:02<18:54, 19.23s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  34%|███▍      | 30/88 [09:22<18:40, 19.32s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  35%|███▌      | 31/88 [09:40<18:08, 19.10s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  36%|███▋      | 32/88 [09:58<17:29, 18.74s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  38%|███▊      | 33/88 [10:17<17:08, 18.70s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  39%|███▊      | 34/88 [10:35<16:43, 18.59s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  40%|███▉      | 35/88 [10:53<16:20, 18.49s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  41%|████      | 36/88 [11:12<15:57, 18.41s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  42%|████▏     | 37/88 [11:30<15:39, 18.42s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  43%|████▎     | 38/88 [11:49<15:27, 18.56s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  44%|████▍     | 39/88 [12:08<15:15, 18.68s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  45%|████▌     | 40/88 [12:26<14:50, 18.56s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  47%|████▋     | 41/88 [12:44<14:28, 18.48s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  48%|████▊     | 42/88 [13:03<14:06, 18.40s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  49%|████▉     | 43/88 [13:21<13:50, 18.45s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  50%|█████     | 44/88 [13:41<13:46, 18.79s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  51%|█████     | 45/88 [13:59<13:24, 18.71s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  52%|█████▏    | 46/88 [14:19<13:13, 18.90s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  53%|█████▎    | 47/88 [14:38<13:03, 19.12s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  55%|█████▍    | 48/88 [14:57<12:41, 19.03s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  56%|█████▌    | 49/88 [15:16<12:21, 19.02s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  57%|█████▋    | 50/88 [15:36<12:09, 19.21s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  58%|█████▊    | 51/88 [15:54<11:42, 18.98s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  59%|█████▉    | 52/88 [16:14<11:29, 19.16s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  60%|██████    | 53/88 [16:34<11:18, 19.38s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  61%|██████▏   | 54/88 [16:53<10:57, 19.34s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  62%|██████▎   | 55/88 [17:12<10:33, 19.20s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  64%|██████▎   | 56/88 [17:31<10:16, 19.26s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  65%|██████▍   | 57/88 [17:51<10:03, 19.46s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  66%|██████▌   | 58/88 [18:10<09:39, 19.31s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  67%|██████▋   | 59/88 [18:30<09:24, 19.47s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  68%|██████▊   | 60/88 [18:50<09:06, 19.53s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  69%|██████▉   | 61/88 [19:09<08:42, 19.37s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  70%|███████   | 62/88 [19:28<08:21, 19.28s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  72%|███████▏  | 63/88 [19:46<07:57, 19.11s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  73%|███████▎  | 64/88 [20:06<07:40, 19.20s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  74%|███████▍  | 65/88 [20:25<07:23, 19.29s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  75%|███████▌  | 66/88 [20:44<07:02, 19.18s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  76%|███████▌  | 67/88 [21:04<06:43, 19.23s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  77%|███████▋  | 68/88 [21:23<06:26, 19.32s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  78%|███████▊  | 69/88 [21:42<06:05, 19.25s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  80%|███████▉  | 70/88 [22:01<05:45, 19.17s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  81%|████████  | 71/88 [22:19<05:21, 18.89s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  82%|████████▏ | 72/88 [22:39<05:03, 18.96s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  83%|████████▎ | 73/88 [22:58<04:45, 19.06s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  84%|████████▍ | 74/88 [23:17<04:27, 19.11s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  85%|████████▌ | 75/88 [23:36<04:07, 19.02s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  86%|████████▋ | 76/88 [23:55<03:49, 19.10s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  88%|████████▊ | 77/88 [24:13<03:27, 18.83s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  89%|████████▊ | 78/88 [24:32<03:06, 18.69s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  90%|████████▉ | 79/88 [24:49<02:44, 18.26s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  91%|█████████ | 80/88 [25:07<02:26, 18.33s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  92%|█████████▏| 81/88 [25:25<02:07, 18.18s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  93%|█████████▎| 82/88 [25:43<01:49, 18.18s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  94%|█████████▍| 83/88 [26:02<01:31, 18.21s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  95%|█████████▌| 84/88 [26:20<01:12, 18.16s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  97%|█████████▋| 85/88 [26:37<00:53, 17.93s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  98%|█████████▊| 86/88 [26:55<00:35, 17.85s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


Train:  99%|█████████▉| 87/88 [27:13<00:17, 17.80s/it]

torch.Size([16, 256]) torch.Size([16, 768])
torch.Size([16, 256]) torch.Size([16, 768])


[32m2024-07-04 08:15:05.331[0m | [1mINFO    [0m | [36mopr.trainers.place_recognition.unimodal[0m:[36m_loop_epoch[0m:[36m277[0m - [1mTrain time: 27:31[0m
[32m2024-07-04 08:15:05.336[0m | [1mINFO    [0m | [36mopr.trainers.place_recognition.unimodal[0m:[36m_loop_epoch[0m:[36m278[0m - [1mTrain stats: {'loss': 2.617476766759699, 'avg_embedding_norm': 22.5776829069311, 'num_triplets': 16.0, 'num_non_zero_triplets': 15.409090909090908, 'non_zero_rate': 0.9630681818181818, 'max_pos_pair_dist': 35.17248201370239, 'max_neg_pair_dist': 32.86103168400851, 'mean_pos_pair_dist': 32.99788574738936, 'mean_neg_pair_dist': 30.943805434487082, 'min_pos_pair_dist': 30.629588213833895, 'min_neg_pair_dist': 29.431094017895784}[0m
[32m2024-07-04 08:15:05.338[0m | [1mINFO    [0m | [36mopr.trainers.place_recognition.unimodal[0m:[36m_loop_epoch[0m:[36m244[0m - [1m=> Val stage:[0m
Val:   0%|          | 0/2 [00:00<?, ?it/s]

torch.Size([256, 256]) torch.Size([256, 768])
torch.Size([256, 256]) torch.Size([256, 768])


Val:  50%|█████     | 1/2 [05:15<05:15, 315.79s/it]

torch.Size([256, 256]) torch.Size([256, 768])
torch.Size([256, 256]) torch.Size([256, 768])


[32m2024-07-04 08:25:05.089[0m | [1mINFO    [0m | [36mopr.trainers.place_recognition.unimodal[0m:[36m_loop_epoch[0m:[36m277[0m - [1mVal time: 09:59[0m
[32m2024-07-04 08:25:05.092[0m | [1mINFO    [0m | [36mopr.trainers.place_recognition.unimodal[0m:[36m_loop_epoch[0m:[36m278[0m - [1mVal stats: {'loss': 6.186217784881592, 'avg_embedding_norm': 14.795476913452148, 'num_triplets': 256.0, 'num_non_zero_triplets': 256.0, 'non_zero_rate': 1.0, 'max_pos_pair_dist': 16.005375385284424, 'max_neg_pair_dist': 9.59437608718872, 'mean_pos_pair_dist': 12.089282035827637, 'mean_neg_pair_dist': 6.105823278427124, 'min_pos_pair_dist': 7.900232791900635, 'min_neg_pair_dist': 0.8727841973304749}[0m
[32m2024-07-04 08:25:05.095[0m | [1mINFO    [0m | [36mopr.trainers.place_recognition.unimodal[0m:[36mtest[0m:[36m172[0m - [1m=> Test stage:[0m
Calculating test set descriptors:   0%|          | 0/8 [00:00<?, ?it/s]

torch.Size([256, 256]) torch.Size([256, 768])
torch.Size([256, 256]) torch.Size([256, 768])


Calculating test set descriptors:  12%|█▎        | 1/8 [07:38<53:27, 458.18s/it]

torch.Size([256, 256]) torch.Size([256, 768])
torch.Size([256, 256]) torch.Size([256, 768])


Calculating test set descriptors:  25%|██▌       | 2/8 [12:59<37:45, 377.56s/it]

Cтоит отметить, что при обучении тестирование происходит с помощью параметра distance_threshold = 25, подсчет метрик в яндекс контесте происходит при параметре distance_threshold = 5, поэтому делаем перерасчет

In [None]:
trainer.test(test_dl, distance_threshold=5)

## Подготовка ответа для загрузки на сервер

Пример кода для создания файла сабмита. Вы можете модифицировать пайплайн, например добавить ре-ранжирование кандидатов на основе каких-либо характеристик (например, как в [Path-NetVLAD](https://arxiv.org/abs/2103.01486)).

In [None]:
import itertools

import pandas as pd
from sklearn.neighbors import KDTree
import numpy as np
import torch
from tqdm import tqdm


def extract_embeddings(model, descriptor_key, dataloader, device):
    model = model.to(device)
    model.eval()
    with torch.no_grad():
        test_embeddings_list = []
        for data in tqdm(dataloader, desc="Calculating test set descriptors"):
            batch = data
            batch = {e: batch[e].to(device) for e in batch}
            batch_embeddings = model(batch)
            test_embeddings_list.append(batch_embeddings[descriptor_key].cpu().numpy())
        test_embeddings = np.vstack(test_embeddings_list)
    return test_embeddings


def test_submission(
    test_embeddings: np.ndarray, dataset_df: pd.DataFrame, filename: str = "submission.txt"
) -> None:
    """Function to create submission txt file.

    Args:
        test_embeddings (np.ndarray): Array of embeddings.
        dataset_df (pd.Dataframe): Test dataset dataframe ('test.csv').
        filename (str): Name of the output txt file. Defaults to "submission.txt".
    """
    tracks = []

    for _, group in dataset_df.groupby("track"):
        tracks.append(group.index.to_numpy())
    n = 1
    ij_permutations = sorted(list(itertools.permutations(range(len(tracks)), 2)))
    # ij_permutations = [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]

    submission_lines = []

    for i, j in tqdm(ij_permutations, desc="Calculating metrics"):
        query_indices = tracks[i]
        database_indices = tracks[j]
        query_embs = test_embeddings[query_indices]
        database_embs = test_embeddings[database_indices]

        database_tree = KDTree(database_embs)
        _, indices = database_tree.query(query_embs, k=n)

        submission_lines.extend(list(database_indices[indices.squeeze()]))

    with open(filename, "w") as f:
        for l in submission_lines:
            f.write(str(l)+"\n")

In [None]:
embeddings = extract_embeddings(model, descriptor_key="final_descriptor", dataloader=test_dl, device='cuda')
test_submission(embeddings, dataset_df=test_dl.dataset.dataset_df, filename="baseline_submission.txt")

## Текстовая информация

Для решения задачи также предлагается использовать OCR-пайплайн, [реализация](https://colab.research.google.com/drive/1TXGPCjOdi7auAIoHOspBqS1Tr-c9hHym?usp=sharing) которого представлена в библиотеке OpenPlaceRecognition. Таким образом, для предсказания места помимо текстовых эмбеддингов можно использовать текст, найденный на сцене.

Файл с сабмитом необходимо загружать на яндекс контест: https://contest.yandex.ru/contest/63631

Удачи!