[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AnyLoc/AnyLoc/blob/main/demo/anyloc_vlad_generate_colab.ipynb)

# AnyLoc VLAD DINOv2 Descriptors

Given a folder of images, this notebook generates global descriptors per image and stores the result in another folder. The global descriptors are created using VLAD over DINOv2 features from a particular layer and facet of transformer (default is from the paper).

We'll use images from [FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance](https://www.robots.ox.ac.uk/~mobile/IJRR_2008_Dataset/) (data from [here](https://www.robots.ox.ac.uk/~mobile/IJRR_2008_Dataset/data.html)) as an example. This will be downloaded if it doesn't exist.


## Setup


### Google Colab

- Run this section only if running this notebook on Google Colab.
- If you're running this section on your local machine, jump to `Downloading data` sub-section.

In [1]:
# Ensure that utilities.py module is there
import os
import requests
if os.path.isfile('utilities.py'):
    print('Found utilities.py')
else:
    print("Could not find utilities.py, downloading it")
    url = "https://raw.githubusercontent.com/AnyLoc/AnyLoc/main/demo/utilities.py"
    file_data = requests.get(url, allow_redirects=True)
    with open('utilities.py', 'wb') as handler:
        handler.write(file_data.content)

Found utilities.py


In [2]:
print("Verifying NVIDIA GPU is available")
!nvidia-smi -L
print("Please see that the GPU has at least 16 GB VRAM free")
!nvidia-smi

Verifying NVIDIA GPU is available
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-e356fcd4-1d44-a564-e38c-d5c80c7a1b2e)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-5d97153c-baaa-6e31-90f3-1dd2a2dda19d)
Please see that the GPU has at least 16 GB VRAM free
Thu Aug 17 18:23:28 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 36%   29C    P8    16W / 250W |      0MiB / 11264MiB |      0%      Default |
|                               |                      |                  N

Ensure that packages are installed

In [None]:
# Install other things
print("Trying to access utility libraries")
try:
    import einops
    import fast_pytorch_kmeans
    import distinctipy
    import onedrivedownloader
    print("Can access utility libraries")
except ImportError:
    print("Installing utility libraries")
    !pip install fast_pytorch_kmeans
    !pip install einops
    !pip install distinctipy
    !pip install onedrivedownloader

#### Downloading Data

Downloading

- `cache`: Vocabulary (cluster centers) and test images
- `data`: Images that we'll use for testing


In [3]:
# Download cache data from OneDrive
from onedrivedownloader import download
from utilities import od_down_links
# Link
ln = od_down_links["cache"]
# Download and unzip
if os.path.isdir("./cache"):
    print("Cache folder already exists!")
else:
    print("Downloading the cache folder")
    download(ln, filename="cache.zip", unzip=True, unzip_path="./")
    print("Cache folder downloaded")

Cache folder already exists!


In [4]:
# Download the data (images)
use_odrive = True   # If True, use personal OneDrive (not official links)
if use_odrive:
    print("Downloading images from OneDrive ...")
    imgs_link = od_down_links["test_imgs_od"]
    download(imgs_link, "./data/CityCenter/Images.zip", unzip=True, unzip_path="./data/CityCenter")
    print("Download and extraction of images from OneDrive completed")
else:
    print("Downloading from original source")
    imgs_link = od_down_links["test_imgs"]
    if os.path.isdir("./data/CityCenter"):
        print("Directory already exists")
    else:
        os.makedirs("./data/CityCenter")
        print("Directory created")
    !wget $imgs_link -O ./data/CityCenter/Images.zip
    print("Extraction completed")

Downloading images from OneDrive ...
Unzipping file...


Extracting files: 100%|██████████| 2475/2475 [00:00<00:00, 140176.39it/s]

Download and extraction of images from OneDrive completed





In [5]:
# Ensurer that everything went smoothly
import glob
_ex = lambda x: os.path.realpath(os.path.expanduser(x))
cache_dir: str = _ex("./cache")
imgs_dir = _ex("./data/CityCenter/Images/")
assert os.path.isdir(cache_dir), "Cache directory not found"
assert os.path.isdir(imgs_dir), "Invalid unzipping"
num_imgs = len(glob.glob(f"{imgs_dir}/*.jpg"))
print(f"Found {num_imgs} images in {imgs_dir}")

Found 2474 images in /scratch/avneesh.mishra/vl-vpr/apps/global_descriptors/data/CityCenter/Images


### Import Everything

In [6]:
# Import everything
import numpy as np
import cv2 as cv
import torch
from torch import nn
from torch.nn import functional as F
from torchvision import transforms as tvf
from torchvision.transforms import functional as T
from PIL import Image
import matplotlib.pyplot as plt
import distinctipy as dipy
from tqdm.auto import tqdm
from typing import Literal, List
import os
import natsort
import shutil
from copy import deepcopy
# DINOv2 imports
from utilities import DinoV2ExtractFeatures
from utilities import VLAD

## Building Global Descriptors

Save global descriptors as numpy arrays to a directory (mirroring the directory structure of the dataset).


In [7]:
# Program parameters
save_dir = _ex("./data/CityCenter/GD_Images/")
device = torch.device("cuda")
# Dino_v2 properties (parameters)
desc_layer: int = 31
desc_facet: Literal["query", "key", "value", "token"] = "value"
num_c: int = 32
# Domain for use case (deployment environment)
domain: Literal["aerial", "indoor", "urban"] = "urban"
# Maximum image dimension
max_img_size: int = 1024

In [8]:
# Ensure inputs are fine
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
    print(f"Creating directory: {save_dir}")
else:
    print("Save directory already exists, overwriting possible!")

Creating directory: /scratch/avneesh.mishra/vl-vpr/apps/global_descriptors/data/CityCenter/GD_Images


### DINOv2 Extractor

DINOv2 extractor and the base transformation (for each image)

In [9]:
# DINO extractor
if "extractor" in globals():
    print(f"Extractor already defined, skipping")
else:
    extractor = DinoV2ExtractFeatures("dinov2_vitg14", desc_layer,
        desc_facet, device=device)
# Base image transformations
base_tf = tvf.Compose([
    tvf.ToTensor(),
    tvf.Normalize(mean=[0.485, 0.456, 0.406], 
                    std=[0.229, 0.224, 0.225])
])

Using cache found in /home2/avneesh.mishra/.cache/torch/hub/facebookresearch_dinov2_main
xFormers not available
xFormers not available


### VLAD object

For forming global descriptors. Also loads the cluster centers (vocabulary) for VLAD.


In [10]:
# Ensure that data is present
ext_specifier = f"dinov2_vitg14/l{desc_layer}_{desc_facet}_c{num_c}"
c_centers_file = os.path.join(cache_dir, "vocabulary", ext_specifier,
                            domain, "c_centers.pt")
assert os.path.isfile(c_centers_file), "Cluster centers not cached!"
c_centers = torch.load(c_centers_file)
assert c_centers.shape[0] == num_c, "Wrong number of clusters!"

In [11]:
# VLAD object
vlad = VLAD(num_c, desc_dim=None, 
        cache_dir=os.path.dirname(c_centers_file))
# Fit (load) the cluster centers (this'll also load the desc_dim)
vlad.fit(None)

Using cached cluster centers
Desc dim set to 1536


### Global Descriptor Generation

Main generation stage. Creating global descriptors only for the first 20 images here.

In [12]:
img_fnames = glob.glob(f"{imgs_dir}/*.jpg")
img_fnames = natsort.natsorted(img_fnames)
for img_fname in tqdm(img_fnames[:20]):
    # DINO features
    with torch.no_grad():
        pil_img = Image.open(img_fname).convert('RGB')
        img_pt = base_tf(pil_img).to(device)
        if max(img_pt.shape[-2:]) > max_img_size:
            c, h, w = img_pt.shape
            # Maintain aspect ratio
            if h == max(img_pt.shape[-2:]):
                w = int(w * max_img_size / h)
                h = max_img_size
            else:
                h = int(h * max_img_size / w)
                w = max_img_size
            print(f"To {(h, w) =}")
            img_pt = T.resize(img_pt, (h, w), 
                    interpolation=T.InterpolationMode.BICUBIC)
            print(f"Resized {img_fname} to {img_pt.shape = }")
        # Make image patchable (14, 14 patches)
        c, h, w = img_pt.shape
        h_new, w_new = (h // 14) * 14, (w // 14) * 14
        img_pt = tvf.CenterCrop((h_new, w_new))(img_pt)[None, ...]
        # Extract descriptor
        ret = extractor(img_pt) # [1, num_patches, desc_dim]
    # VLAD global descriptor
    gd = vlad.generate(ret.cpu().squeeze()) # VLAD: shape [agg_dim]
    gd_np = gd.numpy()[np.newaxis, ...] # shape: [1, agg_dim]
    np.save(f"{save_dir}/{os.path.basename(img_fname)}.npy", gd_np)

  0%|          | 0/20 [00:00<?, ?it/s]

Done