> `hover` supports bulk-labeling images through their URLs.
>
> :bulb: Let's do a quickstart for images and note what's different from texts.

-   <details open><summary>This page assumes that you have know the basics</summary>
    i.e. simple usage of `dataset` and `annotator`. Please visit the [quickstart tutorial](/hover/pages/tutorial/t0-quickstart) if you haven't done so.

</details>

## **Dataset for Images**

`hover` handles images through their URL addresses. URLs are strings which can be easily stored, hashed, and looked up against. They are also convenient for rendering tooltips in the annotation interface.

Similarly to `SupervisableTextDataset`, we can build one for images:

In [1]:
from hover.core.dataset import SupervisableImageDataset
import pandas as pd

# this is a 1000-image-url set of ImageNet data
# with custom labels: animal, object, food
example_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.7.0/imagenet_custom.csv"
df = pd.read_csv(example_csv_path).sample(frac=1).reset_index(drop=True)
df["SUBSET"] = "raw"
df.loc[500:800, 'SUBSET'] = 'train'
df.loc[800:900, 'SUBSET'] = 'dev'
df.loc[900:, 'SUBSET'] = 'test'

dataset = SupervisableImageDataset.from_pandas(df, feature_key="image", label_key="label")

# each subset can be accessed as its own DataFrame
dataset.dfs["raw"].head(5)

Unnamed: 0,image,label,SUBSET
0,https://raw.githubusercontent.com/phurwicz/ima...,ABSTAIN,raw
1,https://raw.githubusercontent.com/phurwicz/ima...,ABSTAIN,raw
2,https://raw.githubusercontent.com/phurwicz/ima...,ABSTAIN,raw
3,https://raw.githubusercontent.com/phurwicz/ima...,ABSTAIN,raw
4,https://raw.githubusercontent.com/phurwicz/ima...,ABSTAIN,raw


## **Vectorizer for Images**

We can follow a `URL -> content -> image object -> vector` path.

In [2]:
import requests
from functools import lru_cache

@lru_cache(maxsize=10000)
def url_to_content(url):
    """
    Turn a URL to response content.
    """
    response = requests.get(url)
    return response.content

In [3]:
from PIL import Image
from io import BytesIO

@lru_cache(maxsize=10000)
def url_to_image(url):
    """
    Turn a URL to a PIL Image.
    """
    img = Image.open(BytesIO(url_to_content(url))).convert("RGB")
    return img

-   <details open><summary>Caching and reading from disk</summary>
    This guide uses [`@wrappy.memoize`](https://erniethornhill.github.io/wrappy/) in place of `@functools.lru_cache` for caching.

    -   The benefit is that `wrappy.memoize` can persist the cache to disk, speeding up code across sessions.

    Cached values for this guide have been pre-computed, making it much master to run the guide.

</details>

In [4]:
import torch
import wrappy
from efficientnet_pytorch import EfficientNet
from torchvision import transforms

# EfficientNet is a series of pre-trained models
# https://github.com/lukemelas/EfficientNet-PyTorch
effnet = EfficientNet.from_pretrained("efficientnet-b0")
effnet.eval()

# standard transformations for ImageNet-trained models
tfms = transforms.Compose(
    [
        transforms.Resize(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
    ]
)

@wrappy.memoize(cache_limit=10000, persist_path='custom_cache/image_url_to_vector.pkl')
def vectorizer(url):
    """
    Using logits on ImageNet-1000 classes.
    """
    img = tfms(url_to_image(url)).unsqueeze(0)

    with torch.no_grad():
        outputs = effnet(img)

    return outputs.detach().numpy().flatten()

Downloading: "https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b0-355c32eb.pth" to /home/runner/.cache/torch/hub/checkpoints/efficientnet-b0-355c32eb.pth


  0%|          | 0.00/20.4M [00:00<?, ?B/s]

Loaded pretrained weights for efficientnet-b0
[38;5;4mℹ Persisting __main__.vectorizer() output to
custom_cache/image_url_to_vector.pkl.[0m


## **Embedding and Plot**

This is exactly the same as in the quickstart, just switching to image data:

In [5]:
# any kwargs will be passed onto the corresponding reduction
# for umap: https://umap-learn.readthedocs.io/en/latest/parameters.html
# for ivis: https://bering-ivis.readthedocs.io/en/latest/api.html
reducer = dataset.compute_nd_embedding(vectorizer, "umap", dimension=2)

Vectorizing: 100%|██████████| 1000/1000 [05:41<00:00,  2.93it/s]


In [6]:
from hover.recipes.stable import simple_annotator

interactive_plot = simple_annotator(dataset)

# ---------- NOTEBOOK MODE: for your actual Jupyter environment ---------
# this code will render the entire plot in Jupyter
# from bokeh.io import show, output_notebook
# output_notebook()
# show(interactive_plot, notebook_url='https://localhost:8888')

-   <details open><summary>What's special for images?</summary>
    **Tooltips**

    For text, the tooltip shows the original value.

    For images, the tooltip embeds the image based on URL.

    -   images in the local file system shall be served through [`python -m http.server`](https://docs.python.org/3/library/http.server.html).
    -   they can then be accessed through `https://localhost:<port>/relative/path/to/file`.

    **Search**

    For text, the search widget is based on regular expressions.

    For images, the search widget is based on vector cosine similarity.

    -   the `dataset` has remembered the `vectorizer` under the hood and passed it to the `annotator`.
    -   {== please [**let us know**](https://github.com/phurwicz/hover/issues/new) if you think there's a better way to search images in this case. ==}

</details>