> The most common usage of `hover` is through built-in `recipe`s like in the quickstart.
>
> :ferris_wheel: Let's explore another `recipe` -- an active learning example.

-   <details open><summary>Dependencies for {== local environments ==}</summary>
    When you run the code locally, you may need to install additional packages.

    To run the text embedding code on this page, you need:
```shell
    pip install spacy
    python -m spacy download en_core_web_md
```

    To render `bokeh` plots in Jupyter, you need:
```shell
    pip install jupyter_bokeh
```

    If you are using JupyterLab older than 3.0, use this instead ([reference](https://pypi.org/project/jupyter-bokeh/)):
```shell
    jupyter labextension install @jupyter-widgets/jupyterlab-manager
    jupyter labextension install @bokeh/jupyter_bokeh
```

</details>

## **Fundamentals**

Hover `recipe`s are functions that take a `SupervisableDataset` and return an annotation interface.

The `SupervisableDataset` is assumed to have some data and embeddings.

## **Recap: Data & Embeddings**

Let's preprare a dataset with embeddings. This is almost the same as in the [quickstart](../t0-quickstart/):

In [1]:
from hover.core.dataset import SupervisableTextDataset
import pandas as pd

raw_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.5.0/20_newsgroups_raw.csv"
train_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.5.0/20_newsgroups_train.csv"

# for fast, low-memory demonstration purpose, sample the data
df_raw = pd.read_csv(raw_csv_path).sample(400)
df_raw["SUBSET"] = "raw"
df_train = pd.read_csv(train_csv_path).sample(400)
df_train["SUBSET"] = "train"
df_dev = pd.read_csv(train_csv_path).sample(100)
df_dev["SUBSET"] = "dev"
df_test = pd.read_csv(train_csv_path).sample(100)
df_test["SUBSET"] = "test"

# build overall dataframe and ensure feature type
df = pd.concat([df_raw, df_train, df_dev, df_test])
df["text"] = df["text"].astype(str)

# this class stores the dataset throught the labeling process
dataset = SupervisableTextDataset.from_pandas(df, feature_key="text", label_key="label")

<br>

In [2]:
import spacy
import re
from functools import lru_cache

# use your preferred embedding for the task
nlp = spacy.load("en_core_web_md")

# raw data (str in this case) -> np.array
@lru_cache(maxsize=int(1e+4))
def vectorizer(text):
    clean_text = re.sub(r"[\s]+", r" ", str(text))
    return nlp(clean_text, disable=nlp.pipe_names).vector

text = dataset.dfs["raw"]().loc[0, "text"]
vec = vectorizer(text)
print(f"Text: {text}")
print(f"Vector shape: {vec.shape}")

Text:  There has been NO hard info provided about MSG making people ill. That's the point, after all.   That's because these "peer-reviewed" studies are not addressing the effects of MSG in people, they're looking at animal models. You can't walk away from this and start ranting about gloom and doom as if there were any documented deleterious health effects demonstrated in humans.  Note that I wouldn't have any argument with a statement like "noting that animal administration has pro- duced the following [blah, blah], we must be careful about its use in humans."  This is precisely NOT what you said.   It most certainly is for neurotoxicology.  You know, studies of glutamate involve more than "food science".   So, point us to the studies in humans, please.  I'm familiar with the literature, and I've never seen any which relate at all to Olney's work in animals and the effects of glutamate on neurons.   Well, actually, they HAVE to tolerate some phenylalanine; it's a essential amino acid

<br>

In [3]:
# any kwargs will be passed onto the corresponding reduction
# for umap: https://umap-learn.readthedocs.io/en/latest/parameters.html
# for ivis: https://bering-ivis.readthedocs.io/en/latest/api.html
reducer = dataset.compute_nd_embedding(vectorizer, "umap", dimension=2)

Vectorizing:   0%|          | 0/951 [00:00<?, ?it/s]

Vectorizing:   4%|▍         | 37/951 [00:00<00:02, 368.38it/s]

Vectorizing:   8%|▊         | 74/951 [00:00<00:02, 365.62it/s]

Vectorizing:  13%|█▎        | 128/951 [00:00<00:01, 427.16it/s]

Vectorizing:  18%|█▊        | 171/951 [00:00<00:02, 336.21it/s]

Vectorizing:  23%|██▎       | 217/951 [00:00<00:01, 371.46it/s]

Vectorizing:  27%|██▋       | 257/951 [00:00<00:02, 333.06it/s]

Vectorizing:  31%|███       | 296/951 [00:00<00:01, 338.89it/s]

Vectorizing:  38%|███▊      | 362/951 [00:00<00:01, 403.08it/s]

Vectorizing:  45%|████▌     | 429/951 [00:01<00:01, 474.80it/s]

Vectorizing:  53%|█████▎    | 502/951 [00:01<00:00, 543.51it/s]

Vectorizing:  59%|█████▉    | 559/951 [00:01<00:00, 532.72it/s]

Vectorizing:  65%|██████▍   | 614/951 [00:01<00:00, 459.21it/s]

Vectorizing:  72%|███████▏  | 686/951 [00:01<00:00, 524.64it/s]

Vectorizing:  78%|███████▊  | 742/951 [00:01<00:00, 432.95it/s]

Vectorizing:  85%|████████▌ | 811/951 [00:01<00:00, 489.15it/s]

Vectorizing:  94%|█████████▎| 890/951 [00:01<00:00, 553.06it/s]

Vectorizing: 100%|██████████| 951/951 [00:02<00:00, 474.08it/s]




<br>

## **Recipe-Specific Ingredient**

Each recipe has different functionalities and potentially different signature.

To utilize active learning, we need to specify how to get a model in the loop.

`hover` considers the `vectorizer` as a "frozen" embedding and follows up with a neural network, which infers its own dimensionality from the vectorizer and the output classes.

-   This architecture named [`VectorNet`](../../reference/core-neural/#hover.core.neural.VectorNet) is the (default) basis of active learning in `hover`.

-   <details open><summary>Custom models</summary>
    It is possible to use a model other than `VectorNet` or its subclass.

    You will need to implement the following methods with the same signatures as `VectorNet`:

    -   [`train`](../../reference/core-neural/#hover.core.neural.VectorNet.train)
    -   [`save`](../../reference/core-neural/#hover.core.neural.VectorNet.save)
    -   [`predict_proba`](../../reference/core-neural/#hover.core.neural.VectorNet.predict_proba)
    -   [`prepare_loader`](../../reference/core-neural/#hover.core.neural.VectorNet.prepare_loader)
    -   [`manifold_trajectory`](../../reference/core-neural/#hover.core.neural.VectorNet.manifold_trajectory)
</details>

In [4]:
from hover.core.neural import VectorNet
from hover.utils.common_nn import LogisticRegression

# Create a model with vectorizer-NN architecture.
# model.pt will point to a PyTorch state dict (to be created)
# the label classes in the dataset can change, and vecnet can adjust to that
vecnet = VectorNet(vectorizer, LogisticRegression, "model.pt", dataset.classes)

# predict_proba accepts individual strings or list
# text -> vector -> class probabilities
# if no classes right now, will see an empty list
print(vecnet.predict_proba(text))
print(vecnet.predict_proba([text]))

[5.6972964e-05 3.9335554e-03 3.4759217e-04 3.7174271e-03 3.1787364e-04
 3.7430620e-04 1.7271064e-03 2.3209106e-03 1.6812733e-04 4.4767815e-04
 6.4497599e-03 2.6048366e-03 5.1584811e-06 3.4962548e-04 6.4243010e-05
 9.6696687e-01 2.6635025e-04 6.0568131e-03 1.1987500e-03 2.6260128e-03]
[[5.6972964e-05 3.9335554e-03 3.4759217e-04 3.7174271e-03 3.1787364e-04
  3.7430620e-04 1.7271064e-03 2.3209106e-03 1.6812733e-04 4.4767815e-04
  6.4497599e-03 2.6048366e-03 5.1584811e-06 3.4962548e-04 6.4243010e-05
  9.6696687e-01 2.6635025e-04 6.0568131e-03 1.1987500e-03 2.6260128e-03]]


Note how the callback dynamically takes `dataset.classes`, which means the model architecture will adapt when we add classes during annotation.

## :sparkles: **Apply Labels**

Now we invoke the `active_learning` recipe.

-   <details open><summary>Tips: how recipes work programmatically</summary>
    In general, a `recipe` is a function taking a `SupervisableDataset` and other arguments based on its functionality.

    Here are a few common recipes:

    === "active_learning"

        ::: hover.recipes.experimental.active_learning
            rendering:
              show_root_heading: false
              show_root_toc_entry: false

    === "simple_annotator"

        ::: hover.recipes.stable.simple_annotator
            rendering:
              show_root_heading: false
              show_root_toc_entry: false

    === "linked_annotator"

        ::: hover.recipes.stable.linked_annotator
            rendering:
              show_root_heading: false
              show_root_toc_entry: false

    The recipe returns a `handle` function which `bokeh` can use to visualize an annotation interface in multiple settings.
</details>

In [5]:
from hover.recipes.experimental import active_learning

interactive_plot = active_learning(dataset, vecnet)

# ---------- NOTEBOOK MODE: for your actual Jupyter environment ---------
# this code will render the entire plot in Jupyter
# from bokeh.io import show, output_notebook
# output_notebook()
# show(interactive_plot, notebook_url='https://localhost:8888')

-   <details open><summary>Tips: annotation interface with multiple plots</summary>
    <details open><summary>Video guide: leveraging linked selection</summary>
        <iframe width="560" height="315" src="https://www.youtube.com/embed/TIwBlCH9YHw" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

    </details>

    <details open><summary>Video guide: active learning</summary>
        <iframe width="560" height="315" src="https://www.youtube.com/embed/hRIn3r7ovQ8" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

    </details>

    <details open><summary>Text guide: active learning</summary>
        Inspecting model predictions allows us to

        -   get an idea of how the current set of annotations will likely teach the model.
        -   locate the most valuable samples for further annotation.
    </details>

</details>