# Anchors Notebook

This notebook walks through how anchors work in ICAT ...

Anchors represent the features that the underlying model in ICAT uses as a basis for training and prediction. An anchor essentially defines some function that when provided a piece of 

## Dictionary anchors

The most basic anchor that comes pre-implemented in ICAT is a dictionary anchor, which essentially just represents an exact keyword search, or a Bag of Words (BoW) feature. You provide one or more comma separated keywords to search for, and the more times any of those words appear in a particular piece of text, the stronger the output value for that text.

To demonstrate the anchors, we'll first load in the 20 newsgroups dataset and just grab the top few rows to look at them

For simple testing purposes we define a tiny asdf-inspired dataframe of text samples

In [None]:
import pandas as pd

rows = [
    {"text": "They said I could never teach a llama to drive!"},
    {"text": "I like trains"},
    {"text": "No llama, no!"},
    {"text": "You are a chair, darling."},
    {"text": "Beep. Beep. I'm a sheep. I said beep beep I'm a sheep."},
    {"text": "Hey kid, you can't skate here!"},
    {"text": "Ow, hey, muffin man do you ever run out of muffins?"},
    {"text": "I'm going to punch your face. IN THE FACE."},
    {"text": "Oh boy a pie, what flavor?"},
    {"text": "PIE FLAVOR."},
    {"text": "Joey did you eat my sandwich?"},
    {"text": "I am your sandwich."},
]
df = pd.DataFrame(rows)
df

In [None]:
from icat import DictionaryAnchor

sandwich_anchor = DictionaryAnchor(text_col="text")

In [None]:
sandwich_anchor.widget

In [None]:
sandwich_anchor.featurize(df)

In [None]:
sandwich_anchor.keywords = ["sandwich"]

In [None]:
sandwich_anchor.featurize(df)

In [None]:
sandwich_anchor.weight = 3.0

In [None]:
sandwich_anchor.featurize(df)

## Similarity anchors

A similarity anchor returns an output strength based on some algorithm's similarity score output between a given piece of text and some target text, e.g. the cosine similarity between TF-IDF vectors. While a TF-IDF based vector comes implemented directly into ICAT, you can provide a list of one or more custom similarity function definitions to the model constructor, which will show up as options in the similarity anchor UI. This allows for providing arbitrary algorithms such as using a transformer and calculating distances between transformer embeddings. See the other notebook [...] for an example of how to do this

## Custom anchors