In [None]:
import sys, subprocess
if "google.colab" in sys.modules:
    subprocess.run(["pip", "install", "-q", "pandas", "numpy", "scikit-learn", "requests", "pydantic", "jsonschema"])


# Embeddings Client Example

**What:** Generate local TF-IDF-based embeddings, compute similarity, and run a tiny search over sample texts.

**Why:** It offers a dependency-light stand-in for remote embeddings so you can design similarity workflows without API keys.

**How:** Run the install cell in Colab if needed, then execute cells. Embeddings here are numeric vectors made from text; cosine similarity shows how closely two texts align.

**You will learn:** How to produce and inspect embeddings, read similarity matrices, and retrieve top matchesâ€”all offline.

By the end of this notebook, you will have completed the listed steps and produced the outputs described in the success criteria.

### Success criteria
- You generated embeddings for sample texts.
- You computed similarity and inspected scores.
- You executed a simple search over the texts.

In [None]:
import sys
from pathlib import Path

repo_root = Path.cwd()
for candidate in [repo_root, repo_root.parent, repo_root.parent.parent]:
    if (candidate / "api" / "python" / "client_embeddings.py").exists():
        sys.path.append(str(candidate))
        break

from api.python.client_embeddings import EmbeddingsClient

texts = [
    "Synthetic research abstract about reproducibility.",
    "Notes on experimental design and treatment arms.",
    "Overview of responsible AI documentation practices.",
]

client = EmbeddingsClient(max_features=32)
embeddings = client.embed(texts)
embeddings.shape


## Pairwise similarity

In [None]:
similarity = client.similarity(texts)
similarity


### If you get stuck / What to try next

If you get stuck: rerun installs and ensure sample texts are defined. What to try next: connect embeddings to retrieval workflows or try the Streamlit Text Workflows page for qualitative checks.