In [None]:
import sys, subprocess
if "google.colab" in sys.modules:
    subprocess.run(["pip", "install", "-q", "pandas", "numpy", "scikit-learn", "requests", "pydantic", "jsonschema"])


# Embeddings Client Example

**What this notebook does:** Uses the local TF-IDF-based `EmbeddingsClient` to generate vectors, compute similarity matrices, and run simple search over synthetic text.

**Why it matters:** Provides a dependency-light stand-in for remote embeddings so you can design similarity workflows without API keys.

**How to use it:**
1. Run locally or in Colab.
2. Execute cells to embed sample texts, view shapes, and inspect pairwise similarity; adjust `max_features` or inputs to match your corpus.

**Expected outcome:** Numeric embeddings, similarity scores, and top-k search results demonstrating baseline text similarity patterns.

In [None]:
import sys
from pathlib import Path

repo_root = Path.cwd()
for candidate in [repo_root, repo_root.parent, repo_root.parent.parent]:
    if (candidate / "api" / "python" / "client_embeddings.py").exists():
        sys.path.append(str(candidate))
        break

from api.python.client_embeddings import EmbeddingsClient

texts = [
    "Synthetic research abstract about reproducibility.",
    "Notes on experimental design and treatment arms.",
    "Overview of responsible AI documentation practices.",
]

client = EmbeddingsClient(max_features=32)
embeddings = client.embed(texts)
embeddings.shape


## Pairwise similarity

In [None]:
similarity = client.similarity(texts)
similarity


### If you get stuck / What to try next

If you get stuck: rerun installs and ensure sample texts are defined. What to try next: connect embeddings to retrieval workflows or try the Streamlit Text Workflows page for qualitative checks.