# Analysis of Query-Target Difference Vectors in Spherical Coordinates

This notebook:
1. Takes a sample of queries and their most similar target vectors.
2. Converts each vector to spherical coordinates, then computes the difference as n - 1 angular differences (for n-dimensional vectors)
3. Apply k-means clustering for a range of k values, and plot the inertia and silhouette scores to identify optimal k.

If there is meaningful clustering, then 
1. for each cluster we can compute the average (spherical coordinates) difference vector. 
1. for new queries, if we can identify which cluster they belong to we can apply the corresponding difference vector. 

To verify this
1. Take another sample of queries
1. Apply their cluster's difference vector and perform similarity search
1. Compare results to original similarity search results

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from tqdm import tqdm
from citeline.helpers import coordinate_converters as coords
from citeline.database.milvusdb import MilvusDB
from citeline.embedders import Embedder

tqdm.pandas()

sample = pd.read_json("../data/dataset/nontrivial_100.jsonl", lines=True)
db = MilvusDB()
embedder = Embedder.create("Qwen/Qwen3-Embedding-0.6B", device="mps", normalize=True)
print(f"Sample length: {len(sample)}")
print(f"DB: {db}")
print(f"Embedder: {embedder}")

Sample length: 100
DB: <citeline.database.milvusdb.MilvusDB object at 0x128b55910>
Embedder: Qwen/Qwen3-Embedding-0.6B, device=mps, normalize=True, dim=1024


In [2]:
vectors = embedder(['sample one', 'sample two', 'is this the real life? Is this just fantasy?'])
spherical = np.array([coords.euclidean_to_spherical(v) for v in vectors])