# Space Biology Knowledge Engine - Similarity Search

This notebook implements vector similarity search using FAISS for fast retrieval.

## Objectives:
1. Set up FAISS index for embeddings
2. Implement similarity search algorithms
3. Evaluate search performance
4. Prepare search functionality for the API


In [1]:
# Cell 1: Imports
import pandas as pd
import numpy as np
import faiss
import json

from sentence_transformers import SentenceTransformer





In [2]:
# Cell 2: Load saved embeddings + metadata + FAISS index
embeddings = np.load("datasets/embeddings.npy")

with open("datasets/metadata.json", "r", encoding="utf-8") as f:
    metadata = [json.loads(line) for line in f]

df = pd.DataFrame(metadata)

index = faiss.read_index("datasets/faiss_index.idx")

print("âœ… Data loaded")
print("Embeddings shape:", embeddings.shape)
print("Metadata records:", len(df))





âœ… Data loaded
Embeddings shape: (624, 384)
Metadata records: 624


In [3]:
# Cell 3: Load same model used for embeddings
MODEL_NAME = "all-MiniLM-L6-v2"   # must match previous notebook
model = SentenceTransformer(MODEL_NAME)
print("âœ… Model loaded:", MODEL_NAME)


âœ… Model loaded: all-MiniLM-L6-v2


In [4]:
# Cell 4: Define search function
def search(query: str, k: int = 5):
    query_vec = model.encode([query]).astype("float32")
    distances, indices = index.search(query_vec, k=k)

    results = []
    for i, idx in enumerate(indices[0]):
        record = df.iloc[idx]
        results.append({
            "rank": i+1,
            "title": record.get("title", ""),
            "authors": record.get("authors", ""),
            "year": record.get("year", ""),
            "abstract": record.get("clean_text", "")[:300] + "...",
            "score": float(distances[0][i])
        })
    return results


In [5]:
# Cell 5: Test search
query = "effects of radiation on human cells in space"
results = search(query, k=5)

print("ðŸ”Ž Query:", query, "\n")
for r in results:
    print(f"{r['rank']}. {r['title']} ({r['year']}) - score={r['score']:.4f}")
    print(f"   Authors: {r['authors']}")
    print(f"   Abstract: {r['abstract']}")
    print()


ðŸ”Ž Query: effects of radiation on human cells in space 

1. Interplay of space radiation and microgravity in DNA damage and DNA damage response. () - score=0.6432
   Authors: 
   Abstract: Interplay of space radiation and microgravity in DNA damage and DNA damage response....

2. The individual and combined effects of spaceflight radiation and microgravity on biologic systems and functional outcomes () - score=0.6663
   Authors: 
   Abstract: The individual and combined effects of spaceflight radiation and microgravity on biologic systems and functional outcomes...

3. Dose- and Ion-Dependent Effects in the Oxidative Stress Response to Space-Like Radiation Exposure in the Skeletal System () - score=0.7162
   Authors: 
   Abstract: Dose- and Ion-Dependent Effects in the Oxidative Stress Response to Space-Like Radiation Exposure in the Skeletal System...

4. Simultaneous exposure of cultured human lymphoblastic cells to simulated microgravity and radiation increases chromosome aberrati

In [6]:
# Cell 6: (Optional) Simple interactive search
while True:
    query = input("Enter search query (or 'exit'): ")
    if query.lower() == "exit":
        break
    results = search(query, k=5)
    print("\nResults:\n")
    for r in results:
        print(f"{r['rank']}. {r['title']} ({r['year']}) - score={r['score']:.4f}")
        print(f"   Authors: {r['authors']}")
        print(f"   Abstract: {r['abstract']}")
        print()



Results:

1. Exploration of space to achieve scientific breakthroughs () - score=0.9496
   Authors: 
   Abstract: Exploration of space to achieve scientific breakthroughs...

2. To Infinity and Beyond! Human spaceflight and life science. () - score=1.0266
   Authors: 
   Abstract: To Infinity and Beyond! Human spaceflight and life science....

3. Astronaut omics and the impact of space on the human body at scale () - score=1.0925
   Authors: 
   Abstract: Astronaut omics and the impact of space on the human body at scale...

4. Yeast in Space () - score=1.1287
   Authors: 
   Abstract: Yeast in Space...

5. NASA open science data repository: Open science for life in space. () - score=1.1766
   Authors: 
   Abstract: NASA open science data repository: Open science for life in space....


Results:

1. Astronaut omics and the impact of space on the human body at scale () - score=1.4212
   Authors: 
   Abstract: Astronaut omics and the impact of space on the human body at scale...

2. To 