### 1. Define Collection Descriptions
We define a dictionary `COLLECTION_DESCRIPTIONS` that maps each Qdrant collection name to a natural language description of its content (e.g., 'cs_ai_full' -> 'general artificial intelligence research'). These descriptions are the basis for the semantic routing.

In [15]:
COLLECTION_DESCRIPTIONS = {
    "cs_ai_full": "general artificial intelligence research, AI systems, AI safety",
    "ML_collection": "machine learning theory, supervised and unsupervised learning",
    "DL_collection": "deep learning, neural networks, transformers",
    "cv_collection": "computer vision, image processing, object detection",
    "nlp_collection": "natural language processing, text, LLMs",
    "RL_collection": "reinforcement learning, agents, robotics control",
    "other_cs": "miscellaneous computer science research"
}


### 2. Initialize Client & Model
Initializes the Qdrant client and loads the `sentence-transformers/all-MiniLM-L6-v2` model. This model will be used for both query encoding and routing.

In [None]:
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
import numpy as np
import time

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

client = QdrantClient(
    url="Enter your api URL***",
    api_key="Enter your Api key***"
)


### 3. Compute Collection Centroids (Router Initialization)
Here we create the 'Router'. We encode the natural language descriptions of each collection into vector embeddings. These vectors act as the 'centroids' or semantic anchors for each domain.

In [17]:
from sentence_transformers import SentenceTransformer
import numpy as np

router_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

collection_vectors = {
    name: router_model.encode(desc, normalize_embeddings=True)
    for name, desc in COLLECTION_DESCRIPTIONS.items()
}


### 4. Define Basic Routing Function
The `route_query` function takes a user query, encodes it, and computes the dot product (similarity) with each collection's description vector. It returns the top-k most similar collections.

In [18]:
def route_query(query, top_k=2):
    q_vec = router_model.encode(query, normalize_embeddings=True)

    scores = {
        name: float(np.dot(q_vec, vec))
        for name, vec in collection_vectors.items()
    }

    return sorted(scores.items(), key=lambda x: x[1], reverse=True)[:top_k]


### 5. Define Routed Search (v1)
This is the first version of the search logic:
1.  **Route**: Determine the best collections for the query.
2.  **Search**: Query only those selected collections in Qdrant.
3.  **Merge & Rank**: Combine results from different collections and re-rank them based on a weighted score (70% vector similarity + 30% routing score).

In [None]:
import time
from qdrant_client import QdrantClient

client = QdrantClient(
    url="Enter your api URL***",
    api_key="Enter your Api key***"
    
)

retrieval_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

def routed_search(query, limit=5):
    start = time.time()
    routed_collections = route_query(query)
    query_vector = retrieval_model.encode(query).tolist()

    all_hits = []

    for collection, routing_score in routed_collections:
        result = client.query_points(
            collection_name=collection,
            query=query_vector,
            limit=limit,
            with_payload=True
        )

        for hit in result.points:
            hit.payload["router_score"] = routing_score
            hit.payload["vector_score"] = hit.score
            hit.payload["collection"] = collection
            all_hits.append(hit.payload)

    latency = time.time() - start

    # Final re-ranking
    all_hits.sort(
        key=lambda x: 0.7 * x["vector_score"] + 0.3 * x["router_score"],
        reverse=True
    )

    return all_hits[:limit], latency


### 6. Test Basic Routing
Tests the `routed_search` function with a sample query ("research gaps in reinforcement learning for robotics") and prints the results to verify that it correctly picks relevant collections (likely RL and Robotics).

In [20]:
query = "research gaps in reinforcement learning for robotics"

results, latency = routed_search(query)

print(f"Latency: {latency:.3f}s\n")

for i, r in enumerate(results, 1):
    print(f"--- Result {i} ---")
    print("Title:", r.get("title"))
    print("Year:", r.get("publication_year"))
    print("Citations:", r.get("citation_count"))
    print("Collection:", r.get("collection"))
    print()


Latency: 1.686s

--- Result 1 ---
Title: Advancements in Reinforcement Learning Techniques for Robotics
Year: 2025
Citations: 5
Collection: RL_collection

--- Result 2 ---
Title: Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes
Year: 2025
Citations: 32
Collection: RL_collection

--- Result 3 ---
Title: Comprehensive Review of Robotics Operating System-Based Reinforcement Learning in Robotics
Year: 2025
Citations: 7
Collection: RL_collection

--- Result 4 ---
Title: Reinforcement learning approaches in the motion systems of autonomous underwater vehicles
Year: 2025
Citations: 3
Collection: RL_collection

--- Result 5 ---
Title: Advancements in Reinforcement Learning Techniques for Robotics
Year: 2025
Citations: 5
Collection: cs_ai_full



### 7. Imports (Iterative Refinement)
(Re-importing libraries for a fresh block of code - likely a new iteration or section).

In [32]:
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
import numpy as np
import time
import math


### 8. Redefine Collections
(Redefining the `COLLECTIONS` dictionary, possibly with slightly refined descriptions for better routing accuracy).

In [35]:
COLLECTIONS = {
    "cs_ai_full": "general artificial intelligence",
    "ML_collection": "machine learning algorithms models",
    "dl_collection": "deep learning neural networks",
    "cv_collection": "computer vision image video",
    "nlp_collection": "natural language processing text",
    "RL_collection": "reinforcement learning agents robotics",
    "other_cs": "computer science miscellaneous"
}


### 9. Re-initialize Client
(Re-initializing the Qdrant client and model. In a clean notebook, this might be redundant, but useful for standalone execution of this block).

In [None]:
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
import time
import numpy as np

url="Enter your api URL***",
Api_key="Enter your Api key***"

client = QdrantClient(url=url, api_key=Api_key)
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")


### 10. Initialize Graph Weights
Here we introduce the **Graph Optimization** layer.
*   `collection_weights`: A dictionary to store the learned importance of each collection (initialized to 1.0).
*   `ALPHA`: The learning rate (0.15) which determines how fast the system adapts to new patterns.

In [37]:
from collections import defaultdict

# Graph weights
collection_weights = defaultdict(lambda: 1.0)

# Learning rate
ALPHA = 0.15


### 11. Define Weighted Routing
The `route_collections` function is updated to include the graph weights:
`weighted_score = semantic_similarity * collection_weight`
This allows the system to favor collections that have historically performed well, even if the semantic match is slightly lower.

In [38]:
def route_collections(query, top_k=3):
    q_vec = model.encode(query)

    scores = []
    for collection, anchor in COLLECTIONS.items():
        anchor_vec = model.encode(anchor)
        sim = np.dot(q_vec, anchor_vec) / (
            np.linalg.norm(q_vec) * np.linalg.norm(anchor_vec)
        )
        weighted_score = sim * collection_weights[collection]
        scores.append((collection, weighted_score))

    scores.sort(key=lambda x: x[1], reverse=True)
    return scores[:top_k]


### 12. Define Routed Search with Graph Learning (v2)
This version includes the feedback loop:
*   Performs the search in selected collections.
*   **Graph Update**: Increases the weight of the selected collections by `ALPHA` (`collection_weights[collection] += ALPHA`). This reinforces the usage of useful collections.

In [39]:
def routed_search(query, limit=5):
    start = time.time()
    routed = route_collections(query)

    results = []
    q_vec = model.encode(query).tolist()

    for collection, score in routed:
        hits = client.query_points(
            collection_name=collection,
            query=q_vec,
            limit=limit,
            with_payload=True
        )

        for h in hits.points:
            results.append({
                "collection": collection,
                "score": h.score,
                "payload": h.payload
            })

        # üî• GRAPH LEARNING
        collection_weights[collection] += ALPHA

    latency = time.time() - start
    return results, latency


### 13. Test Graph Learning
Tests the graph-optimized search. You should observe that after running this, the weights for the relevant collections (e.g., RL, ML) will increase.

In [41]:
query = "research gaps in reinforcement and Machine learning for robotics"

results, latency = routed_search(query)

print(f"Latency: {latency:.3f}s\n")

for r in results[:5]:
    p = r["payload"]
    print("Collection:", r["collection"])
    print("Score:", round(r["score"], 3))
    print("Title:", p.get("title"))
    print("Year:", p.get("publication_year"))
    print("Citations:", p.get("citation_count"))
    print("DOI:", p.get("doi"))
    print("-"*60)


Latency: 0.779s

Collection: RL_collection
Score: 0.713
Title: Advancements in Reinforcement Learning Techniques for Robotics
Year: 2025
Citations: 5
DOI: https://doi.org/10.2139/ssrn.5199660
------------------------------------------------------------
Collection: RL_collection
Score: 0.617
Title: Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes
Year: 2025
Citations: 32
DOI: https://doi.org/10.1609/aaai.v39i27.35095
------------------------------------------------------------
Collection: RL_collection
Score: 0.586
Title: Comprehensive Review of Robotics Operating System-Based Reinforcement Learning in Robotics
Year: 2025
Citations: 7
DOI: https://doi.org/10.3390/app15041840
------------------------------------------------------------
Collection: RL_collection
Score: 0.586
Title: Combining Convolutional Neural Networks with Reinforcement Learning for Autonomous Robotics
Year: 2025
Citations: 1
DOI: https://doi.org/10.1109/icpct64145.2025.10940484
--------------

### 14. Redefine Collections (Again)
(Another iteration of collection definitions, this time using a nested dictionary structure `{'domain': '...'}`).

In [42]:
COLLECTIONS = {
    "cs_ai_full": {"domain": "AI"},
    "cv_collection": {"domain": "Computer Vision"},
    "dl_collection": {"domain": "Deep Learning"},
    "ML_collection": {"domain": "Machine Learning"},
    "nlp_collection": {"domain": "NLP"},
    "RL_collection": {"domain": "Reinforcement Learning"},
    "other_cs": {"domain": "Other CS"}
}


### 15. Initialize Complex Graph State
Defines a more complex state for each node in the graph, tracking:
*   `weight`: Adaptive importance.
*   `frequency`: How often it's queried.
*   `last_used`: Timestamp for decay calculations.

In [43]:
import time

graph_state = {
    name: {
        "weight": 1.0,        # adaptive importance
        "frequency": 0,       # how often queried
        "last_used": 0.0
    }
    for name in COLLECTIONS
}


### 16. Re-initialize Client
(Standard setup for this code block).

In [None]:
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
import time

url="Enter your api URL***",
api_key="Enter your Api key***"

client = QdrantClient(
    url=url,   # ‚ö†Ô∏è MUST be string, not tuple
    api_key=api_key
)

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")


### 17. Define State-Based Routing
Routing logic that uses the complex `graph_state` dictionary. Currently, it sorts primarily by `weight`.

In [45]:
def route_collections(query, top_k=3):
    query_vec = model.encode(query)

    scored = []
    for collection, state in graph_state.items():
        score = state["weight"]
        scored.append((collection, score))

    scored.sort(key=lambda x: x[1], reverse=True)
    return [c[0] for c in scored[:top_k]]


### 18. Define Search with Frequency & Decay
Updates the search logic to track usage statistics:
*   Increments `frequency`.
*   Increases `weight`.
*   Updates `last_used` timestamp.
This enables more advanced features like time-based decay (forgetting old trends).

In [46]:
def routed_search(query, limit=5):
    start = time.time()
    query_vector = model.encode(query).tolist()

    selected_collections = route_collections(query)

    all_results = []

    for collection in selected_collections:
        results = client.query_points(
            collection_name=collection,
            query=query_vector,
            limit=limit,
            with_payload=True
        )

        # Update graph memory
        graph_state[collection]["frequency"] += 1
        graph_state[collection]["weight"] += 0.1
        graph_state[collection]["last_used"] = time.time()

        for p in results.points:
            all_results.append({
                "collection": collection,
                "score": p.score,
                "payload": p.payload
            })

    latency = time.time() - start
    return all_results, latency


### 19. Test State-Based Search
Executes a query to test the frequency/weight update mechanism.

In [48]:
query = "research gaps in reinforcement learning and Machine Learning for robotics"

results, latency = routed_search(query)

print(f"Latency: {latency:.3f}s\n")

for r in results[:5]:
    print("Collection:", r["collection"])
    print("Score:", round(r["score"], 3))
    print("Title:", r["payload"].get("title"))
    print("Year:", r["payload"].get("publication_year"))
    print("Citations:", r["payload"].get("citation_count"))
    print("-" * 60)


Latency: 0.489s

Collection: cs_ai_full
Score: 0.738
Title: Advancements in Reinforcement Learning Techniques for Robotics
Year: 2025
Citations: 5
------------------------------------------------------------
Collection: cs_ai_full
Score: 0.603
Title: Combining Convolutional Neural Networks with Reinforcement Learning for Autonomous Robotics
Year: 2025
Citations: 1
------------------------------------------------------------
Collection: cs_ai_full
Score: 0.591
Title: AI-Driven Intelligent Control Strategies for Industrial Robotics: A Reinforcement Learning Approach
Year: 2025
Citations: 2
------------------------------------------------------------
Collection: cs_ai_full
Score: 0.591
Title: Machine Learning and Robotics in Urban Planning and Management
Year: 2025
Citations: 1
------------------------------------------------------------
Collection: cs_ai_full
Score: 0.588
Title: A Comprehensive Review of Robotics Advancements Through Imitation Learning for Self-Learning Systems
Year: 202

### 20. Final Collection Profiles
Defines the most detailed descriptions yet for the collections, optimizing for semantic matching.

In [49]:
COLLECTION_PROFILES = {
    "RL_collection": "reinforcement learning, robotics, control, policy optimization, agents",
    "ML_collection": "machine learning algorithms, supervised learning, unsupervised learning",
    "dl_collection": "deep learning, neural networks, transformers, representation learning",
    "cv_collection": "computer vision, image processing, object detection",
    "nlp_collection": "natural language processing, language models, text mining",
    "cs_ai_full": "general artificial intelligence research",
    "other_cs": "theoretical computer science, systems, databases"
}


### 21. Master Router (Final Version)
Pre-computes embeddings for the detailed profiles. The `master_router` function computes pure semantic similarity between the query and these profiles.

In [50]:
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Precompute collection embeddings
COLLECTION_EMBEDDINGS = {
    name: model.encode(desc, normalize_embeddings=True)
    for name, desc in COLLECTION_PROFILES.items()
}

def master_router(query, top_k=3):
    q_vec = model.encode(query, normalize_embeddings=True)

    scores = []
    for col, emb in COLLECTION_EMBEDDINGS.items():
        sim = float(np.dot(q_vec, emb))
        scores.append((col, sim))

    scores.sort(key=lambda x: x[1], reverse=True)
    return scores[:top_k]


### 22. Collection Graph Class
Encapsulates the graph logic into a Python class `CollectionGraph`:
*   `score()`: Computes weight with time-based decay (collections not used recently lose importance).
*   `update()`: Adjusts weights based on search success (reward) or failure (penalty).

In [51]:
from collections import defaultdict
import time

class CollectionGraph:
    def __init__(self):
        self.weights = defaultdict(lambda: 1.0)
        self.last_used = defaultdict(lambda: time.time())

    def score(self, collection):
        decay = np.exp(-(time.time() - self.last_used[collection]) / 3600)
        return self.weights[collection] * decay

    def update(self, collection, success=True):
        if success:
            self.weights[collection] += 0.2
        else:
            self.weights[collection] *= 0.9
        self.last_used[collection] = time.time()

graph = CollectionGraph()


### 23. Hybrid Routing Logic
Combines the semantic score from the Master Router with the graph score:
`final_score = 0.7 * semantic_score + 0.3 * graph_score`
This balances content relevance with historical reliability.

In [52]:
def route_collections(query, top_k=3):
    candidates = master_router(query, top_k=top_k)

    ranked = []
    for col, semantic_score in candidates:
        graph_score = graph.score(col)
        final_score = 0.7 * semantic_score + 0.3 * graph_score
        ranked.append((col, final_score))

    ranked.sort(key=lambda x: x[1], reverse=True)
    return ranked


### 24. Helper Search Function
A simple wrapper to search a specific collection.

In [53]:
def search_collection(collection, query, limit=5):
    vector = model.encode(query).tolist()

    result = client.query_points(
        collection_name=collection,
        query=vector,
        limit=limit,
        with_payload=True
    )

    return result.points


### 25. Final Routed Search System
The complete pipeline:
1.  Get ranked collections from `route_collections` (Hybrid).
2.  Search them.
3.  **Feedback Loop**: If hits are found, call `graph.update(success=True)`. If not, `success=False`.
4.  Return merged results.

In [54]:
def routed_search(query, limit=5):
    routes = route_collections(query)

    all_results = []
    start = time.time()

    for collection, score in routes:
        hits = search_collection(collection, query, limit)

        if hits:
            graph.update(collection, success=True)
            for h in hits:
                h.payload["source_collection"] = collection
                all_results.append(h)
        else:
            graph.update(collection, success=False)

    latency = time.time() - start
    return all_results, latency


### 26. Redefine Collections (Final Check)
(Ensuring the collection list is consistent for the final run).

In [57]:
COLLECTIONS = {
    "cs_ai_full": "general artificial intelligence research",
    "ML_collection": "machine learning algorithms and theory",
    "dl_collection": "deep learning neural networks",
    "cv_collection": "computer vision and image processing",
    "nlp_collection": "natural language processing and text models",
    "RL_collection": "reinforcement learning and sequential decision making",
    "other_cs": "miscellaneous computer science research"
}


### 27. Master Router with Sklearn
Uses `sklearn.metrics.pairwise.cosine_similarity` for potentially faster or more standard similarity calculation.

In [58]:
import numpy as np
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Encode collection descriptions once
collection_names = list(COLLECTIONS.keys())
collection_descs = list(COLLECTIONS.values())
collection_vectors = model.encode(collection_descs)

def master_router(query, top_k=3):
    q_vec = model.encode([query])
    sims = cosine_similarity(q_vec, collection_vectors)[0]

    ranked = sorted(
        zip(collection_names, sims),
        key=lambda x: x[1],
        reverse=True
    )

    return ranked[:top_k]


### 28. Graph with Edges (Co-occurrence)
Introduces **Edge Weights**:
*   `node_weight`: Importance of a single collection.
*   `edge_weight`: Strength of connection between two collections (e.g., if AI and ML are often queried together).
*   `update_graph`: Updates both node and edge weights.
*   `apply_decay`: Global decay to prevent weights from growing indefinitely.

In [59]:
import math
from collections import defaultdict

# Node weights (search success)
node_weight = defaultdict(lambda: 1.0)

# Edge weights (co-occurrence)
edge_weight = defaultdict(lambda: defaultdict(float))

DECAY = 0.95

def update_graph(primary, secondary):
    node_weight[primary] += 1
    edge_weight[primary][secondary] += 0.5
    edge_weight[secondary][primary] += 0.5

def apply_decay():
    for k in node_weight:
        node_weight[k] *= DECAY


### 29. Graph Re-ranking
Re-ranks the router's output by boosting scores based on the node weights (`log(weight + 1)`).

In [60]:
def graph_rerank(router_output):
    scored = []
    for col, score in router_output:
        graph_boost = math.log(node_weight[col] + 1)
        final_score = score + 0.15 * graph_boost
        scored.append((col, final_score))

    return sorted(scored, key=lambda x: x[1], reverse=True)


### 30. Final Client Setup
(Final initialization).

In [None]:
from qdrant_client import QdrantClient

client = QdrantClient(
    url="Enter your api URL***",
    api_key="Enter your Api key***"
)

def search_collection(collection, query_vector, limit=5):
    return client.query_points(
        collection_name=collection,
        query=query_vector,
        limit=limit,
        with_payload=True
    ).points


### 31. The Ultimate Search Function
Combines everything:
1.  `master_router`: Semantic match.
2.  `graph_rerank`: Adjust based on node weights.
3.  Search top collections.
4.  `update_graph`: Update node and edge weights based on the top results.
5.  `apply_decay`: Normalize weights.

In [62]:
import time

def routed_search(query, per_collection=5):
    start = time.time()

    router_out = master_router(query)
    ranked_cols = graph_rerank(router_out)

    q_vec = model.encode(query).tolist()
    all_results = []

    for col, _ in ranked_cols:
        hits = search_collection(col, q_vec, per_collection)
        for h in hits:
            h.payload["collection"] = col
            all_results.append(h)

    # Update graph using top hit
    if ranked_cols:
        update_graph(ranked_cols[0][0], ranked_cols[-1][0])
        apply_decay()

    latency = time.time() - start
    return all_results, latency


### 32. Final System Test
Runs the complete, graph-optimized, domain-aware retrieval system on a test query.

In [65]:
query = "research gaps in AI"

results, latency = routed_search(query)

print(f"Latency: {latency:.3f}s\n")

for r in results[:5]:
    print("Collection:", r.payload["collection"])
    print("Score:", round(r.score, 3))
    print("Title:", r.payload.get("title"))
    print("Year:", r.payload.get("publication_year"))
    print("Citations:", r.payload.get("citation_count"))
    print("-" * 60)


Latency: 0.933s

Collection: cs_ai_full
Score: 0.622
Title: A Perspective toward Generative-AI Policymaking (GAP)
Year: 2025
Citations: 1
------------------------------------------------------------
Collection: cs_ai_full
Score: 0.62
Title: AI for all: bridging data gaps in machine learning and health
Year: 2025
Citations: 1
------------------------------------------------------------
Collection: cs_ai_full
Score: 0.607
Title: The Rise of the Research Automaton: Science as process or product in the era of generative AI?
Year: 2025
Citations: 1
------------------------------------------------------------
Collection: cs_ai_full
Score: 0.598
Title: The Economics of Artificial Intelligence
Year: 2025
Citations: 1
------------------------------------------------------------
Collection: cs_ai_full
Score: 0.592
Title: How AI can achieve human-level intelligence: researchers call for change in tack
Year: 2025
Citations: 4
------------------------------------------------------------
