<a href="https://colab.research.google.com/github/amanmehra-23/RP_RecommenderSystem/blob/main/Test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# 1) Install dependencies (run once)
!pip install faiss-cpu sentence-transformers


Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=1.11.0->sentence-transformers)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_6

In [None]:
# 2) Imports & MODEL path
import os
import pandas as pd
import numpy as np
import faiss
from sentence_transformers import SentenceTransformer

MODELS_DIR = "/content/"   # adjust if you uploaded elsewhere

# 3) Load your data, index, embeddings, and SBERT model
df = pd.read_pickle(os.path.join(MODELS_DIR, "dataset.pkl"))
embeddings = np.load(os.path.join(MODELS_DIR, "embeddings.npy"))
index = faiss.read_index(os.path.join(MODELS_DIR, "faiss_index.bin"))
sbert = SentenceTransformer("all-mpnet-base-v2", device="cpu")

# 4) Helper: build combo, embed, search, and return a DataFrame
def get_recommendations(terms: str, title: str, abstract: str, k: int = 5):
    # build query string
    combo = "  ||  ".join([
        terms.replace(",", " "),
        title,
        abstract
    ]).strip()
    # embed + normalize
    q_emb = sbert.encode([combo], convert_to_tensor=False)
    q_emb = q_emb / np.linalg.norm(q_emb, axis=1, keepdims=True)
    # FAISS search
    D, I = index.search(q_emb.astype(np.float32), k)
    # assemble results
    recs = []
    for score, idx in zip(D[0], I[0]):
        recs.append({
            "score": float(score),
            "terms": df.at[idx, "terms"],
            "title": df.at[idx, "titles"],
            "abstract": df.at[idx, "abstracts"][:200] + "…"  # truncate
        })
    return pd.DataFrame(recs)

# 5) Example usage
results = get_recommendations(
    terms="cs.CL",
    title="Attention Is All You Need",
    abstract="The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.",
    k=5
)
print(results)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

      score                          terms  \
0  0.773035           ['cs.LG', 'stat.ML']   
1  0.762186  ['cs.LG', 'cs.CL', 'stat.ML']   
2  0.761946           ['cs.LG', 'stat.ML']   
3  0.761946           ['cs.LG', 'stat.ML']   
4  0.761946           ['cs.LG', 'stat.ML']   

                                               title  \
0                            Agglomerative Attention   
1   Augmenting Self-attention with Persistent Memory   
2  I-BERT: Inductive Generalization of Transforme...   
3  I-BERT: Inductive Generalization of Transforme...   
4  I-BERT: Inductive Generalization of Transforme...   

                                            abstract  
0  Neural networks using transformer-based archit...  
1  Transformer networks have lead to important pr...  
2  Self-attention has emerged as a vital componen...  
3  Self-attention has emerged as a vital componen...  
4  Self-attention has emerged as a vital componen...  


In [None]:
df = pd.DataFrame(results)

In [None]:
df.head(5)

Unnamed: 0,score,terms,title,abstract
0,0.773035,"['cs.LG', 'stat.ML']",Agglomerative Attention,Neural networks using transformer-based archit...
1,0.762186,"['cs.LG', 'cs.CL', 'stat.ML']",Augmenting Self-attention with Persistent Memory,Transformer networks have lead to important pr...
2,0.761946,"['cs.LG', 'stat.ML']",I-BERT: Inductive Generalization of Transforme...,Self-attention has emerged as a vital componen...
3,0.761946,"['cs.LG', 'stat.ML']",I-BERT: Inductive Generalization of Transforme...,Self-attention has emerged as a vital componen...
4,0.761946,"['cs.LG', 'stat.ML']",I-BERT: Inductive Generalization of Transforme...,Self-attention has emerged as a vital componen...
