# Task
I'd like to build a langchain pipeline that given a relevant user query
1. Invokes an arxiv_search tool that searches arxiv for papers related to keywords relevant to the user query
2. Downloads the top pdf and puts it into an ephemeral in-memory vector database (e.g. FAISS) using an embedding of your choice
3. Provides snippets from said pdf as input to the LLM for final response generation.

## Examples
1. "what's the mmlu value of reka core according to the paper on arxiv?"
  - Returns something like "The score is 83.2", or
  - "The MMLU score is not in the provided context" (if retrieval/arxiv search did not work -- which is perfectly fine for the sake of the exercise)
2. "tell me a joke"
  - No tool is invoked; just provide the plain llm response.

I kept some code around that starts a background task for ollama, and downloads Gemma2 9B as well as a generic sentence encoder for your convenience, but feel free to pick whatever you'd like.

Bonus: Install local phoenix tracing and share a trace.

## Overall Goal
- I don't care if the particular example queries actually work, I really just want to see the ~20 lines of relevant python code to illustrate the approach.
- Colab's free T4 GPU instances should work just fine, but let me know if you run into problems there.
- Please don't spend more than 30 minutes on it :)

# Setup

In [1]:
# !pip install langchain langchainhub langchain-community langchain-experimental langchain-huggingface --quiet
!pip install arxiv pymupdf --quiet
!pip install sentence_transformers --quiet
!pip install faiss-gpu --quiet
!pip install arize-phoenix --quiet
!apt install curl

import subprocess

ERROR: Could not find a version that satisfies the requirement faiss-gpu (from versions: none)
ERROR: No matching distribution found for faiss-gpu
'apt' is not recognized as an internal or external command,
operable program or batch file.


In [None]:
# !curl -fsSL https://ollama.com/install.sh | sh

In [None]:
# subprocess.Popen(["ollama", "serve"], start_new_session=True)

In [None]:
# import phoenix as px
session = px.launch_app()

In [None]:
# !ollama pull gemma2:9b

# Arxiv RAG chain

In [None]:
# import subprocess
!curl -fsSL https://ollama.com/install.sh | sh
import phoenix as px
session = px.launch_app()
!ollama pull gemma2:9b

In [None]:
# import arxiv
import langchain
import faiss
from langchain_experimental.llms.ollama_functions import OllamaFunctions
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores.faiss import FAISS
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from sentence_transformers import SentenceTransformer
import fitz
import requests
import numpy as np

In [None]:
# llm = OllamaFunctions(model="gemma2:9b", format="json", temperature=0)
embeddings_model_name = "sentence-transformers/all-mpnet-base-v2"
embedder = HuggingFaceEmbeddings(model_name=embeddings_model_name, model_kwargs={"device": "cuda"})

In [None]:
# # Define the function to search arXiv and get the top paper
def search_arxiv(query):
    search = arxiv.Search(query=query, max_results=1, sort_by=arxiv.SortCriterion.Relevance)
    paper = next(search.results())
    return paper.pdf_url

# Define the function to download and read the PDF
def download_pdf(pdf_url):
    response = requests.get(pdf_url)
    with open("temp.pdf", "wb") as f:
        f.write(response.content)
    doc = fitz.open("temp.pdf")
    text = ""
    for page in doc:
        text += page.get_text()
    return text

# Define the function to embed and store in FAISS
def embed_text_to_faiss(text, embedder):
    sentences = text.split(". ")
    embeddings = embedder.embed_documents(sentences)
    embeddings = np.array(embeddings).astype('float32')
    faiss_index = faiss.IndexFlatL2(embeddings.shape[1])
    faiss_index.add(embeddings)
    return faiss_index, sentences

# Define the function to query the FAISS index
def query_faiss(query, faiss_index, sentences, embedder):
    query_embedding = embedder.embed_query(query)
    query_embedding = np.array(query_embedding).astype('float32').reshape(1, -1)
    D, I = faiss_index.search(query_embedding, k=1)
    return sentences[I[0][0]]

In [None]:
# # Main function to handle the pipeline
def main_pipeline(query):
    if "arxiv" in query:
        pdf_url = search_arxiv(query)
        pdf_text = download_pdf(pdf_url)
        faiss_index, sentences = embed_text_to_faiss(pdf_text, embedder)
        response = query_faiss(query, faiss_index, sentences, embedder)
    else:
        response = llm(query)
    return response

In [None]:
# # Example
query = "what's the mmlu value of reka core according to the paper on arxiv?"
response = main_pipeline(query)
print(response)

In [None]:
# # OK I'll stop here.

In [None]:
# from langchain_experimental.llms.ollama_functions import OllamaFunctions
# from langchain.embeddings.huggingface import HuggingFaceEmbeddings
# # # Other langchain import...

In [None]:
# # llm = OllamaFunctions(model="gemma2:9b", format="json", temperature=0)
# 
# embeddings_model_name = "sentence-transformers/all-mpnet-base-v2"
# embeddings_model_kwargs = {"device": "cuda"}
# embedder = HuggingFaceEmbeddings(model_name=embeddings_model_name, model_kwargs=embeddings_model_kwargs)