### Based on:
https://docs.llamaindex.ai/en/stable/presentations/materials/2024-02-28-rag-bootcamp-vector-institute/?h=rag

- Use this notebook if you want to use OpenAI's LLM and embeddings
- Here we use the pickled embeddings, so we save the cost there and only incur the LLM query cost

In [1]:
import os
import pickle
import nest_asyncio

nest_asyncio.apply()

In [2]:
USE_OPENAI = True

In [3]:
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding

if USE_OPENAI:
    Settings.llm = OpenAI(model="gpt-3.5-turbo", api_key=os.getenv('OPENAI_API_KEY'))
    Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
else:
    Settings.llm = Ollama(model="llama3:instruct")
    Settings.embed_model = OllamaEmbedding(
        model_name="llama3:instruct",
        base_url="http://localhost:11434",
        ollama_additional_kwargs={"mirostat": 0},
    )

In [4]:
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.vector_stores.qdrant import QdrantVectorStore
import qdrant_client

with open('models/openAI_idpp_metagpt_state/_nodes.pickle', 'rb') as handle:
    _nodes = pickle.load(handle)

In [5]:
client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="test_store")
_ = vector_store.add(_nodes)



In [6]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

In [7]:
retriever = index.as_retriever(similarity_top_k=2)
retrieved_nodes = retriever.retrieve("What did the president say about Justice Breyer?")

In [8]:
# to view the retrieved node
print (retrieved_nodes[0].text)
print ("================")
print (retrieved_nodes[1].text)

The most fundamental right in America is the right to vote and have it counted. And look, it’s under assault.

In state after state, new laws have been passed not only to suppress the vote — we’ve been there before — but to subvert the entire election. We can’t let this happen.

Tonight, I call on the Senate to pass — pass the Freedom to Vote Act. Pass the John Lewis Act — Voting Rights Act. And while you’re at it, pass the DISCLOSE Act so Americans know who is funding our elections.

Look, tonight, I’d — I’d like to honor someone who has dedicated his life to serve this country: Justice Breyer — an Army veteran, Constitutional scholar, retiring Justice of the United States Supreme Court.

Justice Breyer, thank you for your service. Thank you, thank you, thank you. I mean it. Get up. Stand — let me see you. Thank you.

And we all know — no matter what your ideology, we all know one of the most serious constitutional responsibilities a President has is nominating someone to serve on the

In [9]:
query_engine = index.as_query_engine(similarity_top_k=2)

In [10]:
response = query_engine.query("What were the models tried for predicting ALS progression in the idpp paper?.")
print (response)

The models tried for predicting ALS progression in the iDPP paper included a naive model that carried the last observed value forward, various Machine Learning algorithms for regression, and a Long Short-Term Memory (LSTM) neural network to model the temporal dependencies in the sequential sensor data.


In [11]:
response = query_engine.query("What was the validation strategy used by the authors in the idpp paper?.")
print (response)

The authors in the idpp paper utilized a nested k-fold cross-validation strategy for their validation process. This strategy involved two loops - an inner loop and an outer loop. In the inner loop, a test set containing 10% of complete patient data was set aside, while the remaining data underwent further k-fold cross-validation. This was adapted to ensure that each patient's complete set of observations was included in both the training and validation sets, preventing information leakage between different times of the patient's data. The outer loop repeated the same procedure on another test set covering another 10% of the patients, with a total of 10 iterations to compute RMSE for model selection. The best hyperparameters were chosen based on these iterations before fitting all the data for the final submission.


In [12]:
response = query_engine.query("How do agents share information with other agents in MetaGPT?")
print (response)

Agents in MetaGPT share information with other agents by utilizing a shared message pool. This shared message pool allows agents to publish structured messages and access messages from other agents directly. By storing information in this global message pool, agents can exchange information transparently without the need to inquire about other agents individually. Additionally, agents can retrieve required information from the shared pool, enhancing communication efficiency.
