# Retrieval Augmented Generation

This is just a sneak peek for a possible future workshop.

Using LangChain and Llama Index we will build a vector database from a pdf file. 
This, paired with an LLM from huggingface allows to "chat" with a pdf.

LangChain is a library with a very high level of abstraction. This allows to construct a "talk with pdf" LLM in just a few lines, with the disadvantange that one looses a lot of control over the process.

In [1]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import torch
system_prompt = "You are helpful assistant."
query_wrapper_prompt = "<|USER|>{query_str}<|ASSISTANT|>"
from llama_index.llms import HuggingFaceLLM
llm = HuggingFaceLLM(
    context_window=2048,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.1, "do_sample": True},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="monology/openinstruct-mistral-7b",
    model_name="monology/openinstruct-mistral-7b",
    device_map="cuda:0",
    tokenizer_kwargs={"max_length": 2048},
    model_kwargs={"torch_dtype": torch.bfloat16}

)

Loading checkpoint shards: 100%|████████████████████████| 3/3 [00:12<00:00,  4.00s/it]


In [3]:
from llama_index import  ServiceContext
service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embed_model
)

In [4]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

In [5]:
query_engine = index.as_query_engine()
response = query_engine.query("What kind of model is TensorNet?")
print(response)

Token indices sequence length is longer than the specified maximum sequence length for this model (2065 > 2048). Running this sequence through the model will result in indexing errors
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


TensorNet is a message-passing model that uses Cartesian tensors and their irreducible representations.


In [6]:
response = query_engine.query("What layers is TensorNet composed of?")
print(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


The TensorNet model is composed of an initial embedding module, the concatenation of several interaction layers, and the output module.


In [7]:
response = query_engine.query("What is the energy accuracy of TensorNet in the QM9 dataset?")
print(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


The energy accuracy of TensorNet in the QM9 dataset is 3.9 meV, which is state-of-the-art.
