# Retrieval Augmented Generation (RAG) application with LLMs

### Overview:
- Import the Required Libraries
- Load the LLM Model
    - Ollama
- Load the Embeddig model

- Extract and Process the document text

- Setup of vector data base 
    - Creating embeddings in vectorstore 
    - Similarity Search 
- Prompt Template
- Create a RAG chain conversation


## Import the Required Libraries

In [1]:
import torch

if torch.cuda.is_available():
    print(f"CUDA disponible. Using  GPU: {torch.cuda.get_device_name(0)}")
else:
    print("CUDA not disponible. Using  CPU.")

print(torch.__version__)
print(torch.cuda.is_available())
print(torch.version.cuda)

  from .autonotebook import tqdm as notebook_tqdm


CUDA disponible. Using  GPU: NVIDIA A100-SXM4-80GB
1.13.1+cu116
True
11.6


In [2]:
import os
from glob import glob 
import getpass
import warnings
warnings.filterwarnings('ignore')

In [3]:
from transformers import AutoTokenizer
import transformers

In [4]:
from langchain_ollama.llms import OllamaLLM
from langchain.llms import HuggingFacePipeline
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.docstore.document import Document

from langchain_community.embeddings.ollama import OllamaEmbeddings

from langchain.memory import ConversationBufferMemory
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate

In [5]:
import pdfplumber
from langchain_community.document_loaders import PyPDFLoader

AutoTokenizer. A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model.

## **Load the Llama Model Using Ollama**

https://ollama.com/library

The temperature parameter adjusts the randomness of the output. Higher values like 0.7 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

temperature value--> how creative we want our model to be

0 ---> temperature it means model is  very safe it is not taking any bets.

1 --> it will take risk it might generate wrong output but it is very creative

In [6]:
llm = OllamaLLM(model="llama3.2:3b", temperature = 0.2, max_new_tokens = 512, max_length=512,)

In [7]:
prompt = ''' 
        role: system,
        content: You are the IA assistent .
        question: What the prompt template structure used by llama3 model to infom the system and user prompt?
        '''
print(llm(prompt,truncation=True))

  print(llm(prompt,truncation=True))


The LLaMA 3 model uses a specific prompt template structure to inform both the system and the user's prompt. Here is an overview of the template:

1. **Context**: The context section provides information about the topic, task, or scenario that the model will be responding to.

2. **Task/Question**: This section specifies the task or question that the model needs to answer or respond to.

3. **Constraints**: Any additional constraints or limitations that need to be considered when generating a response are specified here.

4. **Options (optional)**: Some models may include options for the user to choose from, which can influence the system's response.

5. **User Prompt**: This section is where you provide your input as the user. The model will use this prompt to generate a response.

6. **System Prompt**: This section is not explicitly stated in the LLaMA 3 documentation but based on how the model works, it can be inferred that the system prompt would typically include information about

## Load the Embeddig model

#**Logged in with a Hugging Face account**

https://huggingface.co/docs/huggingface_hub/quick-start

In [8]:
# api token is disponible in hugginface site 
HUGGINGFACEHUB_API_TOKEN = "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"
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN
os.environ['HUGGING_FACE_HUB_API_KEY'] = HUGGINGFACEHUB_API_TOKEN #getpass.getpass('Hugging face api key:')

In [9]:
embeding_model='sentence-transformers/all-MiniLM-L6-v2'
embeddings = HuggingFaceEmbeddings(model_name=embeding_model)

  embeddings = HuggingFaceEmbeddings(model_name=embeding_model)


## Extract and Process the document text

**Read text from PDF**

In [10]:
def read_pdf(file_path):
    """Extracts and returns text from a PDF file as a single string."""
    with pdfplumber.open(file_path) as pdf:
        text = [page.extract_text() for page in pdf.pages if page.extract_text() is not None]
    return "\n".join(text)  # Join text from all pages into a single string

In [11]:
for name in glob('files/*'):
    print(name)

files/A CNN-based multi-target fast classification method for AR-SSVEP.pdf.pdf
files/Haim Azhari(auth) - Basics of Biomedical Ultrasou_241017_135112.pdf.pdf
files/filterbank_cca.pdf.pdf


In [12]:
pdf_text = read_pdf('files/Haim Azhari(auth) - Basics of Biomedical Ultrasou_241017_135112.pdf.pdf')

In [13]:
print(pdf_text[:100])

BASICS OF BIOMEDICAL
ULTRASOUND FOR
ENGINEERS
BASICS OF BIOMEDICAL
ULTRASOUND FOR
ENGINEERS
HAIM AZH


**make chunks from the all PDF text**

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [15]:
def get_text_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1500, chunk_overlap=200, length_function=len
        )
    chunks = text_splitter.split_text(text)
    return chunks

In [16]:
chunks = get_text_chunks(pdf_text)

In [17]:
chunks[0]

'BASICS OF BIOMEDICAL\nULTRASOUND FOR\nENGINEERS\nBASICS OF BIOMEDICAL\nULTRASOUND FOR\nENGINEERS\nHAIM AZHARI\nA JOHN WILEY & SONS, INC., PUBLICATION\nCopyright © 2010 John Wiley & Sons, Inc. All rights reserved.\nPublished by John Wiley & Sons, Inc., Hoboken, New Jersey\nPublished simultaneously in Canada\nCopyright for the Hebrew version of the book and distribution rights in Israel are held by\nMichlol, Inc.\nNo part of this publication may be reproduced, stored in a retrieval system, or transmitted in\nany form or by any means, electronic, mechanical, photocopying, recording, scanning, or\notherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright\nAct, without either the prior written permission of the Publisher, or authorization through\npayment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222\nRosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at\nwww.copyright.com. Requests to the Pu

In [18]:
chunk_documents = [Document(page_content=chunk) for chunk in chunks]

In [19]:
def get_text_chunks(text):
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1500,
        chunk_overlap=200,
        length_function=len
    )
    chunks = text_splitter.split_text(text)
    return chunks

## Setup of vector data base 

**Creating embeddings in vectorstore**

In [20]:
file_name =  'Basics of Biomedical Ultrasound'
data_path = os.path.join('db', file_name)
vectorstore = Chroma.from_documents(documents = chunk_documents, embedding=embeddings, persist_directory = data_path)
#vectorstore = Chroma.from_documents(documents = chunks, embedding=OllamaEmbeddings(model="llama3.2:3b"))

In [21]:
vectorstore.persist()

  vectorstore.persist()


In [22]:
# clear memory
del  chunks, chunk_documents, pdf_text

**Similarity Search**

In [23]:
query = "doppler"
docs = vectorstore.similarity_search(query, k=4)
len(docs)

4

## Prompt Template 

In [24]:
#<s>[INST] <<SYS>>
#{{ system_prompt }}
#<</SYS>>
#
#{{ user_message }} [/INST]

In [25]:
from langchain import HuggingFacePipeline, PromptTemplate


In [26]:
template = """

Use as seguintes partes do Context e o History para responder a pergunta feita por User.
Obrigatoriamente ao longo da sua resposta ou ao final informe como referência a página que contém o contéudo citado, ex: [pag.34].

Context: {context}
History: {history}

User: {question}
Chatbot:

"""

In [27]:
prompt = PromptTemplate(

    # Set input variables 
    input_variables=["history", "context", "question"],

    # Set template to the session state, template 
    template=template,
)

## Create a RAG chain conversation

In [28]:
def query_with_history(chain, question):
    chat_history = chain.memory.load_memory_variables({})    
    formatted_history = "\n".join([
        f"{msg.type.capitalize()}: {msg.content}" for msg in chat_history['history']
    ])
    

    if chat_history['history'] !=[]:
        complete_prompt = f"Contexto do histórico:\n{formatted_history}\n\nPergunta atual: {question}"
    else:
        complete_prompt = question
        
    response = chain({"query": complete_prompt})
    
    return response

In [29]:
def get_conversation_chain(vectorstore, llm, prompt):
    
    memory = ConversationBufferMemory(
        memory_key='chat_history', 
        max_memory=5,
        return_messages=True,
        output_key= "result")
    
    #memory = ConversationBufferWindowMemory(k=5)
    chain_type_kwargs = {"verbose": True, 'prompt': prompt}
    
    conversation_chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vectorstore.as_retriever(),
        return_source_documents=True,
        chain_type_kwargs = chain_type_kwargs,    
        memory = memory
    )
    
    return conversation_chain

### RAG History

In [30]:
def get_conversation_chain(vectorstore, llm, prompt):
    
        memory1 = ConversationBufferMemory(
            # Set params from input variables list
            memory_key="history",
            return_messages=True,
            input_key="question",
        )
        memory2 = ConversationBufferMemory(
            memory_key="history",
            return_messages=True,
            output_key="result",
        )
        
        conversation_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type='stuff',
            retriever=vectorstore.as_retriever(),
            return_source_documents=True,
            chain_type_kwargs={
                "verbose": True,
                "prompt": prompt,
                "memory": memory1,
            },
            memory = memory2
        )

        return conversation_chain

In [31]:
uploaded_file='Haim Azhari(auth) - Basics of Biomedical Ultrasou_241017_135112.pdf'
if not os.path.isfile("files/"+uploaded_file+".pdf"):
    print('true')

In [32]:
chat_pdf=get_conversation_chain(vectorstore, llm, prompt)
response=chat_pdf('Me fale sobre a formulação do efeito doppler')

  response=chat_pdf('Me fale sobre a formulação do efeito doppler')
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3m

Use as seguintes partes do Context e o History para responder a pergunta feita por User.
Obrigatoriamente ao longo da sua resposta ou ao final informe como referência a página que contém o contéudo citado, ex: [pag.34].

Context: 37. Levy Y and Azhari H , Velocity measurements using a single transmitted linear
frequency modulated chirp , Ultrasound Med Biol 33 ( 5 ): 768 – 773 , 2007 .
CHAPTER 11
DOPPLER IMAGING TECHNIQUES
Synopsis: In this chapter the basic principles used for measuring motion and fl ow
with ultrasonic waves are presented. First, the Doppler effect is introduced.
Then, it is explained how this effect can be utilized for measuring the temporal
fl ow speed profi le. The diffi culties associated with the computation of the spec-
tral Doppler shift are discussed, and numeric methods for rapid estimation of
fl ow velocity and variance are presented. Finally, the principles of color fl ow
mapping and duplex imaging are introduced.
11.

In [33]:
response.keys()

dict_keys(['query', 'history', 'result', 'source_documents'])

In [34]:
print(response['result'])

O efeito Doppler é uma consequência da mudança na frequência de um sinal sonoro quando ele é emitido por um objeto em movimento. A formulação do efeito Doppler pode ser explicada com base no contexto e na história apresentados.

A princípio, o efeito Doppler foi descoberto pelo físico austríaco Christian Johann Doppler em 1842, enquanto observava as estrelas que pareciam se mover em relação à Terra. Ele notou que a frequência da luz das estrelas era alterada quando elas estavam se aproximando ou afastando do observador.

A formulação matemática do efeito Doppler pode ser expressa como:

Δf = (2v / c) \* f

onde:

* Δf é a mudança na frequência
* v é a velocidade do objeto em movimento
* c é a velocidade da onda sonora no meio
* f é a frequência original do sinal sonoro

Essa formula mostra que a mudança na frequência é proporcional à velocidade do objeto e inversamente proporcional à velocidade da onda sonora.

No contexto da ultrassonografia, o efeito Doppler é utilizado para medir a 

In [35]:
print(response['source_documents'])

[Document(page_content='37. Levy Y and Azhari H , Velocity measurements using a single transmitted linear\nfrequency modulated chirp , Ultrasound Med Biol 33 ( 5 ): 768 – 773 , 2007 .\nCHAPTER 11\nDOPPLER IMAGING TECHNIQUES\nSynopsis: In this chapter the basic principles used for measuring motion and fl ow\nwith ultrasonic waves are presented. First, the Doppler effect is introduced.\nThen, it is explained how this effect can be utilized for measuring the temporal\nfl ow speed profi le. The diffi culties associated with the computation of the spec-\ntral Doppler shift are discussed, and numeric methods for rapid estimation of\nfl ow velocity and variance are presented. Finally, the principles of color fl ow\nmapping and duplex imaging are introduced.\n11.1 THE DOPPLER EFFECT\nOne of the prominent advantages of ultrasonic imaging is its ability to combine\nanatomical imaging with fl ow or tissue velocity mapping. This is an essential\ntool in cardiovascular diagnosis. The basis for thes

In [36]:
chat_pdf.memory.load_memory_variables({})

{'history': [HumanMessage(content='Me fale sobre a formulação do efeito doppler'),
  AIMessage(content='O efeito Doppler é uma consequência da mudança na frequência de um sinal sonoro quando ele é emitido por um objeto em movimento. A formulação do efeito Doppler pode ser explicada com base no contexto e na história apresentados.\n\nA princípio, o efeito Doppler foi descoberto pelo físico austríaco Christian Johann Doppler em 1842, enquanto observava as estrelas que pareciam se mover em relação à Terra. Ele notou que a frequência da luz das estrelas era alterada quando elas estavam se aproximando ou afastando do observador.\n\nA formulação matemática do efeito Doppler pode ser expressa como:\n\nΔf = (2v / c) \\* f\n\nonde:\n\n* Δf é a mudança na frequência\n* v é a velocidade do objeto em movimento\n* c é a velocidade da onda sonora no meio\n* f é a frequência original do sinal sonoro\n\nEssa formula mostra que a mudança na frequência é proporcional à velocidade do objeto e inversament

In [37]:
response=query_with_history(chat_pdf,'Explique passo a passo o equacionamento matemático do PW doppler')
print(response['result'])

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3m

Use as seguintes partes do Context e o History para responder a pergunta feita por User.
Obrigatoriamente ao longo da sua resposta ou ao final informe como referência a página que contém o contéudo citado, ex: [pag.34].

Context: Analytic Signal
e−j⋅2πf 0⋅t)
s(t) ∑ Q(t)
{ }
Hilbert
j
Figure 11.4. Block diagram depicting the process applied for obtaining the Q ( t ) signal
using the Hilbert transform [see Eq. (9.25)].
Q( t)=A′⋅e−βt2⋅ej⋅2πt⋅(Δf)
=A′⋅e−βt2⋅ej⋅2πt⋅⎛ ⎝⎜f02 Cv⎞
⎠⎟
(11.10)
where A ′ represents the amplitude of the signal after applying the fi lter.
As can be noted, the velocity of the target v whose value we seek is con-
tained within the second exponential term. In order to estimate the value of
v at a certain range X from the transducer, we need to sample the signal
0
around time point t = 2 X / C , where C is the average speed of sound in the
0 0
medium and the time is measured relative to the pulse transmission time.
Using the PRF a

In [38]:
response.keys()

dict_keys(['query', 'history', 'result', 'source_documents'])

In [39]:
response['history']

[HumanMessage(content='Me fale sobre a formulação do efeito doppler'),
 AIMessage(content='O efeito Doppler é uma consequência da mudança na frequência de um sinal sonoro quando ele é emitido por um objeto em movimento. A formulação do efeito Doppler pode ser explicada com base no contexto e na história apresentados.\n\nA princípio, o efeito Doppler foi descoberto pelo físico austríaco Christian Johann Doppler em 1842, enquanto observava as estrelas que pareciam se mover em relação à Terra. Ele notou que a frequência da luz das estrelas era alterada quando elas estavam se aproximando ou afastando do observador.\n\nA formulação matemática do efeito Doppler pode ser expressa como:\n\nΔf = (2v / c) \\* f\n\nonde:\n\n* Δf é a mudança na frequência\n* v é a velocidade do objeto em movimento\n* c é a velocidade da onda sonora no meio\n* f é a frequência original do sinal sonoro\n\nEssa formula mostra que a mudança na frequência é proporcional à velocidade do objeto e inversamente proporciona

In [40]:
response=query_with_history(chat_pdf,'Agora como seria uma aplicação em python disso')
print(response['result'])

Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")
Error in StdOutCallbackHandler.on_chain_start callback: AttributeError("'NoneType' object has no attribute 'get'")


Prompt after formatting:
[32;1m[1;3m

Use as seguintes partes do Context e o History para responder a pergunta feita por User.
Obrigatoriamente ao longo da sua resposta ou ao final informe como referência a página que contém o contéudo citado, ex: [pag.34].

Context: Analytic Signal
e−j⋅2πf 0⋅t)
s(t) ∑ Q(t)
{ }
Hilbert
j
Figure 11.4. Block diagram depicting the process applied for obtaining the Q ( t ) signal
using the Hilbert transform [see Eq. (9.25)].
Q( t)=A′⋅e−βt2⋅ej⋅2πt⋅(Δf)
=A′⋅e−βt2⋅ej⋅2πt⋅⎛ ⎝⎜f02 Cv⎞
⎠⎟
(11.10)
where A ′ represents the amplitude of the signal after applying the fi lter.
As can be noted, the velocity of the target v whose value we seek is con-
tained within the second exponential term. In order to estimate the value of
v at a certain range X from the transducer, we need to sample the signal
0
around time point t = 2 X / C , where C is the average speed of sound in the
0 0
medium and the time is measured relative to the pulse transmission time.
Using the PRF a