# RAG Simples com Gemini e Llama Index 

Experimento simples para construção de uma RAG com um artigo científico utilizando Llama Index e Gemini.


**Referência do Artigo**

Larsson, D.G.J., Flach, CF. Antibiotic resistance in the environment. Nat Rev Microbiol 20, 257–269 (2022). https://doi.org/10.1038/s41579-021-00649-x

Disponível no link: https://rdcu.be/dUn5n


Link para documentação dos parâmetros da Gemini: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values?hl=pt-br


In [None]:
# Bibliotecas utilizadas

!pip install google-generativeai
!pip install llama-index
!pip install llama-index-llms-gemini
!pip install llama-index-embeddings-huggingface
!pip install python-dotenv

In [1]:
import os
import google.generativeai as genai
from dotenv import load_dotenv

load_dotenv()  

genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))

In [2]:
for m in genai.list_models():
    if "generateContent" in m.supported_generation_methods:
        print(m.name)

models/gemini-1.0-pro-latest
models/gemini-1.0-pro
models/gemini-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-001
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-pro-exp-0801
models/gemini-1.5-pro-exp-0827
models/gemini-1.5-flash-latest
models/gemini-1.5-flash-001
models/gemini-1.5-flash-001-tuning
models/gemini-1.5-flash
models/gemini-1.5-flash-exp-0827
models/gemini-1.5-flash-8b-exp-0827
models/gemini-1.5-flash-8b-exp-0924
models/gemini-1.5-flash-002


In [3]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader(
    input_files=["pubmed_document.pdf"]
).load_data()

In [4]:
print(type(documents), "\n")
print(len(documents), "\n")
print(type(documents[0]), "\n")
print(documents[0])

<class 'list'> 

13 

<class 'llama_index.core.schema.Document'> 

Doc ID: 430a9553-cc29-4a1c-b0b2-f9041d631917
Text: 0123456789();: Many bacterial species evolved the ability to
tolerate  antibiotics long before humans started to mass-produce  them
to prevent and treat infectious diseases1,2. Isolated  caves2,
permafrost cores1, and other environments and  specimens that have
been preserved from anthropo-genic bacterial contamination 3,4 can
provide insights  ...


## Basic RAG Pipeline

In [5]:
from llama_index.core import VectorStoreIndex
from llama_index.llms.gemini import Gemini
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.node_parser import TokenTextSplitter


llm  = Gemini(model="models/gemini-1.5-pro", temperature=0.3, top_p=1, top_k=32)
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)
splitter = TokenTextSplitter(chunk_size=1024, chunk_overlap=20)


# global settings
Settings.llm = llm
Settings.embed_model = embed_model
Settings.text_splitter = text_splitter

index = VectorStoreIndex.from_documents(documents, show_progress=True)

query_engine = index.as_query_engine()

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Parsing nodes:   0%|          | 0/13 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/31 [00:00<?, ?it/s]

In [6]:
response = query_engine.query(
    "What are the factors that contribute to the development and spread of antibiotic resistance??"
)
print(str(response))

The evolution of antibiotic resistance can stem from mutations within a bacterium's genome or the acquisition of foreign DNA. The extensive genetic diversity of external environments, such as water and soil, provides a vast pool of genes that pathogens could potentially acquire to counteract antibiotics. 

The spread of antibiotic resistance is facilitated by the mobility of antibiotic resistance genes (ARGs). This mobility is often achieved through association with genetic elements like insertion sequences, gene cassettes, plasmids, and integrative conjugative elements. Environments characterized by high metabolic activity, extensive cell-to-cell contact (like biofilms), and the presence of fecal bacteria can accelerate the mobilization and transfer of ARGs. 

Furthermore, the presence of antibiotics, while not essential for the occurrence of these processes, can significantly increase their rates. Understanding the specific environments and conditions that accelerate the evolution an

In [7]:
response_2 = query_engine.query(
    "Explain the problem of pollution."
)
print(str(response_2))

Pollution from antibiotic manufacturing is a major problem, especially in low- and middle-income countries, where production costs are lower and waste management is often insufficient. Excessive emissions of antibiotic residues from manufacturing have been reported, posing significant risks to human health and the environment. 

Addressing this issue is crucial, as resistant bacteria can easily spread across borders, impacting everyone in the long run. While some pharmaceutical companies have voluntarily set emission targets, the lack of transparency regarding production sites and emission levels makes it difficult to assess progress. Therefore, active intervention from public institutions is necessary to enforce stricter pollution control measures and ensure the effectiveness of mitigation efforts. 



In [8]:
response_3 = query_engine.query(
    "How the environment contributes to the problem of antibiotic resistance??"
)
print(str(response_3))

The environment contains a vast and diverse microbiome, which serves as a reservoir for antibiotic resistance genes (ARGs). These ARGs can be transferred to bacteria that infect humans and animals, making infections harder to treat.  While the overuse of antibiotics in healthcare and agriculture is a major driver of resistance, the environment plays a significant role in the evolution and spread of these resistant bacteria. 

