## RAG with Local Models and LangChain

This example demonstrates how to implement RAG with a local [Phi-3-Mini-4K-Instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-q4.gguf) model, a 3.8B parameter language model with a 4K token context length. In addition, this model has been quantized to compress its size into 2.2 GB.

In [15]:
%pip install -q langchain langchain_community llama-cpp-python langchain-huggingface faiss-cpu

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


### Load Language Model

**Downloading this model requires 2.2 GB!**

In [None]:
!wget -q https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-q4.gguf

In [7]:
from langchain import LlamaCpp

llm = LlamaCpp(
    model_path="Phi-3-mini-4k-instruct-q4.gguf",
    n_gpu_layers=-1,
    max_tokens=500,
    n_ctx=2048,
    seed=42,
    verbose=False
)

llama_new_context_with_model: n_batch is less than GGML_KQ_MASK_PAD - increasing to 32
llama_new_context_with_model: n_ctx_per_seq (2048) < n_ctx_train (4096) -- the full capacity of the model will not be utilized


### Load Embedding Model

In [10]:
from langchain_huggingface import HuggingFaceEmbeddings

embedding_model = HuggingFaceEmbeddings(
    model_name='all-MiniLM-L6-v2'
)

  from .autonotebook import tqdm as notebook_tqdm


### Load Documents

In [None]:
text = """
Legumes offer a powerful natural solution for carbon sequestration while providing multiple environmental benefits. These remarkable plants can store 30% more soil organic carbon (SOC) compared to other species, primarily due to their unique nitrogen-fixing abilities.

Carbon Sequestration Mechanism

Legumes sequester carbon through both above-ground biomass (leaves, stems, branches) and below-ground systems (roots, root exudates). Their root systems work in partnership with beneficial soil microorganisms, producing a sticky substance called glycoprotein that binds soil particles together, creating stable aggregates that can store carbon for thousands of years.

Environmental Benefits

Reduced Emissions

Legume crops emit 5-7 times less greenhouse gases per unit area compared to other crops. They also decrease the need for chemical nitrogen fertilizers, thereby reducing the carbon emissions associated with fertilizer manufacturing.

Soil Enhancement

Beyond carbon storage, legumes improve:
- Soil fertility and structure
- Water retention capacity
- Ecosystem resilience
- Agricultural productivity5
- Future Potential

According to World Bank predictions, pulse crops alone could sequester up to 33.0 Mg/ha of carbon in Asia by 20304. This significant potential for carbon storage, combined with their ability to enhance soil health and reduce greenhouse gas emissions, makes legumes an essential tool in the fight against climate change."""

# Split into a list of sentences
docs = text.split('.')

# Clean up to remove empty spaces and new lines
docs = [t.strip(' \n') for t in docs]

### Local Vector Database

In [16]:
from langchain.vectorstores import FAISS

db = FAISS.from_texts(docs, embedding_model)

### RAG Pipeline

In [17]:
from langchain import PromptTemplate
from langchain.chains import RetrievalQA


template = """<|user|>
Relevant information:
{context}

Provide a concise answer the following question using the relevant information provided above:
{question}<|end|>
<|assistant|>"""
prompt = PromptTemplate(
    template=template,
    input_variables=["context", "question"]
)

rag = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type='stuff',
    retriever=db.as_retriever(),
    chain_type_kwargs={
        "prompt": prompt
    },
    verbose=True
)

In [19]:
rag.invoke('what are the environmental benefits of growing leguments?')



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


{'query': 'what are the environmental benefits of growing leguments?',
 'result': ' Growing legumes offers significant environmental benefits such as reduced greenhouse gas emissions, carbon sequestration, improved soil health, and decreased need for chemical nitrogen fertilizers. They emit 5-7 times less greenhouse gases per unit area compared to other crops, thus aiding in climate change mitigation efforts. Their natural ability to store carbon while enhancing soil health makes them essential in environmental sustainability strategies.'}

----