## Install Required Packages
First, install the necessary packages. OpenAI's Python client library and any specific embedding-related library (like langchain) should be installed.

In [24]:
# ! pip install langchain lancedb openai
# ! pip install langchain-community
# !pip install requests pypdf
# ! pip install PyPDF2
# ! pip install rank_bm25
# ! pip install tiktoken

 ## Set Up Your API Key
In Google Colab, you can set your API key by directly assigning it in the notebook or using environment variables. For security, it's best practice to avoid hardcoding sensitive information in your code
Set the API Key Using Environment Variables in Cola


In [7]:
import openai
from langchain.embeddings import OpenAIEmbeddings

# Directly set your API key here
openai_api_key = 'your_key'

## Access the Environment Variable in Your Code:


In [8]:
from langchain.vectorstores import LanceDB
import lancedb
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader


# Initialize embeddings for semantic search
embedding = OpenAIEmbeddings(openai_api_key=openai_api_key)


## Download the PDF
Before we start let's download required pdfs.

In [9]:
import requests
import time

def download_pdf(url, save_path, retries=3):
    attempt = 0
    while attempt < retries:
        try:
            response = requests.get(url, stream=True)
            response.raise_for_status()  # Check if the download was successful
            with open(save_path, 'wb') as file:
                for chunk in response.iter_content(chunk_size=8192):
                    file.write(chunk)
            print(f"Downloaded PDF from {url} to {save_path}")
            return True
        except requests.exceptions.RequestException as e:
            attempt += 1
            print(f"Error downloading PDF (attempt {attempt} of {retries}): {e}")
            if attempt < retries:
                time.sleep(5)  # Wait before retrying
    return False

# Example URL and file path
pdf_url = "https://pdf.usaid.gov/pdf_docs/PA00TBCT.pdf"
pdf_path = "/content/Food_and_Nutrition.pdf"

# Download the PDF
if not download_pdf(pdf_url, pdf_path):
    raise Exception("Failed to download PDF after multiple attempts")


Downloaded PDF from https://pdf.usaid.gov/pdf_docs/PA00TBCT.pdf to /content/Food_and_Nutrition.pdf


## Load and Split the PDF
Use PyPDFLoader to load and split the PDF into pages.

In [10]:
from langchain.document_loaders import PyPDFLoader

# Load documents
loader = PyPDFLoader("Food_and_Nutrition.pdf")
pages = loader.load_and_split()


## Initialize the BM25 Retriever
Set up the BM25 retriever to fetch top results.

In [11]:
from langchain.retrievers import BM25Retriever

# Initialize the BM25 retriever
bm25_retriever = BM25Retriever.from_documents(pages)
bm25_retriever.k = 2  # Retrieve top 2 results using BM25


## Create LanceDB Vector Store for Semantic Search
Connect to LanceDB and create a table for storing embeddings.

In [14]:
import lancedb

# Create lancedb vector store for semantic search
db = lancedb.connect('lancedb')
table = db.create_table("pandas_docs", data=[
    {"vector": embedding.embed_query("Hello World"), "text": "Hello World", "id": "1"}
], mode="overwrite")


## Initialize LanceDB Retriever
Set up the LanceDB retriever for semantic search.

In [20]:
from langchain.vectorstores import LanceDB
from lancedb.db import LanceDBConnection
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.document_loaders import PyPDFLoader

# Initialize embeddings for semantic search

# Establish connection to the LanceDB database
# Replace 'your_database_path' with the actual path to your LanceDB database
connection = LanceDBConnection('lancedb')

# Assume `pages` is a list of Document objects loaded previously
# Initialize LanceDB retriever
docsearch = LanceDB.from_documents(pages, embedding, connection=connection)

# Create a retriever using the LanceDB vector store
retriever_lancedb = docsearch.as_retriever(search_kwargs={"k": 2})


## Initialize the Ensemble Retriever
Combine the BM25 and LanceDB retrievers with specified weights.

In [21]:
from langchain.retrievers import EnsembleRetriever

# Initialize the ensemble retriever with weights
ensemble_retriever = EnsembleRetriever(retrievers=[bm25_retriever, retriever_lancedb], weights=[0.4, 0.6])


## Retrieve Relevant Documents
Perform a query and retrieve relevant documents using the ensemble retriever.

In [23]:
query = "Lorem ipsum dolor sit amet"

# Retrieve relevant documents
docs = ensemble_retriever.get_relevant_documents(query)

# Print retrieved documents
for doc in docs:
    print(doc.page_content)

MINISTRY OF AGRICULTURE,
ANIMAL INDUSTRY AND FISHERIES
P.O. Box 102 ENTEBBE - UGANDA
www.agriculture.go.ug
Food and Nutrition Handbook for Extension Workers72
Picture 18: A back yard garden and small animals and local chickens
Food and Nutrition Handbook for Extension Workers25• Shortage of iodine decreases IQ and causes a productivity loss.
• Farmers with low literacy levels are less likely to adopt improved 
agricultural practices hence leading to poor agricultural production 
and productivity.
• People with low literacy levels are bound to have poor health seek -
ing behaviours and access to quality health services.
• Mothers with low education level are likely to follow poor feeding 
practices hence affecting the nutritional and health status of family 
members.
• Contributes to poverty.
• Cost of treating illnesses attributable to malnutrition.
• Cost of caring for sick.
• Lost care for other (not sick) household members.
b)	Consequences 	of	overnutrition
Malnutrition can lead to 