In [2]:
from IPython.display import display, Markdown

# Update the main title and intro
display(Markdown("""
# Food Security Recommender RAG System

This notebook implements a Retrieval Augmented Generation (RAG) system that provides recommendations and insights about food security based on expert documents. It uses:
- PDF documents containing food security research and best practices
- LangChain for RAG implementation
- OpenAI's GPT model for generating informed, contextual responses
- Vector storage for semantic search

The system will:
1. Load and process food security documents from PDF
2. Create embeddings for semantic search
3. Take user queries about food security challenges
4. Provide evidence-based recommendations grounded in research
"""))


# Food Security Recommender RAG System

This notebook implements a Retrieval Augmented Generation (RAG) system that provides recommendations and insights about food security based on expert documents. It uses:
- PDF documents containing food security research and best practices
- LangChain for RAG implementation
- OpenAI's GPT model for generating informed, contextual responses
- Vector storage for semantic search

The system will:
1. Load and process food security documents from PDF
2. Create embeddings for semantic search
3. Take user queries about food security challenges
4. Provide evidence-based recommendations grounded in research


## 1. Setup and Dependencies

First, we'll import the required libraries:
- `PyPDF2` for reading PDF files
- `langchain` components for RAG
- `dotenv` for environment variables
- Custom prompt templates for detailed responses

In [3]:
from __future__ import annotations

import os
from typing import List

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.vectorstores import InMemoryVectorStore

# Load environment variables
load_dotenv()

# Configure API key
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise RuntimeError(
        "OPENAI_API_KEY not found in .env file"
    )

## 2. Load and Process Text Document

We'll now:
1. Load the food security documents from PDF files
2. Split them into manageable chunks
3. Create Document objects for the RAG system

The text splitter is configured to preserve context while creating chunks that are small enough for effective retrieval.

In [5]:
# Load text file
import glob
from langchain.document_loaders import PyPDFLoader

# Find all PDF files in the current directory (or specify your path)
pdf_files = glob.glob("data/*.pdf")

# Load all PDFs and concatenate their documents
all_documents = []
for pdf_file in pdf_files:
    loader = PyPDFLoader(pdf_file)
    docs = loader.load()
    all_documents.extend(docs)

documents = all_documents
print(f"Loaded {len(documents)} documents from {len(pdf_files)} PDFs")

print('Document:', documents)

# Configure text splitter for optimal chunk size
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
    length_function=len,
    separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""]
)

# Split documents into chunks
chunks = text_splitter.split_documents(documents)
print(f"Split text into {len(chunks)} chunks")

# Preview first chunk
if chunks:
    print("\nSample chunk content:")
    print("=" * 40)
    print(chunks[0].page_content[:200], "...")

Loaded 602 documents from 2 PDFs
Split text into 5438 chunks

Sample chunk content:
FOOD SECURITY  AND NUTRITION IN THE WORLD
THE STATE OF 
URBANIZATION, AGRIFOOD SYSTEMS 
TRANSFORMATION AND HEALTHY DIETS 
ACROSS THE RURAL–URBAN CONTINUUM
2023 ...


## 3. Create Vector Store

Now we'll:
1. Initialize the embeddings model
2. Create embeddings for all chunks
3. Store them in a vector store for similarity search

We'll use OpenAI's embedding model for high-quality semantic search.

In [6]:
# Initialize the embedding model
embeddings_model = OpenAIEmbeddings(
    model="text-embedding-3-large",  # Use a standard embedding model name
    openai_api_key=api_key,
    openai_api_base="https://api.openai.com/v1",
)

# Create embeddings and store them in a vector store
vectorstore = InMemoryVectorStore.from_documents(
    chunks,
    embeddings_model
)

## 4. Create Response Generator

We'll create a chain that:
1. Takes a user's food security query
2. Retrieves relevant research and recommendations
3. Generates an informed, practical response

The prompt is designed to:
- Address the specific food security challenge
- Ground advice in research and best practices
- Offer actionable recommendations
- Provide context-specific solutions

In [7]:
# Initialize the LLM
llm = ChatOpenAI(
    model="gpt-5-mini",  # or another suitable model
)

# Create a thoughtful prompt template
template = """You are a food security expert chatbot. Given the user's question and the food security documentation provided below, 
provide evidence-based recommendations and practical solutions. Focus on actionable advice that can help address the specific 
food security challenge while considering local context and resource constraints.

Context from food security documents:
{context}

User's question: {question}

Your response should:
1. Acknowledge the specific food security challenge
2. Share relevant research findings or best practices from the documents
3. Provide practical, implementable solutions
4. Consider resource constraints and local context
5. Suggest next steps or resources for further assistance

Response:"""

prompt = ChatPromptTemplate.from_template(template)

def get_food_security_guidance(question: str, num_chunks: int = 3) -> str:
    """Generate food security recommendations based on provided documents."""
    # Retrieve relevant passages
    context_chunks = vectorstore.similarity_search(question, k=num_chunks)
    context = "\n\n".join(doc.page_content for doc in context_chunks)
    
    # Generate response
    chain = prompt | llm | StrOutputParser()
    response = chain.invoke({"context": context, "question": question})
    return response

## 5. Interactive Food Security Advisor

Now you can use the cell below to ask questions about food security challenges. The system will:
1. Find relevant research and recommendations
2. Generate practical, evidence-based advice
3. Provide actionable solutions

Try asking about topics like:
- Improving crop yields
- Food storage solutions
- Sustainable farming practices
- Community food programs
- Climate-resilient agriculture

In [8]:
# Interactive cell for seeking guidance
question = "List major Food insecurity reason in 2024"  # Replace with your question
print("Your question:", question)
print("\nFood Security Recommendations:")
print("=" * 60)
print(get_food_security_guidance(question))

Your question: List major Food insecurity reason in 2024

Food Security Recommendations:
Acknowledgement
- You’re asking about the main causes of food insecurity in 2024. This is a complex, multi‑dimensional problem affecting households, markets and national economies — and solutions must match the main drivers in each context.

Major reasons for food insecurity in 2024 (evidence‑based)
1. Conflict and insecurity
   - Active conflict disrupts production, markets and humanitarian access and forces displacement, driving acute food insecurity.
2. Climate extremes and variability
   - Droughts, floods, heatwaves and shifting seasons reduce crop and livestock yields and destroy infrastructure.
3. Economic downturns, inflation and loss of purchasing power
   - High food prices and reduced incomes make nutritious foods unaffordable for many households.
4. Lack of access to and unaffordability of nutritious foods (and unhealthy food environments)
   - Limited physical and economic access to di