# 🍛 Dapoer-AI: Agentic Chatbot for Indonesian Food Recipes using Langchain and Gemini API

Dapoer-AI is an agentic chatbot built using Langchain and Gemini API to help users explore Indonesian food recipes. The chatbot can:
- Search recipes by title.
- Find dishes based on ingredients.
- Suggest recipes based on cooking methods.
- Recommend easy-to-cook dishes.
- Retrieve recipes using FAISS vector search and RAG (Retrieval-Augmented Generation).

## 🔗 Access the Dapoer-AI Application

- Streamlit App: [Dapoer-AI](https://dapoer-ai-audreynazhira.streamlit.app)  
- Dataset: [Indonesian_Food_Recipes.csv](https://raw.githubusercontent.com/audreeynr/dapoer-ai/refs/heads/main/data/Indonesian_Food_Recipes.csv)

# 1. Install Required Libraries

In [1]:
!pip install langchain langchain-google-genai langchain-community faiss-cpu google-generativeai



## 2. Import Libraries
We import essential libraries for:
- Data processing (`pandas`, `re`)
- Vector search (`FAISS`)
- LLM integration (`ChatGoogleGenerativeAI`, `GoogleGenerativeAIEmbeddings`)
- Document splitting and memory for Langchain agents

In [8]:
import pandas as pd
import re
import random
import google.generativeai as gemini
from langchain.vectorstores import FAISS
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema import Document
from langchain.agents import Tool, initialize_agent
from langchain.memory import ConversationBufferMemory
from google.colab import userdata

## 3. Configure Google Gemini API
We retrieve the Gemini API key from Colab's `userdata` and ensure the key is available before proceeding.


In [25]:
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
if not GOOGLE_API_KEY:
    print("Error: GEMINI API key tidak ditemukan.")
    exit()

## 4. Load and Clean the Dataset
We load the Indonesian food recipes dataset and clean it by:
- Dropping empty rows
- Removing duplicates


In [10]:
CSV_FILE_PATH = 'https://raw.githubusercontent.com/audreeynr/dapoer-ai/refs/heads/main/data/Indonesian_Food_Recipes.csv'
df = pd.read_csv(CSV_FILE_PATH)
df_cleaned = df.dropna(subset=['Title', 'Ingredients', 'Steps']).drop_duplicates()

## 5. Normalize Text Data
We preprocess the text by:
- Lowercasing all characters
- Removing punctuation
- Stripping unnecessary spaces

In [11]:
def normalize_text(text):
    if isinstance(text, str):
        text = text.lower()
        text = re.sub(r'[^a-z0-9\s]', '', text)
        text = re.sub(r'\s+', ' ', text).strip()
        return text
    return text

df_cleaned['Title_Normalized'] = df_cleaned['Title'].apply(normalize_text)
df_cleaned['Ingredients_Normalized'] = df_cleaned['Ingredients'].apply(normalize_text)
df_cleaned['Steps_Normalized'] = df_cleaned['Steps'].apply(normalize_text)

## 6. Format Recipe for Display
This function converts the recipe data into a structured text format to be displayed by the chatbot.

In [12]:
def format_recipe(row):
    bahan_raw = re.split(r'\n|--|,', row['Ingredients'])
    bahan_list = [b.strip().capitalize() for b in bahan_raw if b.strip()]
    bahan_md = "\n".join([f"- {b}" for b in bahan_list])
    langkah_md = row['Steps'].strip()
    return f"""🍽 {row['Title']}\n\nBahan-bahan:\n{bahan_md}\n\nLangkah Memasak:\n{langkah_md}"""

## 7. Create Agent Tools
We define 4 search tools to be used by the Langchain agent:
1. Search by title
2. Search by ingredients
3. Search by cooking method
4. Recommend easy recipes

In [13]:
def search_by_title(query):
    query_normalized = normalize_text(query)
    match_title = df_cleaned[df_cleaned['Title_Normalized'].str.contains(query_normalized)]
    if not match_title.empty:
        return format_recipe(match_title.iloc[0])
    return "Resep tidak ditemukan berdasarkan judul."

def search_by_ingredients(query):
    stopwords = {"masakan", "apa", "saja", "yang", "bisa", "dibuat", "dari", "menggunakan", "bahan", "resep"}
    prompt_lower = normalize_text(query)
    bahan_keywords = [w for w in prompt_lower.split() if w not in stopwords and len(w) > 2]

    if bahan_keywords:
        mask = df_cleaned['Ingredients_Normalized'].apply(lambda x: all(k in x for k in bahan_keywords))
        match_bahan = df_cleaned[mask]
        if not match_bahan.empty:
            hasil = match_bahan.head(5)['Title'].tolist()
            return "Masakan yang menggunakan bahan tersebut:\n- " + "\n- ".join(hasil)
    return "Tidak ditemukan masakan dengan bahan tersebut."

def search_by_method(query):
    prompt_lower = normalize_text(query)
    for metode in ['goreng', 'panggang', 'rebus', 'kukus']:
        if metode in prompt_lower:
            cocok = df_cleaned[df_cleaned['Steps_Normalized'].str.contains(metode)]
            if not cocok.empty:
                hasil = cocok.head(5)['Title'].tolist()
                return f"Masakan yang dimasak dengan cara {metode}:\n- " + "\n- ".join(hasil)
    return "Tidak ditemukan metode memasak yang cocok."

def recommend_easy_recipes(query):
    prompt_lower = normalize_text(query)
    if "mudah" in prompt_lower or "pemula" in prompt_lower:
        hasil = df_cleaned[df_cleaned['Steps'].str.len() < 300].head(5)['Title'].tolist()
        return "Rekomendasi masakan mudah:\n- " + "\n- ".join(hasil)
    return "Tidak ditemukan masakan mudah yang relevan."

## 8. Build FAISS Vectorstore
We build a FAISS vector index from the recipe dataset to enable fast vector-based retrieval.

In [32]:
def build_vectorstore(api_key):
    docs = []
    for _, row in df_cleaned.iterrows():
        content = f"Title: {row['Title']}\nIngredients: {row['Ingredients']}\nSteps: {row['Steps']}"
        docs.append(Document(page_content=content))

    splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=30)
    texts = splitter.split_documents(docs)

    embeddings = GoogleGenerativeAIEmbeddings(
        model="models/embedding-001",
        google_api_key=api_key
    )

    vectorstore = FAISS.from_documents(texts, embeddings)
    return vectorstore

## 9. Create RAG Search Function
This function uses FAISS to retrieve relevant recipes. If no result is found, it returns random recipes as a fallback.

In [33]:
def rag_search(api_key, query):
    vectorstore = build_vectorstore(api_key)
    retriever = vectorstore.as_retriever()
    docs = retriever.get_relevant_documents(query)

    if not docs:
        fallback_samples = df_cleaned.sample(5)
        fallback_response = "\n\n".join([
            f"{row['Title']}:\nBahan: {row['Ingredients']}\nLangkah: {row['Steps']}"
            for _, row in fallback_samples.iterrows()
        ])
        return f"Tidak ditemukan informasi yang relevan. Berikut beberapa rekomendasi masakan acak:\n\n{fallback_response}"

    return "\n\n".join([doc.page_content for doc in docs[:5]])

## 10. Create Langchain Agent
We create an agent using:
- 5 custom tools
- Gemini LLM (Google Generative AI)
- Conversation memory
- Zero-shot agent configuration

In [27]:
def create_agent(api_key):
    llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-flash",
        google_api_key=api_key,
        temperature=0.7
    )

    def rag_tool_func(query):
        return rag_search(api_key, query)

    tools = [
        Tool(name="SearchByTitle", func=search_by_title, description="Cari resep berdasarkan judul masakan."),
        Tool(name="SearchByIngredients", func=search_by_ingredients, description="Cari masakan berdasarkan bahan."),
        Tool(name="SearchByMethod", func=search_by_method, description="Cari masakan berdasarkan metode memasak."),
        Tool(name="RecommendEasyRecipes", func=recommend_easy_recipes, description="Rekomendasi masakan yang mudah dibuat."),
        Tool(name="RAGSearch", func=rag_tool_func, description="Cari informasi masakan menggunakan FAISS dan RAG dengan fallback rekomendasi acak.")
    ]

    memory = ConversationBufferMemory(memory_key="chat_history")

    agent = initialize_agent(
        tools=tools,
        llm=llm,
        agent="zero-shot-react-description",
        memory=memory,
        verbose=True
    )

    return agent

In [28]:
agent = create_agent(GOOGLE_API_KEY)

In [31]:
print("Selamat datang di Dapoer-AI 🍳! Kamu bisa tanya seputar resep masakan Indonesia.")
print("Tanyakan resep, bahan hingga ide masakan (atau ketik 'exit' untuk keluar).")

while True:
    user_input = input("\nKamu: ")
    if user_input.lower() in ["exit", "quit", "keluar"]:
        print("Terima kasih sudah menggunakan Dapoer-AI! 👋")
        break

    response = agent.invoke(user_input)
    print("\nDapoer-AI: ", response)
    print("\n" + "=" * 60 + "\n")

Selamat datang di Dapoer-AI 🍳! Kamu bisa tanya seputar resep masakan Indonesia.
Tanyakan resep, bahan hingga ide masakan (atau ketik 'exit' untuk keluar).

Kamu: tahu bacem


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mQuestion: tahu bacem
Thought: I need to find information about tahu bacem.  Since it's a specific dish name, searching by title is the most appropriate first step.
Action: SearchByTitle
Action Input: tahu bacem[0m
Observation: [36;1m[1;3m🍽 Tahu bacem sederhana

Bahan-bahan:
- 10 bh tahu
- 5 siung bawang merah
- 5 siung bawang putih
- 1 sdm ketumbar
- 1 bh kemiri
- Daun salam
- Daun jeruk
- Kecap
- Gula jawa
- Air

Langkah Memasak:
1) Haluskan bumbu. Bawang merah, bawang putih, ketumbar, kemiri.
2) Siapkan panci yang diisi air kira2 1,5 liter.
3) Masukkan tahu dan bumbu halus serta daun salam dan daun jeruk.
4) Masukkan gula jawa dan kecap. Jgn lupa tambah kan garam.
5) Aduk, koreksi rasa dan biarkan hingga air rebusan habis.
6) Angkat dan simpan di kulk