# 📖 Introduction

This is a notebook that summarizes some of the knowledge gained through the **5-day Gen AI Intensive Course with Google** in a form of the **Gen AI Intensive Course Capstone 2025Q1**.  

## It is a refactored approach to use **LangChain** capabilities where possible.

### TODO: verify list below when finished

It implements the intelligent chef assistant bot, whose main capabilities are:
* selection of proper cookbook based on users suggestion
* suggestion of a recipe eg. based on available ingredients
* dummy ordering of ingredients

The **gen AI capabilities** used in the notebook are:  
✅ Embeddings  
✅ Few shot prompting  
✅ Structured output/JSON mode/controlled generation  
✅ Retrieval augmented generation (RAG)  
✅ Vector search/vector store/vector database   
✅ Agents with LangGraph

# ⚒ Installation and setup

In [1]:
!pip uninstall -qqy kfp jupyterlab libpysal thinc spacy fastai ydata-profiling google-cloud-bigquery google-generativeai

!pip install -qU 'langgraph==0.3.21' 'langchain-google-genai==2.1.2' 'langgraph-prebuilt==0.1.7' 'langchain' 'langchain-community'

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.5/43.5 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m138.0/138.0 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m52.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m45.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m433.6/433.6 kB[0m [31m19.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.0/42.0 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
!pip install -qU "google-genai==1.7.0" "chromadb==0.6.3"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.1/611.1 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m57.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m5.7 MB/s[0

Verify installed genai version

In [3]:
from google import genai
from google.genai import types

genai.__version__

'1.7.0'

Setup the API key and env variable.

In [4]:
import os
from kaggle_secrets import UserSecretsClient
from enum import Enum, auto

class LLM_PROVIDER(Enum):
    GOOGLE = auto()

OUR_LLM_PROVIDER = LLM_PROVIDER.GOOGLE
LLM_KWARGS = dict(model="gemini-1.5-flash")

GOOGLE_API_KEY = UserSecretsClient().get_secret("GOOGLE_API_KEY")
client = genai.Client(api_key=GOOGLE_API_KEY)

# This is crucial, necessary for LangGraph invoke
os.environ['GOOGLE_API_KEY'] = GOOGLE_API_KEY

is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

# 📚 Cookbook data corpus preparation

From the attached dataset **Cookbooks** select some books and get the first N characters, based on which the titles will be retrieved later.

In [5]:
import os
import re
import json
import typing_extensions as typing
from google.api_core import retry
from langchain_community.document_loaders import TextLoader

CLIP = 250
NUM_BOOKS = 5
BOOKS_STEP = 12

book_headers = []
book_file_names = []

for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in sorted(filenames)[::BOOKS_STEP][:NUM_BOOKS]:

        book_loader = TextLoader(os.path.join(dirname, filename))
        book = book_loader.load()
        
        book_headers.append(book[0].page_content[:CLIP])
        book_file_names.append(filename)
        print(filename)

amem.txt
chin.txt
epia.txt
grea.txt
linc.txt


## 📝 Titles retrieval
Define Pydantic model for a function output, to help structure the few_shot_prompt and LLM call output format.
Capabilities:
* **few shot prompting**
* **structured output controlled generation**

In [6]:
from pydantic import BaseModel, Field

class BookInfo(BaseModel):
    title: str = Field(description="Title of a book")
    authors: list[str] = Field(description="Book authors list")

In [7]:
from langchain_core.prompts import PromptTemplate, FewShotPromptTemplate
from langchain_core.example_selectors import LengthBasedExampleSelector
from langchain_core.output_parsers import JsonOutputParser

few_shot_prompt_instruction = "Parse the begining of given book to retrieve title and authors. Note there can be many new-line characters inside the text."
examples = [{"input": "\n \n \n \n The American Woman's Home: or, Principles of Domestic Science; being a Guide to the Formation and Maintenance of Economical, Healthful, Beautiful, and Christian Homes.  Beecher, Catharine Esther  Stowe, Harriet Beecher  Home economics.  Introduction. The Christian Family. A Christian House. A Healthful Home.",
            "output": 
                """
                title: "The American Woman's Home: or, Principles of Domestic Science; being a Guide to the Formation and Maintenance of Economical, Healthful, Beautiful, and Christian Homes.\n"
                authors: ["Catharine Beecher", "Stowe Esther", "Beecher Harriet"]
                """,
            },           
            {"input": "\n\n Directions for Cookery, in its Various Branches.\n Leslie, Eliza \nCookery, American.\n",
            "output":
                """
                title: "Directions for Cookery, in its Various Branches."
                authors: ["Eliza Leslie"]
                """,
            },
            {"input": "\n\n \n\n \nA bookplate illustration of a illuminated reading lap and an open book.  \nThis book belongs to Beatrice V. Grant.\n\n",
            "output":
                """
                title: "A bookplate illustration of a illuminated reading lap and an open book."
                authors: ["Beatrice V. Grant"]
                """
           }]

example_prompt = PromptTemplate(
    input_variables = ["input", "output"],
    template = "EXAMPLE: {input}\nResponse: {output}",  
)

example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    max_length=500,
)

output_parser = JsonOutputParser(pydantic_object=BookInfo)
format_instructions = output_parser.get_format_instructions()

dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix=few_shot_prompt_instruction + "\n {format_instructions}",
    suffix="EXAMPLE: {header}\nResponse:",
    input_variables=["header"],
    partial_variables={"format_instructions": format_instructions},
)

In [8]:
print(dynamic_prompt.format(header="Cookbook for oldies. Mr. Matuzalem"))

Parse the begining of given book to retrieve title and authors. Note there can be many new-line characters inside the text.
 The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"title": {"description": "Title of a book", "title": "Title", "type": "string"}, "authors": {"description": "Book authors list", "items": {"type": "string"}, "title": "Authors", "type": "array"}}, "required": ["title", "authors"]}
```

EXAMPLE: 
 
 
 
 The American Woman's Home: or, Principles of Domestic Science; being a Guide to the Formation and Maintenance of Economical, Healthful, Beautiful, and Chri

In [9]:
def gen_llm_factory(provider: LLM_PROVIDER, args=None, kwargs=None):
    if provider == LLM_PROVIDER.GOOGLE:
        return GoogleGenerativeAI(**kwargs)
        
    else:
        raise NotImplementedError(f"The provider {provider} is not supported!")

In [10]:
from langchain_google_genai import GoogleGenerativeAI

llm = gen_llm_factory(OUR_LLM_PROVIDER, kwargs=LLM_KWARGS)

@retry.Retry(predicate=is_retriable, timeout=3.0)
def extract_header_meta(book_header: str) -> dict:

    chain = dynamic_prompt | llm | output_parser
    
    return chain.invoke({"header": book_header})
    

titles_retrieved = [] 
for book_header in book_headers:
    print("=========================")
    print("Original book header:\n")
    print(book_header)

    try:        
        book_info = extract_header_meta(book_header)
        titles_retrieved.append(book_info.get("title"))
        print("=========================")
        print("Retrieved title and authors:")
        print(book_info)
        print("")
    except:
        print("error")
        titles_retrieved.append("")
    print("=========================\n")

Original book header:

 
 

 
  

 
 
 


 The American Matron: Or, Practical and Scientific Cookery. 


 By a Housekeeper. 


 Boston: J. Munroe &amp; Co., 1851 

 [Page images for  The American Matron  were produced before MSU began the "Feeding America" digitization pro
Retrieved title and authors:
{'title': 'The American Matron: Or, Practical and Scientific Cookery.', 'authors': ['A Housekeeper']}


Original book header:

 
 
  
 
 Chinese-Japanese Cook Book 
 Bosse, Sara 
 Watanna, Onoto 
 Cookery, Chinese. Cookery, Japanese. Cookery, American. 
 Part 1 Chinese Recipes. Rules for Cooking. Soups. Gravy. Fish. Poultry and Game. Meats. Chop Sueys. Chow Mains. Fried Rice
Retrieved title and authors:
{'title': 'Chinese-Japanese Cook Book', 'authors': ['Sara Bosse', 'Onoto Watanna']}


Original book header:

 
 
 
  The Epicurean...  Ranhofer, Charles.  Cookery, American. Cookery, French. Menus.  Complete title: The Epicurean. A complete treatise of Analytical and Practical Studies on t

# 🧠 RAG utilities
## Text chunking

In [11]:
from langchain.text_splitter import TokenTextSplitter

splitter = TokenTextSplitter(
    encoding_name="cl100k_base", # Example encoding for newer OpenAI models
    chunk_size=500,  # Target chunk size in TOKENS
    chunk_overlap=30 # Overlap in TOKENS
)

chunks = splitter.split_documents(book)

# example:
print(chunks[0].page_content)

 
 
 
 
 Mrs. Lincoln's Boston Cook Book. What to Do and What Not to Do in Cooking. 
 Lincoln, Mary Johnson 
 Cookery, American. 
 Introduction. Bread and Bread Making. Receipts for Yeast and Bread. Raised Biscuit, Rolls, etc. Stale Bread, Toast, etc. Soda Biscuit, Muffins, Gems, etc. Waffles and Griddle-Cakes. Fried Muffins, Fritters, Doughnuts, etc. Oatmeal and other Grains. Beverages. Soup and Stock. Soup without Stock. Fish. Shell Fish. Meat and Fish Sauces. Eggs. Meat. Beef. Mutton and Lamb. Veal. Pork. Poultry and Game. Entr&#233;es and Meat R&#233;chauff&#233;. Sundries. Vegetables. Rice and Macaroni. Salads. Pastry and Pies. Pudding Sauces. Hot Puddings. Custards, Jellies, and Creams. Ice-Cream and Sherbet. Cake. Fruit. Cooking for Invalids. Miscellaneous Hints. The Dining-Room. The Care of Kitchen Utensils. An Outline of Study for Teachers. Suggestions to Teachers. A Course of Study for Normal Pupils. Miscellaneous Questions for Examination. Topics and Illustrations for Lectur

## Embedding function for RAG system
This will make embeddings of text chunks (obtained with dummy_chunk_text) that will be stored in a vector database. Later a user query will allow to retrieve (hopefully) the most relevant chunks.

Helper method to navigate through book titles.

# 🔩 Tools for LLM to use
Several tools are specified here:
* find_cuisine - Find appropriate cookbook based on how the provided query matches the title of a book - based on semantic similarity (capability: **Embeddings**)
* summarize_cookbook - Summarize cookbook (perform indexing with vector database) - (capability: **RAG**)
* retrieve_recipe - Retrieves relevant information about user requested recipe - (capability: **RAG**)
* order_ingredients - Orders desired ingredients in the nearby store - (capability: just **function calling** like other methods)

Gather tools and prepare chat bot instructions.

# 💬 Configure the chat and start conversation

## Preview the details of conversation

# 🥷 Agentic approach 

## State class
Prepare class to store conversation history and success state of the conversation, along with ingredients lits.

## Extended and updated instruction for the Agent

## Tools for the Agent and nodes for the graph

## Conditional edges

## Full graph building for the Agent

## 💬 Test the Agentic Chatbot 
In this notebook a pre arranged user messages are run in a chat with the bot.