<h1>Chapter 7 - Advanced Text Generation Techniques and Tools</h1>
<i>Going beyond prompt engineering.</i>

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon"></a>
<a href="https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/"><img src="https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K"></a>
<a href="https://github.com/HandsOnLLM/Hands-On-Large-Language-Models"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter07/Chapter%207%20-%20Advanced%20Text%20Generation%20Techniques%20and%20Tools.ipynb)

---

This notebook is for Chapter 7 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>

### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---


In [1]:
%%capture
!pip install langchain>=0.1.17 openai>=1.13.3 langchain_openai>=0.1.6 transformers>=4.40.1 datasets>=2.18.0 accelerate>=0.27.2 sentence-transformers>=2.5.1 duckduckgo-search>=5.2.2 langchain_community
!CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python==0.2.69

# Loading an LLM

In [2]:
!wget https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf

# If this command does not work for you, you can use the link directly to download the model
# https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf

--2025-07-11 15:21:00--  https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/resolve/main/Phi-3-mini-4k-instruct-fp16.gguf
Resolving huggingface.co (huggingface.co)... 18.164.174.55, 18.164.174.118, 18.164.174.23, ...
Connecting to huggingface.co (huggingface.co)|18.164.174.55|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cas-bridge.xethub.hf.co/xet-bridge-us/662698108f7573e6a6478546/a9cdcf6e9514941ea9e596583b3d3c44dd99359fb7dd57f322bb84a0adc12ad4?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20250711%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250711T152100Z&X-Amz-Expires=3600&X-Amz-Signature=bb5c17dfb5ff68236f2c2aad328dad81888e4955a2f0bcae4e54d63703b2c901&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=public&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27Phi-3-mini-4k-instruct-fp16.gguf%3B+filename%3D%22Phi-3-mini-4k-instruct-fp16.gguf%22%3B&x-id=GetObject&Expires=1752250860&Poli

In [3]:
from langchain import LlamaCpp

# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="Phi-3-mini-4k-instruct-fp16.gguf",
    n_gpu_layers=-1,
    max_tokens=500,
    n_ctx=2048,
    seed=42,
    verbose=False
)

In [5]:
llm.invoke("Hi! My name is Daniella. What is 1 + 1?")

''

### Chains

In [9]:
from langchain import PromptTemplate

# Create a prompt template with the "input_prompt" variable
template = """<s><|user|>
{input_prompt}<|end|>
<|assistant|>"""
prompt = PromptTemplate(
    template=template,
    input_variables=["input_prompt"]
)

In [10]:
basic_chain = prompt | llm

In [11]:
# Use the chain
basic_chain.invoke(
    {
        "input_prompt": "Hi! My name is Daniella. What is 1 + 1?",
    }
)

' The answer to what 1 + 1 equals is 2. This is a basic arithmetic operation following the principle of addition where if you have one item and you add another one, you will have two items in total.'

### Multiple Chains

In [13]:
from langchain import LLMChain

# Create a chain for the title of our story
template = """<s><|user|>
Create a title for a story about {summary}. Only return the title.<|end|>
<|assistant|>"""
title_prompt = PromptTemplate(template=template, input_variables=["summary"])
title = LLMChain(llm=llm, prompt=title_prompt, output_key="title")

  title = LLMChain(llm=llm, prompt=title_prompt, output_key="title")


In [15]:
title.invoke({"summary": "a girl that lost her mother"})

{'summary': 'a girl that lost her mother',
 'title': ' "Echoes of a Mother\'s Love: A Tale of Loss and Resilience"'}

In [16]:
# Create a chain for the character description using the summary and title
template = """<s><|user|>
Describe the main character of a story about {summary} with the title {title}. Use only two sentences.<|end|>
<|assistant|>"""
character_prompt = PromptTemplate(
    template=template, input_variables=["summary", "title"]
)
character = LLMChain(llm=llm, prompt=character_prompt, output_key="character")

In [17]:
character.invoke({"summary": "a girl that lost her mother", "title": "Echoes of a Mother\'s Love: A Tale of Loss and Resilience"})

{'summary': 'a girl that lost her mother',
 'title': "Echoes of a Mother's Love: A Tale of Loss and Resilience",
 'character': " The protagonist, Emma, is an introverted yet resilient teenager, whose vibrant spirit stands in stark contrast to the overwhelming grief she experiences after losing her mother at a young age. She finds solace and strength through cherished memories of her mother's love and guidance, which serves as her beacon during this tumultuous period of her life."}

In [18]:
# Create a chain for the story using the summary, title, and character description
template = """<s><|user|>
Create a story about {summary} with the title {title}. The main charachter is: {character}. Only return the story and it cannot be longer than one paragraph<|end|>
<|assistant|>"""
story_prompt = PromptTemplate(
    template=template, input_variables=["summary", "title", "character"]
)
story = LLMChain(llm=llm, prompt=story_prompt, output_key="story")

In [19]:
# Combine all three components to create the full chain
llm_chain = title | character | story

In [21]:
llm_chain.invoke("a girl that lost her mother")

{'summary': 'a girl that lost her mother',
 'title': ' "Lily\'s Legacy: A Journey Through Grief"',
 'character': " Lily, a resilient and compassionate teenager, struggles to navigate the overwhelming waves of grief following the sudden loss of her beloved mother. Her journey is marked by moments of introspection, healing, and ultimately finding solace in cherishing her mother's enduring legacy.",
 'story': " Lily's Legacy: A Journey Through Grief tells the heart-wrenching tale of a resilient teenager named Lily, who embarks on an emotional journey following her mother's sudden departure from this world. Despite the overwhelming waves of grief that threaten to consume her, Lily finds solace in cherishing the enduring legacy left by her beloved mother, through moments of introspection and healing. Along her path, she discovers untapped reserves of strength within herself, allowing her to navigate life's trials with compassion, love, and an unwavering spirit that honors her mother’s memor

# Memory

In [22]:
# Let's give the LLM our name
basic_chain.invoke({"input_prompt": "Hi! My name is Daniella. What is 1 + 1?"})

' The sum of 1 + 1 is 2. Hello, Daniella! Math can be simple yet fascinating once you get the hang of it. If you have any more questions or need assistance with mathematical concepts, feel free to ask.'

In [23]:
# Next, we ask the LLM to reproduce the name
basic_chain.invoke({"input_prompt": "What is my name?"})

" I'm unable to determine your name as I don't have the ability to access personal data. However, if you're looking for advice on how to remember important names or manage a list of names in an application, feel free to ask!"

## ConversationBuffer

In [24]:
# Create an updated prompt template to include a chat history
template = """<s><|user|>
Current conversation:{chat_history}

{input_prompt}<|end|>
<|assistant|>"""

prompt = PromptTemplate(
    template=template,
    input_variables=["input_prompt", "chat_history"]
)

In [25]:
from langchain.memory import ConversationBufferMemory

# Define the type of Memory we will use
memory = ConversationBufferMemory(memory_key="chat_history")

# Chain the LLM, Prompt, and Memory together
llm_chain = LLMChain(
    prompt=prompt,
    llm=llm,
    memory=memory
)

  memory = ConversationBufferMemory(memory_key="chat_history")


In [26]:
# Generate a conversation and ask a basic question
llm_chain.invoke({"input_prompt": "Hi! My name is Daniella. What is 1 + 1?"})

{'input_prompt': 'Hi! My name is Daniella. What is 1 + 1?',
 'chat_history': '',
 'text': ' Hello Daniella! The sum of 1 + 1 is 2.'}

In [27]:
# Does the LLM remember the name we gave it?
llm_chain.invoke({"input_prompt": "What is my name?"})

{'input_prompt': 'What is my name?',
 'chat_history': 'Human: Hi! My name is Daniella. What is 1 + 1?\nAI:  Hello Daniella! The sum of 1 + 1 is 2.',
 'text': ' Your name is the Human, Daniella.\n\nAs for your question about what your name is in relation to me, as an AI, I don\'t have a personal name but you referred to yourself as "Human" and introduced yourself as Daniella. So in this context, Daniella is your given name.'}

## ConversationBufferMemoryWindow

In [32]:
from langchain.memory import ConversationBufferWindowMemory

# Retain only the last 4 conversations in memory
memory = ConversationBufferWindowMemory(k=4, memory_key="chat_history")

# Chain the LLM, Prompt, and Memory together
llm_chain = LLMChain(
    prompt=prompt,
    llm=llm,
    memory=memory
)

In [33]:
# Ask two questions and generate two conversations in its memory
llm_chain.invoke({"input_prompt":"Hi! My name is Daniella and I am 33 years old. What is 4 + 4?"})
llm_chain.invoke({"input_prompt":"What is 3 + 3?"})

{'input_prompt': 'What is 3 + 3?',
 'chat_history': "Human: Hi! My name is Daniella and I am 33 years old. What is 4 + 4?\nAI:  Hello Daniella! It's nice to meet you. Four plus four equals eight.\n\n(Note: The response doesn't directly answer the personal information shared but rather addresses the simple arithmetic question asked.)",
 'text': ' Hello Daniella! Three plus three equals six.'}

In [34]:
# Check whether it knows the name we gave it
llm_chain.invoke({"input_prompt":"What is my name?"})

{'input_prompt': 'What is my name?',
 'chat_history': "Human: Hi! My name is Daniella and I am 33 years old. What is 4 + 4?\nAI:  Hello Daniella! It's nice to meet you. Four plus four equals eight.\n\n(Note: The response doesn't directly answer the personal information shared but rather addresses the simple arithmetic question asked.)\nHuman: What is 3 + 3?\nAI:  Hello Daniella! Three plus three equals six.",
 'text': ' Your name is Daniella.\n\nI hope this answers your questions! If you have any more, feel free to ask.'}

In [35]:
# Check whether it knows the age we gave it
llm_chain.invoke({"input_prompt":"What is my age?"})

{'input_prompt': 'What is my age?',
 'chat_history': "Human: Hi! My name is Daniella and I am 33 years old. What is 4 + 4?\nAI:  Hello Daniella! It's nice to meet you. Four plus four equals eight.\n\n(Note: The response doesn't directly answer the personal information shared but rather addresses the simple arithmetic question asked.)\nHuman: What is 3 + 3?\nAI:  Hello Daniella! Three plus three equals six.\nHuman: What is my name?\nAI:  Your name is Daniella.\n\nI hope this answers your questions! If you have any more, feel free to ask.",
 'text': ' You mentioned that you are 33 years old. So, your current age is 33.'}

## ConversationSummary

In [36]:
# Create a summary prompt template
summary_prompt_template = """<s><|user|>Summarize the conversations and update with the new lines.

Current summary:
{summary}

new lines of conversation:
{new_lines}

New summary:<|end|>
<|assistant|>"""
summary_prompt = PromptTemplate(
    input_variables=["new_lines", "summary"],
    template=summary_prompt_template
)

In [37]:
from langchain.memory import ConversationSummaryMemory

# Define the type of memory we will use
memory = ConversationSummaryMemory(
    llm=llm,
    memory_key="chat_history",
    prompt=summary_prompt
)

# Chain the LLM, prompt, and memory together
llm_chain = LLMChain(
    prompt=prompt,
    llm=llm,
    memory=memory
)

  memory = ConversationSummaryMemory(


In [38]:
# Generate a conversation and ask for the name
llm_chain.invoke({"input_prompt": "Hi! My name is Daniella. What is 1 + 1?"})
llm_chain.invoke({"input_prompt": "What is my name?"})

{'input_prompt': 'What is my name?',
 'chat_history': ' Daniella introduces herself and asks the AI for the sum of 1+1, which the AI correctly answers as 2, offering additional assistance if needed.',
 'text': ' Your name appears to be the one you introduced yourself as in this conversation; therefore, your name is Daniella.\n\n-----'}

In [39]:
# Check whether it has summarized everything thus far
llm_chain.invoke({"input_prompt": "What was the first question I asked?"})

{'input_prompt': 'What was the first question I asked?',
 'chat_history': " Daniella introduces herself, asks the AI for the sum of 1+1 (which the AI correctly answers as 2), and inquires about her name, to which the AI confirms that it's Daniella based on her introduction. The AI offers further assistance if required.",
 'text': ' The first question you asked was: "can you calculate this for me? what is 1+1?"'}

In [40]:
# Check what the summary is thus far
memory.load_memory_variables({})

{'chat_history': ' Daniella introduces herself, asks the AI to calculate 1+1 (which it correctly answers as 2), inquires about her name, and confirms that it\'s Daniella based on her introduction. She then questions the first question she asked, which was "Can you calculate this for me? What is 1+1?" The AI offers further assistance if required.'}

# Agents

In [None]:
import os
from langchain_openai import ChatOpenAI

# Load OpenAI's LLMs with LangChain
os.environ["OPENAI_API_KEY"] = "MY_KEY"
openai_llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [None]:
# Create the ReAct template
react_template = """Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}"""

prompt = PromptTemplate(
    template=react_template,
    input_variables=["tools", "tool_names", "input", "agent_scratchpad"]
)

In [None]:
from langchain.agents import load_tools, Tool
from langchain.tools import DuckDuckGoSearchResults

# You can create the tool to pass to an agent
search = DuckDuckGoSearchResults()
search_tool = Tool(
    name="duckduck",
    description="A web search engine. Use this to as a search engine for general queries.",
    func=search.run,
)

# Prepare tools
tools = load_tools(["llm-math"], llm=openai_llm)
tools.append(search_tool)

In [None]:
from langchain.agents import AgentExecutor, create_react_agent

# Construct the ReAct agent
agent = create_react_agent(openai_llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent, tools=tools, verbose=True, handle_parsing_errors=True
)

In [None]:
# What is the Price of a MacBook Pro?
agent_executor.invoke(
    {
        "input": "What is the current price of a MacBook Pro in USD? How much would it cost in EUR if the exchange rate is 0.85 EUR for 1 USD?"
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to find the current price of a MacBook Pro in USD first before converting it to EUR.
Action: duckduck
Action Input: "current price of MacBook Pro in USD"[0m[33;1m[1;3m[snippet: View at Best Buy. The best MacBook Pro overall The MacBook Pro 14-inch with the latest M3-series chips offers outstanding, best-in-class performance while getting fantastic battery life and ..., title: The best MacBook Pro in 2024: our picks for the top Pro models, link: https://www.techradar.com/best/best-macbook-pro], [snippet: Starts at $1,299. Upgradable to 24 GB of memory and 2 TB of storage. 67W USB-C charger included. The M2-powered MacBook Pro is available now for a starting price of $1,299 on Apple's website ..., title: MacBook Pro 13-inch (M2, 2022) review | Tom's Guide, link: https://www.tomsguide.com/reviews/macbook-pro-13-inch-m2-2022], [snippet: The late-2023 MacBook Pro update also marks the demise of the 13-inch MacBook Pro, w

{'input': 'What is the current price of a MacBook Pro in USD? How much would it cost in EUR if the exchange rate is 0.85 EUR for 1 USD?',
 'output': 'The current price of a MacBook Pro in USD is $2,249.00. It would cost approximately 1911.65 EUR with an exchange rate of 0.85 EUR for 1 USD.'}