# Unit 2.2 – Agent Execution Loop (Reason → Act → Reflect)

**Course:** AI Bootcamp / Agents  
**Week:** 5  
**Unit:** 2.2 – The Agent Execution Loop  
**Student:** Marcellous

## Objective
Demonstrate an agent execution loop that follows the **Reason → Act → Reflect** pattern using a local language model and persistent memory.

## How This Agent Works
- **Reason:** Checks user input for memory-related questions (e.g., “What did I say earlier?”)
- **Act:** Either retrieves stored memory or generates a response using a local model
- **Reflect:** Saves conversation history to a local memory file (`ai_memory.txt`)

## Technical Details
- Model: `distilgpt2` (no API key, no cost)
- Interface: Gradio
- Memory: File-based persistence
- Runs fully offline after installation

## Notes
- The model is not instruction-tuned; repetitive outputs are expected.
- The focus of this assignment is the **agent loop structure**, not model quality.


In [1]:
!pip install -q gradio transformers torch

In [2]:
import gradio as gr
import os
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

# -----------------------------
# 1. Setup model
# -----------------------------
model_name = "distilgpt2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Add pad token if missing
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = tokenizer.eos_token_id

agent = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer
)


Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]



vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/76 [00:00<?, ?it/s]

GPT2LMHeadModel LOAD REPORT from: distilgpt2
Key                                        | Status     |  | 
-------------------------------------------+------------+--+-
transformer.h.{0, 1, 2, 3, 4, 5}.attn.bias | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.


generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [3]:
# -----------------------------
# 2. Memory setup
# -----------------------------
MEMORY_FILE = "ai_memory.txt"

if os.path.exists(MEMORY_FILE):
    with open(MEMORY_FILE, "r", encoding="utf-8") as f:
        memory = f.read()
else:
    memory = ""

def save_memory():
    with open(MEMORY_FILE, "w", encoding="utf-8") as f:
        f.write(memory)


In [6]:
def chat_with_ai(user_message, chat_history):
    global memory

    # -------- REASON --------
    if "what did i say earlier" in user_message.lower():
        if memory.strip():
            reply = f"You previously said: {memory.strip().splitlines()[-2]}"
        else:
            reply = "I don't have any memory yet."

    elif "remember" in user_message.lower():
        reply = "Okay, I will remember that."

    else:
        # -------- ACT --------
        context = f"{memory}\nUser: {user_message}\nAI:"
        result = agent(context, max_new_tokens=80, temperature=0.6)[0]["generated_text"]
        reply = result.split("AI:")[-1].strip()

    # -------- REFLECT --------
    memory += f"\nUser: {user_message}\nAI: {reply}"
    save_memory()

    chat_history.append((user_message, reply))
    return chat_history, ""


In [7]:
# -----------------------------
# 4. Gradio interface
# -----------------------------
with gr.Blocks() as ui:
    gr.Markdown("## Agent Execution Loop Demo (Reason → Act → Reflect)")
    chatbot = gr.Chatbot()
    msg = gr.Textbox(label="Type your message")
    clear = gr.Button("Clear Memory")

    msg.submit(agent_loop, [msg, chatbot], [chatbot, msg])

    def clear_memory():
        global memory
        memory = ""
        open(MEMORY_FILE, "w").close()
        return []

    clear.click(clear_memory, None, chatbot)

ui.launch()


  chatbot = gr.Chatbot()
  chatbot = gr.Chatbot()


It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://882533363c335f5454.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


