# Reflection Agent for Tweet Generation with Llama 3.2

## Model Loading & Setup

The notebook starts by loading a Llama 3.2 (1B-instruct) model:

1. **Model and Tokenizer Loading**:
    - Using HuggingFace's transformers library
    - Loading from a local directory: `/kaggle/input/llama-3.2/transformers/1b-instruct/1`
    - Handling device selection (CUDA/CPU)

2. **Tokenizer Configuration**:
    - Setting padding tokens
    - Ensuring proper token handling

3. **Pipeline Setup**:
    - Creating a text-generation pipeline with appropriate parameters:
      - Temperature: 0.3 (low to maintain coherence)
      - Max tokens: 128
      - Top-p sampling: 0.9
    - Wrapping the pipeline in LangChain's HuggingFacePipeline

## The Reflection Agent

The reflection agent uses a two-step process to improve content quality:

1. **Generation Stage**:
    - Creates a tweet based on user prompts
    - Uses a system prompt identifying as a "twitter techie influencer assistant"
    - Handles both initial creation and revisions

2. **Reflection Stage**:
    - Critiques the generated tweet
    - Identifies potential improvements
    - Provides detailed recommendations on style, length, virality, etc.

3. **Custom Message Formatting**:
    - `format_messages_for_llama()` converts LangChain message objects to Llama 3's expected chat format
    - Properly handles system, user, and assistant messages
    - Uses the tokenizer's built-in chat template

## LangGraph Implementation

The reflection process is implemented as a graph with:

1. **Nodes**:
    - `GENERATE`: Creates or revises tweets
    - `REFLECT`: Critiques the current tweet

2. **Edges and Flow Control**:
    - Conditional edge from GENERATE: determines whether to continue or end
    - Direct edge from REFLECT back to GENERATE for revision
    - Termination criteria:
      - Perfect tweet detection ("no changes needed")
      - Maximum iteration limit (8 messages)

3. **State Management**:
    - State is maintained as a list of messages
    - Each node adds new messages to the conversation history

*The reflection loop continues until either the tweet is deemed perfect or the maximum number of iterations is reached, creating a self-improving system that refines content through multiple revisions.*

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/llama-3.2/transformers/1b-instruct/1/config.json
/kaggle/input/llama-3.2/transformers/1b-instruct/1/README.md
/kaggle/input/llama-3.2/transformers/1b-instruct/1/USE_POLICY.md
/kaggle/input/llama-3.2/transformers/1b-instruct/1/tokenizer.json
/kaggle/input/llama-3.2/transformers/1b-instruct/1/tokenizer_config.json
/kaggle/input/llama-3.2/transformers/1b-instruct/1/LICENSE.txt
/kaggle/input/llama-3.2/transformers/1b-instruct/1/model.safetensors
/kaggle/input/llama-3.2/transformers/1b-instruct/1/special_tokens_map.json
/kaggle/input/llama-3.2/transformers/1b-instruct/1/.gitattributes
/kaggle/input/llama-3.2/transformers/1b-instruct/1/generation_config.json


In [2]:
! pip install langchain



In [3]:
! pwd

/kaggle/working


In [4]:
! ls

In [5]:
! ls - /kaggle/input/llama-3.2/transformers/1b-instruct/1

ls: cannot access '-': No such file or directory
/kaggle/input/llama-3.2/transformers/1b-instruct/1:
config.json		model.safetensors	 tokenizer_config.json
generation_config.json	README.md		 tokenizer.json
LICENSE.txt		special_tokens_map.json  USE_POLICY.md


In [6]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

Using device: cuda


In [7]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [8]:
# model loading
#model directory
model_directory = "/kaggle/input/llama-3.2/transformers/1b-instruct/1"

# Load the tokenizer
try:
    tokenizer = AutoTokenizer.from_pretrained(model_directory)
    print("Tokenizer loaded successfully!")
except Exception as e:
    print(f"Error loading tokenizer: {e}")
    print(f"Please ensure the tokenizer files are in: {model_directory}")

# Load the model
try:
    model = AutoModelForCausalLM.from_pretrained(model_directory).to(device)
    print("Model loaded successfully!")
except Exception as e:
    print(f"Error loading model: {e}")
    print(f"Please ensure the model files are in: {model_directory}")



Tokenizer loaded successfully!
Model loaded successfully!


In [None]:
if tokenizer and model:
    prompt = "Write a detail and concise history of British Empire"
    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    try:
        outputs = model.generate(**inputs, max_new_tokens=512, num_return_sequences=1)
        generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print(f"\nGenerated text: {generated_text}")
    except Exception as e:
        print(f"Error during text generation: {e}")

# AI Agents

In [None]:
! pip install langchain_community langchain_huggingface

In [13]:

import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain_community.llms import HuggingFacePipeline
from langchain_huggingface import HuggingFacePipeline
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain.agents import initialize_agent, AgentType
from langchain.tools import tool # Import the tool decorator
from dotenv import load_dotenv # For loading API keys

In [None]:
os.environ["TAVILY_API_KEY"] = "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"

In [None]:
tavily_api_key = os.getenv("TAVILY_API_KEY")
tavily_api_key

In [None]:
load_dotenv(verbose=True)

In [None]:
# --- Universal Check for TAVILY_API_KEY ---
tavily_api_key = os.getenv("TAVILY_API_KEY")

if tavily_api_key:
    print("TAVILY_API_KEY is loaded!")
    print(f"Key starts with: {tavily_api_key[:6]}*****") 
else:
    print("Recheck again")


In [None]:
from transformers import StoppingCriteria, StoppingCriteriaList

class StopOnTokens(StoppingCriteria):
    def __init__(self, stop_sequences, tokenizer):
        self.stop_sequences = stop_sequences
        self.tokenizer = tokenizer
        
    def __call__(self, input_ids, scores, **kwargs):
        # Decode the last generated tokens
        last_token_slice = input_ids[0][-10:]  # Check last 10 tokens
        decoded = self.tokenizer.decode(last_token_slice, skip_special_tokens=True)
        
        # Check if any stop sequence appears in the decoded text
        for stop_seq in self.stop_sequences:
            if stop_seq in decoded:
                return True
        return False

# Update your pipeline setup:
if tokenizer and model:
    # Set pad token if not exists
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
    
    stop_sequences = ["\nObservation:", "\nFinal Answer:", "Observation:", "Final Answer:"]
    stopping_criteria = StoppingCriteriaList([StopOnTokens(stop_sequences, tokenizer)])
    
    text_generation_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_new_tokens=512,  # Reduced for better control
        temperature=0.1,
        do_sample=True,
        repetition_penalty=1.1,
        top_p=0.9,
        pad_token_id=tokenizer.eos_token_id,
        stopping_criteria=stopping_criteria
    )
    llm = HuggingFacePipeline(pipeline=text_generation_pipeline)
    print("\nLangChain LLM (Llama via HuggingFacePipeline) initialized successfully!")

In [None]:
search_tool = TavilySearchResults(search_depth="basic")

tools = [search_tool]
agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
    verbose=True,
    handle_parsing_errors=True,
    max_iterations=4,  # Limit iterations to prevent infinite loops
    early_stopping_method="generate"  # Stop early if no progress
)
agent

In [None]:
@tool
def get_system_time(format: str = "%Y-%m-%d %H:%M:%S"):
    """ Returns the current date and time in the specified format """
    current_time = datetime.datetime.now()
    formatted_time = current_time.strftime(format)
    return formatted_time


In [None]:
result=agent.invoke("Write a Detailed and concise history of British Empire")
print(result)


In [None]:
from langchain.agents.mrkl.output_parser import MRKLOutputParser
from langchain.schema import AgentAction, AgentFinish
import re

class LlamaOutputParser(MRKLOutputParser):
    def parse(self, text: str):
        # Clean up the text
        text = text.strip()
        
        # If it contains "Final Answer:", extract it
        if "Final Answer:" in text:
            try:
                final_answer = text.split("Final Answer:")[-1].strip()
                return AgentFinish(return_values={"output": final_answer}, log=text)
            except:
                pass
        
        # Look for properly formatted actions
        action_match = re.search(r"Action:\s*([^\n]+)", text)
        action_input_match = re.search(r"Action Input:\s*([^\n]+)", text)
        
        if action_match and action_input_match:
            action = action_match.group(1).strip()
            action_input = action_input_match.group(1).strip()
            
            # Clean up common formatting issues
            if "tavily" in action.lower():
                action = "tavily_search_results_json"
            
            # Remove quotes from action input
            action_input = action_input.strip('"').strip("'")
            if action_input.startswith("The search query "):
                action_input = action_input.replace("The search query ", "").strip('"').strip("'")
            
            return AgentAction(tool=action, tool_input=action_input, log=text)
        
        # If nothing works, return as final answer
        return AgentFinish(return_values={"output": text}, log=text)

# Create agent with custom parser
from langchain.agents import create_react_agent
from langchain import hub

try:
    react_prompt = hub.pull("hwchase17/react")
    custom_agent = create_react_agent(
        llm=llm,
        tools=[search_tool],
        prompt=react_prompt,
        output_parser=LlamaOutputParser()
    )
    
    from langchain.agents import AgentExecutor
    agent_executor = AgentExecutor(
        agent=custom_agent,
        tools=[search_tool],
        verbose=True,
        max_iterations=2,
        handle_parsing_errors=True
    )
    
    print("Testing custom parser agent:")
    result = agent_executor.invoke({"input": "Write a detail and concise history of British Empire"})
    print(result)
    
except Exception as e:
    print(f"Custom agent failed: {e}")

# Reflection Agent

In [9]:
! pip install langgraph



In [10]:
from typing import List, Sequence
from dotenv import load_dotenv
from langchain_core.messages import BaseMessage, HumanMessage
from langgraph.graph import END, MessageGraph
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage # To structure messages

# Explanation of LangChain Message Types

LangChain uses different message types to structure conversations with language models.
These message types represent different roles in a conversation:

# 1. SystemMessage:
    - Provides overall instructions or context to the model
    - Not visible to the end user in most UIs
    - Sets the tone, constraints, and behavior of the assistant
    - Example: "You are a helpful AI assistant that writes professional emails."

# 2. HumanMessage:
    - Represents input from the user/human
    - Contains the queries, instructions, or responses from the user
    - Typically shown as user messages in chat interfaces
    - Example: "Write a tweet about AI in healthcare."

# 3. AIMessage:
   - Contains responses generated by the AI/assistant
   - Represents the output from the language model
   - Shown as assistant responses in chat interfaces
   - Example: "AI is revolutionizing healthcare by improving diagnosis accuracy..."

#### These message types help structure conversations in a chat format
#### and make it clear which parts come from the system, user, or AI.
#### They're particularly useful for chat models that expect a specific format
#### of alternating human and assistant messages, with optional system messages.

In [11]:
if tokenizer.pad_token is None: # tokwni\we from model loading
     tokenizer.add_special_tokens({'pad_token': '[PAD]'})
     model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = tokenizer.pad_token_id

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


In [14]:
text_generation_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128,
    temperature=0.3,
    do_sample=True,
    repetition_penalty=1.1,
    top_p=0.9,
    pad_token_id=tokenizer.pad_token_id
)
llm = HuggingFacePipeline(pipeline=text_generation_pipeline)
print("\nLangChain LLM (Llama via HuggingFacePipeline) initialized successfully!")

Device set to use cuda:0



LangChain LLM (Llama via HuggingFacePipeline) initialized successfully!


In [15]:
llm

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x789bcafccca0>, model_id='/kaggle/input/llama-3.2/transformers/1b-instruct/1')

In [None]:
# The apply_chat_template method is a key functionality in HuggingFace's transformers library
# that helps format messages into the specific chat format expected by LLMs like Llama 3.

def explain_apply_chat_template():
    """
    Explanation of the apply_chat_template method used in format_messages_for_llama function:
    
    1. Purpose:
       - Converts a list of message dictionaries into a properly formatted chat prompt string
       - Each LLM has its own expected chat format (e.g., Llama 3, ChatGPT, Mistral, etc.)
       - The tokenizer contains a built-in template specific to the model
    
    2. Parameters:
       - formatted_messages: List of dictionaries with 'role' and 'content' keys
       - tokenize=False: Returns the formatted string rather than token IDs
       - add_generation_prompt=True: Adds the special tokens for the assistant to respond
    
    3. How it works:
       - The tokenizer has a chat_template (JSON or Jinja2 format) that defines:
         * How to format system/user/assistant messages
         * What special tokens to add between messages
         * How to indicate where the model should start generating
       - For Llama 3, it typically adds tokens like:
         <|start_header_id|>assistant<|end_header_id|>\n\n
         at the end to signal the model to generate a response
    
    4. Benefits:
       - Handles model-specific formatting automatically
       - Ensures the chat history is properly structured
       - Reduces errors in prompt engineering
    """
    return "apply_chat_template handles the model-specific chat formatting automatically"

# The function creates a string that might look like:
# <|start_header_id|>system<|end_header_id|>
# You are a helpful assistant
# <|start_header_id|>user<|end_header_id|>
# Write a tweet about AI
# <|start_header_id|>assistant<|end_header_id|>
# [this is where the model will generate text]

# apply_chat_template
Chats should be structured as a list of dictionaries with role and content keys. The role key specifies the speaker (usually between you and the system), and the content key contains your message. For the system, the content is a high-level description of how the model should behave and respond when you’re chatting with it.

### *CODE*

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta", device_map="auto", torch_dtype=torch.bfloat16)

messages = [
    {"role": "system", "content": "You are a friendly chatbot who always responds in the style of a pirate",},
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.decode(tokenized_chat[0]))
```

#### copied from Hugging face website

In [18]:

# --- Adaptation for Llama Model (HuggingFacePipeline) ---
# We need to flatten the chat messages into a single prompt string.
# Llama 3 models typically use a specific chat template.

# This function will format messages for Llama 3's chat template
def format_messages_for_llama(messages_list, tokenizer_obj):
    # Convert LangChain message objects to dictionary format expected by apply_chat_template
    formatted_messages = []
    for msg in messages_list:
        if isinstance(msg, SystemMessage):
            formatted_messages.append({"role": "system", "content": msg.content})
        elif isinstance(msg, HumanMessage):
            formatted_messages.append({"role": "user", "content": msg.content})
        elif isinstance(msg, AIMessage):
            formatted_messages.append({"role": "assistant", "content": msg.content})
        else:
            # Handle other message types if necessary, or raise an error
            raise ValueError(f"Unsupported message type: {type(msg)}")

    # Apply the tokenizer's chat template to get the final string
    # Llama 3 tokenizer has a chat template that we can use
    # add_generation_prompt=True tells the tokenizer to add the
        ##<|start_header_id|>assistant<|end_header_id|>\n\n for the LLM to start generating.
    return tokenizer_obj.apply_chat_template(formatted_messages, tokenize=False, add_generation_prompt=True)


# --- Generation Prompt (Adapted for Llama) ---
# Instead of ChatPromptTemplate with MessagesPlaceholder, we use a custom function
# that will format the messages into a string expected by the Llama model.
# The `PromptTemplate` here is just a placeholder, the actual formatting happens
# via the `format_messages_for_llama` function.

def get_generation_chain(llm_model, tokenizer_obj):
    # This chain will take a list of messages, format them, and pass to LLM
    def _generation_chain_run(messages):
        # Add the system message at the beginning of the messages list
        # This acts as the initial system prompt for the generation
        full_messages = [
            SystemMessage(
                content="You are a twitter techie influencer assistant tasked with writing excellent twitter posts."
                        " Generate the best twitter post possible for the user's request."
                        " If the user provides critique, respond with a revised version of your previous attempts."
            )
        ] + messages # User's current messages (including user input and past assistant replies)

        prompt_text = format_messages_for_llama(full_messages, tokenizer_obj)
        # LangChain's LLM callable expects a string.
        # We need to ensure the output is also just the generated text.
        return llm_model.invoke(prompt_text) # Returns a Generation or LLMResult object

    return _generation_chain_run

# --- Reflection Prompt (Adapted for Llama) ---
def get_reflection_chain(llm_model, tokenizer_obj):
    # This chain will take a list of messages, format them, and pass to LLM
    def _reflection_chain_run(messages):
        # Add the system message at the beginning of the messages list
        full_messages = [
            SystemMessage(
                content="You are a viral twitter influencer grading a tweet. Generate critique and recommendations for the user's tweet."
                        " Always provide detailed recommendations, including requests for length, virality, style, etc."
            )
        ] + messages # Messages include the user's original request and the generated tweet

        prompt_text = format_messages_for_llama(full_messages, tokenizer_obj)
        return llm_model.invoke(prompt_text)

    return _reflection_chain_run


### here, we used simple code

In [19]:
if llm and tokenizer: # Ensure both are loaded before proceeding
    generation_chain_llama = get_generation_chain(llm, tokenizer)
    reflection_chain_llama = get_reflection_chain(llm, tokenizer)

    print("\nLlama-compatible Generation and Reflection Chains initialized successfully!")

    # --- Example Usage ---

    # Initial generation
    initial_user_input = "Write a tweet about the future of AI in coding, under 280 characters with relevant hashtags."
    messages_history = [HumanMessage(content=initial_user_input)]

    print(f"\n--- Initial Generation for: '{initial_user_input}' ---")
    generated_tweet_result = generation_chain_llama(messages_history)
    generated_tweet = generated_tweet_result.content if hasattr(generated_tweet_result, 'content') else str(generated_tweet_result)
    print(f"Generated Tweet:\n{generated_tweet}")

    # Add the AI's generated tweet to the messages history for reflection
    messages_history.append(AIMessage(content=generated_tweet))


    # Reflection
    print(f"\n--- Reflection on the Generated Tweet ---")
    reflection_messages = messages_history # Reflection prompt will receive the full conversation
    critique_result = reflection_chain_llama(reflection_messages)
    critique = critique_result.content if hasattr(critique_result, 'content') else str(critique_result)
    print(f"Critique:\n{critique}")


    # Simulate user providing critique based on the AI's critique
    # In a real loop, you'd feed the AI's critique back into the `messages_history`
    # and then the user would add their new request or a "Please revise based on this."
    user_critique_input = "The tweet is a bit generic. Make it more exciting and mention a specific new AI tool for coders. Add an emoji."
    messages_history.append(HumanMessage(content=user_critique_input))

    # Revised Generation
    print(f"\n--- Revised Generation after Critique ---")
    revised_tweet_result = generation_chain_llama(messages_history)
    revised_tweet = revised_tweet_result.content if hasattr(revised_tweet_result, 'content') else str(revised_tweet_result)
    print(f"Revised Tweet:\n{revised_tweet}")


else:
    print("Cannot run example as LLM or tokenizer failed to load.")


Llama-compatible Generation and Reflection Chains initialized successfully!

--- Initial Generation for: 'Write a tweet about the future of AI in coding, under 280 characters with relevant hashtags.' ---
Generated Tweet:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a twitter techie influencer assistant tasked with writing excellent twitter posts. Generate the best twitter post possible for the user's request. If the user provides critique, respond with a revised version of your previous attempts.<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a tweet about the future of AI in coding, under 280 characters with relevant hashtags.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Here is a tweet about the future of AI in coding:

"Get ready to code like a robot! AI-powered tools will revolutionize the way we write, test & debug code. From automated testing to predictive analytics, the future of coding is here #AIinCoding #FutureOfWork #TechForGood"

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Critique:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a viral twitter influencer grading a tweet. Generate critique and recommendations for the user's tweet. Always provide detailed recommendations, including requests for length, virality, style, etc.<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a tweet about the future of AI in coding, under 280 characters with relevant hashtags.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a twitter techie influencer assistant tasked with writing excellent twitter posts. Generate the best twitter post possible for the user's request. If the user provides critique, respond with a revised version of your previous attempts.<|eot_id|><|start_header_id|>user<|end_header_id|>

Write a tweet about the future of AI in coding, under 280 characters with relevant hashtags.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Here is a tweet ab

# LangGraph Reflection Agent

In [16]:
! pip install grandalf



In [None]:
# This function formats messages for Llama 3's chat template
def format_messages_for_llama(messages_list: list[BaseMessage], tokenizer_obj: AutoTokenizer) -> str:
    formatted_messages = []
    for msg in messages_list:
        if isinstance(msg, SystemMessage):
            formatted_messages.append({"role": "system", "content": msg.content})
        elif isinstance(msg, HumanMessage):
            formatted_messages.append({"role": "user", "content": msg.content})
        elif isinstance(msg, AIMessage):
            formatted_messages.append({"role": "assistant", "content": msg.content})
        else:
            raise ValueError(f"Unsupported message type: {type(msg)}")
    return tokenizer_obj.apply_chat_template(formatted_messages, tokenize=False, add_generation_prompt=True)


# --- Define the Generation and Reflection Chains (these are the custom Python functions for Llama) ---
# Ensure llm and tokenizer are available before defining these functions
if llm and tokenizer:
    def generation_chain_llama(messages: list[BaseMessage]):
        full_messages_for_prompt = [
            SystemMessage(
                content="You are a twitter techie influencer assistant tasked with writing excellent twitter posts."
                        " Generate the best twitter post possible for the user's request."
                        " If the user provides critique, respond with a revised version of your previous attempts."
            )
        ] + messages
        prompt_text = format_messages_for_llama(full_messages_for_prompt, tokenizer)
        return llm.invoke(prompt_text)

    def reflection_chain_llama(messages: list[BaseMessage]):
        full_messages_for_prompt = [
            SystemMessage(
                content="You are a viral twitter influencer grading a tweet. Generate critique and recommendations for the user's tweet."
                        " Always provide detailed recommendations, including requests for length, virality, style, etc."
            )
        ] + messages
        prompt_text = format_messages_for_llama(full_messages_for_prompt, tokenizer)
        return llm.invoke(prompt_text)

    print("\nLlama-compatible chains ready for LangGraph.")

    # --- LangGraph Setup ---
    REFLECT = "reflect"
    GENERATE = "generate"

    graph = MessageGraph()

    def generate_node(state: list[BaseMessage]) -> list[BaseMessage]:
        print("\n--- GENERATE NODE: Producing tweet ---")
        # response will be a string
        response_str = generation_chain_llama(state)
        # Wrap the string response in an AIMessage object
        return state + [AIMessage(content=response_str)]

    def reflect_node(state: list[BaseMessage]) -> list[BaseMessage]:
        print("\n--- REFLECT NODE: Critiquing tweet ---")
        # critique_response will be a string
        critique_response_str = reflection_chain_llama(state)
        # Wrap the string response in a HumanMessage object
        return state + [HumanMessage(content=critique_response_str)]

    graph.add_node(GENERATE, generate_node)
    graph.add_node(REFLECT, reflect_node)

    graph.set_entry_point(GENERATE)

    def should_continue(state: list[BaseMessage]) -> str:
        if not state:
            return END

        last_message = state[-1]
        # Check if the last message (the critique) indicates completion
        if "perfect. no changes needed" in last_message.content.lower():
            print("\n--- Should Continue: NO (Critique indicates perfection) ---")
            return END
        elif len(state) > 8: # Limit to prevent infinite loops (e.g., 3 generate-reflect cycles + initial message)
            print("\n--- Should Continue: NO (Max iterations reached) ---")
            return END
        else:
            print("\n--- Should Continue: YES (Needs more reflection) ---")
            return GENERATE # After reflection, if we continue, we go back to GENERATE

    # Edge from GENERATE: After generating, check if we should continue reflecting or end.
    graph.add_conditional_edges(GENERATE, should_continue)

    # Edge from REFLECT: After reflecting, always go back to GENERATE for revision.
    graph.add_edge(REFLECT, GENERATE)

    app = graph.compile()

    # --- Print Graph for Visualization ---
    try:
        print("\n--- LangGraph Mermaid Diagram ---")
        print(app.get_graph().draw_mermaid())
    except Exception as e:
        print(f"Could not draw Mermaid diagram (might be a plotting dependency issue): {e}")

    print("\n--- LangGraph ASCII Representation ---")
    app.get_graph().print_ascii()

    # --- Invoke the Reflection Agent ---
    initial_user_query = HumanMessage(content="Write a tweet about how AI is changing the human lives!")
    print(f"\n--- Invoking Reflection Agent with: '{initial_user_query.content}' ---")
    response_messages = app.invoke([initial_user_query])

    print("\n--- Final LangGraph Response ---")
    for msg in response_messages:
        print(f"Type: {msg.type.capitalize()}\nContent: {msg.content}\n---")

else:
    print("Cannot proceed with LangGraph setup as LLM or Tokenizer failed to load.")


Llama-compatible chains ready for LangGraph.

--- LangGraph Mermaid Diagram ---
---
config:
  flowchart:
    curve: linear
---
graph TD;
	__start__(<p>__start__</p>)
	generate(generate)
	reflect(reflect)
	__end__(<p>__end__</p>)
	__start__ --> generate;
	generate --> __end__;
	classDef default fill:#f2f0ff,line-height:1.2
	classDef first fill-opacity:0
	classDef last fill:#bfb6fc


--- LangGraph ASCII Representation ---
+-----------+  
| __start__ |  
+-----------+  
      *        
      *        
      *        
+----------+   
| generate |   
+----------+   
      *        
      *        
      *        
 +---------+   
 | __end__ |   
 +---------+   

--- Invoking Reflection Agent with: 'Write a tweet about how AI is changing the human lives!' ---

--- GENERATE NODE: Producing tweet ---

--- Should Continue: YES (Needs more reflection) ---

--- GENERATE NODE: Producing tweet ---

--- Should Continue: YES (Needs more reflection) ---

--- GENERATE NODE: Producing tweet ---

--- Sho