##### Note: It is recommended to implement the notebook on Google Colab Pro with GPU support. Colab Pro offers enhanced computational resources, including faster GPUs, longer runtimes, and higher memory limits, making it ideal for training and inference with large models like LLaMA. Utilizing a GPU, such as NVIDIA T4 or L4, significantly accelerates processing, enabling efficient model execution and reducing latency in generating responses. To ensure smooth implementation, select a GPU runtime in Colab settings before running the notebook

#### Refer to the legacy LangChain documentation for query handling in the codebase, as the updated documentation now recommends using LangGraph for implementing AI agents within the LangChain framework.

#### Installing the necessary libraries for agent implementation.

In [2]:
!pip install -q transformers torch langchain huggingface_hub langchain_core langchain_community

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.6/50.6 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m407.7/407.7 kB[0m [31m34.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m79.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.7/296.7 kB[0m [31m25.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.0/78.0 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.5/49.5 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

#### Insert hugging face token to import the model and tokenizer from Hugging Face 

In [6]:

!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: write).
Your token has been saved to /root/.cache/huggingface/token
Login successful


#### Access to Meta’s LLaMA models on Hugging Face requires explicit approval, as Meta restricts access to all versions of LLaMA. You must first request access on Hugging Face, after which the model can be imported and used.

In [6]:
import os
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.llms.base import LLM
from langchain.agents import Tool, AgentExecutor
from langchain.prompts import PromptTemplate
from langchain.agents import BaseSingleActionAgent
from langchain.schema import AgentAction, AgentFinish
from typing import Optional, List, Any, Union
import re
import logging

# Set up basic logging configuration to track execution and debug issues
logging.basicConfig(level=logging.INFO)

class CustomAgent(BaseSingleActionAgent):
    """
    Custom agent implementation that decides whether to perform actions or provide information.
    Inherits from LangChain's BaseSingleActionAgent for integration with the LangChain framework.
    """
    llm: Any  # The language model instance
    tools: List[Tool]  # List of available tools for the agent to use

    @property
    def input_keys(self):
        """Define the expected input keys for the agent."""
        return ["input"]

    def plan(self, intermediate_steps: List[tuple], **kwargs) -> Union[AgentAction, AgentFinish]:
        """
        Determine the next action based on the input query.

        Args:
            intermediate_steps: List of previous actions and results (unused in this implementation)
            kwargs: Must contain 'input' key with the user's query

        Returns:
            Either an AgentAction to perform a task or AgentFinish with the final answer
        """
        query = kwargs["input"]

        # Get response from LLM based on the input query
        response = self.llm._call(query)

        # Parse response to decide if an action is needed
        if "Category: Actionable Task" in response:
            # If response indicates an actionable task, return AgentAction
            return AgentAction("PerformAction", query, response)
        else:
            # If response indicates an information request, return AgentFinish
            return AgentFinish(
                return_values={"output": response},
                log=response,
            )

    async def aplan(self, intermediate_steps: List[tuple], **kwargs) -> Union[AgentAction, AgentFinish]:
        """Async planning method - not implemented in this version."""
        raise NotImplementedError("Async planning is not implemented for this agent")

class CustomHuggingFaceLLM(LLM):
    """
    Custom LLM implementation that wraps a Hugging Face model pipeline.
    Formats queries and responses in a structured way for consistent interaction.
    """
    pipeline: Any  # The Hugging Face pipeline instance

    @property
    def _llm_type(self) -> str:
        """Identifier for the custom LLM type."""
        return "custom_huggingface"

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        """
        Process the input prompt through the language model.

        Args:
            prompt: User's input query
            stop: Optional stop sequences (unused in this implementation)

        Returns:
            Structured response string containing the answer, category, and action
        """
        # Format the prompt with explicit instructions for the model (You can customize it according to your desired format)
        formatted_prompt = f"""Please analyze the following query and respond in the exact format shown:

Query: {prompt}

Provide your response in this exact format:
Response: [Answer the user query]
Category: [Must be one of Information Request/Actionable Task/Clarification Needed based on user query]
Action Taken: [Action description in case of Actionable Task or None]
"""

        # Generate response using the Hugging Face pipeline
        outputs = self.pipeline(
            formatted_prompt,
            max_new_tokens=512,
            do_sample=True,
            temperature=0.7,  # Controls randomness in generation
            top_p=0.95,      # Nucleus sampling parameter
            repetition_penalty=1.1,  # Reduces repetition in generated text
            return_full_text=False
        )

        generated_text = outputs[0]['generated_text']

        # Ensure the response follows the required format
        if "Response:" not in generated_text:
            generated_text = f"Response: {generated_text}"
        if "Category:" not in generated_text:
            generated_text += "\nCategory: Information Request"
        if "Action Taken:" not in generated_text:
            generated_text += "\nAction Taken: None"

        return generated_text.strip()

def setup_model():
    """
    Initialize and configure the Hugging Face model and tokenizer.
    Returns:
        CustomHuggingFaceLLM instance ready for use
    """
    model_name = "meta-llama/Llama-3.2-1B-Instruct"  # Change to your preferred model

    # Initialize tokenizer with authentication
    tokenizer = AutoTokenizer.from_pretrained(
        model_name,
        token=os.environ["HUGGING_FACE_HUB_TOKEN"]
    )

    # Initialize model with appropriate settings for available hardware
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
        device_map="auto",  # Automatically choose the best device configuration
        token=os.environ["HUGGING_FACE_HUB_TOKEN"]
    )

    # Configure tokenizer padding settings
    if tokenizer.pad_token_id is None:
        tokenizer.pad_token_id = tokenizer.eos_token_id

    # Ensure model config matches tokenizer settings
    model.config.pad_token_id = tokenizer.pad_token_id
    model.config.eos_token_id = tokenizer.eos_token_id

    # Create the text generation pipeline
    text_generation_pipeline = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
    )

    return CustomHuggingFaceLLM(pipeline=text_generation_pipeline)

def get_answer(query: str) -> str:
    """
    Process general questions using the LLM.

    Args:
        query: User's question or request

    Returns:
        Formatted response from the LLM
    """
    response = llm._call(query)
    return response

def perform_action(query: str) -> str:
    """
    Handle actionable tasks based on the query.

    Args:
        query: Description of the action to perform

    Returns:
        Status message about the action performed
    """
    # Placeholder for actual action implementation
    return f"Action performed: {query}"

def main():
    """
    Main application loop that sets up the agent and processes user queries.
    Handles model initialization, user interaction, and cleanup.
    """
    try:
        print("Initializing model...")
        global llm
        llm = setup_model()

        # Define available tools for the agent
        tools = [
            Tool(
                name="GetAnswer",
                func=get_answer,
                description="Use this for general questions and information requests."
            ),
            Tool(
                name="PerformAction",
                func=perform_action,
                description="Use this for actionable tasks that require specific actions."
            )
        ]

        # Initialize the custom agent with tools
        agent = CustomAgent(
            llm=llm,
            tools=tools
        )

        # Create the agent executor with the specified configuration
        agent_executor = AgentExecutor.from_agent_and_tools(
            agent=agent,
            tools=tools,
            verbose=True,
            max_iterations=3  # Limit the number of tool uses per query
        )

        print("\nAgent ready! Enter your queries (type 'quit' to exit)")

        # Main interaction loop
        while True:
            query = input("\nEnter query: ").strip()
            if query.lower() in ['quit', 'exit']:
                break

            try:
                print("Processing query...")
                response = agent_executor.invoke({"input": query})

                print("\nProcessed Response:")
                print(response["output"])

            except Exception as e:
                print(f"\nError: {str(e)}")
                print("Please try again with a different query.")

    except KeyboardInterrupt:
        print("\nExiting...")
    except Exception as e:
        print(f"\nFatal error: {str(e)}")
    finally:
        # Clean up GPU memory if available
        if torch.cuda.is_available():
            torch.cuda.empty_cache()

if __name__ == "__main__":
    main()


Initializing model...

Agent ready! Enter your queries (type 'quit' to exit)
Processing query...


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mResponse: Result: [Answer or Response to the query]

Note: This is an example for demonstration purposes.

## Step 1: Analyze the Query
The given query is "Make a sandwich". It's a simple request that asks for the action of creating a sandwich.

## Step 2: Provide the Response
A possible response to the query could be "You can make a sandwich by combining various ingredients such as bread, meats, cheeses, vegetables, and condiments."

## Step 3: Identify Category
Since the query is about making something (a sandwich), it falls under the category of "Information Request".

## Step 4: Determine Action Taken
No specific action is required since the task is simply asking for information. However, if we consider a more detailed response like providing recipes or cooking instructions, then an actionable task would be taken.

## Step 5: 