### Description
**Model:** shrinath-suresh/alpaca-lora-all-7B<br />
**Tools:** PyTorch Index, HF Dataset<br />
**With COT:** Yes<br />
**Chat History:** Yes<br />


In [8]:
# !pip install peft InstructorEmbedding

In [1]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [2]:
import os
import json
import uuid
import torch
import requests
from typing import Any, List, Mapping, Optional
from dotenv import load_dotenv
import langchain 
from threading import Thread
from tqdm import tqdm
from peft import PeftModel
from IPython.display import Markdown, display

from pydantic import BaseModel, Field
from transformers import pipeline, TextStreamer, TextIteratorStreamer, LlamaTokenizer, LlamaForCausalLM

from llama_index.indices.composability import ComposableGraph
from llama_index.prompts.prompts import QuestionAnswerPrompt, RefinePrompt
from llama_index.langchain_helpers.memory_wrapper import GPTIndexChatMemory
from llama_index.langchain_helpers.agents import LlamaToolkit, create_llama_chat_agent, IndexToolConfig, LlamaIndexTool
from llama_index import download_loader, SummaryPrompt, LLMPredictor, GPTListIndex, GPTVectorStoreIndex, PromptHelper, load_index_from_storage, StorageContext, ServiceContext, LangchainEmbedding, SimpleDirectoryReader
from llama_index.langchain_helpers.text_splitter import TokenTextSplitter
from llama_index.node_parser import SimpleNodeParser, NodeParser
from llama_index.vector_stores import ChromaVectorStore


from langchain.tools import BaseTool, StructuredTool, tool
from langchain.prompts import MessagesPlaceholder
from langchain.agents import Tool, AgentOutputParser
from langchain import OpenAI, LLMChain
from langchain.llms.base import LLM
from langchain.llms import HuggingFacePipeline
from langchain import PromptTemplate
from langchain.callbacks import tracing_enabled
from langchain.agents import initialize_agent, LLMSingleActionAgent, StructuredChatAgent, ConversationalChatAgent, ConversationalAgent, AgentExecutor, ZeroShotAgent, AgentType
from langchain.embeddings import HuggingFaceEmbeddings, HuggingFaceInstructEmbeddings
from langchain.chains.conversation.memory import ConversationBufferMemory, ConversationStringBufferMemory, ConversationBufferWindowMemory
from langchain.memory.chat_memory import ChatMessageHistory
from langchain.memory import ConversationKGMemory
from langchain.memory.chat_message_histories import RedisChatMessageHistory
from langchain.cache import RedisSemanticCache

load_dotenv()

INFO:numexpr.utils:Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.

Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)


True

In [3]:
# define prompt helper
# set maximum input size
max_input_size = 2048
# set number of output tokens
num_output = 512
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

In [4]:
api = ""

tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
streamer = TextStreamer(tokenizer, skip_prompt=True, Timeout=5)

import huggingface_hub as hf_hub

hf_hub.login(token=api)

## loading llama base model and configuring it with adapter

base_model_name = 'decapoda-research/llama-7b-hf'

base_model = LlamaForCausalLM.from_pretrained(
            base_model_name,
            torch_dtype=torch.float16,
            device_map="auto",
        )

model = LlamaForCausalLM.from_pretrained("shrinath-suresh/alpaca-lora-all-7B",low_cpu_mem_usage=True, use_auth_token=api,device_map="auto")
# model = LlamaForCausalLM.from_pretrained("shrinath-suresh/alpaca-lora-7b-answer-summary",low_cpu_mem_usage=True, use_auth_token=api,device_map="auto")
# model = LlamaForCausalLM.from_pretrained("shrinath-suresh/alpaca-lora-7b-context-summary",low_cpu_mem_usage=True, use_auth_token=api,device_map="auto")
# model = LlamaForCausalLM.from_pretrained("tloen/alpaca-lora-7b",low_cpu_mem_usage=True, use_auth_token=api,device_map="auto")

class CustomLLM(LLM):
    model_name = 'shrinath-suresh/alpaca-lora-all-7B'
    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        print(":::::::::::::::::::::::::::", prompt)
        inputs = tokenizer([prompt], return_tensors="pt")

        # Run the generation in a separate thread, so that we can fetch the generated text in a non-blocking way.
        response = model.generate(**inputs, streamer=streamer, top_p=0.75, max_new_tokens=num_output)
        response = tokenizer.decode(response[0])
        return response

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

llm_predictor = LLMPredictor(llm=CustomLLM())

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.


Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /home/ubuntu/.cache/huggingface/token
Login successful
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.


Loading checkpoint shards:   0%|          | 0/33 [00:00<?, ?it/s]

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [5]:
# inputs = tokenizer(["How can one do model parallel training in PyTorch"], return_tensors="pt")

# # Run the generation in a separate thread, so that we can fetch the generated text in a non-blocking way.
# response = model.generate(**inputs, streamer=streamer, top_p=0.75, max_new_tokens=num_output)
# response = tokenizer.decode(response[0])
# print(response)

In [6]:
HF_embed_model = HuggingFaceInstructEmbeddings(model_name="hkunlp/instructor-xl")
embed_model = LangchainEmbedding(HF_embed_model)

INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: hkunlp/instructor-xl
Load pretrained SentenceTransformer: hkunlp/instructor-xl
load INSTRUCTOR_Transformer
max_seq_length  512
INFO:sentence_transformers.SentenceTransformer:Use pytorch device: cuda
Use pytorch device: cuda


In [8]:
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper, embed_model=embed_model)

In [9]:
import llama_index
from llama_index.storage.docstore import SimpleDocumentStore
from llama_index.storage.index_store import SimpleIndexStore
from llama_index.vector_stores import SimpleVectorStore
storage_context = StorageContext.from_defaults(
    docstore=SimpleDocumentStore.from_persist_dir(persist_dir="/home/ubuntu/pytorch_docs_512"),
    vector_store=SimpleVectorStore.from_persist_dir(persist_dir="/home/ubuntu/pytorch_docs_512"),
    index_store=SimpleIndexStore.from_persist_dir(persist_dir="/home/ubuntu/pytorch_docs_512"),
)
new_index = load_index_from_storage(storage_context, service_context=service_context)


INFO:llama_index.indices.loading:Loading all indices.
Loading all indices.
Loading all indices.


In [10]:
# query_engine = new_index.as_query_engine(service_context=service_context)
# response = query_engine.query("How can one do model parallel training in PyTorch")


In [11]:
API_TOKEN="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"

from transformers.tools import Tool
from huggingface_hub import list_models, ModelFilter
import huggingface_hub as hf_hub
import re

class get_query(BaseModel):
    query: str = Field(default=None, description="The name of the model and task for which you want the datasets.")

@tool
def get_datasets(query: str) -> str:
    """Fetches dataset for a given model and task"""
    print("Log::::::::::::::::::::::::::::", query, type(query))
    pattern = r'model=(.*?)\s*(?:and\s*)?task=(.*?)$'
    match = re.search(pattern, query)
    model_name = match.group(1)
    task = match.group(2)
    print("Log::::::::::::::::::::::::::::", model_name, task)

    filter = ModelFilter(library="pytorch", model_name=model_name, task=task)

    models = list_models(filter=filter, token=API_TOKEN)    
    if not models:
        return f"{model_name} Not Found!"
    model = [model for model in models if model.id == model_name]
    datasets = [string.split(":")[1] for string in model[0].tags if "dataset:" in string]
    if not datasets:
        return f"No Datasets available for {model_name}"
    print("Log::::::::::::::::::::::::::::",",".join(datasets))
    return ",".join(datasets)



In [39]:
search_tools = [
    StructuredTool.from_function(
        func=get_datasets,
        name="Get Model Datasets",
        args_schema=get_query,
        return_direct=True,
        description=f"A search engine useful for answering questions about fetching datasets. For example the input should be like model=model and task=task. Do not use this tool with the same input/query",
    ),
]
index_tools = [
    LlamaIndexTool.from_tool_config(
        IndexToolConfig(
            name = "PyTorch Index",
            query_engine=new_index.as_query_engine(similarity_top_k=3, response_mode="compact", service_context=service_context),
            description=f"useful for answering questions related to pytorch. Do not use this tool for fetching dataset. Do not use this tool with the same input/query",
            return_direct=True
        )
    ),
]

# ALL_TOOLS = index_tools
ALL_TOOLS = search_tools+index_tools

In [40]:
ALL_TOOLS[0].description = ALL_TOOLS[0].description.split(' - ')[1]

In [41]:
from langchain.vectorstores import FAISS
from langchain.schema import Document

docs = [Document(page_content=t.description, metadata={"index": i}) for i, t in enumerate(ALL_TOOLS)]
print(docs)
vector_store = FAISS.from_documents(docs, HF_embed_model)

retriever = vector_store.as_retriever()

def get_tools(query):
    docs = retriever.get_relevant_documents(query)
    return [ALL_TOOLS[d.metadata["index"]] for d in docs]

get_tools("give me the supported datasets for the model=bert-base-uncased and task=fill-mask")


[Document(page_content='A search engine useful for answering questions about fetching datasets. For example the input should be like model=model and task=task. Do not use this tool with the same input/query', metadata={'index': 0}), Document(page_content='useful for answering questions related to pytorch. Do not use this tool for fetching dataset. Do not use this tool with the same input/query', metadata={'index': 1})]


[StructuredTool(name='Get Model Datasets', description='A search engine useful for answering questions about fetching datasets. For example the input should be like model=model and task=task. Do not use this tool with the same input/query', args_schema=<class '__main__.get_query'>, return_direct=True, verbose=False, callbacks=None, callback_manager=None, func=StructuredTool(name='get_datasets', description='get_datasets(query: str) -> str - Fetches dataset for a given model and task', args_schema=<class 'pydantic.main.get_datasetsSchemaSchema'>, return_direct=False, verbose=False, callbacks=None, callback_manager=None, func=<function get_datasets at 0x7f08602bb760>, coroutine=None), coroutine=None),
 LlamaIndexTool(name='PyTorch Index', description='useful for answering questions related to pytorch. Do not use this tool for fetching dataset. Do not use this tool with the same input/query', args_schema=None, return_direct=False, verbose=False, callbacks=None, callback_manager=None, quer

In [42]:
import re
from langchain.prompts import StringPromptTemplate
from typing import Callable
from typing import List, Union
from langchain.schema import AgentAction, AgentFinish

class CustomOutputParser(AgentOutputParser):
    
    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        # Check if agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not recommended to try anything else at the moment :)
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )
        # Parse out the action and action input
        regex = r"Action\s*\d*\s*:(.*?)\nAction\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)
        # Return the action and action input
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)
output_parser = CustomOutputParser()


In [50]:
# template_with_history = """Act like you are an expert PyTorch Engineer and provide answers to these questions from the developer community. 
# If you don't know the answer say "I am not sure about this, can you post the question on pytorch-discuss channel", don't make up an answer if you don't know. 

# You have access to the following tools:

# {tools}

# Previous conversation history:
# {history}

# New question: {input}
# {agent_scratchpad}"""

# template_with_history = """Act like you are an expert PyTorch Engineer and provide answers to these questions from the developer community. 
# If you don't know the answer say "I am not sure about this, can you post the question on pytorch-discuss channel", don't make up an answer if you don't know. 


# Previous conversation history:
# {history}

# New question: {input}"""


# Set up the base template
template_with_history = """Act like you are an expert PyTorch Engineer and provide answers to these questions from the developer community. 
If you don't know the answer say "I am not sure about this, can you post the question on pytorch-discuss channel", don't make up an answer if you don't know. 

You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat 1 times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin! Remember to speak as a pirate when giving your final answer. Use lots of "Arg"s

Previous conversation history:
{history}

New question: {input}
{agent_scratchpad}"""

class CustomPromptTemplate(StringPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools_getter: Callable
    
    def format(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)
        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "
        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts
        ############## NEW ######################
        tools = self.tools_getter(kwargs["input"])
        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in tools])
        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in tools])
        return self.template.format(**kwargs)

prompt_with_history = CustomPromptTemplate(
    template=template_with_history,
    tools_getter=get_tools,
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
    # This includes the `intermediate_steps` variable because that is needed
    input_variables=["input", "intermediate_steps", "history"]
)


In [51]:
tools = get_tools("")
tool_names = [tool.name for tool in tools]

# tools = []
# tool_names = []

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# llm = OpenAI(streaming=True, temperature=0, callbacks=[StreamingStdOutCallbackHandler()], openai_api_key="sk-")
llm = CustomLLM()

llm_chain = LLMChain(llm=llm, prompt=prompt_with_history)

agent = LLMSingleActionAgent(llm_chain=llm_chain, output_parser=output_parser, stop=["\nObservation: "], allowed_tools=tool_names)

In [52]:
session_id = uuid.uuid4()

message_history = RedisChatMessageHistory(str(session_id), 'redis://localhost:6379/0', ttl=600)
memory = ConversationBufferWindowMemory(k=3, memory_key="history", return_messages=True, chat_memory=message_history)

agent_chain = AgentExecutor.from_agent_and_tools(agent=agent, streaming=True, tools=tools, verbose=False, memory=memory)


In [61]:
memory.clear()
message_history.clear()

In [62]:
queries = [
    "How do I check if PyTorch is using the GPU?",
    "How do I save a trained model in PyTorch?",
    "What does .view() do in PyTorch?",
    "Why do we need to call zero_grad() in PyTorch?",
    "How do I print the model summary in PyTorch?",
    "How do I initialize weights in PyTorch?",
    "What does model.eval() do in pytorch?",
    "What's the difference between reshape and view in pytorch?",
    "What does model.train() do in PyTorch?",
    "What does .contiguous() do in PyTorch?",
    "Why do we ""pack"" the sequences in PyTorch?",
    "Check the total number of parameters in a PyTorch model",
    "RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same",
    "pytorch - connection between loss.backward() and optimizer.step()",
    "PyTorch preferred way to copy a tensor",
    "Pytorch tensor to numpy array",
    "Pytorch, what are the gradient arguments",
    "How to fix RuntimeError 'Expected object of scalar type Float but got scalar type Double for argument'?",
    "What does the gather function do in pytorch in layman terms?",
    "How to avoid ""CUDA out of memory"" in PyTorch",
]

In [63]:
response = agent_chain.run(input="How to check if  pytorch is using gpu")
print("=========================================")
print(response)
print("=========================================")

::::::::::::::::::::::::::: Act like you are an expert PyTorch Engineer and provide answers to these questions from the developer community. 
If you don't know the answer say "I am not sure about this, can you post the question on pytorch-discuss channel", don't make up an answer if you don't know. 

You have access to the following tools:

PyTorch Index: useful for answering questions related to pytorch. Do not use this tool for fetching dataset. Do not use this tool with the same input/query
Get Model Datasets: A search engine useful for answering questions about fetching datasets. For example the input should be like model=model and task=task. Do not use this tool with the same input/query

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [PyTorch Index, Get Model Datasets]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/