# Local Model Notebook loader
## This is for people who want to test langchain or other agent/agi related code in a notebook


## ⚠️Llama-cpp users🦙⚠️
If you are using Llama-cpp you can skip down to the llama cpp cell

If your Llama uses gpu then dont skip
# Text-generation-webui related code
## Load Required Libraries and Modules
The first step is to load all the required libraries and modules:

In [None]:
!pip install langchain

In [1]:
import sys
sys.argv = [sys.argv[0]]
import os
import re
import time
import json
from pathlib import Path
from transformers import LlamaTokenizer, LlamaForCausalLM, GenerationConfig, pipeline
from langchain.llms import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain
sys.path.append(str(Path().resolve().parent / "modules"))
from modules import api, chat, shared, training, ui
from modules.html_generator import chat_html_wrapper
from modules.LoRA import add_lora_to_model
from modules.models import load_model, load_soft_prompt
from modules.text_generation import generate_reply, stop_everything_event
import torch
torch.cuda.set_device(0)


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: C:\Users\admin\Documents\oobabooga-windows\installer_files\env\bin\cudart64_110.dll
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary C:\Users\admin\Documents\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll...


# Parameters and command-line flags

input your command line arguments like you would when launching server.py [complete list](https://github.com/oobabooga/text-generation-webui#basic-settings)

Example: --auto-devices --wbits 4 --groupsize 128 --no-stream


In [2]:
from modules.shared import parser

def parse_input_string(input_string):
    input_args = input_string.split()
    return parser.parse_args(input_args)

input_string = input('Enter args string: ')
shared.args = parse_input_string(input_string)
# Load custom settings from a JSON file
settings_file = None
if shared.args.settings is not None and Path(shared.args.settings).exists():
    settings_file = Path(shared.args.settings)
elif Path('settings.json').exists():
    settings_file = Path('settings.json')

if settings_file is not None:
    print(f"Loading settings from {settings_file}...")
    new_settings = json.loads(open(settings_file, 'r').read())
    for item in new_settings:
        shared.settings[item] = new_settings[item]

shared.settings['seed'] = -1


Enter args string: --auto-devices --wbits 4 --groupsize 128 --no-stream


# Choose your model

In [3]:
# Function to get available models
def get_available_models():
    if shared.args.flexgen:
        return sorted([re.sub('-np$', '', item.name) for item in list(Path(f'{shared.args.model_dir}/').glob('*')) if item.name.endswith('-np')], key=str.lower)
    else:
        return sorted([re.sub('.pth$', '', item.name) for item in list(Path(f'{shared.args.model_dir}/').glob('*')) if not item.name.endswith(('.txt', '-np', '.pt', '.json'))], key=str.lower)

# Get the list of available models
available_models = get_available_models()

# Set the model name
if shared.args.model is not None:
    shared.model_name = shared.args.model
else:
    if len(available_models) == 0:
        print('No models are available! Please download at least one.')
        sys.exit(0)
    elif len(available_models) == 1:
        i = 0
    else:
        print('The following models are available:\n')
        for i, model in enumerate(available_models):
            print(f'{i+1}. {model}')
        print(f'\nWhich one do you want to load? 1-{len(available_models)}\n')
        i = int(input()) - 1
        print()
    shared.model_name = available_models[i]


The following models are available:

1. anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g
2. chavinlo_alpaca-native
3. gozfarb_oasst-llama13b-4bit-128g
4. llama-13b-ggml-q4_0
5. llama-30b-4bit-128g
6. llama-30b-sft-oa-alpaca-epoch-2
7. llama-7b
8. MetaIX_GPT4-X-Alpaca-30B-Int4
9. vicuna-13b-GPTQ-4bit-128g

Which one do you want to load? 1-9

6



# Load Model and Tokenizer

In [4]:
# Load the model and tokenizer
shared.model, shared.tokenizer = load_model(shared.model_name)

# Add Lora to the model if specified
if shared.args.lora:
    add_lora_to_model(shared.args.lora)

# Set up the tokenizer and model variables
tokenizer = shared.tokenizer
base_model = shared.model

Loading llama-30b-sft-oa-alpaca-epoch-2...
Found the following quantized model: models\llama-30b-sft-oa-alpaca-epoch-2\llama-30b-sft-ao-alpaca-epoch-2-hf-int4-128g.safetensors
Loading model ...


  with safe_open(filename, framework="pt", device=device) as f:
  return self.fget.__get__(instance, owner)()
  storage = cls(wrap_storage=untyped_storage)


Done.
Loaded the model in 22.44 seconds.


## We create a text-generation pipeline with the specified parameters:
Feel free to change these to best fit your model/usage

In [74]:
# Create a text-generation pipeline with the specified parameters
pipe = pipeline(
    "text-generation",
    model=base_model, 
    tokenizer=tokenizer,
    device=0,
    max_length=1200,
    temperature=0.6,
    top_p=0.95,
    repetition_penalty=1.1
)

llm = HuggingFacePipeline(pipeline=pipe)

## The model is now loaded
Run the next cell to test it

In [75]:
text = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Provide 3 potential names for a business that sells toilets.

### Response:

"""
print(llm(text))

1. Toilet World
2. The Porcelain Palace
3. Bathroom Boutique


## If that worked then you can skip down to the Lanchain part
##

# 🦙🦙🦙🦙🦙🦙🦙🦙
# 🦙Llama-cpp users🦙
# 🦙🦙🦙🦙🦙🦙🦙🦙
## If you are just using llama-cpp then follow these steps

A folder containingg your cpp .bin file should be located in the models folder 

Example: "./models/Alpaca-7B-ggml-4bit-LoRA-merged/ggml-model-q4_0.bin"

## Install and Import dependencies

In [None]:
!pip install llama-cpp-python
!pip install langchain

In [None]:
from langchain.llms import LlamaCpp
from langchain import PromptTemplate, LLMChain

# Select Model / Load Model

In [None]:
model_dir = "./models"
import os
# get a list of all folders in the models directory
model_folders = [f for f in os.listdir(model_dir) if os.path.isdir(os.path.join(model_dir, f))]

# print the list of model names with their index starting at 1
for i, model_name in enumerate(model_folders):
    print(f"{i+1}. {model_name}")

# ask the user to select a model by number
selected_index = int(input("Enter the number of the model to select: ")) - 1
selected_model = model_folders[selected_index]

# check if the selected model contains a .bin file and save the path if it does
model_bin = None
for file in os.listdir(os.path.join(model_dir, selected_model)):
    if file.endswith(".bin"):
        model_bin = os.path.join(model_dir, selected_model, file)
        break

if model_bin:
    print(f"Selected model binary: {model_bin}")
else:
    print("No .bin file found in selected model directory.")
    
llm = LlamaCpp(model_path=model_bin)

## The model is now loaded
Run the next cell to test it

In [None]:
text = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Provide 3 potential names for a business that sells toilets.

### Response:

"""
print(llm(text))

## If that worked then you can proceed
##
# 👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇
# ⛓️Begginning of Langchain section⛓️
I stole some of the code from [this colab](https://colab.research.google.com/drive/1VOwJpcZqOXag-ZXi-52ibOx6L5Pw-YJi#scrollTo=nu-AmhDLEK0h) that goes with [this video](https://www.youtube.com/watch?v=LbT1yp6quS8) by Patrick Loeber. I recommend subscribing.

## Custom LLM Agent - Google Search
Google search requires a SERAPI API KEY



In [7]:
!pip install langchain
!pip install google-search-results



In [8]:
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import BaseChatPromptTemplate
from langchain import SerpAPIWrapper, LLMChain
from langchain.chat_models import ChatOpenAI
from typing import List, Union
from langchain.schema import AgentAction, AgentFinish, HumanMessage
import re
import os

## Add SERAPI API Key to the cell below
You can get a free one here https://serpapi.com/users/sign_up?plan=free
It does require phone number. You can skip down to the wikipedia code if you dont want to mess with api keys.

In [9]:
os.environ["SERPAPI_API_KEY"] = ""

In [10]:
# Define which tools the agent can use to answer user queries
search = SerpAPIWrapper()
tools = [
    Tool(
        name = "Search",
        func=search.run,
        description="useful for when you need to answer questions about current events"
    )
]

## Set up the base template. 
This template is based on what kind of instructions llama was trained on. More info [here](https://github.com/tatsu-lab/stanford_alpaca#data-release)

In [11]:
template = """Please read the following instruction and input, and respond with the appropriate actions and final answer.

### Instruction:
Answer the following questions as best you can. When giving the Final Answer speak like a pirate.  You have access to the following tools: {tools}

### Input:
Question: {input}
{agent_scratchpad}


Make sure to include the following elements in your response:
- Thought process
- Action (the name of the tool, one word only, {tool_names})
- Action Input (the input to the action)
- Observation (the result of the action; include the input context here if necessary)
- Final Answer (the final answer to: {input})
"""

In [12]:
# Set up a prompt template
class CustomPromptTemplate(BaseChatPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools: List[Tool]
    
    def format_messages(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)
        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "
        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts
        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])
        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        formatted = self.template.format(**kwargs)
        return [HumanMessage(content=formatted)]

In [13]:
prompt = CustomPromptTemplate(
    template=template,
    tools=tools,
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
    # This includes the `intermediate_steps` variable because that is needed
    input_variables=["input", "intermediate_steps"]
)

In [14]:
class CustomOutputParser(AgentOutputParser):
    
    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        # Check if agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not recommended to try anything else at the moment :)
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )
        # Parse out the action and action input
        regex = r"Action: (.*?)[\n]*Action Input:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)
        # Return the action and action input
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)

In [15]:
output_parser = CustomOutputParser()

In [16]:
# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=prompt)

In [17]:
tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
    llm_chain=llm_chain, 
    output_parser=output_parser,
    stop=["\nObservation:"], 
    allowed_tools=tool_names
)

In [18]:
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)

## 👁️👁️Observe the results in this cell👇


In [20]:
agent_executor.run("When did metal gear solid 3 snake eater come out?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m


Thought process:
Action: Search
Action Input: "metal gear solid 3 snake eater"[0m

Observation:[36;1m[1;3mMetal Gear Solid 3: Snake Eater is a 2004 action-adventure stealth video game developed and published by Konami for the PlayStation 2. It was released in late 2004 in North America and Japan, and in early 2005 in Europe and Australia.[0m[32;1m[1;3m
Final Answer: Arrrrrr, it be released on November 17th, 2004.[0m

[1m> Finished chain.[0m


'Arrrrrr, it be released on November 17th, 2004.'

## Custom LLM Agent - Wikipedia Search



In [None]:
!pip install langchain
!pip install wikipedia

In [76]:
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import BaseChatPromptTemplate
from langchain import SerpAPIWrapper, LLMChain
from langchain.chat_models import ChatOpenAI
from typing import List, Union
from langchain.schema import AgentAction, AgentFinish, HumanMessage
import re
import os

In [77]:
# Define which tools the agent can use to answer user queries
from langchain.utilities import WikipediaAPIWrapper
wikipedia = WikipediaAPIWrapper()
tools = [
    Tool(
        name = "Search",
        func=wikipedia.run,
        description="useful for when you need to look up information about something"
    )
]

## Set up the base template. 
This template is based on what kind of instructions llama was trained on. More info [here](https://github.com/tatsu-lab/stanford_alpaca#data-release)

In [80]:
template = """Please read the following instruction and input, and respond with the appropriate actions and final answer.

### Instruction:
Answer the following questions as best you can. When giving the Final Answer speak like a pirate.  You have access to the following tools: {tools}

### Input:
Question: {input}
{agent_scratchpad}


Make sure to include the following elements in your response:
- Thought process
- Action (the name of the tool, one word only, {tool_names})
- Action Input (the input to the action)
- Observation (the result of the action; include the input context here if necessary)
- Final Answer (the final answer to: {input})
"""

In [81]:
# Set up a prompt template
class CustomPromptTemplate(BaseChatPromptTemplate):
    # The template to use
    template: str
    # The list of tools available
    tools: List[Tool]
    
    def format_messages(self, **kwargs) -> str:
        # Get the intermediate steps (AgentAction, Observation tuples)
        # Format them in a particular way
        intermediate_steps = kwargs.pop("intermediate_steps")
        thoughts = ""
        for action, observation in intermediate_steps:
            thoughts += action.log
            thoughts += f"\nObservation: {observation}\nThought: "
        # Set the agent_scratchpad variable to that value
        kwargs["agent_scratchpad"] = thoughts
        # Create a tools variable from the list of tools provided
        kwargs["tools"] = "\n".join([f"{tool.name}: {tool.description}" for tool in self.tools])
        # Create a list of tool names for the tools provided
        kwargs["tool_names"] = ", ".join([tool.name for tool in self.tools])
        formatted = self.template.format(**kwargs)
        return [HumanMessage(content=formatted)]

In [82]:
prompt = CustomPromptTemplate(
    template=template,
    tools=tools,
    # This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are generated dynamically
    # This includes the `intermediate_steps` variable because that is needed
    input_variables=["input", "intermediate_steps"]
)

In [83]:
class CustomOutputParser(AgentOutputParser):
    
    def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
        # Check if agent should finish
        if "Final Answer:" in llm_output:
            return AgentFinish(
                # Return values is generally always a dictionary with a single `output` key
                # It is not recommended to try anything else at the moment :)
                return_values={"output": llm_output.split("Final Answer:")[-1].strip()},
                log=llm_output,
            )
        # Parse out the action and action input
        regex = r"Action: (.*?)[\n]*Action Input:[\s]*(.*)"
        match = re.search(regex, llm_output, re.DOTALL)
        if not match:
            raise ValueError(f"Could not parse LLM output: `{llm_output}`")
        action = match.group(1).strip()
        action_input = match.group(2)
        # Return the action and action input
        return AgentAction(tool=action, tool_input=action_input.strip(" ").strip('"'), log=llm_output)

In [84]:
output_parser = CustomOutputParser()

In [85]:
# LLM chain consisting of the LLM and a prompt
llm_chain = LLMChain(llm=llm, prompt=prompt)

In [86]:
tool_names = [tool.name for tool in tools]
agent = LLMSingleActionAgent(
    llm_chain=llm_chain, 
    output_parser=output_parser,
    stop=["\nObservation:"], 
    allowed_tools=tool_names
)

In [87]:
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True)

In [88]:
agent_executor.run("who voiced solid snake in metal gear solid 2?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m


Thought process:
Action: Search
Action Input: "who voiced solid snake in metal gear solid 2?"[0m

Observation:[36;1m[1;3mPage: Metal Gear Solid 3: Snake Eater
Summary: Metal Gear Solid 3: Snake Eater is a 2004 action-adventure stealth video game developed and published by Konami for the PlayStation 2. It was released in late 2004 in North America and Japan, and in early 2005 in Europe and Australia. It was the fifth Metal Gear game written and directed by Hideo Kojima and serves as a prequel to the entire Metal Gear series. An expanded edition, titled Metal Gear Solid 3: Subsistence, was released in Japan in late 2005, then in North America, Europe and Australia in 2006. A remastered version of the game was later included in the Metal Gear Solid HD Collection for the PlayStation 3, PlayStation Vita and Xbox 360, while a reworked version, titled Metal Gear Solid: Snake Eater 3D, was released for the Nintendo 3DS in 2012.

'"Who voiced Solid Snake in Metal Gear Solid 2?"\n\nDavid Hayter'