# Custom Agent

I had limited success with the built-in functions, so I'll test a custom flow, really a combination of tweaked langchain functions.

I think part of this could be more specialized schema and table extraction, but for now I think I'll stick with my single schema example and try to expand later.

### Imports

In [18]:
import pandas as pd
import os
from dotenv import load_dotenv

from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain import HuggingFaceHub, SQLDatabase

### Setup query
We'll use this to input into the model

In [3]:
question = "How many heads of the departments are older than 56?"

### Load Target Tablor from Vectordb

I could just point to it, but I do want to make sure this whole flow works.

In [4]:
#setup embeddings using HuggingFace and the directory location
embeddings  = HuggingFaceEmbeddings()
persist_dir = '../data/processed/chromadb/schema-table-split'

# load from disk
vectordb = Chroma(persist_directory=persist_dir, embedding_function=embeddings)

#run similarity search
top_target = vectordb.similarity_search(question, k=1) #just return the top for now

In [5]:
# Create variables
top_schema = top_target[0].metadata['schema']
top_table = top_target[0].metadata['table']
table_cols = top_target[0].metadata['columns']

print(top_schema, top_table, table_cols)

department_management head ["head_id", "name", "born_state", "age"]


In [19]:
BASE_DIR = os.path.dirname(os.path.abspath('../data/processed/db/'+top_schema+'.sqlite'))
db_path = os.path.join(BASE_DIR, "department_management.sqlite")

db = SQLDatabase.from_uri("sqlite:///" + db_path)

### Setup LLM
I'll continue to use the falcon-7b.

In [6]:
#get api key
load_dotenv()
hf_api_token = os.getenv('hf_token')

#add path to HF repo
repo_id = 'tiiuae/falcon-7b-instruct'

#establish llm model
llm = HuggingFaceHub(repo_id=repo_id, huggingfacehub_api_token=hf_api_token, model_kwargs={"temperature": 0.5, "max_length": 1500})

## Create Agent

https://python.langchain.com/docs/modules/agents/

Two types ->
- Action: at each step, decide on the next action using the outputs of the previous steps
- Plan & Execute: decide on the full sequence up front, then execute them all without updating the plan

Plan & Execute is better for larger problems because it maintains focus. Often it's best to combine and let P&E agents use action agents to execute plans.

Tools -> actions an agent can take
Toolkit -> wrapper around collections of tools for specifc use cases. Will need this for sql, tool to inspect tables and tool to run queries.

#### Action Agents

Wrapped in agent executors - call the agent, get back an action and action input, call the referenced tool, take the output of it and return it back to the agent.

Typically has a prompt template, language model, and output parser

#### Plan-and-Execute Agents

Typically have the language model be the planner and the executor be an action agent.

#### Built-In Agent Observations

It appears the built-in model using an action agent - I'll try to tweak it. Maybe even by splitting out the steps into agents and trying the combo type with a larger plan & execute operating triggering the action agents.

### Agent Imports

In [23]:
from langchain.agents import initialize_agent, AgentType, load_tools
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.prompts import PromptTemplate

### Setup Toolkit

I'll use the standard langchain one.

In [20]:
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

### Create Prompt Template - Create SQL Query

### Create Prompt Template - Check the SQL Query

From here: https://canvasapp.com/blog/text-to-sql-in-production

In [25]:
prompt = PromptTemplate(
    input_variables=["question", "sql_dialect", "current_day", "sql"],
    template="""
        You are a helpful AI that verifies that a SQL query runs correctly. If the query does not run successfully, you iteratively update the query based on the error message.
        You follow the following procedure:
            1. Run the input SQL query using the QueryTool
            2. If this runs successfully, return immediately.
            3. If the SQL query fails, debug the query:
                3a. consider the error message
                3b. update the SOL query
                3c. try re-running
            4. Repeat step 3 until you have a valid result. Finish and exit with the updated SQL query.
        You are writing SQL for the {sql_dialect} dialect.
        The current day is {current_day} in case the user references the date
        The user's original question: {question}
        The SQL: {sql}
        Begin!
    """
)

In [27]:
formatted_prompt = prompt.format(
    question=question,
    sql=sql,
    sql_dialect=sql_dialect,
)

NameError: name 'sql' is not defined

In [8]:
agent = initialize_agent(toolkit.get_query_tools(), llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,)

NameError: name 'toolkit' is not defined