# Prompt Templates

Now that I have my vector database setup, I want to pull down the most likely schema and table and save the metadata as variables. Then I think the best way to proceed to is to create a prompt template that accepts the variables and allows a consistent entry point to the model.

## Imports

In [1]:
import pandas as pd
import os
from dotenv import load_dotenv

from langchain.vectorstores import Chroma
from langchain.embeddings import HuggingFaceEmbeddings
from langchain import HuggingFaceHub

from langchain import SQLDatabase, SQLDatabaseChain
from langchain.chains import SQLDatabaseSequentialChain
from langchain.prompts.prompt import PromptTemplate

## Load Top Results from VectorDB

In [2]:
#setup embeddings using HuggingFace and the directory location
embeddings  = HuggingFaceEmbeddings()
persist_dir = '../data/processed/chromadb/schema-table-split'

# load from disk
vectordb = Chroma(persist_directory=persist_dir, embedding_function=embeddings)

In [3]:
#run prompt as query and get most likely results
query = "How many heads of the departments are older than 56?" #first prompt from the training data

result = vectordb.similarity_search(query, k=1)

In [4]:
result

[Document(page_content='Schema Table: department management head', metadata={'schema': 'department_management', 'table': 'head', 'columns': '["head_id", "name", "born_state", "age"]'})]

## Save Metadata to Variables

In [5]:
top_schema = result[0].metadata['schema']
top_table = result[0].metadata['table']
table_cols = result[0].metadata['columns']

print(top_schema, top_table, table_cols)

department_management head ["head_id", "name", "born_state", "age"]


## Build SQL Agent Through HF Hub

The docs on langchain are focused around either accessing the OpenAI API or a local model. I'm hoping I can mix both and access a Hugging Face model through an API without having to store it locally. I think I can do this through the Hugging Face Hub, by combined the steps in different parts of the documentation:
- [Langchain - Hugging Face Hub](https://python.langchain.com/docs/modules/model_io/models/llms/integrations/huggingface_hub)
- [Langchain - SQL Agents](https://python.langchain.com/docs/modules/chains/popular/sqlite)

I do have the start of the locally stored method commented out below.

### Load API Token

In [6]:
load_dotenv()
hf_api_token = os.getenv('hf_token')

In [7]:
#add path to HF repo
repo_id = 'tiiuae/falcon-7b-instruct'

### Point to LLM Model

In [8]:
#establish llm model
llm = HuggingFaceHub(repo_id=repo_id, huggingfacehub_api_token=hf_api_token, model_kwargs={"temperature": 0.5, "max_length": 64})

In [9]:
help(HuggingFaceHub)

Help on class HuggingFaceHub in module langchain.llms.huggingface_hub:

class HuggingFaceHub(langchain.llms.base.LLM)
 |  HuggingFaceHub(*, cache: Optional[bool] = None, verbose: bool = None, callbacks: Union[List[langchain.callbacks.base.BaseCallbackHandler], langchain.callbacks.base.BaseCallbackManager, NoneType] = None, callback_manager: Optional[langchain.callbacks.base.BaseCallbackManager] = None, tags: Optional[List[str]] = None, client: Any = None, repo_id: str = 'gpt2', task: Optional[str] = None, model_kwargs: Optional[dict] = None, huggingfacehub_api_token: Optional[str] = None) -> None
 |  
 |  Wrapper around HuggingFaceHub  models.
 |  
 |  To use, you should have the ``huggingface_hub`` python package installed, and the
 |  environment variable ``HUGGINGFACEHUB_API_TOKEN`` set with your API token, or pass
 |  it as a named parameter to the constructor.
 |  
 |  Only supports `text-generation`, `text2text-generation` and `summarization` for now.
 |  
 |  Example:
 |      ..

### Setup Simple SQL Agent
Langchain has some awesome features, but I'll start with a simple chain that points to one table.

First, I need to use the output of my vector db query to point to the sqlite database we want to work with.

#### Point to sqlite db location

In [10]:
BASE_DIR = os.path.dirname(os.path.abspath('../data/processed/db/department_management.sqlite'))
db_path = os.path.join(BASE_DIR, "department_management.sqlite")

db_path

'/Users/brettly/Sboard/projects/text-to-sql/data/processed/db/department_management.sqlite'

#### Establish db

In [11]:
db = SQLDatabase.from_uri("sqlite:///" + db_path)

#### Create Prompt Template
I'll even specify the single table here to really spoon feed the model.

In [46]:
sql_agent_prompt_template = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return and describe the answer.
Use the following format:

Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result of the SQLQuery
Answer: Final answer here

Only use the following tables:
{table_info}

Question: {input}"""

In [47]:
sql_prompt = PromptTemplate(
    input_variables=["input", "table_info", "dialect"], template=sql_agent_prompt_template)

#### Setup db_chain

In [48]:
db_chain = SQLDatabaseChain.from_llm(llm, db, prompt=sql_prompt, verbose=True)

#### Test with our simple question

In [49]:
db_chain.run(sql_prompt.format(input="How many heads of the departments are older than 56?", table_info="head", dialect="sqlite"))



[1m> Entering new  chain...[0m
Given an input question, first create a syntactically correct sqlite query to run, then look at the results of the query and return and describe the answer.
Use the following format:

Question: Question here
SQLQuery: SQL Query to run
SQLResult: Result of the SQLQuery
Answer: Final answer here

Only use the following tables:
head

Question: How many heads of the departments are older than 56?
SQLQuery:[32;1m[1;3mSELECT COUNT(*) FROM head WHERE age > 56[0m
SQLResult: [33;1m[1;3m[(5,)][0m
Answer:[32;1m[1;3m5 heads of the departments are older than 56.

Question:[0m
[1m> Finished chain.[0m


'5 heads of the departments are older than 56.\n\nQuestion:'