<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
        Building Text-to-Teradata SQL Agents with LangChain and Amazon Bedrock
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style="font-size: 18px; font-family: Arial; color: #00233C;"><b>Introduction</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    In 2023, we witnessed many examples of building text-to-SQL generative AI prototypes, showcasing the potential for business teams to interact with their databases using natural language. However, scaling these generative AI applications for large lakehouse and warehouse environments presents privacy and security challenges, operational inefficiencies, and financial constraints. Today, we can use Amazon Bedrock’s fully managed AWS service that offers a choice of high performing foundation models (FMs), LangChain’s flexible open-source framework, and Teradata Vantage™ analytic warehouse or lakehouse environments to effectively mitigate these challenges.  </p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    Let’s take a closer look at each of these considerations: 
</p>
<ul style="font-size: 16px; font-family: Arial; color: #00233C; list-style-type: none; padding: 0;">
    <li>
        <b>Privacy and Security Challenges:</b>
        <p style="margin-left: 20px;">
           For LLMs (Large Language Models) to create optimal SQL queries, they need access to specific datasets, knowledge bases, or catalogs. This can be problematic when dealing with highly sensitive data that can't be shared with model providers, potentially discouraging further development. 
        </p>
        <div style="padding: 10px; margin-left: 40px; border-left: 4px solid #00BFA5;">
            <b>Solution: Amazon Bedrock</b> is a managed platform that allows developers to select, manage, and customize large language models (LLMs) with data from a single API.  Amazon Bedrock ensures that all proprietary data fed to the selected LLM is managed and protected by AWS, with no data shared with model providers or used to improve base models. This provides a secure way for developing generative AI applications without compromising sensitive data. For this example, we will build our GenAI text-to-TeradataSQL with Anthropic Claude-3 offered via Bedrock.
        </div>
    </li>
    <li>
        <b>Operational Challenges:</b>
<p style="margin-left: 20px;">
The deployment of GenAI apps requires considerable experimentation with a variety of LLMs to identify the most suitable model and configuration. This process can be time-consuming, complex, and costly.
</p>
        <div style="padding: 10px; margin-left: 40px; border-left: 4px solid #00BFA5;">
    <b>Solution: LangChain</b> provides an easy-to-use open-source framework with multiple SQL chains, SQL agents, and tools optimized for querying SQL databases, which speeds up the development process. When coupled with Amazon Bedrock's single API, which interfaces with a variety of chat models, LangChain accelerates the creation of POCs and the deployment of GenAI applications. These two tools together reduce the complexity and time involved in model selection and integration. With Bedrock, we can experiment with models from Anthropic, Cohere, Meta, Mistral AI, and Stability AI via one interface.
    <br><br>
    In addition to these two tools, Teradata Vantage facilitates the analysis of data stored externally in object storage. We can leverage the Teradataml library and its native object storage to read data inside Google Cloud, Azure Blob, and Amazon S3. This minimizes costly data transfer and allows developers to work within Vantage using foreign tables, without needing to transfer data into Vantage or use additional storage in their Vantage environment. 
</div>
    </li>
    <li>
        <b>Financial Challenges:</b>
        <p style="margin-left: 20px;">
            LLMs can generate inefficient queries that could result in high compute costs. 
        </p>
            <div style="padding: 10px; margin-left: 40px; border-left: 4px solid #00BFA5;">
            <b>Solution: Teradata Vantage.</b> To address these financial risks, we recommend setting thresholds and alerts on your providers. For example, Vantage Lake on Amazon offers financial governance features that make it easy to set automated alerts based on consumption thresholds. Another solution is decoupling the generation of the query from the execution, giving you the opportunity to review the query and ensure it is optimized before it is executed against the Teradata Vantage Database.
        </div>
    </li>
</ul>
<p style="font-size: 20px; font-family: Arial; color: #00233C;"><b>Text-to-Teradata SQL Agent Architecture</b></p>

<div style="text-align: center;">
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    We will build a LangChain implementation of a text-to-SQL agent to generate and execute advanced SQL queries compatible with any LLM available via Amazon Bedrock. Users can ask business questions in natural English and receive answers with data drawn from the relevant databases. This agent will connect to a Vantage database to analyze data stored in Vantage as well as object storage, including Amazon S3, Google Cloud, and Azure Blob.</p>
    <img src="LangChain-and-Amazon-Bedrock_Jupyter-NB_ImgCover.jpg" alt="Architecture for Text-2-TeradataSQL Agents with LangChain and Amazon Bedrock" style="width: 90%; height: auto;">
</div>


<b style = 'font-size:20px;font-family:Arial;color:#00233c'>Prerequisite</b>
- Must have access to AWS Console and Amazon Bedrock
- Ensure you have requested and been granted access to the <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html">foundation model you want to use</a>. In this example, we are using Anthropic Claude 3 Sonnet. 

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>1. Configuring the environment</b>


<p style="font-size: 16px; font-family: Arial; color: #00233C;">
We begin by installing our dependencies. We use the AWS SDK for Python <code>boto3</code> to interact with AWS services directly and facilitate connection to the Bedrock API, the LangChain libraries, and teradataml.

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
The <code>teradataml</code> library automatically installs <code>teradatasqlalchemy</code>, enabling us to connect to our database. We will also use the <code>teradataml</code> virtual DataFrame, which serves as a reference to database objects, allowing operations directly on Vantage without transferring entire datasets to the client, except when needed. For this demo, we will be exploring a dataset in s3 via a foreign table on Vantage.
</p>

In [None]:
%%capture
# '%%capture' suppresses the display of installation steps of the following packages

!pip install -r requirements.txt --quiet

In [None]:
%%capture
!pip install pyOpenSSL cryptography --force-reinstall --quiet

<div class="alert alert-block alert-info">
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b><i>The above statements will install the required libraries to run this demo. Be sure to restart the kernel after executing the above lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>
    </div>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.1 Import the required libraries and set up environment</b></p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
We import LangChain’s <code>SQLDatabase</code> class – a wrapper around the SQLAlchemy engine to facilitate interactions with databases using SQLAlchemy’s Python SQL toolkit and ORM capabilities.
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
We use the <code>create_sql_agent</code> function to build a SQL Agent given a language model and the database.
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    And finally, we import LangChain’s <code>ChatBedrock</code> class that serves as a common interface for using Bedrock's LLMs on AWS that support chat functionalities.
</p>


In [6]:
import os 
import getpass
import warnings
import boto3 

from langchain_community.utilities import SQLDatabase 
from langchain_community.agent_toolkits import create_sql_agent
 
from langchain_aws import ChatBedrock
warnings.filterwarnings("ignore")

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>2. Establish a Teradata Vantage Connection</b>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    To establish a connection to any Teradata Vantage edition, we use LangChain’s <code>SQLDatabase.from_uri</code> method to create an engine from the database URI.
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    Here we use <a href="https://clearscape.teradata.com/" style="color: #00bfa5;">ClearScape Analytics Experience</a> to provision a free Teradata environment, you will need the <b>password</b> and <b>host URI</b> for the environment found in the connection details portion of the dashboard.
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
     Also note that <code>teradatasqlalchemy</code> was automatically installed with <code>teradataml</code> which enables SQLAlchemy to detect the teradatasql dialect when we invoke <code>db.dialect</code>.
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;"> The <code>db.get_usable_table_names()</code> method retrieves and prints the names of tables in the connected database. For this demonstration, we're using our default <code>demo_user</code> database. If there are no tables in your database, this method will return an empty array.
</p>

In [9]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

Performing setup ...
Setup complete



Enter password:  ·········


... Logon successful
Connected as: teradatasql://demo_user:xxxxx@host.docker.internal/dbc
Engine(teradatasql://demo_user:***@host.docker.internal)


In [14]:
# teradata lib
from teradataml import *
db_drop_table(table_name="retail_marketing", schema_name="demo_user")

True

In [15]:
# Set up Teradata connection
password_db=getpass.getpass(prompt='Enter your ClearScape Analytics Environment password: ')

connection_string = f"teradatasql://demo_user:{password_db}@host.docker.internal/demo_user"
db = SQLDatabase.from_uri(connection_string)

print(db.dialect)
print(db.get_usable_table_names())

Enter your ClearScape Analytics Environment password:  ·········


teradatasql
['ml__td_sqlmr_out__1725880256709157', 'ml__td_sqlmr_out__1725880462844251']


<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>2.1 Getting Data for This Demo</b></p>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    When working with Teradata Vantage for data analysis, you have two options:
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C; margin-left: 20px;">
    <b>Analyze data stored externally in object storage.</b> This method uses native object storage integration to create <code>foreign tables</code> inside the database; point this virtual table to an external object storage location like Google Cloud, Azure Blob, and Amazon S3; and use SQL to analyze the data. The advantage here is that it minimizes data transfer and allows you to work within Vantage using foreign tables, without needing additional storage in your Vantage environment.
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C; margin-left: 20px;">
    <b>Download data into your local Vantage environment.</b> Alternatively, you can use native object storage integration to ingest data at scale into Vantage using one SQL request. Downloading data can result in faster execution of some steps that perform the initial access to the source data.
</p>
<p style="font-size: 16px; font-family: Arial; color: #00233C; margin-left: 20px;">
Let’s explore the data where it resides in Amazon S3 by creating a foreign table inside our <code> demo_user</code> with the following code:  
</p>

In [16]:
load_data = """ CREATE FOREIGN TABLE demo_user.retail_marketing 
USING ( 
location('/s3/dev-rel-demos.s3.amazonaws.com/bedrock-demo/Retail_Marketing.csv') 
); """ 

try:
    db.run(load_data)
    print('Table Loaded')
except:
    print(exception)


Table Loaded


<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    After loading data, refresh the connection to ensure all changes are recognized and confirm that your data has loaded properly. When you call the following methods, you will see the database dialect as <code> teradatasql </code>, an array of usable table names, and a dictionary with detailed context information about the <code> retail_marketing </code> table, including its schema and sample data:
</p>


In [28]:
db = SQLDatabase.from_uri(connection_string)

print(db.dialect) 
print(db.get_usable_table_names()) 
print(db.get_context()) 

teradatasql
['ml__td_sqlmr_out__1725880256709157', 'ml__td_sqlmr_out__1725880462844251', 'retail_marketing']
{'table_info': '\nCREATE TABLE ml__td_sqlmr_out__1725880256709157 (\n\ttd_clusterid_kmeans BIGINT NOT NULL, \n\tspam FLOAT NOT NULL, \n\tembeddings_0 FLOAT NOT NULL, \n\tembeddings_1 FLOAT NOT NULL, \n\tembeddings_2 FLOAT NOT NULL, \n\tembeddings_3 FLOAT NOT NULL, \n\tembeddings_4 FLOAT NOT NULL, \n\tembeddings_5 FLOAT NOT NULL, \n\tembeddings_6 FLOAT NOT NULL, \n\tembeddings_7 FLOAT NOT NULL, \n\tembeddings_8 FLOAT NOT NULL, \n\tembeddings_9 FLOAT NOT NULL, \n\tembeddings_10 FLOAT NOT NULL, \n\tembeddings_11 FLOAT NOT NULL, \n\tembeddings_12 FLOAT NOT NULL, \n\tembeddings_13 FLOAT NOT NULL, \n\tembeddings_14 FLOAT NOT NULL, \n\tembeddings_15 FLOAT NOT NULL, \n\tembeddings_16 FLOAT NOT NULL, \n\tembeddings_17 FLOAT NOT NULL, \n\tembeddings_18 FLOAT NOT NULL, \n\tembeddings_19 FLOAT NOT NULL, \n\tembeddings_20 FLOAT NOT NULL, \n\tembeddings_21 FLOAT NOT NULL, \n\tembeddings_22 FL

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>3. Set up the Bedrock client</b>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    Next, initialize the <code>bedrock-runtime</code> client. Because we are executing this demo outside of SageMaker we will pass in or <a/ href="https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html"> Amazon credentials<a>

In [18]:
# If executing from sagemaker you do not need to define aws_access_key_id, aws_secret_access_key, or aws_session_token. Simply define client as 'bedrock-runtime'.
# boto3_bedrock = boto3.client('bedrock-runtime')

boto3_bedrock = boto3.client(
    service_name="bedrock-runtime",
    region_name=getpass.getpass(prompt='Enter your AWS region: '),
    aws_access_key_id=getpass.getpass(prompt='Enter your AWS Access Key ID: '),
    aws_secret_access_key=getpass.getpass(prompt='Enter your AWS Secret Access Key: '),
    aws_session_token=getpass.getpass(prompt='Enter your AWS Session Token: ')
)

Enter your AWS region:  ·········
Enter your AWS Access Key ID:  ····················
Enter your AWS Secret Access Key:  ········································
Enter your AWS Session Token:  ····························································································································································································································································································································································································································································································································································································································································································


<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>3.1 Define LLM</b></p>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
     First ensure you have requested and been granted access to the <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html">foundation model you want to use</a>. In this example, we are using Anthropic Claude 3 Sonnet. 
</p>   
<img src="bedrock-claude-3.png" alt="Bedrock Claude 3" style="width: 90%; height: auto;">
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    Define the LLM using the <code>ChatBedrock</code> interface. When defining <code>ChatBedrock</code>, set the <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html#model-ids-arns">Amazon Bedrock base model ID</a>, the client as <code>boto3_bedrock</code>, and the common inference parameters.
</p> 
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    We use the optional parameter <b>temperature</b> to make our Teradata SQL outputs more predictable.
</p>

<div style="margin-left: 16px; font-size: 16px; font-family: Arial; color: #00233C;">
    <b>- Temperature:</b> which can range from 0.0 to 2 and controls how creative our results will be, Setting it to 0.1 ensures the model favors higher-probability (more predictable) words, resulting in more consistent and less varied outputs.<br>
</div>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    For a complete list of optional parameters for base models provided by Amazon Bedrock, visit the <a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html"> AWS docs</a>.
</p>

In [19]:

# Define the LLM
llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    client=boto3_bedrock,
    model_kwargs={
        "temperature": 0.1,
    }
)

<hr style='height:2px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>3.2 Create and Invoke the SQL agent</b></p>    


<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    With the connection to Teradata Vantage established and our database (<code>db</code>) and Large Language Model (<code>LLM</code>) defined, we are ready to create and invoke our SQL Agent using the <code>create_sql_agent()</code> function. 
    </p>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    We pass in our <code>llm</code> and <code>db</code> as required parameters and set <code>agent_type</code> to "zero-shot-react-description" to instruct the agent to perform a reasoning step before acting.  
    </p>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    We set <code>verbose</code> to true so that the agent can output detailed information of intermediate steps. Additionally, we set <code>handle_parsing_errors</code> to <code>True</code>, ensuring that errors are sent back to the LLM as observations, for the LLM to attempt handling the errors.
    </p>


In [20]:
agent=create_sql_agent(
    llm=llm,
    db=db,
    agent_type="zero-shot-react-description",
    verbose=True,
    handle_parsing_errors=True
)

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>4. Set optional observability with LangSmith</b>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    We add optional additional observability with LangSmith. You can create a <a href="https://docs.smith.langchain.com/#2-create-an-api-key"> LangSmith free account with limited traces here.</a>
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
     To enhance observability in our application, we can configure environment variables for LangSmith. LangSmith comes pre-installed with the LangChain library, so all you need to do is generate a LangSmith API key and set up the following environment variables to start monitoring your applications. Uncomment the LangSmith variables if you want to use the LangSmith tracing functionalities to log and view executions of your LLM application. 
</p>

In [None]:
#os.environ["LANGSMITH_API_KEY"] = getpass.getpass() 
#os.environ["LANGSMITH_TRACING_V2"] = "true"
#os.environ["LANGSMITH_PROJECT"] = "genaibedrock"

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>5. Invoke the SQL Agent to explore the data</b>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    Invoke the agent with an exploratory command or question and observe each step. For example, you can ask it to describe the retail marketing table.
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    As soon as we invoke the agent, it begins performing a sequence of thought and action. Notice how this behavior is very similar to how a Data Scientist would approach interacting with a database. 
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
Data scientists and data analysts usually start with a trial query to understand the table schema and look at initial rows. This exploration helps in constructing queries. If errors happen, they edit their queries until they are successful. Similarly, LangChain agents ensure LLMs are firmly rooted in real data by describing the database, including its table structure, data samples of the top few columns, and sample SELECT queries. This reduces hallucinations and ensures more reliable and accurate SQL query generation.  
</p>

In [21]:
response = agent.invoke("Describe the retail marketing table") 



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: 
[0m[38;5;200m[1;3mml__td_sqlmr_out__1725880256709157, ml__td_sqlmr_out__1725880462844251, retail_marketing[0m[32;1m[1;3mThought: The retail_marketing table seems relevant to the question, so I should query its schema to see what columns it contains.
Action: sql_db_schema
Action Input: retail_marketing
[0m[33;1m[1;3m
CREATE TABLE retail_marketing (
	"Location" VARCHAR(2048) CHAR SET UNICODE NOT NULL, 
	customer_id SMALLINT NOT NULL, 
	age BYTEINT NOT NULL, 
	profession VARCHAR(20) CHAR SET LATIN NOT NULL, 
	marital VARCHAR(10) CHAR SET LATIN NOT NULL, 
	education VARCHAR(11) CHAR SET LATIN NOT NULL, 
	city VARCHAR(18) CHAR SET LATIN NOT NULL, 
	monthly_income_in_thousand BYTEINT NOT NULL, 
	family_members BYTEINT NOT NULL, 
	communication_type VARCHAR(11) CHAR SET LATIN NOT NULL, 
	last_contact_day BYTEINT NOT NULL, 
	last_contact_month CHAR(3) CHAR SET LATIN NOT NULL, 
	

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
If we inspect our application using LangSmith we can see all inputs and outputs in every step of the chain. You can view a public run of this example question
 <a href="https://smith.langchain.com/public/cc66a546-acf3-4e51-8044-cc1c194ed386/r">here</a>.  Note that the first call to Anthropic Claude 3 is the third step in our chain. In this third step, the input includes additional instructions for the agent to use supplementary tools such as sql_db_query, sql_db_list_tables, sql_db_query_checker, and sql_db_query, along with a specified format for thought and action sequences. This SQLDatabase agent has been engineered with prompts to emulate the approach taken by data scientists when exploring and analyzing data. </p>


<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>6. Optimizing our SQL Agent with Prompt Engineering</b>


<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>6.1 Invoking the SQL Agent with a complex query</b></p>    


<p style="font-size: 16px; font-family: Arial; color: #00233C;">
   Now let's ask our agent a slightly more involved question. You may notice that the queries generated by the LLM sometimes return syntax errors. However, the LLM might be able to recover from these errors and still provide the correct answer. In some cases, though, the LLM may arrive to the answer and the agent is stumped by a parsing error.
 
</p>

In [24]:
response = agent.invoke("What is the month with the highest number of marketing engagements?")



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mAction: sql_db_list_tables
Action Input: 
[0m[38;5;200m[1;3mml__td_sqlmr_out__1725880256709157, ml__td_sqlmr_out__1725880462844251, retail_marketing[0m[32;1m[1;3mThought: The retail_marketing table seems most relevant for this question about marketing engagements. I should query the schema for that table.
Action: sql_db_schema
Action Input: retail_marketing
[0m[33;1m[1;3m
CREATE TABLE retail_marketing (
	"Location" VARCHAR(2048) CHAR SET UNICODE NOT NULL, 
	customer_id SMALLINT NOT NULL, 
	age BYTEINT NOT NULL, 
	profession VARCHAR(20) CHAR SET LATIN NOT NULL, 
	marital VARCHAR(10) CHAR SET LATIN NOT NULL, 
	education VARCHAR(11) CHAR SET LATIN NOT NULL, 
	city VARCHAR(18) CHAR SET LATIN NOT NULL, 
	monthly_income_in_thousand BYTEINT NOT NULL, 
	family_members BYTEINT NOT NULL, 
	communication_type VARCHAR(11) CHAR SET LATIN NOT NULL, 
	last_contact_day BYTEINT NOT NULL, 
	last_contact_month CHAR(3) CHAR SET LAT

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
The agent can answer various questions about our data using only table names, schemas, sample rows, and the LLM’s knowledge of Teradata SQL. Although it can recover from mistakes as noted in the previous query, there's significant room for improvement. The last query resulted in our agent using 23,913 tokens and taking 49.56 seconds to provide the correct answer. You can view a public view of this execution via this <a href="https://smith.langchain.com/public/878e5908-abaa-49fc-8157-781a44adc0a5/r" >LangSmith trace </a>. </p>
<hr style='height:2px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>6.2 Adding a custom prompt to the SQL Agent </b></p>    

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
 We can optimize the agents performance with additional prompt engineering. 
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
We import a <code>ChatPromptTemplate</code> class to build flexible reusable prompts in our agent. Here we define a prefix, format instructions, and a suffix and join them to create a custom prompt. The prefix has unique rules that apply to Teradata. The format guides it's Question, thought, observation behavior and the suffix cues it to begin. 
</p>


In [25]:
from langchain_core.prompts import ChatPromptTemplate    

prefix = """You are an helpful and expert TeradataSQL database admin. TeradataSQL shares many similarities to SQL, with a few key differences.
Given an input question, first create a syntactically correct TeradataSQL query to run, then look at the results of the query and return the answer.

IMPORTANT: Unless the user specifies an exact number of rows they wish to obtain, you must always limit your query to at most {top_k} results by using "SELECT TOP {top_k}".

The following keywords do not exist in TeradataSQL: 
1. LIMIT 
2. FETCH
3. FIRST
Instead of LIMIT or FETCH, use the TOP keyword. The TOP keyword should immediately follow a "SELECT" statement.
For example, to select the top 3 results, use "SELECT TOP 3 FROM <table_name>"
Enclose all value identifiers in quotes to prevent errors from restricted keywords. Append an underscore to all alias keywords (e.g., AS count_).
Always use double quotation marks (" ") for column names in SQL queries to avoid syntax errors.
NOT make any DML statements (INSERT, UPDATE, DELETE, DROP, etc.) to the database. 
If the question does not seem related to the database, just return "I don't know" as the answer

You have access to the following tools:"""

format_instructions = """You must always the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Don't forget to prefix your final answer with the string, "Final Answer:"!"""

suffix = """Begin!

Question: {input}
Thought:{agent_scratchpad}"""

custom_prompt = ChatPromptTemplate.from_template("\n\n".join([
    prefix,
    "{tools}",
    format_instructions,
    suffix,
]))


agent=create_sql_agent(
    llm=llm,
    db=db,
    agent_type="zero-shot-react-description",
    verbose=True,
    handle_parsing_errors=True,
    prompt=custom_prompt
)

<hr style='height:2px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>6.3 Test and Compare Results </b></p>    


<p style="font-size: 16px; font-family: Arial; color: #00233C;">
To test and compare our results let's invoke the agent using the same question as the previous step. </p>

In [26]:
response = agent.invoke("What is the month with the highest number of marketing engagements?") 



[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mThought: To find the month with the highest number of marketing engagements, I will need to query a table that contains marketing engagement data with a date or month column. I should first check what tables are available in the database.

Action: sql_db_list_tables
Action Input: 
[0m[38;5;200m[1;3mml__td_sqlmr_out__1725880256709157, ml__td_sqlmr_out__1725880462844251, retail_marketing[0m[32;1m[1;3mThought: The "retail_marketing" table seems most relevant for this query. I should check the schema of that table to see if it contains columns for marketing engagements and dates/months.

Action: sql_db_schema
Action Input: retail_marketing
[0m[33;1m[1;3m
CREATE TABLE retail_marketing (
	"Location" VARCHAR(2048) CHAR SET UNICODE NOT NULL, 
	customer_id SMALLINT NOT NULL, 
	age BYTEINT NOT NULL, 
	profession VARCHAR(20) CHAR SET LATIN NOT NULL, 
	marital VARCHAR(10) CHAR SET LATIN NOT NULL, 
	education VARCHAR(11) CHA

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
This time our response was returned within 19.30 seconds and required 5,692 tokens! Inspect this public run on <a href="https://smith.langchain.com/public/e5cff28b-9bfe-48ba-a990-b87b0c3e2e06/r">LangSmith</a>. This time the LLM did not generate the query using the restricted `LIMIT` keyword. This additional prompting enables it to produce the final answer with fewer iterations. 
Final answer: The month with the highest number of marketing engagements is May. </p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>6.4 You can try your own question</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here are some sample questions that you can try out:</p>

<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>What is the average income for Phoenix?</li>
    <li>Which city has the highest average income?</li>
    <li>What is the average age of married people?</li>
    <li>Which profession has the most married people in Phoenix?</li>
    <li>What is the month with the lowest sales?</li>
    <li>What is the month with the highest number of marketing engagements?</li>
    <li>What is the payment method distribution?</li>
    <li>What is the average number of days between a customer's last contact and their next purchase?</li>
    <li>What is the relationship between marital status and purchase frequency?</li>
    <li>What is the most effective communication method for reaching customers who have not purchased from our company in the past 6 months?</li>
</ol>

In [27]:
try:
    question = input("\nEnter your natural language query: ")
    response = agent.invoke(question)
    print(f"Query: {question}\nResponse: {response}")
except Exception as e:
    print(f"An error occurred: {e}")



Enter your natural language query:  What is the average income for Phoenix?




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mThought: To find the average income for Phoenix, I will need to query a table that contains income data and has a column for city or location. I should first check what tables are available in the database.

Action: sql_db_list_tables
Action Input: 
[0m[38;5;200m[1;3mml__td_sqlmr_out__1725880256709157, ml__td_sqlmr_out__1725880462844251, retail_marketing[0m[32;1m[1;3mThought: The retail_marketing table seems like it may contain income and location data. I should check the schema for that table.

Action: sql_db_schema
Action Input: retail_marketing
[0m[33;1m[1;3m
CREATE TABLE retail_marketing (
	"Location" VARCHAR(2048) CHAR SET UNICODE NOT NULL, 
	customer_id SMALLINT NOT NULL, 
	age BYTEINT NOT NULL, 
	profession VARCHAR(20) CHAR SET LATIN NOT NULL, 
	marital VARCHAR(10) CHAR SET LATIN NOT NULL, 
	education VARCHAR(11) CHAR SET LATIN NOT NULL, 
	city VARCHAR(18) CHAR SET LATIN NOT NULL, 
	monthly_income_in_thou

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>7. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Cleanup work tables to prevent errors next time.</p>

In [None]:
try:
    db.run("""DROP TABLE demo_user.retail_marketing;""" )
    print('Table Dropped')
except:
    print(exception)

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2024. All Rights Reserved
        </div>
    </div>
</footer>