<header style="background-color: #00233C; color: #F0F0F0; padding: 20pt;">
    <h1 style="font-size: 36px; font-family: Arial; margin: 0;">
        Data Analyst AI Agent with LangChain and Google Gemini
    </h1>
    <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
</header>

<section style="margin-top: 20px;">
    <p style="font-size: 18px; font-family: Arial; color: #00233C;"><b>Introduction</b></p>
    <p style="font-size: 16px; font-family: Arial; color: #00233C;">
        Data scientists, data engineers, and SQL experts are frequently tasked with querying enterprise databases to answer critical business questions, such as “What was our churn this month?” or “Which product led sales in the Americas?” The answers often trigger further follow-up questions, requiring engineers to spend significant time responding to data requests. 
        <br><br>
        By integrating Google Gemini, LangChain’s open-source framework, and the powerful capabilities of Teradata Vantage™, we can develop text-to-SQL autonomous agents that enable business teams to independently retrieve insights. Leveraging teradataml, teams can analyze data stored in Vantage and external object stores, such as Amazon S3, Google Cloud, Azure Blob, and OTF, without needing to ingest it. This reduces data transfer costs and offers flexible exploration using text-to-SQL generative AI applications.
    </p>
</section>

<section style="margin-top: 20px;">
    <p style="font-size: 18px; font-family: Arial; color: #00233C;"><b>Text-to-Teradata SQL Agent Architecture</b></p>
        <p style="font-size: 16px; font-family: Arial; color: #00233C;">
            We will build a LangChain implementation of a text-to-SQL agent to generate and execute advanced SQL queries compatible with any LLM. Users can ask business questions in natural English and receive answers with data drawn from the relevant databases. This agent will connect to a Vantage database to analyze data stored in Vantage and Amazon S3. You may add additional foreign tables to load data from Google Cloud, and Azure Blob.
        </p>
        <img src="SQLAgent.png" alt="Text-2-TeradataSQL Agents with LangChain and Google" style="width: 90%; height: auto; margin-top: 20px;"> Source: <a href="https://python.langchain.com/docs/tutorials/sql_qa/" target="_blank" style="color: #00233C;">LangChain SQL Agent Tutorial</a>

</section> https://python.langchain.com/docs/tutorials/sql_qa/

<hr style='height:2px;border:none;background-color:#00233C;'>

<b style = 'font-size:20px;font-family:Arial;color:#00233c'>1. Configuring the environment</b>

In [None]:
%%capture
# '%%capture' suppresses the display of installation steps of the following packages

!pip install -r requirements.txt --quiet
!pip install -U langsmith

In [None]:
import os 
import getpass
import warnings

from teradataml import create_context, execute_sql, DataFrame, in_schema
from langchain_google_genai import ChatGoogleGenerativeAI

from langchain_community.utilities import SQLDatabase 
from langchain_core.prompts import ChatPromptTemplate    
from langchain_community.agent_toolkits import create_sql_agent


<hr style="height: 1px; border: none; background-color: #00233C;">
<p style="font-size: 20px; font-family: Arial; color: #00233C;"><b>2. Connect to Vantage</b></p>
    Here, we use the <a href="https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Package-for-Python-User-Guide-17.20/Context-to-Teradata-Vantage" target="_blank">create_context()</a> function to connect to the Vantage system using the teradatasql and teradatasqlalchemy DBAPI and dialect combination. This connection allows us to use the teradataml <a href="https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Package-for-Python-User-Guide-17.20/DataFrames-for-Tables-and-Views/DataFrames-from-Teradata-Vantage-Data-Sources/DataFrame-Constructor" target="_blank">DataFrame()</a> constructor to build foreign tables for reading data from Object Storage - in this example from Amazon s3.
</p>


In [58]:
# Set up Teradata connection
password=getpass.getpass(prompt='Enter your ClearScape Analytics Environment password: ')

eng = create_context(host = 'host.docker.internal', username = 'demo_user', password = password)
print(eng)

Enter your ClearScape Analytics Environment password:  ········


Engine(teradatasql://demo_user:***@host.docker.internal)


In [45]:
%run -i ../run_procedure.py "call space_report();"

You have:  #databases=0 #tables=0 #views=0  You have used 0.8 MB of 30,678.3 MB available - 0.0%  ... Space Usage OK
 
   Database Name                  #tables  #views     Avail MB      Used MB
   demo_user                            0       0  30,678.3 MB       0.8 MB 


<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>2.1 Loading Data for This Demo</b></p>

In [46]:
#Load data onto Foreign table

load_data = """ CREATE FOREIGN TABLE demo_user.retail_marketing
USING ( 
location('/s3/dev-rel-demos.s3.amazonaws.com/bedrock-demo/Retail_Marketing.csv') 
); """ 

try:
    execute_sql(load_data)
    print('Table Loaded')
except Exception as e:
    if "already exists" in str(e): 
        print("Table already exists, skipping creation.")
    else:
        print(f"An error occurred: {e}")

Table Loaded


<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>3. Data Exploration</b>

In [47]:
df = DataFrame(in_schema('demo_user', 'retail_marketing'))
df.shape

(11162, 24)

In [48]:
df

Location,customer_id,age,profession,marital,education,city,monthly_income_in_thousand,family_members,communication_type,last_contact_day,last_contact_month,credit_card,num_of_cars,last_contact_duration,campaign,days_from_last_contact,prev_contacts_performed,payment_method,purchase_frequency,prev_campaign_outcome,gender,recency,purchased
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,8585,44,technician,married,secondary,Philadelphia,6,2,unknown,9,jun,1,3,252,campaign_3,-1,0,QRcodes,yearly,unknown,female,56,no
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,11868,37,management,married,tertiary,Phoenix,6,4,cellular,11,aug,1,3,549,campaign_1,-1,0,QRcodes,biweekly,unknown,male,99,no
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,9054,44,technician,married,secondary,San Diego,6,4,cellular,26,aug,0,2,59,campaign_6,-1,0,debit_card,quarterly,unknown,female,13,no
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,6036,41,self-employed,married,secondary,Chicago,5,2,cellular,17,nov,0,1,789,campaign_1,-1,0,QRcodes,yearly,unknown,female,56,no
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,9319,40,entrepreneur,married,secondary,Chicago,7,1,telephone,21,jul,1,1,50,campaign_2,-1,0,credit_card,quarterly,unknown,female,44,no
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,6505,32,management,single,tertiary,San Diego,5,2,cellular,22,aug,1,2,360,campaign_2,-1,0,ewallets,quarterly,unknown,female,96,yes
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,12072,41,technician,single,unknown,San Diego,5,2,unknown,13,may,1,3,93,campaign_2,-1,0,QRcodes,yearly,unknown,male,46,yes
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,3018,39,entrepreneur,married,secondary,San Diego,6,4,cellular,15,may,0,1,576,campaign_4,371,1,cash,monthly,failure,male,92,no
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,11603,32,services,single,secondary,Chicago,5,3,cellular,7,jul,1,1,201,campaign_1,-1,0,QRcodes,weekly,unknown,female,15,yes
/S3/s3.amazonaws.com/dev-rel-demos/bedrock-demo/Retail_Marketing.csv,5832,31,self-employed,married,tertiary,Dallas,5,2,cellular,21,nov,1,2,635,campaign_1,135,2,credit_card,monthly,failure,male,17,yes


<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:20px;font-family:Arial;color:#00233c'><b>4. Create engine using SQLAlchemy </b><p>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
 We create our engine using LangChain's SQLAlchemy wrapper <a href="https://api.python.langchain.com/en/latest/utilities/langchain_community.utilities.sql_database.SQLDatabase.html" target="_blank">SQLDatabase</a>. <br><br>The <code>db.get_usable_table_names()</code> method retrieves and prints the names of tables in the connected database. For this demonstration, we're using our default <code>demo_user</code> database. If there are no tables in your database, this method will return an empty array.
</p>
</p>


In [49]:
# initialize the database connection
db = SQLDatabase(eng, schema='demo_user', view_support=True)
print(db.dialect)
print(db.get_usable_table_names())
# print(db.get_context())

teradatasql
['retail_marketing']


In [50]:
os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google API Key")

Enter your Google API Key ········································


<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.1 Define LLM</b></p>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    Define the LLM using the <code>ChatGoogleGenerativeAI</code> class.
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    We use the optional parameter <b>temperature</b> to make our Teradata SQL outputs more predictable.
</p>
<div style="margin-left: 16px; font-size: 16px; font-family: Arial; color: #00233C;">
 <b>Temperature:</b> which can range from 0.0 to 2, depending on the model and controls how creative our results will be, Setting it to 0.1 ensures the model favors higher-probability (more predictable) words, resulting in more consistent and less varied outputs.<br>
</div>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
For a complete list of optional parameters for base models provided by Genmini, visit the <a href=""> Google docs</a>.
</p>

In [51]:
llm = ChatGoogleGenerativeAI(
    model="gemini-1.5-flash",
    temperature=0.1,
    max_tokens=None,
    timeout=None
    # other params...
)

<hr style="height: 2px; border: none; background-color: #00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5. Prepare a Custom Prompt to Guide the SQL Agent</b></p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    We imported the 
    <a href="https://python.langchain.com/docs/concepts/prompt_templates/" style="color: #00233C;">ChatPromptTemplate</a> 
    class from langchain_core to build flexible, reusable prompts for a list of messages in our agent. Here, we define a prefix, format instructions, and a suffix and join them to create a custom prompt. The prefix has unique rules that apply to Teradata. The format guides its question, thought, observation behavior, and the suffix cues it to begin.
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    Tools are classes that an Agent uses to interact with the world. Each tool has a description, and the Agent uses the description to choose the right tool for the job:
</p>

<ul style="font-size: 16px; font-family: Arial; color: #00233C;">
    <li><b>sql_db_query</b> - The input to this tool is a detailed and correct SQL query, and the output is a result from the database. If the query is incorrect, an error message will be returned. If an error is returned, rewrite the query, check it, and try again. If you encounter an issue with <i>Unknown column 'xxxx' in 'field list'</i>, use <b>sql_db_schema</b> to query the correct table fields.</li>
    <li><b>sql_db_schema</b> - The input to this tool is a comma-separated list of tables, and the output is the schema and sample rows for those tables. Be sure the tables exist by calling <b>sql_db_list_tables</b> first! Example Input: table1, table2, table3.</li>
    <li><b>sql_db_list_tables</b> - The input is an empty string, and the output is a comma-separated list of tables in the database.</li>
    <li><b>sql_db_query_checker</b> - Use this tool to double-check if your query is correct before executing it. Always use this tool before executing a query with <b>sql_db_query</b>!</li>
</ul>


In [52]:
prefix = """You are an helpful and expert TeradataSQL database admin. TeradataSQL shares many similarities to SQL, with a few key differences.
Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.

IMPORTANT: Unless the user specifies an exact number of rows they wish to obtain, you must always limit your query to at most {top_k} results by using "SELECT TOP {top_k}".

The following keywords do not exist in TeradataSQL: 
1. LIMIT 
2. FETCH
3. FIRST
Instead of LIMIT or FETCH, use the TOP keyword. The TOP keyword should immediately follow a "SELECT" statement.
For example, to select the top 3 results, use "SELECT TOP 3 FROM <table_name>"
Enclose all value identifiers in quotes to prevent errors from restricted keywords. Append an underscore to all alias keywords (e.g., AS count_).
Always use double quotation marks (" ") for column names in SQL queries to avoid syntax errors.
NOT make any DML statements (INSERT, UPDATE, DELETE, DROP, etc.) to the database. 
If the question does not seem related to the database, just return "I don't know" as the answer

You have access to the following tools:"""

format_instructions = """You must always the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Don't forget to prefix your final answer with the string, "Final Answer:"!"""

suffix = """Begin!

Question: {input}
Thought:{agent_scratchpad}"""


In [53]:
custom_prompt = ChatPromptTemplate.from_template("\n\n".join([
    prefix,
    "{tools}",
    format_instructions,
    suffix,
]))

<hr style='height:2px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>6. Create the SQL agent</b></p>    

In [54]:
agent=create_sql_agent(
    llm=llm,
    db=db,
    agent_type="zero-shot-react-description",
    verbose=True,
    handle_parsing_errors=True,
    prompt=custom_prompt
)

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>7. Set optional observability with LangSmith</b>
<p style="font-size: 16px; font-family: Arial; color: #00233C;">
    We add optional additional observability with LangSmith. You can create a <a href="https://docs.smith.langchain.com/#2-create-an-api-key"> LangSmith free account with limited traces here.</a>
</p>

<p style="font-size: 16px; font-family: Arial; color: #00233C;">
     To enhance observability in our application, we can configure environment variables for LangSmith. Once you install the LangSmith dependency, all you need to do is generate a LangSmith API key and set up the following environment variables to start monitoring your applications. Uncomment the LangSmith variables if you want to use the LangSmith tracing functionalities to log and view executions of your LLM application. 
</p>

In [55]:
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass() 
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "text-to-teradata-sql-gemini"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"

 ···················································


<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>8. Invoke the SQL Agent to explore the data</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here are some sample questions that you can try out:</p>

<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>What is the average income for Phoenix?</li>
    <li>Which city has the highest average income?</li>
    <li>What is the average age of married people?</li>
    <li>Which profession has the most married people in Phoenix?</li>
    <li>What is the month with the lowest sales?</li>
    <li>What is the month with the highest number of marketing engagements?</li>
    <li>What is the payment method distribution?</li>
    <li>What is the average number of days between a customer's last contact and their next purchase?</li>
    <li>What is the relationship between marital status and purchase frequency?</li>
    <li>What is the most effective communication method for reaching customers who have not purchased from our company in the past 6 months?</li>
</ol>

In [62]:
try:
    question = input("\nEnter your natural language query: ")
    response = agent.invoke(question)
    if isinstance(response, dict) and 'output' in response:
        answer = response['output']
    else:
        answer = "No valid response received."
    print(f"Query: {question}\nResponse: {answer}")
except Exception as e:
    print(f"An error occurred: {e}")



Enter your natural language query:  Which city has the highest average income?




[1m> Entering new SQL Agent Executor chain...[0m
[32;1m[1;3mThought: I need to find the city with the highest average income. To do this, I should first get a list of all tables to see which tables might have this information.
Action: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mretail_marketing[0m[32;1m[1;3mQuestion: Which city has the highest average income?
Thought:Thought: I need to find the city with the highest average income. To do this, I should first get a list of all tables to see which tables might have this information.
Action: sql_db_list_tables
Action Input: [0m[38;5;200m[1;3mretail_marketing[0m[32;1m[1;3mThought: The table 'retail_marketing' might contain information about cities and income. I need to check the schema of this table to see if it has the required columns. 
Action: sql_db_schema
Action Input: retail_marketing[0m[33;1m[1;3m
CREATE TABLE demo_user.retail_marketing (
	"Location" VARCHAR(2048) CHAR SET UNICODE NOT NULL, 
	customer_id 

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>9. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Cleanup work tables to prevent errors next time.</p>

In [None]:
execute_sql("""Drop table retail_marketing""")