<header style="padding:1px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Mortgage Calculator chatbot using Generative AI with Vantage</b>
</header>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Introduction:</b></p>
<p style = 'font-size:16px;font-family:Arial'>In the Mortgage Calculator chatbot using Generative AI demo, the combination of <b>RAG, Langchain, and LLM models</b> allows users to ask queries in layman's terms, retrieve relevant information from the Vantage tables, and generate accurate and concise answers based on the retrieved data. This integration of retrieval-based and generative-based approaches provides a powerful tool for extracting knowledge from structured sources and delivering user-friendly responses.</p>

<p style = 'font-size:16px;font-family:Arial'>In this demo we will build Generative Question-Answering using LangChain, a powerful library for working with LLMs like GPT-3.5, GPT-4, Bloom, etc. and JumpStart in ClearScape notebooks, a system is built where users can ask business questions in natural English and receive answers with data drawn from the relevant databases.</p>


<center><img src="images/header.png" alt="mortgage calc"  width=800 height=800/></center>

<br>
<p style = 'font-size:16px;font-family:Arial'>Before going any farther, let's get a better understanding of RAG, LangChain, and LLM.</p>

<ul style = 'font-size:16px;font-family:Arial'><li> <b>Retrieval-Augmented Generation (RAG):</b></li></ul>
<p style = 'font-size:16px;font-family:Arial'> &emsp;  &emsp;RAG is a framework that combines the strengths of retrieval-based and generative-based approaches in question-answering systems.It utilizes both a retrieval model and a generative model to generate high-quality answers to user queries. The retrieval model is responsible for retrieving relevant information from a knowledge source, such as a database or documents. The generative model then takes the retrieved information as input and generates concise and accurate answers in natural language.</p>

<ul style = 'font-size:16px;font-family:Arial'><li> <b>Langchain:</b></li></ul>
<p style = 'font-size:16px;font-family:Arial'> &emsp;  &emsp; Langchain is a language model developed for understanding and generating human-like text. It is designed to handle queries and requests expressed in everyday language, enabling users to ask questions in layman's terms. Langchain leverages state-of-the-art deep learning techniques to comprehend the semantics and context of user queries. It can process various types of queries, ranging from simple factual questions to complex and nuanced queries.</p>

<ul style = 'font-size:16px;font-family:Arial'><li> <b>LLM Models (Large Language Models):</b></li></ul>
<p style = 'font-size:16px;font-family:Arial'> &emsp;  &emsp; LLM models refer to the large-scale language models that are trained on vast amounts of text data.
These models, such as GPT-3 (Generative Pre-trained Transformer 3),  GPT-3.5, GPT-4, HuggingFace BLOOM, LLaMA, Google's FLAN-T5, etc. are capable of generating human-like text responses. LLM models have been pre-trained on diverse sources of text data, enabling them to learn patterns, grammar, and context from a wide range of topics. They can be fine-tuned for specific tasks, such as question-answering, natural language understanding, and text generation.
LLM models have achieved impressive results in various natural language processing tasks and are widely used in AI applications for generating human-like text responses.</p>

<p style = 'font-size:16px;font-family:Arial;color:#E37C4D'><b>Steps in the analysis:</b></p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li>Configuring the environment</li>
    <li>Connect to Vantage</li>
    <li>Data Exploration</li>
    <li>LLM</li>
    <li>Run the query function</li>
    <li>Cleanup</li>
</ol>

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>1. Configuring the environment</b>

In [None]:
%%capture
# '%%capture' suppresses the display of installation steps of the following packages

!pip install -r requirements.txt --quiet

<p style = 'font-size:16px;font-family:Arial'>
    <i>The above statements will install the required libraries to run this demo. To gain access to installed libraries after running this, restart the kernel.</i></p>

<p style = 'font-size:16px;font-family:Arial'><b>To restart the kernel, press the escape key first, then type 0 0.</b></p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>1.1 Import the required libraries</b></p>

<p style = 'font-size:16px;font-family:Arial'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
import os
import pandas as pd

# teradata lib
from teradataml import *

# LLM
from langchain.agents import AgentType
from langchain.prompts import ChatPromptTemplate
from langchain.tools.render import format_tool_to_openai_function
from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser
from langchain.prompts import MessagesPlaceholder
from langchain.schema.runnable import RunnablePassthrough
from langchain.agents.format_scratchpad import format_to_openai_functions
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain.llms.openai import OpenAI
from langchain.agents import AgentExecutor
from langchain.chat_models import ChatOpenAI
from langchain.tools import tool

# Suppress warnings
warnings.filterwarnings("ignore")
display.max_rows = 10

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>2. Connection to Vantage and OpenAI</b>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.1 Get the OpenAI API key</b></p>

<p style = 'font-size:16px;font-family:Arial'>In order to utilize this demo, you will need an OpenAI API key. If you do not have one, please refer to the instructions provided in this guide to obtain your OpenAI API key: </p>

[Openai_setup_api_key_guide](..//Openai_setup_api_key/Openai_setup_api_key.md)

In [None]:
# enter your openai api key
api_key = input(prompt="\n Please Enter Openai api key: ")

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.2 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
eng.execute('''SET query_band='DEMO= Mortgage_Calculator_Python.ipynb;' UPDATE FOR SESSION;''')

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>2.3 Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
for t in ["Customer", "Interest"]:
    try:
        eng.execute(f"DROP TABLE {t}")
    except:
        pass

In [None]:
import glob

for file in glob.glob("data/*.sql"):
        # print("Executing all the SQLs present in the path" + os.getcwd())
        print("execution started: Queries Mentioned in %s are executed." % file)
        SQLQuery = open(file, 'r').read()
        eng.execute(SQLQuery)

        print("execution success: Queries Mentioned in %s are executed." % file)
        print('--'*50)

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>3. Data Exploration</b>

<p style = 'font-size:16px;font-family:Arial'>The goal of the Marketing Campaign Effectiveness prediction is to reduce marketing resources by identifying customers who would purchase the product and thereby directing marketing efforts to them.</p>

<p style = 'font-size:16px;font-family:Arial'>The data is from the last marketing campaign, with thousands of rows of customer data like age, job, marital status, education, etc.<p/>

<p style = 'font-size:16px;font-family:Arial'>Each row is a snapshot of data taken during the last marketing campaign, and each column is a different variable. The input dataset can be divided into three categories, as below:</p>
<p style = 'font-size:16px;font-family:Arial'> 
<ol style = 'font-size:16px;font-family:Arial'>
    <li>customer data i.e. age, profession, eduction, monthly income, etc.</li>
    <li>attributes related with the last contact of the current campaign i.e. contact, month, day, etc.</li>
    <li>other attributes i.e. campaign, previous outcome, payment methods, etc.</li>
   <li>target attribute - purchased.</li>

</ol>
</p>

<p style = 'font-size:16px;font-family:Arial'><b><i>*Please scroll down to the end of the notebook for detailed column descriptions of the dataset.</i></b></p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>3.1 Examine the Customer and Interest table</b></p>    
<p style = 'font-size:16px;font-family:Arial'>Let's look at the sample data in Customer table.</p>

In [None]:
tdf = DataFrame(in_schema("demo_user", "Customer"))

print("Data information: \n", tdf.shape)
tdf.sort("CustomerID")

<p style = 'font-size:16px;font-family:Arial'>There are 5 records in all, and there are 18 variables.</p>
<p style = 'font-size:16px;font-family:Arial'>Now, let's look at the interest table.</p>

In [None]:
tdf = DataFrame(in_schema("demo_user", "Interest"))

print("Data information: \n", tdf.shape)
tdf

<p style = 'font-size:16px;font-family:Arial'>we can see that interest are basis on individuals CreditScore.</p>

In [None]:
q = """
    select c.FirstName, c.LastName, c.CreditScore, i.InterestRate from Customer c join Interest i on c.CreditScore between i.MinCreditScore and i.MaxCreditScore
    where CustomerID=1
"""

pd.read_sql(q, eng)

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>4. LLM </b>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>4.1 Connect to databases using SQL Alchemy and Initialize the LLM</b></p>    

<p style = 'font-size:16px;font-family:Arial'>Under the hood, LangChain uses SQLAlchemy to connect to SQL databases. The SQLDatabaseChain can therefore be used with any SQL dialect supported by SQLAlchemy, such as Teradata Vantage, MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, and SQLite. Please refer to the <a href="https://docs.sqlalchemy.org/en/20/"> SQLAlchemy documentation</a> for more information about requirements for connecting to your database.</p>

<p style = 'font-size:16px;font-family:Arial'>Important: The code below establishes a database connection for data sources and Large Language Models. Please note that the solution will only work if the database connection for your sources is defined in the cell below</p>

<p style = 'font-size:16px;font-family:Arial'>Build a consolidated view of Table Data Catalog by combining metadata stored for the database and table in pipe delimited format.</p>

In [None]:
# OpenAI API
os.environ["OPENAI_API_KEY"] = api_key

db = SQLDatabase(eng)
llm = OpenAI(temperature=0, verbose=True)

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>4.1 Define Agent and tool</b></p>  

<p style = 'font-size:16px;font-family:Arial'>In OpenAI's language models, the <b>temperature</b> parameter controls the randomness of the generated text. It affects the diversity and creativity of the model's responses.</p>

<p style = 'font-size:16px;font-family:Arial'>A higher temperature value, such as 1.0 or above, increases the randomness and diversity of the generated output. This can lead to more varied and surprising responses, but it may also result in less coherence and occasional nonsensical outputs.</p>

<p style = 'font-size:16px;font-family:Arial'>On the other hand, a lower temperature value, such as 0.2 or below, reduces randomness and makes the model's output more focused and deterministic. The generated text is likely to be more conservative, sticking closely to patterns observed in the training data.</p>

<p style = 'font-size:16px;font-family:Arial'>Choosing an appropriate temperature value depends on the desired output. Higher temperatures can be useful for creative tasks or brainstorming, while lower temperatures are preferred when you need more control over the output, such as when generating specific responses or following a particular style.</p>

In [None]:
# define the tool
@tool
def generate_sql(query: str) -> str:
    """Given an input question, first create a syntactically correct query to run, then look at the results of the query and return the answer."""
    agent_executor = create_sql_agent(
        llm=llm,
        toolkit=SQLDatabaseToolkit(db=db, llm=llm),
        verbose=True,
        agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    )
    return agent_executor.run(query)

In [None]:
tools = [generate_sql]

## AIBot 

In [None]:
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory

# Current user
CustomerID = 1


functions = [format_tool_to_openai_function(f) for f in tools]
model = ChatOpenAI(temperature=0).bind(functions=functions)
memory = ConversationBufferMemory(return_messages=True, memory_key="chat_history")
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You will play the role of a mortgage adviser working with a client to create a mortgage quote. I will play the role of the client.""",
        ),
        (
            "system",
            """As a mortgage advisor for the bank you have direct access to the banks data for me
            generate the SQL to get the details  like Balance, Income, CreditScore, etc. if user don't know the answer or Do not enter the answer.""",
        ),
        (
            "system",
            """This information should compliment any answers you gain from the me. 
            Ask a series of questions, no more than eight, sequentially - pausing between each question to wait for a response before proceeding to the next. """,
        ),
        (
            "system",
            """Here's the first question: 'Income: What is your total annual income, including any additional sources of income?' """,
        ),
        (
            "system",
            """And then follow this format for each subsequent question. Again, wait for my response before moving on to the next. """,
        ),
        (
            "system",
            """Start the chat with greeting and ask the questions to user as mentioned above. """,
        ),
        (
            "system",
            """For example:
            First question: What is your total annual income?
            Second question: Do you have any additional income?""",
        ),
        (
            "system",
            """Examples of question and expected SQLQuery
            Question: What will be the InterestRate for customer whose CustomerID=1
            SQLQuery: SELECT i.InterestRate from Customer c join Interest i on c.CreditScore between i.MinCreditScore and i.MaxCreditScore where CustomerID=1""",
        ),
        (
            "system",
            """Calculate the maximum house price and monthly repayment for principal and interest based on a 5% APR. Assume income tax is 25% and 
            that the bank will lend no more than an 80% LTV.""",
        ),
        (
            "system",
            """"Provide a succinct summary paragraph of these details as a response to the client but don't show the method of calculation. Follow with a formatted table of monthly amounts 
            (gross income, tax obligation, mortgage obligation, expenses, remaining disposable income)""",
        ),
        (
            "system",
            f"""If the user fails to provide a response or says 'I don't know' for a question, automatically call the 'generate_sql' function to retrieve the answer from the database. The function should return the relevant answer for the question asked.
            current user: {CustomerID}""",
        ),
        MessagesPlaceholder(variable_name="chat_history"),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

chain = (
    RunnablePassthrough.assign(
        agent_scratchpad=lambda x: format_to_openai_functions(x["intermediate_steps"])
    )
    | prompt
    | model
    | OpenAIFunctionsAgentOutputParser()
)

agent_executor = AgentExecutor(agent=chain, tools=tools, verbose=False, memory=memory)

memory.clear()

## View bot UI

In [None]:
import panel as pn

pn.extension(design="material")


def callback(contents, user, instance):
    return agent_executor.invoke({"input": contents})["output"]


pn.chat.ChatInterface(callback=callback).servable()

<hr>
<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>5. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>Cleanup work tables to prevent errors next time.</p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_MarketingCamp');"        # Takes 5 seconds

In [None]:
remove_context()

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Dataset:</b>

- `customer_id`: Unique row customer id
- `age`: customer age (numeric)
- `profession` : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student","blue-collar","self-employed","retired","technician","services")
- `marital` : marital status (categorical: "married","divorced","single"; note: "divorced" meansdivorced or widowed)
- `education` customer eduction (categorical: "unknown","secondary","primary","tertiary")
- `city`: city of customer (categorical: 'New York','Los Angeles','Chicago','Houston','Phoenix','Philadelphia','San Antonio','San Diego','Dallas','San Jose')
- `monthly_income_in_thousand`: customer's monthly income, in dollar (numeric)
- `family_members`: number of family members (numeric)
- `communication_type`: communication type (categorical: "unknown","telephone","cellular")
- `last_contact_day`: last contact day of the month (numeric)
- `last_contact_month`: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
- `credit_card`: does customer have a credit card? (binary: 'yes','no')
- `num_of_cars`: number of cars (numeric)
- `last_contact_duration`: last contact duration, in seconds (numeric)
- `campaign`: number of contacts performed during this campaign and for this client (categorical,includes last contact)
- `days_from_last_contact`: number of days that passed by after the client was last contacted from a previouscampaign (numeric, -1 means client was not previously contacted)
- `prev_contacts_performed`: number of contacts performed before this campaign and for this client (numeric)
- `prev_campaign_outcome`: outcome of the previous marketing campaign (categorical:"unknown","other","failure","success")
- `payment_method`: payment method use by customer (categorical: 'cash','credit_card','debit_card','ewallets', 'payment_links', 'QRcodes')
- `purchase_frequency`: how frequently customer is purchasing (categorical: 'daily','weekly','biweekly','monthly','quarterly','yearly')
- `gender`: gender of customer? (binary: 'male','female')
- `recency`: number of days since the last purchase (numeric)


Output variable (desired target):
- `purchased`: does customer did a purchase - target column (binary: 'yes','no')

<p style = 'font-size:16px;font-family:Arial;color:#E37C4D'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>Langchain Python reference: <a href='https://python.langchain.com/docs/get_started/introduction.html'>here</a></li>
</ul>

<footer style="padding:10px;background:#f9f9f9;border-bottom:3px solid #394851">Copyright © Teradata Corporation - 2023. All Rights Reserved.</footer>