<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
      Generative Question Answering using Generative AI with Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;color:#00233c'><b>Introduction:</b></p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>In our Question-Answering system using Generative AI demo, we leverage the combination of <b>RAG, Langchain, and LLM models</b> to enable users to ask queries in layman's terms, retrieve relevant information from the <b>Vantage</b> tables, and generate accurate and concise answers based on the retrieved data. This integration of retrieval-based and generative-based approaches provides a powerful tool for extracting knowledge from structured sources and delivering user-friendly responses.</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>In this demo, we will build Generative Question-Answering using LangChain, a powerful library for working with LLMs like GPT-3.5, GPT-4, Google's Gemini, Claude 3.5 Sonnet, etc. and JumpStart in ClearScape notebooks. We are building a system where users can ask business questions in natural English and receive answers with data drawn from the relevant databases.</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>The following diagram illustrates the architecture.</p>

<center><img src="images/vantage_qa_gen.png" alt="Generative_QA_architecture"  width=800 height=800/></center>

<br>
<p style='font-size:16px;font-family:Arial;color:#00233C'>Before going any farther, let's get a better understanding of RAG, LangChain, and LLM.</p>

<ul style='font-size:16px;font-family:Arial;color:#00233C'><li> <b>Retrieval-Augmented Generation (RAG):</b></li></ul>
<p style='font-size:16px;font-family:Arial;color:#00233C'> &emsp;  &emsp;We use RAG, a framework that combines the strengths of retrieval-based and generative-based approaches in question-answering systems. It utilizes both a retrieval model and a generative model to generate high-quality answers to user queries. The retrieval model is responsible for retrieving relevant information from a knowledge source, such as a database or documents. The generative model then takes the retrieved information as input and generates concise and accurate answers in natural language.</p>

<ul style='font-size:16px;font-family:Arial;color:#00233C'><li> <b>Langchain:</b></li></ul>
<p style='font-size:16px;font-family:Arial;color:#00233C'> &emsp;  &emsp;We leverage LangChain, a framework that facilitates the integration and chaining of large language models with other tools and sources to build more sophisticated AI applications. LangChain does not serve its own LLMs; instead, it provides a standard way of communicating with a variety of LLMs, including those from OpenAI and HuggingFace. LangChain accelerates the development of AI applications with building blocks. We learn to leverage the following building blocks in this notebook:</p>

<ol style='font-size:16px;font-family:Arial;color:#00233C'>
    <li> <b> LLMs</b> – LangChain's <code>llm</code> class is designed to provide a standard interface for all LLM it supports.   </li>
    <li> <b> PromptTemplate</b>  - LangChain’s <code>PromptTemplate</code> class are predefined structures for generating prompts for LLM’s. They can be reused across different LLM's.</li>
    <li> <b> Chains</b> – When we build complex AI applications, we may need to combine multiple calls to LLM’s and to other components. LangChain’s <code>chain</code> class allows us to link calls to LLM’s and components. The most common type of chaining in any LLM application is combining a prompt template with an LLM and optionally an output parser. </li>
</ol>

<ul style='font-size:16px;font-family:Arial;color:#00233C'><li> <b>LLM Models (Large Language Models):</b></li></ul>
<p style='font-size:16px;font-family:Arial;color:#00233C'> &emsp;  &emsp;We work with LLM models, which refer to the large-scale language models that are trained on vast amounts of text data. These models, such as GPT-3.5, GPT-4, HuggingFace BLOOM, LlaMA 3, Google's Gemini, etc. are capable of generating human-like text responses. LLM models have been pre-trained on diverse sources of text data, enabling them to learn patterns, grammar, and context from a wide range of topics. They can be fine-tuned for specific tasks, such as question-answering, natural language understanding, and text generation. LLM models have achieved impressive results in various natural language processing tasks and are widely used in AI applications for generating human-like text responses.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233c'><b>Steps in the analysis:</b></p>
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Configuring the environment</li>
    <li>Connect to Vantage</li>
    <li>Data Exploration</li>
    <li>LLM</li>
    <li>Run the query function</li>
    <li>Cleanup</li>
</ol>

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:28px;font-family:Arial;color:#00233c'>1. Configuring the environment</b>

In [None]:
%%capture
# '%%capture' suppresses the display of installation steps of the following packages

!pip install -r requirements.txt --quiet

<div class="alert alert-block alert-info">
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b><i>The above statements will install the required libraries to run this demo. Be sure to restart the kernel after executing the above lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>
    </div>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.1 Import the required libraries</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
import io
import os

import numpy as np
import pandas as pd

# teradata lib
from teradataml import *

# LLM
import sqlalchemy
from sqlalchemy import create_engine
from langchain import PromptTemplate, SQLDatabase, LLMChain
from langchain_community.utilities.sql_database import SQLDatabase
from langchain_community.agent_toolkits import create_sql_agent
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage
from langchain_core.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)

# Suppress warnings
warnings.filterwarnings("ignore")
display.max_rows = 5

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:28px;font-family:Arial;color:#00233c'>2. Connect to Vantage and OpenAI</b>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>2.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO= Generative_Question_Answering_Python.ipynb;' UPDATE FOR SESSION;''')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Begin running steps with Shift + Enter keys. </p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>2.2 Get the OpenAI API key</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>To utilize this demo, we need an OpenAI API key. If we don't have one yet, we can refer to the instructions provided in this guide to obtain our OpenAI API key. </p>



<a href="..//Openai_setup_api_key/Openai_setup_api_key.md" style="text-decoration:none;" target="_blank"><button style="font-size:16px;font-family:Arial;color:#fff;background-color:#00233C;border:none;border-radius:5px;cursor:pointer;height:50px;line-height:50px;display:flex;align-items:center;">OpenAI API Key Guide <span style="margin-left:10px;">&#8658;</span></button>
</a>

In [None]:
import getpass

# enter your openai api key
api_key = getpass.getpass(prompt="\n Please Enter OpenAI API key: ")

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>2.3 Getting Data for This Demo</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. We can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. We may switch which mode to choose by changing the comment string.</p>

In [None]:
%run -i ../run_procedure.py "call get_data('DEMO_MarketingCamp_local');"        # Takes 20 seconds

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Next is an optional step – if we want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:28px;font-family:Arial;color:#00233c'>3. Data Exploration</b>

<p style='font-size:16px;font-family:Arial;color:#00233C'>Our goal in predicting Marketing Campaign Effectiveness is to reduce marketing resources by identifying customers who would purchase the product and thereby directing our marketing efforts to them.</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>We are working with data from the last marketing campaign, which includes thousands of rows of customer data such as age, job, marital status, education, and more.</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>Each row in the dataset represents a snapshot of data taken during the last marketing campaign, and each column represents a different variable. We can categorize the input dataset into three main categories:</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>
    <ol style='font-size:16px;font-family:Arial;color:#00233C'>
        <li>Customer data, including age, profession, education, monthly income, and more.</li>
        <li>Attributes related to the last contact of the current campaign, such as contact, month, day, and so on.</li>
        <li>Other attributes, including campaign, previous outcome, payment methods, and more.</li>
        <li>The target attribute - whether the customer purchased the product.</li>
    </ol>
</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>We have loaded the source data from <a href="https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset">Kaggle</a> into Vantage and supplemented it with additional information such as city, monthly income, family members, and more. The data is stored in a Vantage table named <i>Retail_Marketing</i>.</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'><b><i><i>*Please click <a href="#section62">here</a>  scroll down to the end of the notebook for detailed column descriptions of the dataset.</i></b></p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>3.1 Examine the Retail Marketing Campaign table</b></p>    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let's look at the sample data in the Retail_Marketing table.</p>

In [None]:
tdf = DataFrame(in_schema("DEMO_MarketingCamp", "Retail_Marketing"))
print("Data information: \n", tdf.shape)
tdf.sort("customer_id")

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>There are 11K records in all, and there are 23 variables. Purchased is the target variable. We shall classify the purchased variable in accordance with the remaining features.</p>

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:28px;font-family:Arial;color:#00233c'>4. LLM </b>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.1 Connect to databases using SQL Alchemy</b></p>    

<p style='font-size:16px;font-family:Arial;color:#00233C'>Under the hood, we use SQLAlchemy to connect to SQL databases. This means that the SQLDatabaseChain can be used with any SQL dialect supported by SQLAlchemy, such as Teradata Vantage, MS SQL, MySQL, MariaDB, PostgreSQL, Oracle SQL, and SQLite. For more information about the requirements for connecting to our database, we recommend referring to the <a href="https://docs.sqlalchemy.org/en/20/">SQLAlchemy documentation</a>.</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>Important: The code below establishes a database connection for our data sources and Large Language Models. Please note that the solution will only work if we define the database connection for our sources in the cell below.</p>

<p style='font-size:16px;font-family:Arial;color:#00233C'>We build a consolidated view of the Table Data Catalog by combining metadata stored for the database and table.</p>

In [None]:
#  Create the vantage SQLAlchemy engine
database = "DEMO_MarketingCamp_db"
db = SQLDatabase(
    eng,
    schema=database,
    include_tables=["Retail_Marketing"],
)

In [None]:
def get_db_schema():
    table_dicts = []
    database_schema_dict = {
        "database_name": database,
        "table_name": "Retail_Marketing",
        "column_names": tdf.columns,
    }
    table_dicts.append(database_schema_dict)

    database_schema_string = "\n".join(
        [
            f"Database: {table['database_name']}\nTable: {table['table_name']}\nColumns: {', '.join(table['column_names'])}"
            for table in table_dicts
        ]
    )

    return database_schema_string

In [None]:
database_schema = get_db_schema()
print(database_schema)

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b> 4.2 Format the answer and Display</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>To view the answer in proper format with markdown</p>


In [None]:
from IPython.display import display, Markdown


def response_template(response):
    if "output" in response:
        return f"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Answer:  <b>{response['output']}<b>"
    else:
        return f"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Answer:  <b>{response}<b>"


def error_template():
    return f"<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Sorry, there was an error while generating the SQL query. The GenAI may have made a mistake in the syntax of the query.  <br>"

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.3 Define LLM model</b></p>  

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In OpenAI's language models, we understand that the <b>Temperature</b> parameter affects the determinism of the results. The lower the temperature, the more deterministic the results, meaning the highest probable next token is always picked. Increasing the temperature could lead to more randomness, which encourages more diverse or creative outputs. We are essentially increasing the weights of the other possible tokens. In terms of application, we might want to use a lower temperature value for tasks like fact-based QA to encourage more factual and concise responses. For poem generation or other creative tasks, it might be beneficial to increase the temperature value.</p>

In [None]:
# OpenAI API
os.environ["OPENAI_API_KEY"] = api_key

# set LLM model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.1)

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>4.4 Setup SQLAgent</b></p> 

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>An agent is a sophisticated component that leverages a suite of tools, including a Large Language Model (LLM), to make informed decisions based on user input. This advanced functionality allows agents to utilize the appropriate tools until they achieve a satisfactory answer. For instance, in the context of text-to-SQL, the LangChain SQLAgent exhibits resilience by recovering from errors in executing generated SQL queries. Instead of giving up, it interprets the error in a subsequent LLM call and rectifies the issue. This robustness theoretically makes SQLAgent more productive and accurate compared to a simple SQLChain.</p>
    
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>WWe can think of agents as enabling tools for LLMs, much like how humans use calculators for math or perform Google searches for information. Agents empower LLMs to perform tasks more efficiently and effectively.</p>

In [None]:
# main prompt
generated_prompt = f"""You are a Teradata Database expert and you are tasked with generating SQL queries for Teradata based on user questions. 
    Your response should ONLY be based on the given context and follow the response guidelines and format instructions.

        Utilize the following tables and columns exclusively when creating SQL queries:\n{database_schema}

        Here are some tips for writing Teradata style queries: 
        * Always use table aliases when your SQL statement involves more than one source
        * Aggregated fields like COUNT(*) must be appropriately named
        * Unless the user specifies a specific number of examples they wish to obtain, always limit your query to at most 3 results by using SELECT TOP 3, note that LIMIT function does not works in Teradata DB.
        * Remove unnecessary ORDER BY clauses unless required. 
        * Remember: Do not use 'LIMIT' or 'FETCH' keyword in the SQLQuery, instead of TOP keyword, For Example: To select top 3 results, use TOP keyword instead of LIMIT or FETCH. 
        * Important: If you received the error "Bad character in format or data", change the value of columns, get the values from table only.
        * Remember: purchased column have only 2 values: 'yes' or 'no'
        * Critical Instruction: Use default database as 'DEMO_MarketingCamp_db'
        
        Few examples of SQL:
        Example1: SELECT count(*) as total_count FROM DEMO_MarketingCamp_db.Retail_Marketing
        Example2: SELECT TOP 1 city, AVG(monthly_income_in_thousand) AS avg_income FROM DEMO_MarketingCamp.Retail_Marketing
                WHERE monthly_income_in_thousand IS NOT NULL GROUP BY city ORDER BY avg_income DESC;

        Response Guidelines: 
        * Whenever possible, give the answer in bulleted points and proper markup.
        * Critical Instruction: Ensure responses are exclusively derived from query results. Refrain from generating or adding synthetic data in any form.
        * Most important: Always create a syntactically correct Teradata-style query that addresses the question, 
        even if it has been asked and answered previously. Ensure the query is generated from scratch and does not rely on any pre-existing data stored in memory.
      

        Given a user's question about this data, write a valid Teradata SQL query that accurately extracts or calculates the requested information from these tables and adheres to SQL best practices for Teradata database, optimizing for readability and performance where applicable.
        Most important: Execute the SQL and return the final answer only in simple english statement. 
        Critical Instruction: Do not return json or SQL."""


messages = [
    HumanMessagePromptTemplate.from_template("{input}"),
    AIMessage(content=generated_prompt),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
]

prompt = ChatPromptTemplate.from_messages(messages)


agent_executor = create_sql_agent(
    llm,
    db=db,
    agent_type="openai-tools",
    verbose=True,
    prompt=prompt,
    max_iterations=10,
    max_execution_time=20,
    handle_parsing_errors=True,
    return_intermediate_steps=True,
    handle_sql_errors=True,
    max_tokens=4000,
)

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:28px;font-family:Arial;color:#00233c'>5. Execute the user queries on SQLAgent</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Run the run_query function that in turn calls the Langchain SQL Database chain to convert 'text to sql' and runs the query against the source database</p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5.1 Query 1</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>For example, for the user query <b>How many married customers have purchased the product?</b> the answer is as follows:</p>

In [None]:
try:
    # Enter the query
    query = """How many married customers have purchased the product?"""

    # Response from Langchain
    response = agent_executor.invoke(query)

    display(Markdown(response_template(response)))
except:
    display(Markdown(error_template()))

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5.2 Query 2</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>For example, another user query <b>What is the number of purchases made by customers who are in management professions?</b> the answer is as follows:</p>

In [None]:
try:
    # Enter the query
    query = """What is the number of purchases made by customers who are in management professions?"""

    # Response from Langchain
    response = agent_executor.invoke(query)

    display(Markdown(response_template(response)))
except:
    display(Markdown(error_template()))

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5.3 Query 3</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>For example, for little bit complex user query <b>Which are the most common purchasing behaviors of customers?</b> the answer is as follows:</p>

In [None]:
try:
    # Enter the query
    query = """Which are the most common purchasing behaviors of customers?"""

    # Response from Langchain
    response = agent_executor.invoke(query)

    display(Markdown(response_template(response)))
except:
    display(Markdown(error_template()))

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>5.4 We encourage you to try formulating your own question.</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here are some sample questions that you can try out:</p>

<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>What is the average income for Phoenix?</li>
    <li>Which city has the highest average income?</li>
    <li>What is the average age of married people?</li>
    <li>Which profession has the most married people in Phoenix?</li>
    <li>What is the month with the lowest sales?</li>
    <li>What is the month with the highest number of marketing engagements?</li>
    <li>What is the payment method distribution?</li>
    <li>What is the average number of days between a customer's last contact and their next purchase?</li>
    <li>What is the relationship between marital status and purchase frequency?</li>
    <li>What is the most effective communication method for reaching customers who have not purchased from our company in the past 6 months?</li>
</ol>

In [None]:
try:
    query = input(prompt="\n We invite you to enter your natural language query: ")
    # Response from Langchain
    response = agent_executor.invoke(query)

    display(Markdown(response_template(response)))
except:
    display(Markdown(error_template()))

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:28px;font-family:Arial;color:#00233c'>6. Cleanup</b>

<hr style='height:1px;border:none;background-color:#00233C;'>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'> <b>6.1 Databases and Tables </b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_MarketingCamp');"        # Takes 5 seconds

In [None]:
remove_context()

<a id="section62"></a>
<b style = 'font-size:28px;font-family:Arial;color:#00233c'>Dataset:</b>

- `customer_id`: Unique row customer id
- `age`: customer age (numeric)
- `profession` : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student","blue-collar","self-employed","retired","technician","services")
- `marital` : marital status (categorical: "married","divorced","single"; note: "divorced" meansdivorced or widowed)
- `education` customer eduction (categorical: "unknown","secondary","primary","tertiary")
- `city`: city of customer (categorical: 'New York','Los Angeles','Chicago','Houston','Phoenix','Philadelphia','San Antonio','San Diego','Dallas','San Jose')
- `monthly_income_in_thousand`: customer's monthly income, in dollar (numeric)
- `family_members`: number of family members (numeric)
- `communication_type`: communication type (categorical: "unknown","telephone","cellular")
- `last_contact_day`: last contact day of the month (numeric)
- `last_contact_month`: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
- `credit_card`: does customer have a credit card? (binary: 'yes','no')
- `num_of_cars`: number of cars (numeric)
- `last_contact_duration`: last contact duration, in seconds (numeric)
- `campaign`: number of contacts performed during this campaign and for this client (categorical,includes last contact)
- `days_from_last_contact`: number of days that passed by after the client was last contacted from a previouscampaign (numeric, -1 means client was not previously contacted)
- `prev_contacts_performed`: number of contacts performed before this campaign and for this client (numeric)
- `prev_campaign_outcome`: outcome of the previous marketing campaign (categorical:"unknown","other","failure","success")
- `payment_method`: payment method use by customer (categorical: 'cash','credit_card','debit_card','ewallets', 'payment_links', 'QRcodes')
- `purchase_frequency`: how frequently customer is purchasing (categorical: 'daily','weekly','biweekly','monthly','quarterly','yearly')
- `gender`: gender of customer? (binary: 'male','female')
- `recency`: number of days since the last purchase (numeric)


Output variable (desired target):
- `purchased`: does customer did a purchase - target column (binary: 'yes','no')

<p style = 'font-size:16px;font-family:Arial;color:#00233c'><b>Dataset source:</b> <a href = 'https://www.kaggle.com/datasets/janiobachmann/bank-marketing-dataset'>kaggle</a></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233c'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Package-for-Python-User-Guide-17.20'>here</a></li>
    <li>Langchain Python reference: <a href='https://python.langchain.com/docs/get_started/introduction/'>here</a></li>
</ul>

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2023, 2024. All Rights Reserved
        </div>
    </div>
</footer>