<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Vector analytics and AI functionality per database version - Vantage 3.1
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<hr>

<p style = 'font-size:28px;font-family:Arial;color:#00233C'><b>Overview</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Teradata Vantage provides a suite in-database analytic capabilities for Vector embedding and analytics with support across multiple database versions.  This notebook series reviews these capablities per database version, including:</p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Database Version 17.20+ and VantageCloud Enterprise 3.0</b>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Bring-Your-Own-Model (BYOM)</b> capabilities allow users to generate vector embeddings using open-source models serialized as ONNX format</li>
    <li><b>Vector data</b> stored as FLOAT columns in normal database tables</li>
    <li><b>Similarity analysis</b> using native ClearScape Analytics functions - <b>Vector Distance</b> and <b>KMeans</b></li>
    </ul>
    
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>VantageCloud Enterprise 3.1+</b>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>AI Analytic Functions</b> that leverage <b>Cloud-based LLMs</b> for text analytics, including Vector Embedding functions and RAG</li>
    <li><b>VECTOR Datatype</b> Varbyte-based array of vector data stored as single column</li>
    <li><b>Normalization</b> of vector data for efficient similarity analysis</li>
    <li><b>Similarity analysis</b> using VECTOR DATATYPE and additional functions</li>
    </ul>
    
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>VantageCloud Lake</b>
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>In-platform GPUs</b> leveraging Analytic Compute Clusters for high-scale vector embedding and other Large Language Model tasks</li>
    <li><b>Enterprise Vector Store APIs</b> for creating and managing vector data using Python and/or REST</li>
    <li><b>Similarity Search and RAG APIs</b> using Python</li>
    <li><b>Vector Store UI</b> for managing vector data</li>
    </ul>

<p style = 'font-size:28px;font-family:Arial;color:#00233C'><b>Demonstration Data</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>All of these demonstrations are based off of a small sample data set of Amazon book reviews.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Python Package Prerequsites</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>This only needs to be run once for the user environment - restart the kernel after installing the proper packages.</p>

In [None]:
%pip install -r requirements.txt

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Python Package Imports</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Standard practice to import required packages and libraries; execute this cell to import packages for Teradata automation as well as machine learning, analytics, utility, and data management packages.</p> 

In [None]:
from dotenv import load_dotenv
from teradataml import *
from teradatagenai import TextAnalyticsAI, TeradataAI, load_data

import getpass, os
from huggingface_hub import hf_hub_download

from IPython.display import clear_output , display as ipydisplay, Markdown
import matplotlib.pyplot as plt
import pandas as pd

# Set display options for dataframes, plots, and warnings
%matplotlib inline
warnings.filterwarnings('ignore')
display.suppress_vantage_runtime_warnings = True

# load vars json
with open('vars.json', 'r') as f:
    session_vars = json.load(f)

# Database login information
host = session_vars['environment']['host']
username = session_vars['hierarchy']['users']['business_users'][1]['username']
password = session_vars['hierarchy']['users']['business_users'][1]['password']

<hr>
<p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Features supported in Vantage version 3.1
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
    

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Teradata Vantage version 3.1 introduces new capabilities for managing vector embeddings, similarity search, and tight integration with cloud-based Large Language Models.</p>

<table style = 'width:100%;table-layout:fixed;font-family:Arial;color:#00233C'>
    <tr><td style = 'vertical-align:top' width = '40%'>    
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>AI Analytic Functions</b> that leverage <b>Cloud-based LLMs</b> for text analytics, including Vector Embedding functions and RAG</li>
    <br>
    <li><b>VECTOR Datatype</b> Varbyte-based array of vector data stored as single column</li>
    <br>
    <li><b>Normalization</b> of vector data for efficient similarity analysis</li>
    <br>
    <li><b>Similarity analysis</b> using VECTOR DATATYPE and additional functions</li>
    <br>
    <li><b>RAG</b> using the similarity results as secure input to the LLM</li>
    </ol>
        </td><td style = 'text-align:center'><img src = 'images/Pattern_3.png' width = '300'></td></tr>
</table>
<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Required - Connect to the database</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Initiate the connection to the target system.</p>

In [None]:
# connect to database
eng = create_context(host = host, username = username, password = password)

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Inspect source data</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Use simple python methods to inspect the Amazon Reviews data. This code creates a teradataml DataFrame, which represents the data in the database which could extend to millions or billions of rows. Data is not moved, and users can perform common data management and analytics functions that will run at scale on the target system.</p>

In [None]:
tdf_reviews = DataFrame('"demo"."amazon_reviews_25"')
tdf_reviews.sample(2)

<hr>
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Demo 1 - AI Analytic functions</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Teradata supports a range of <a href = 'https://docs.teradata.com/r/Lake-Analyze-Your-Data-with-ClearScape-AnalyticsTM/Text-Analytics-AI-Functions'>text analytics functions</a> using large language models available on various cloud platforms, utilizing data stored in-database, or accessible via object storage or Open Table Formats. This expansion enables functions that utilize the <b>massively-parallel processing</b> capabilities of Vantage.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>These functions are integrated into the database and callable using SQL or Python, and include the following capabilities:</p>

<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Sentiment Extraction</li>
    <li>Language Detection</li>
    <li>Key Phrase Extraction</li>
    <li>PII Masking</li>
    <li>Entity Recognition</li>
    <li>PII Identification</li>
    <li>Text Classification</li>
    <li>Text Summarization</li>
    <li>Text Translation</li>
    <li><b>Text Embeddings</b></li>
    <li><b>Content Generation</b></li>
    </ul>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following demonstration will introduce <b>Text Embedding</b>, and later will introduce <b>Content Generation</b> for secure Retrieval Augmented Generation (RAG) use cases.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Vector Embedding</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The <a href = 'https://docs.teradata.com/r/Lake-Analyze-Your-Data-with-ClearScape-AnalyticsTM/Text-Analytics-AI-Functions/AI_TextEmbeddings/AI_TextEmbeddings-Syntax'>AI_TextEmbeddings</a> function can be accessed via SQL or python.  Some key arguments include:</p>

<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Input Table (view or query) containing the text data</li>
    <li>CSP-specific arguments to authenticate to the LLM service, select the proper model, etc.</li>
    <li>Output format including the new <b>VECTOR</b> datatype</li>
</ul>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>CSP Authorization</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Each CSP requires different style of authentication to access the LLM services.  The <a href = 'https://docs.teradata.com/r/Lake-Analyze-Your-Data-with-ClearScape-AnalyticsTM/Text-Analytics-AI-Functions/Common-Functionality-or-Rules'>documentation</a> provides a comprehensive overview.  For this demo, an "Authorization Object" has been pre created with access to AWS Bedrock models.</p>

In [None]:
qry = '''
SELECT rev_id, rev_text, Embedding FROM AI_TextEmbeddings(   
    ON (SELECT TOP 2 rev_text, rev_id, TD_BYONE() p FROM demo.amazon_reviews_25) AS InputTable
    PARTITION BY p
USING   
    REGION('us-west-2')
    Authorization(Repositories.BedrockAuth)
    ApiType('aws')
    ModelName('amazon.titan-embed-text-v2:0')
    TextColumn('rev_text')
    outputformat('vector')
    ) as dt;
'''
pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Python version</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>For python developers, the <a href = 'https://docs.teradata.com/r/Lake-Analyze-Your-Data-with-ClearScape-AnalyticsTM/Teradata-Package-for-Generative-AI'>teradatagenai</a> python library can both connect to cloud-based LLM services as well as instantiate private models running <b>at scale</b> on local CPU or GPU compute.  This demonstration illustrates how to use AWS Bedrock to generate embeddings.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>CSP Authorization</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>For this example, access keys and secrets are passed instead of using an authorization object. Copy the "env" file to ".env" to load as environment variables.</p>

In [None]:
!cp env .env

In [None]:
# key and secret in .env file
load_dotenv()

llm_aws = TeradataAI(
    api_type = 'aws',
    access_key = os.getenv('AWS_ACCESS_KEY_ID'),
    secret_key = os.getenv('AWS_SECRET_ACCESS_KEY'),
    region = os.getenv('AWS_DEFAULT_REGION'),
    model_name = 'amazon.titan-embed-text-v2:0')

# Instantiate the TextAnalyticsAI class with the ONNX model.
obj = TextAnalyticsAI(llm = llm_aws)

tdf_embeddings = obj.embeddings(data = tdf_reviews,
                                column = 'rev_text', 
                                accumulate = 'rev_id',
                                output_format = 'vector')

tdf_embeddings.sample(2)

<hr>
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Demo 2 - VECTOR datatype</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Teradata-Vector-Store-User-Guide/Understanding-the-Vector-Store-Features/VECTOR-Data-Type'>VECTOR datatype</a> is based on varbyte arrays that represent a packed version of the vector values.  Vector data can be constructed from VARCHAR (or FLOAT columns packed into a varchar) or VARBYTE arrays.  This demonstration illustrates the following steps to construct a vector from other sources including ONNXEmbeddings.</p>

<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Construct a VARCHAR using the PACK function</li>
    <li>CAST the VECTOR datatype from the result of the PACK operation</li>
    </ol>
    
<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Create a small table of float values</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Create a volatile/temporary table using python methods.</p>

In [None]:
df = pd.DataFrame(data = {'id':[0,1],
                          'emb_1':[0.123632,0.223632], 
                          'emb_2':[-1.786543,-1.986543], 
                          'emb_3':[0.001239,0.011239]})

copy_to_sql(df, table_name = 'vector_floats', temporary = True, if_exists = 'replace') 

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Use the PACK function</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Can use SQL or Python</p>

In [None]:
qry = '''
SELECT * FROM PACK (
    ON vector_floats
    USING
        OutputColumn('packed_data')
        TargetColumns('[1:3]')
        IncludeColumnName('False')
        Accumulate('id')
) AS dt;
'''
pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>CAST to VECTOR</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Users can call NEW or CAST function</p>

In [None]:
qry = '''
SELECT CAST(packed_data AS VECTOR) Vector_Data, id

FROM (
    SELECT * FROM PACK (
    ON vector_floats
    USING
        OutputColumn('packed_data')
        TargetColumns('[1:3]')
        IncludeColumnName('False')
        Accumulate('id')
) AS dt) d;
'''
pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Python version</b></p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Teradataml methods can also be used:</p>

In [None]:
# get a refrence to the float data:

tdf_floats = DataFrame('vector_floats')
ipydisplay(Markdown('Float Data:'))
ipydisplay(tdf_floats)

# pass it to the Pack function
tdf_packed = Pack(data = tdf_floats, 
                  input_columns = ['emb_1','emb_2','emb_3'], 
                  output_column = 'packed_data', 
                  include_column_name = False,
                  accumulate = 'id').result

ipydisplay(Markdown('Packed Data:'))
ipydisplay(tdf_packed)

# cast the varchar column to VECTOR
from teradatasqlalchemy import VECTOR
tdf_vector = tdf_packed.assign(packed_data = tdf_packed['packed_data'].cast(type_= VECTOR))

ipydisplay(Markdown('Vector Data:'))
ipydisplay(tdf_vector)
ipydisplay(tdf_vector.tdtypes)

<hr>
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Demo 3 - Normalize vector values</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Vector normalization is the process of scaling a vector to have a magnitude (length) of 1, while preserving its direction. This resulting vector is called a unit vector. It's essentially dividing each component of the vector by its length.  This makes some calculations much more efficient, including some of the search and indexing operations.  The ClearScape Analytics function <b>TD_VectorNormalize</b> will perform this operation at scale on our VECTOR datatype.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note</b></p> 
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The TD_VECTORNORMALIZE function requires an "EmbeddingSize" parameter.  The LENGTH() function can return this if it isn't already known.</p>

In [None]:
qry = '''
SELECT rev_id, rev_text, Embedding, Embedding.LENGTH() emb_dims
       FROM AI_TextEmbeddings(   
    ON (SELECT TOP 2 rev_text, rev_id, TD_BYONE() p FROM demo.amazon_reviews_25) AS InputTable
    PARTITION BY p
USING   
    REGION('us-west-2')
    Authorization(Repositories.BedrockAuth)
    ApiType('aws')
    ModelName('amazon.titan-embed-text-v2:0')
    TextColumn('rev_text')
    outputformat('vector')
) as dt;
'''
pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Pass the embeddings function to TD_VECTORNORMALIZE</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Return both the original embedding and the normalized value</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note</b> </p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>A human-readable representation of the vector can be seen by casting to VARCHAR.</p>

In [None]:
qry = '''
SELECT * FROM TD_Vectornormalize(
       ON (SELECT id, rev_text, Embedding, CAST(Embedding AS VARCHAR(34000)) Emb_VARCHAR, Embedding Embedding_Normalized
       FROM AI_TextEmbeddings(   
    ON (SELECT TOP 2 rev_text, id, TD_BYONE() p FROM demo.amazon_reviews_25) AS InputTable
    PARTITION BY p
USING   
    REGION('us-west-2')
    Authorization(Repositories.BedrockAuth)
    ApiType('aws')
    ModelName('amazon.titan-embed-text-v2:0')
    TextColumn('rev_text')
    outputformat('vector')
    ) as ve) AS InputTable
USING
    IDColumns('id')
    TargetColumns('Embedding_Normalized')
    Approach('UNITVECTOR')
    Accumulate('rev_text','Embedding', 'Emb_VARCHAR')
    EmbeddingSize(1024)
) AS dt;
'''

pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Demo 4 - Similarity analysis using HNSW</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The prior demonstration notebook reviewed how to use TD_VECTORDISTANCE and KMeans for rapid similarity analysis.  With Vantage 3.1, these functions accept the VECTOR datatype.  Additionally, a new highly-scalable analytic function has been introduced.  </p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The  <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Database-Engine-20-In-Database-Analytic-Functions/Model-Training-Functions/TD_HNSW-Function'>Hierarchical Navigable Small World (HNSW)</a> is a graph-based algorithm that performs approximate nearest neighbor searches in vector databases.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>It uses a multi-layered graph structure to efficiently search high-dimensional spaces. The top layer has a sparse graph with long-range connections, and the lower layers become denser. The search starts at the top layer, then moves down through the layers to find the nearest neighbors.</p>

<table style = 'width:100%;table-layout:fixed;font-family:Arial;color:#00233C'>
    <tr><td style = 'vertical-align:top' width = '40%'>
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Train the model using the Vector Datatype as input</li>
    <br>
    <li>Predict nearest matches using the embedded search term</li>
    <br>
    <li>Join the original data for human-readable results</li>
    </ol>
</td><td style = 'text-align:center'><img src = 'images/HNSW.png' width = '300'></td></tr>
</table>
    
<hr>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Train the model</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>First, create a table using our embeddings function.  Then, train the HNSW model</p>

In [None]:

qry = '''
CREATE TABLE rev_embeddings AS (
SELECT * FROM TD_Vectornormalize(
       ON (SELECT id, rev_text, Embedding, Embedding Embedding_Normalized
       FROM AI_TextEmbeddings(   
    ON (SELECT rev_text, id, TD_BYONE() p FROM demo.amazon_reviews_25) AS InputTable
    PARTITION BY p
USING   
    REGION('us-west-2')
    Authorization(Repositories.BedrockAuth)
    ApiType('aws')
    ModelName('amazon.titan-embed-text-v2:0')
    TextColumn('rev_text')
    outputformat('vector')
    ) as ve) AS InputTable
USING
    IDColumns('id')
    TargetColumns('Embedding_Normalized')
    Approach('UNITVECTOR')
    Accumulate('rev_text','Embedding')
    EmbeddingSize(1024)
) AS dt) WITH DATA;
'''
try:
    execute_sql('DROP TABLE rev_embeddings;')
except Exception as e:
    if 'does not exist' in str(e):
        pass

execute_sql(qry)

In [None]:
qry = '''
SELECT * FROM TD_HNSW (
    ON rev_embeddings AS InputTable
    OUT VOLATILE TABLE OutputTable(hnsw_model)
USING
    IdColumn('id')
    VectorColumn('Embedding')
    EfConstruction(16)
    NumConnPerNode(16)
    MaxNumConnPerNode(20)
    DistanceMeasure('euclidean')
    EmbeddingSize(1024)
    ApplyHeuristics('true')
) as dt;
'''

try:
    execute_sql('DROP TABLE hnsw_model;')
except Exception as e:
    if 'does not exist' in str(e):
        pass
execute_sql(qry)

<hr>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Return similar results using an embedded search term as input</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Database-Engine-20-In-Database-Analytic-Functions/Model-Scoring-Functions/TD_HNSWPredict'>TD_HNSWPredict</a> function will return the topk closest matches.</p>

In [None]:
# search_term = input('Please enter a search term: ')
search_term = 'Which book are all the reviews talking about?'

qry = f'''
CREATE TABLE term_embedded AS (
SELECT * FROM AI_TextEmbeddings(   
            ON (SELECT '{search_term}' txt, 1 id) AS InputTable
        USING   
            REGION('us-west-2')
            Authorization(Repositories.BedrockAuth)
            ApiType('aws')
            ModelName('amazon.titan-embed-text-v2:0')
        TextColumn('txt')
        outputformat('vector')
        ) as ve) WITH DATA;
'''
try:
    execute_sql('DROP TABLE term_embedded;')
except Exception as e:
    if 'does not exist' in str(e):
        pass
execute_sql(qry)

In [None]:
qry = f'''

SELECT r.id review_id, r.rev_text, d.distance, CAST(d.nearest_neighbor_vector AS VARCHAR(34000))
FROM TD_HNSWPREDICT (
    ON hnsw_model AS ModelTable
    ON term_embedded AS InputTable DIMENSION
    USING
    IdColumn('id')
    VectorColumn('Embedding')
    EfSearch(16)
    TopK(10)
    OutputNearestVector('true')
) d

JOIN demo.amazon_reviews_25 r
    ON r.id = d.nearest_neighbor_id

ORDER BY d.distance;
'''
pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Inspect the Model</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Use <a href = 'https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/Database-Engine-20-In-Database-Analytic-Functions/Model-Evaluation-Functions/TD_HNSWSummary'>TD_HNSWSummary</a> to create human-readable model output.</p>

In [None]:
qry = '''
SELECT amp_id, graph_id, node_id, layer_id, input_row_id, cast(node_vector
as varchar(60)) as node_vector, num_neighbors, cast(neighbor_node_id as
varchar(60)) as neighbor_node_id, cast(model_info as varchar(500)) as model_info 

FROM TD_HNSWSummary(
    ON hnsw_model as ModelTable
) as dt
ORDER by 1,9;
'''
pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Demo 5 - Generate responses based on search results</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>As mentioned above in <b>Demo 1</b>, one of the new AI Text Analytic functions is AI_ASKLLM, which allows users to pass custom context and prompts to a CSP-based Large Language Model</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The  <a href = 'https://docs.teradata.com/r/Lake-Analyze-Your-Data-with-ClearScape-AnalyticsTM/Text-Analytics-AI-Functions/AI_AskLLM'>AI_ASKLLM</a> function will create a custom prompt based on two user input tables - a set of context, and a set of questions.  A response will be generated for each row in the questions input.  This function can be used with SQL or python</p>

<hr>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Python version</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The process is as follows:</p>
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Create the context table - in this case the similarity search results</li>
    <li>Create the questions table - provide additional queries to pass to the prompt</li>
    <li>Execute the AI_ASKLLM function</li>
    </ol>

In [None]:
qry = f'''
CREATE TABLE search_context AS (
SELECT r.id review_id, r.rev_text, d.distance
FROM TD_HNSWPREDICT (
    ON hnsw_model AS ModelTable
    ON term_embedded AS InputTable DIMENSION
    USING
    IdColumn('id')
    VectorColumn('Embedding')
    EfSearch(16)
    TopK(10)
    OutputNearestVector('true')
) d

JOIN demo.amazon_reviews_25 r
    ON r.id = d.nearest_neighbor_id) WITH DATA;
'''
try:
    execute_sql('DROP TABLE search_context;')
except Exception as e:
    if 'does not exist' in str(e):
        pass
execute_sql(qry)
pd.read_sql('SELECT * FROM search_context;', eng)

In [None]:
df = pd.DataFrame({'id':[0,1],
                   'question':['Summarize the provided data, respond in French.','Did any one feel the book is thin?']})

copy_to_sql(df, table_name = 'questions', temporary = True, if_exists = 'replace')

In [None]:
# key and secret in .env file
load_dotenv()

llm_aws = TeradataAI(
    api_type = 'aws',
    access_key = os.getenv('AWS_ACCESS_KEY_ID'),
    secret_key = os.getenv('AWS_SECRET_ACCESS_KEY'),
    region = os.getenv('AWS_DEFAULT_REGION'),
    model_name = 'anthropic.claude-v2')

# Instantiate the TextAnalyticsAI class with the ONNX model.
obj = TextAnalyticsAI(llm = llm_aws)


tdf_response = obj.ask(data = DataFrame('questions'), column = 'question', 
                        context = DataFrame('search_context'), context_column = 'rev_text',
                        data_partition_column='id', context_partition_column='review_id',
                        prompt='Provide an answer to the question using data as information relevant to the question. Question:\n #QUESTION# \nData: #DATA#',
                        data_position='#DATA#',
                        question_position='#QUESTION#')
tdf_response

<hr>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>SQL version</b></p>

In [None]:
qry = '''
SELECT * FROM AI_AskLLM( 
      ON questions AS InputTable partition by id
      ON search_context AS ContextTable partition by review_id
      USING   
      TextColumn('question')
      ContextColumn('rev_text')
      ApiType('aws')
      REGION('us-west-2')
      Authorization(Repositories.BedrockAuth)
      ModelName('anthropic.claude-v2')
      Prompt('Provide an answer to the question using data as information relevant to the question. \nQuestion: #QUESTION# \n Data: #DATA#')
      DATAPOSITION('#DATA#')
      QUESTIONPOSITION('#QUESTION#')
      Accumulate('[0:]')
    ) as dt;
'''
pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:24px;font-family:Arial;color:#00233C'><b>Conclusion - Vector embedding and analytics - Vantage 3.1</b></p>



<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The preceding demo showed how users can leverage new vector datatype, Generative AI functions, and high-speed, advanced similarity search.</p>

<hr>
<p style = 'font-size:24px;font-family:Arial;color:#00233C'><b>Cleanup</b></p>



<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Disconnect from the database to remove all volatile tables.</p>

In [None]:
db_drop_table('questions')
db_drop_table('search_context')
db_drop_table('term_embedded')

In [None]:
remove_context()