<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Using Generative AI Large Language Models to Enhance Shopping Experience
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Introduction</b></p>

<div style="font-size: 16px; font-family: Arial;color:#00233C">
    <!-- ![business problem](images/bertrec_businessprroblem.png) -->
    <p>
        Traditional product recommendations have relied on static techniques like product affinity or collaborative filtering. While they offer some value, they fail to consider the dynamic context of shopping, resulting in predictable and often repetitive suggestions. This limitation hinders the potential for a truly personalized and satisfying purchase experience.
    </p>
    <p>
        However, with our cutting-edge <strong>context-based recommendations</strong>, we are revolutionizing the way consumers shop. By harnessing the power of large language models (LLM), we enable customers to enjoy a seamless and delightful shopping journey, where each product recommendation is tailored to their unique preferences and needs.
    </p>
    <p>
        In this demo, we will showcase how Vantage and AzureML synergize to train a state-of-the-art LLM for context-based product recommendations. This sophisticated model will then transform shopping content into relevant and timely product recommendations, allowing users to explore a diverse array of items that genuinely interest them.
    </p>
    <p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Business Values</b></p>
    <ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
        <li>Personalized Customer Experience.</li>
        <li>Improved Customer Engagement and Retention.</li>
        <li>Optimized Inventory Management.</li>
    </ul>
    <p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Why Vantage?</b></p>
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'>The ML and AI industry continues to innovate at an unprecedented rate. Tools, technologies, and algorithms are being developed and improved in both the open source and commercial communities.
<br><br>
Unfortunately, many of these techniques haven’t matured to the point where they are readily deployable to a stable, mature operational environment. Furthermore, many open-source techniques rely on fragile, manual enabling technologies.
<br><br>
ClearScape Analytics <b>Bring Your Own Model</b> capabilities allow organizations to leverage third party and open-source models for scoring inside the Vantage Platform; providing enterprise-class scalability and operational stability for any number of users, applications, or volume of data.</p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>A critical strategy for Vantage and ClearScape Analytics is to embrace the value and innovation in the open-source and partner ML and AI community. A cornerstone of that strategy is to allow users to leverage their ML or AI tools and models of choice to deploy those models directly to the Vantage Platform.  This provides enterprises with the most scalable option for deploying custom machine learning pipelines. Users can leverage the innovation and familiarity of a broad range of tools and techniques, with the ability to prepare and score new data in near-real-time and at any scale; allowing the products of machine learning to become pervasive across all applications, reporting tools, and consumers in an organization.</p>
    <p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Dataset:</b>
    <p>The data is from Kaggle: <a href = 'https://www.kaggle.com/c/instacart-market-basket-analysis/data'>Instacart Market Basket Analysis</a>. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users. For each user, there are between 4 and 100 of their orders, with the sequence of products purchased in each order.</p>
    
<p style="font-size: 16px; font-family: Arial; color:#00233C"><b>No Azure Credentials:</b></p>
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    If you do not have the required Azure credentials or do not wish to create an Azure account, you can still follow the demo. You will be informed when to skip the steps that require Azure credentials, and we will guide you through the alternative process.
    </p>
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    However, if you are interested in using Azure Machine Learning services and want to try the full functionality of the demo, you can follow the instructions in the <a href="../Energy_Consumption_Forecasting_AzureML/Getting Started with Azure.ipynb">Getting Started with Azure</a> guide. This will walk you through setting up an Azure account and acquiring the necessary credentials to fully experience the demo's capabilities.
    </p>
</div>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Downloading and installing additional software needed</b>

In [None]:
%%capture
!pip install tdnpathviz

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b><i>The above statements may need to be uncommented if you run the notebooks on a platform other than ClearScape Analytics Experience that does not have the libraries installed. If you uncomment those installs, be sure to restart the kernel after executing those lines to bring the installed libraries into memory. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i></p>
</div>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
import json
import pandas as pd
import seaborn as sns
import plotly.express as px
import tdnpathviz
from teradataml import *

display.max_rows = 5
configure.val_install_location = 'val'
configure.byom_install_location = 'mldb'
# Column width set to 500 characters
pd.set_option('display.max_colwidth', 500)

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>1. Connect to Vantage.</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=Grocery_Recommendation_PY_SQL.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Begin running steps with Shift + Enter keys. </p>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Getting Data for This Demo</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>We have provided data for this demo on cloud storage. You can either run the demo using foreign tables to access the data without any storage on your environment or download the data to local storage, which may yield faster execution. Still, there could be considerations of available storage. Two statements are in the following cell, and one is commented out. You may switch which mode you choose by changing the comment string.</p>

In [None]:
# %run -i ../run_procedure.py "call get_data('DEMO_Grocery_Data_cloud');"        # Takes 30 seconds
%run -i ../run_procedure.py "call get_data('DEMO_Grocery_Data_local');"        # Takes 3 minutes

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let's take the first step of our data exploration journey and dive into the rich dataset that comprises customer orders and the products they have added to their shopping carts. To accomplish this, we will leverage the power of TeradataML's dataframes</p>

In [None]:
orders = DataFrame(in_schema("DEMO_Grocery_Data", "order_products_train"))
orders

<div style="font-size: 16px; font-family: Arial;color:#00233C">
    <p>
        The dataset contains information about customers' orders.
    </p>
    <ul>
        <li><strong>order_id:</strong> This column represents a unique identifier for each customer order.</li>
<!--         not unique -->
        <li><strong>product_id:</strong> This column contains unique identifiers for the products available.</li>
        <li><strong>add_to_cart_order:</strong> This column represents the sequence or order in which products were added to the customer's shopping cart during a specific order. For example, if a customer added three products to their cart in the order A, B, C, the <em>add_to_cart_order</em> values would be 1, 2, and 3, respectively.</li>
        <li><strong>reordered:</strong> This binary column indicates whether a product has been reordered by a customer in a subsequent order. If a product has a value of 1 in this column, it means that the product was reordered by the customer after being previously purchased. If the value is 0, it means the product was not reordered, i.e., it was either a one-time purchase or a product the customer hasn't purchased again yet.</li>
    </ul>
</div>

In [None]:
products = DataFrame(in_schema("DEMO_Grocery_Data", "products"))
products

<div style="font-size: 16px; font-family: Arial;color:#00233C">
    <ol>
    <li>
        <strong>product_id:</strong> This column contains unique identifiers for each product in the dataset.
    </li>
    <li>
        <strong>product_name:</strong> This column stores the names or descriptions of the products available in the dataset.
    </li>
    <li>
        <strong>aisle_id:</strong> This column represents the aisle to which the product belongs in a physical store.
    </li>
    <li>
        <strong>department_id:</strong> This column indicates the department to which the product is categorized within the store.
    </li>
    </ol>
</div>

In [None]:
product_id_to_seq_product_id = DataFrame(in_schema("DEMO_Grocery_Data", "product_id_to_seq_product_id"))
product_id_to_seq_product_id

<div style="font-size: 16px; font-family: Arial;color:#00233C">
    <p>
        Let us now try to answer a <strong>business question</strong> - How many items do customers usually buy? Vantage has a rich set of in-database functions, which can help answer this question. Let us use the <strong>in-database Histogram function</strong>.
    </p>
    <p>
        Let me also mention that you can use <strong>the tool of your choice</strong>, such as Python or SQL, to work with Vantage.
    </p>
    <p>
        Let us start with SQL, which is the preferred language for business analysts.
    </p>
</div>

In [None]:
qry = """
SELECT *
FROM TD_Histogram (
    ON (
        SELECT order_id, count(*) AS cnt
        FROM DEMO_Grocery_Data.order_products_train
        GROUP BY order_id
    ) AS InputTable
    USING
    TargetColumn('cnt')
    MethodType('STURGES')
) AS dt
ORDER BY 1, 2, 3, 4, 5, 6;
"""

dft = pd.read_sql(qry, eng)

fig = px.bar(dft, x="MaxValue", y="Bin_Percent", labels={"MaxValue": "Max Value Range", "Bin_Percent": "Percentage"},
             title="Percentage Distribution in Max Value Ranges",
             template="plotly_white")  # Use a white background template for better readability

# Customize the appearance of the plot
fig.update_traces(marker_color='rgb(78, 119, 189)',  # Change bar color to a custom RGB value
                  marker_line_color='rgb(8, 48, 107)',  # Set bar edge color to a custom RGB value
                  marker_line_width=1.5)  # Set the bar edge width

fig.update_layout(xaxis_title_font=dict(size=14, family="Arial"),  # Set x-axis label font size and family
                  yaxis_title_font=dict(size=14, family="Arial"),  # Set y-axis label font size and family
                  title_font=dict(size=18, family="Arial", color='rgb(8, 48, 107)'),  # Set title font size, family, and color
                  xaxis_tickangle=-45,  # Rotate x-axis labels to improve readability
                  showlegend=False)  # Hide the legend since it's a simple bar plot

fig.show()

<div style="font-size: 16px; font-family: Arial;color:#00233C">
    <p>
        From the histogram, we can make an insightful observation: <strong>most consumers typically purchase around 10 items</strong> during their shopping sessions. This valuable insight helps us better understand the shopping behavior of our customers and lays the groundwork for optimizing inventory, promotions, and customer experiences.
    </p>

<p>Now let us try to find out what are <strong>the top items purchases</strong></p>

</div>

In [None]:
qry = """
SELECT TOP 20 product_name, COUNT(*) AS cnt
FROM DEMO_Grocery_Data.order_products_train AS t, DEMO_Grocery_Data.products AS v
WHERE t.product_id = v.product_id 
GROUP BY product_name
ORDER BY cnt DESC
"""

dft = pd.read_sql(qry, eng)
dft.head(5)

In [None]:
from IPython.display import display, HTML

fig = px.bar(dft, x="product_name", y="cnt", labels={"product_name": "Product Name", "cnt": "Count"}, 
             title="Product Purchase Count",
             template="plotly_white") # Use a white background template for better readability

# Customize the appearance of the plot
fig.update_traces(marker_color='rgb(78, 119, 189)', # Change bar color to a custom RGB value
                  marker_line_color='rgb(8, 48, 107)', # Set bar edge color to a custom RGB value
                  marker_line_width=1.5) # Set the bar edge width

fig.update_layout(xaxis_title_font=dict(size=14, family="Arial"), # Set x-axis label font size and family
                  yaxis_title_font=dict(size=14, family="Arial"), # Set y-axis label font size and family
                  title_font=dict(size=18, family="Arial", color='rgb(8, 48, 107)'), # Set title font size, family, and color
                  xaxis_tickangle=-45, # Rotate x-axis labels to improve readability
                  showlegend=False) # Hide the legend since it's a simple bar plot

fig.show()

display(HTML(f'''
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'>We can observe that top products purchased are <strong>{dft['product_name'][0]}, {dft['product_name'][1]} and {dft['product_name'][2]}</p>
'''))

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>3. Data preparation</b>

<div style="font-size: 16px; font-family: Arial;color:#00233C">
    <p>Let's move on to the data preparation phase to train our powerful large language model (LLM). Now, you might wonder, what does a language model have to do with grocery products?</p>
    <p>Well, language models have versatile applications beyond natural language processing. They can predict the next word in a text, just like the helpful auto-completion feature on your phone. Similarly, we can utilize this capability to predict the next item a shopper might add to their cart during grocery shopping.</p>
    <p>The LLM we are using goes beyond mere word prediction; it excels at understanding the context of the entire shopping sequence. By harnessing this contextual understanding, the model can recommend items that perfectly align with the customer's shopping preferences.</p>
</div>
<!-- ![llm for grocery recommendation](images/bertrec_lllm.png) -->

In [None]:
product_id_to_seq_product_id = DataFrame(in_schema("DEMO_Grocery_Data", "product_id_to_seq_product_id"))
product_id_to_seq_product_id.sort("seq_product_id")

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The code joins two dataframes(orders and product_id_to_seq_product_id) based on the "product_id" column, adds a new column "bgn" with a fixed value(101), and then selects and displays a subset of columns from the resultant dataframe. The purpose of this code snippet is to enhance the "orders" dataframe with additional information from the "product_id_to_seq_product_id" dataframe and prepare it for further analysis or processing.</p>

In [None]:
orders_with_seq_ids = orders.join(
    other = product_id_to_seq_product_id, 
    on = "product_id",
    how = "inner", 
    lsuffix = "ordrs", 
    rsuffix = "si")


orders_with_seq_ids = orders_with_seq_ids.assign(
    bgn = 101
).select(["order_id", "add_to_cart_order", "seq_product_id", "bgn"])

orders_with_seq_ids

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>
    Vantage in-database function <strong>nPath is perfectly suitable for data preparation for Large language models</strong>. It can efficiently <strong>sequence of products which are purchased</strong>.
</p>

In [None]:
qry = """
SELECT dt.order_id, dt.path, dt.countrank
FROM nPath (
    ON (
        SELECT TOP 10000 order_id, add_to_cart_order, product_name 
        FROM DEMO_Grocery_Data.order_products_train as t, DEMO_Grocery_Data.products as v
        WHERE t.product_id = v.product_id 
        ORDER BY order_id, add_to_cart_order
    )
    PARTITION BY order_id
    ORDER BY add_to_cart_order DESC
    USING
    Mode (NONOVERLAPPING)
    Pattern ('A{1,5}')
    Symbols (TRUE AS A)
    Result (
        FIRST (order_id OF A) AS order_id,
        ACCUMULATE (product_name OF A) AS path,
        COUNT (* OF A) AS countrank
    )
) AS dt
where path like '%Banana%';
"""

dft = DataFrame.from_query(qry)
dft

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Shown below is a sankey visualization based on output of Npath function</p>

In [None]:
dft_head = dft.sort("order_id").head(15)

In [None]:
from tdnpathviz.visualizations import plot_first_main_paths

plot_first_main_paths(dft_head, path_column='path', id_column='order_id', width = 1000)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Let us now use NPath function to <strong>prepare the data and store it in AzureML Blob Storage</strong> using the function <strong>WRITE_NOS_FM</strong></p>

In [None]:
prepared_ds = NPath(
    data1=orders_with_seq_ids,
    data1_partition_column="order_id",
    data1_order_column="add_to_cart_order",
    mode="NONOVERLAPPING",
    pattern="A*",
    symbols="TRUE as A",
    result=["FIRST (bgn OF A) AS c0",
            "NTH (seq_product_id, 1 OF A) as c1",
            "NTH (seq_product_id, 2 OF A) as c2",
            "NTH (seq_product_id, 3 OF A) as c3",
            "NTH (seq_product_id, 4 OF A) as c4",
            "NTH (seq_product_id, 5 OF A) as c5",
            "NTH (seq_product_id, 6 OF A) as c6",
            "NTH (seq_product_id, 7 OF A) as c7",
            "NTH (seq_product_id, 8 OF A) as c8",
            "NTH (seq_product_id, 9 OF A) as c9",
            "NTH (seq_product_id, 10 OF A) as c10",
            "NTH (seq_product_id, 11 OF A) as c11",
            "NTH (seq_product_id, 12 OF A) as c12",
            "NTH (seq_product_id, 13 OF A) as c13",
            "NTH (seq_product_id, 14 OF A) as c14",
            "NTH (seq_product_id, 15 OF A) as c15",
            "NTH (seq_product_id, 16 OF A) as c16",
            "NTH (seq_product_id, 17 OF A) as c17",
            "NTH (seq_product_id, 18 OF A) as c18",
            "NTH (seq_product_id, 19 OF A) as c19",
            "NTH (seq_product_id, 20 OF A) as c20",
            "NTH (seq_product_id, 21 OF A) as c21",
            "NTH (seq_product_id, 22 OF A) as c22",
            "NTH (seq_product_id, 23 OF A) as c23",
            "NTH (seq_product_id, 24 OF A) as c24",
            "NTH (seq_product_id, 25 OF A) as c25",
            "NTH (seq_product_id, 26 OF A) as c26",
            "NTH (seq_product_id, 27 OF A) as c27",
            "NTH (seq_product_id, 28 OF A) as c28",
            "NTH (seq_product_id, 29 OF A) as c29",
            "NTH (seq_product_id, 30 OF A) as c30",
            "NTH (seq_product_id, 31 OF A) as c31"
    ]
).result
prepared_ds.to_sql("prepared_ds", if_exists="replace")
prepared_ds

<div class="alert alert-block alert-info">
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note: If you do not have AzureML please click here <a href="#no-azure">here</a> to skip.</b></i></p>
</div>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following video will guide you through the process of creating a workspace, a storage account, a container, and obtaining the access ID and access key in Microsoft Azure.</p>
<video controls width="800" height="500" src="https://storage.googleapis.com/clearscape_analytics_videos/AzureML-Workspace-Creation.mp4" />

In [None]:
# Please edit the following to enter appropriate credentials.
location = "your_location"
access_id = "your_access_id"
access_key = "your_access_key"

In [None]:
query = """
SELECT NodeId, AmpId, Sequence, ObjectName, ObjectSize, RecordCount
FROM WRITE_NOS (
    ON (
        SELECT
            c0,
            c1,
            coalesce(c2, CASE WHEN c1 IS NULL THEN 0 ELSE 102 END) c2,
            coalesce(c3, CASE WHEN c2 IS NULL THEN 0 ELSE 102 END) c3,
            coalesce(c4, CASE WHEN c3 IS NULL THEN 0 ELSE 102 END) c4,
            coalesce(c5, CASE WHEN c4 IS NULL THEN 0 ELSE 102 END) c5,
            coalesce(c6, CASE WHEN c5 IS NULL THEN 0 ELSE 102 END) c6,
            coalesce(c7, CASE WHEN c6 IS NULL THEN 0 ELSE 102 END) c7,
            coalesce(c8, CASE WHEN c7 IS NULL THEN 0 ELSE 102 END) c8,
            coalesce(c9, CASE WHEN c8 IS NULL THEN 0 ELSE 102 END) c9,
            coalesce(c10, CASE WHEN c9 IS NULL THEN 0 ELSE 102 END) c10,
            coalesce(c11, CASE WHEN c10 IS NULL THEN 0 ELSE 102 END) c11,
            coalesce(c12, CASE WHEN c11 IS NULL THEN 0 ELSE 102 END) c12,
            coalesce(c13, CASE WHEN c12 IS NULL THEN 0 ELSE 102 END) c13,
            coalesce(c14, CASE WHEN c13 IS NULL THEN 0 ELSE 102 END) c14,
            coalesce(c15, CASE WHEN c14 IS NULL THEN 0 ELSE 102 END) c15,
            coalesce(c16, CASE WHEN c15 IS NULL THEN 0 ELSE 102 END) c16,
            coalesce(c17, CASE WHEN c16 IS NULL THEN 0 ELSE 102 END) c17,
            coalesce(c18, CASE WHEN c17 IS NULL THEN 0 ELSE 102 END) c18,
            coalesce(c19, CASE WHEN c18 IS NULL THEN 0 ELSE 102 END) c19,
            coalesce(c20, CASE WHEN c19 IS NULL THEN 0 ELSE 102 END) c20,
            coalesce(c21, CASE WHEN c20 IS NULL THEN 0 ELSE 102 END) c21,
            coalesce(c22, CASE WHEN c21 IS NULL THEN 0 ELSE 102 END) c22,
            coalesce(c23, CASE WHEN c22 IS NULL THEN 0 ELSE 102 END) c23,
            coalesce(c24, CASE WHEN c23 IS NULL THEN 0 ELSE 102 END) c24,
            coalesce(c25, CASE WHEN c24 IS NULL THEN 0 ELSE 102 END) c25,
            coalesce(c26, CASE WHEN c25 IS NULL THEN 0 ELSE 102 END) c26,
            coalesce(c27, CASE WHEN c26 IS NULL THEN 0 ELSE 102 END) c27,
            coalesce(c28, CASE WHEN c27 IS NULL THEN 0 ELSE 102 END) c28,
            coalesce(c29, CASE WHEN c28 IS NULL THEN 0 ELSE 102 END) c29,
            coalesce(c30, CASE WHEN c29 IS NULL THEN 0 ELSE 102 END) c30,
            CASE WHEN c30 IS NULL THEN 0 ELSE 102 END c31
        FROM prepared_ds
    )
    USING
    LOCATION(%s)
    AUTHORIZATION('{"Access_ID": "%s", "Access_Key": "%s"}')
    STOREDAS('PARQUET')
    COMPRESSION('GZIP')
    NAMING('RANGE')
    INCLUDE_ORDERING('TRUE')
    MAXOBJECTSIZE('4MB')
) AS d 
ORDER BY AmpId;
"""

query = query % (location, access_id, access_key)
execute_sql(query)

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>4. Large Language Model Training</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following video will guide you through the process of creating a data asset, an environment, a job, and running the job to get the model in ONNX format.</p>
<video controls width="800" height="500" src="https://storage.googleapis.com/clearscape_analytics_videos/AzureML-Job.mp4" />
<!-- ![business problem](images/bertrec_llm_technical.png) -->

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>5. Deploying Model in Vantage</b>
<div class="alert alert-block alert-info" id="no-azure">
    <p style = 'font-size:16px;font-family:Arial;color:#00233C'><i><b>Note</b>: If you do not have AzureML or did not perform the above steps, the following cell will do the required setup to run the remaining notebook.</i></p>
</div>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The model ONNX file can be now imported into a table in Vantage</p>

In [None]:
# Load the ONNX file into Vantage
model_id = 'bert'
model_file = 'plum_flower.onnx'
table_name = 'onnx_models'

if not get_connection().dialect.has_table(get_connection(), table_name):
    try:
        save_byom(model_id = model_id, model_file = model_file, table_name = table_name)
    except Exception as e:
        # if our model exists, delete and rewrite
        if str(e.args).find('TDML_2200') >= 1:
            delete_byom(model_id = model_id, table_name = table_name)
            save_byom(model_id = model_id, model_file = model_file, table_name = table_name)
        else:
            raise ValueError(f"Unable to save the model '{model_id}' in '{table_name}' due to the following error: {e}")

# Show the onnx_models table
list_byom(table_name)

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>6. Generating Recommendation</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Now we are ready to generate recommendations to delight our customer. We will use Vantage function ONNXPredict</p>

In [None]:
def collect_seq_ids(prouct_names):

    prouct_names_reversed = prouct_names[::-1]
    
    prod_lst_string = "'"+"','".join(prouct_names_reversed)+"'"

    sql_res = DataFrame.from_query(f"""
    select 
        p.product_name,
        pitspi.seq_product_id
    from
        DEMO_Grocery_Data.products p join
        DEMO_Grocery_Data.product_id_to_seq_product_id pitspi on p.product_id = pitspi.product_id
    where product_name in (%s)
    """%prod_lst_string).to_pandas()

    result = [None] * len(prouct_names_reversed)
    for index, row in sql_res.iterrows():
        result[prouct_names_reversed.index(row["product_name"])] = row["seq_product_id"]


    return [x for x in result if x is not None]

def generate_select_tensor(seq_ids, tensor_length):
    result_list = [
        # "101 as input_0_0",
        "103 as input_0_0"]

    for i in range(0,tensor_length - 1):
        if i<len(seq_ids):
            val = seq_ids[i]
        # elif i == len(seq_ids):
        #     val = 102
        else:
            val = 0

        result_list.append(f"%d as input_0_%d"%(val, i+1))
    return "select\n" + ",\n".join(result_list)

def get_recomendations(prouct_names, user_id, rec_number = 3, overwrite_cache = True):
    seq_ids = collect_seq_ids(prouct_names)

    select_tensor = generate_select_tensor(seq_ids, 32)

    query = f"""
    select
    top %d
        tokennum as num, 
        COALESCE(utp.product_name, p.product_name) as product_name, 
        COALESCE(utp.product_id, p.product_id) as product_id, 
        COALESCE(utp.department_id, p.department_id) as department_id,
        COALESCE(utp.aisle_id, p.aisle_id) as aisle_id
    from
    (
        with tbl as
        (
            select
                REGEXP_SUBSTR(json_report,'([0-9,]+)', 1, 1, 'c') score
            from
                mldb.ONNXPredict(
                on (%s)
                    on (select * from %s where model_id = '%s') dimension
                    using
                        Accumulate('input_0_0')
                        %s
                ) a
            )
        SELECT 
            tokennum, 
            cast(seq_product_id as int) seq_product_id
        FROM TABLE (STRTOK_SPLIT_TO_TABLE(1, tbl.score, ',')
            RETURNS (outkey INTEGER,
                    tokennum INTEGER,
                    seq_product_id VARCHAR(30) CHARACTER SET UNICODE)
                ) AS d
        ) f
        join DEMO_Grocery_Data.product_id_to_seq_product_id pitspi on pitspi.seq_product_id = f.seq_product_id
        join DEMO_Grocery_Data.products p on p.product_id = pitspi.product_id
        left join DEMO_Grocery_Data.user_top_products utp on utp.user_id = %d and utp.department_id = p.department_id 
    where COALESCE(utp.product_name, p.product_name) not in (%s)
    order by tokennum
    """%(
        rec_number,
        select_tensor, 
        table_name, 
        model_id, 
        f"OverwriteCachedModel('%s')"%model_id if overwrite_cache else "",
        user_id,
        "'"+"','".join(prouct_names[::-1])+"'"
        )

    # print(query)
    result = {}

    sql_res = DataFrame.from_query(query).to_pandas()

    for index, row in sql_res.iterrows():
        result[row["num"]] = {"product_name": row["product_name"], "product_id": row["product_id"], "department_id": row["department_id"], "aisle_id": row["aisle_id"]}

    return result

In [None]:
# Breakfast guy

recomendations_dict = get_recomendations(['Egg', 'Center Cut Bacon', 'Limited Edition Pumpkin Spice Cheerios Cereal'], 170516)
pd.DataFrame.from_dict(recomendations_dict, orient='index').sort_index()

In [None]:
# Hard day night guy

recomendations_dict = get_recomendations(['Ultra Light Beer'], 170464)
pd.DataFrame.from_dict(recomendations_dict, orient='index').sort_index()

In [None]:
# Italian cocktails lover

recomendations_dict = get_recomendations(['Gelato Dessert, Amalfi Lemon', 'Prosecco'], 129098)
pd.DataFrame.from_dict(recomendations_dict, orient='index').sort_index()

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Conclusion</b></p>

<div style="font-size: 16px; font-family: Arial;color:#00233C">
    <p>The powerful combination of Vantage and AzureML opens up possibilities for creating end-to-end generative AI pipelines. Leveraging in-database functions on Vantage facilitates efficient data exploration and preparation.</p>
    <p>The training process for Large Language Models (LLM) benefits from AzureML GPU instances, enabling faster and more effective model development. Once trained, the LLM can be deployed as an ONNX model within Vantage.</p>
    <p>With the deployed model, you can scale its operationalization and seamlessly integrate it with your operational data, unlocking the potential for enhanced language-based recommendations and other innovative applications.</p>
</div>

<!-- ![pipeline](images/bertrec_pipeline.png) -->

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>7. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Cleanup work tables to prevent errors next time.</p>

In [None]:
db_drop_table(table_name = 'onnx_models', schema_name = 'demo_user')

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables and databases created above.</p>

In [None]:
%run -i ../run_procedure.py "call remove_data('DEMO_Grocery_Data');"        # Takes 10 seconds

In [None]:
remove_context()

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2023, 2024. All Rights Reserved
        </div>
    </div>
</footer>