<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       IVSM Banking Customer Churn Model Install
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>Import the required libraries</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [7]:
# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Standard libraries
import os
import getpass
import json

# PDF loader and text processing
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.callbacks.manager import CallbackManager

# Language model
from langchain.llms import Ollama

# Data handling
import pandas as pd
import teradataml as tdml

# ONNX runtime and tools
import onnx
import onnxruntime as rt
from onnxruntime.tools.onnx_model_utils import *

# Transformers and sentence similarity
import transformers
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim


In [2]:
tdml.configure.val_install_location = "VAL_USER"

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>1. Initiate a connection to Vantage</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [3]:
# Change host and/or username as needed
engine = tdml.create_context(
    host='host.docker.internal',
    username='demo_user',
    password=getpass.getpass(prompt='Password:'),
    logmech="TD2",
    encryptdata=True,
    database='demo_user'
)

Password: ···········


<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.1 Creation of functions</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Below command will create the database and functions required for text summarization and embedding models using Huggingface PyTorch models in Vantage.</p>

In [8]:
with open("commands.json", "r") as file:
    data = json.load(file)

for item in data["queries"]:
    try:
        execute_sql(item["query"])
    except Exception as e:
        print(
            f"The initialization steps have already been executed for this environment!"
        )
        # print(f"Error: {e}")
        pass

The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already been executed for this environment!
The initialization steps have already 

<b style = 'font-size:18px;font-family:Arial;color:#00233C'>1.2 Drop Tables (if exist)</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Attempts to drop <code>embeddings_models</code> and <code>embeddings_tokenizers</code> tables, ignoring errors if they don't exist.</p>

In [9]:
# Drop embeddings-related tables if they exist
SQL = [
    "DROP TABLE embeddings_models;",
    "DROP TABLE embeddings_tokenizers;"
]

for query in SQL:
    try:
        tdml.execute_sql(query)
    except:
        pass  # Suppress any errors if the tables do not exist

!optimum-cli export onnx --opset 16 --trust-remote-code -m BAAI/bge-small-en-v1.5 bge-small-en-v1.5-onnx

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Model Preparation</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the below steps we will fix dynamic dims, fix versions for compatibility, etc and prepare the model to load in Vantage.</p>

In [10]:
# Set the operator set version
op = onnx.OperatorSetIdProto()
op.version = 16

# Load the original ONNX model
model = onnx.load('bge-small-en-v1.5-onnx/model.onnx')

# Create a new model with a specified IR version and opset
model_ir8 = onnx.helper.make_model(
    model.graph,
    ir_version=8,
    opset_imports=[op]
)

# Fix variable dimension sizes
rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "batch_size", 1)
rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "sequence_length", 512)
rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "Divsentence_embedding_dim_1", 384)

# Remove the unnecessary "token_embeddings" output
for node in model_ir8.graph.output:
    if node.name == "token_embeddings":
        model_ir8.graph.output.remove(node)

# Save the updated model
onnx.save(model_ir8, 'bge-small-en-v1.5-onnx/model_fixed.onnx')

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>2.1 Model Results validation</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Checking that everything works with ONNX format locally.</p>

In [11]:
sentences_1 = u'How is the weather today?'
sentences_2 = u'What is the current weather like today?'

In [12]:
# Load the tokenizer and ONNX model session
tokenizer = transformers.AutoTokenizer.from_pretrained("./bge-small-en-v1.5-onnx")
predef_sess = rt.InferenceSession("bge-small-en-v1.5-onnx/model_fixed.onnx")

In [13]:
# Tokenize the first sentence
enc = tokenizer(sentences_1, max_length=512, padding='max_length')

# Run inference to get embeddings for the first sentence
result = predef_sess.run(
    None,
    {
        "input_ids": [enc.input_ids],
        "attention_mask": [enc.attention_mask]
    }
)

# Tokenize the second sentence
enc2 = tokenizer(sentences_2, max_length=512, padding='max_length')

# Run inference to get embeddings for the second sentence
result2 = predef_sess.run(
    None,
    {
        "input_ids": [enc2.input_ids],
        "attention_mask": [enc2.attention_mask]
    }
)

In [14]:
print(cos_sim(result[0][0], result2[0][0]))

tensor([[0.9186]])


In [15]:
# Load the SentenceTransformer model
model = SentenceTransformer('BAAI/bge-small-en-v1.5')

# Generate normalized embeddings for both sentences
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)

# Print the cosine similarity between the two embeddings
print(cos_sim(embeddings_1, embeddings_2))

tensor([[0.9186]])


<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>2.2 Save the Model</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In above steps, we have checked that the model is working fine in ONNX format. Now we will save the model file.</p>

In [16]:
tdml.save_byom('bge-small-en-v1.5',
              'bge-small-en-v1.5-onnx/model_fixed.onnx',
              'embeddings_models')

Created the model table 'embeddings_models' as it does not exist.
Model is saved.


In [17]:
tdml.save_byom('bge-small-en-v1.5',
              'bge-small-en-v1.5-onnx/tokenizer.json',
              'embeddings_tokenizers')

Created the model table 'embeddings_tokenizers' as it does not exist.
Model is saved.


<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>3. Cleanup</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will remove the context.</p>

In [18]:
tdml.remove_context()

True

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>