<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       IVSM Banking Customer Churn Model Install
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial'>
Hugging Face is a French-American company based in New York City that develops computation tools for building applications using machine learning. They are known for their <b>Transformers Library</b> which provides open-source implementations of transformer models for text, image, video, audio tasks including time-series. These models include well-known architectures like BERT and GPT. The library is compatible with PyTorch, TensorFlow, and JAX deep learning libraries. <br>
    Deep Learning Models in HuggingFace are pre-trained by users/open source outfits/companies on various types of data – NLP, Audio, Images, Videos etc. Most popular tool of choice by users is PyTorch (open source python library) which helps create a Deep Learning model from scratch or take an existing model, retrain/fine-tune (Transfer Learning) on new set of data to be published in HF. Models can be inference with CPUs and GPUs with slight performance improvement for smaller models.<br>
</p>
<p style = 'font-size:18px;font-family:Arial'><b>Why Vantage?</b></p>  
<p style = 'font-size:16px;font-family:Arial'>As many Hugging Face models are available in <b>ONNX Runtime</b>, we can load them using the <b>BYOM</b> feature of Vantage and run them in Vantage. Because of <b>Graph Optimizations</b> on ONNX Runtime, there are proven benchmarks that show that inference with <b>ONNX Runtime will be 20% faster than a native PyTorch model on a CPU</b>. </p>
    
<p style = 'font-size:16px;font-family:Arial'><b>Vantage Parallelism</b> on top of boosted ONNX Runtime inference can turn a Vantage system as effective as inference on GPUs. If we have a <b>Vantage box with 72 AMPs</b>, assuming the table is perfectly distributed, it will <b>closely match the performance of a dedicated GPU and data never moves across the network saving time and I/O operations</b>. As parallelism increases with number of AMPs, the model inference will complete faster in Teradata Vantage with the same amount of text data vs a GPU. We can of course quantize the model (change float8 weights to int8/int4) for inference on CPU to go even faster with some tradeoff with accuracy. However, If Model size goes up GPU advantage will widen – example LLM like LLama3 and costs will be disproportionate with GPU but for smaller models we can get comparable performance. 
</p>

<p style = 'font-size:18px;font-family:Arial'><b>Overall flow:</b></p>  

<center><img src="./images/pat1.png" alt="Design pattern 1" width=1200 height=900/></center>

<hr style='height:2px;border:none'>
<b style = 'font-size:20px;font-family:Arial'>1. Configuring the environment</b>

<p style = 'font-size:18px;font-family:Arial'><b>1.1 Install the required libraries</b></p>

In [None]:
%%capture

!pip install optimum sentence_transformers==4.0.2

In [None]:
%%capture

!pip install --upgrade torch

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note: </b><i>Please restart the kernel after executing these two lines. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b>
</i>
    <br>You can remove or comment the <b>%%capture</b> is you want to observe what <i>!pip install</i> is doing. </p>

<hr style="height:2px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>1.2 Import the required libraries</b></p>

<p style = 'font-size:16px;font-family:Arial'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Standard libraries
import os
import getpass
import json

# PDF loader and text processing
# from langchain_community.document_loaders import PyPDFLoader
# from langchain.text_splitter import RecursiveCharacterTextSplitter
# from langchain.callbacks.manager import CallbackManager

# # Language model
# from langchain.llms import Ollama

# Data handling
import pandas as pd
import teradataml as tdml

# ONNX runtime and tools
import onnx
import onnxruntime as rt
from onnxruntime.tools.onnx_model_utils import *

# Transformers and sentence similarity
import transformers
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

from teradataml import (
    create_context,
    delete_byom,
    display,
    execute_sql,
    save_byom,
    remove_context,
)

In [None]:
tdml.configure.val_install_location = "val"

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>2. Connect to Vantage</b></p>

<p style = 'font-size:16px;font-family:Arial'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell. Begin running steps with Shift + Enter keys.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>3. Creation of functions</b></p>
<p style = 'font-size:16px;font-family:Arial'>Below command will create the database and functions required for text summarization and embedding models using Huggingface PyTorch models in Vantage.</p>

In [None]:
with open("commands.json", "r") as file:
    data = json.load(file)

for item in data["queries"]:
    try:
        execute_sql(item["query"])
    except Exception as e:
        print(
            f"The initialization steps have already been executed for this environment!"
        )
        #print(f"Error: {e}")
        pass

<b style = 'font-size:18px;font-family:Arial'>3.1 Drop Tables (if exist)</b>
<p style = 'font-size:16px;font-family:Arial'>Attempts to drop <code>embeddings_models</code> and <code>embeddings_tokenizers</code> tables, ignoring errors if they don't exist.</p>

In [None]:
# Drop embeddings-related tables if they exist
SQL = [
    "DROP TABLE embeddings_models;",
    "DROP TABLE embeddings_tokenizers;"
]

for query in SQL:
    try:
        tdml.execute_sql(query)
    except:
        pass  # Suppress any errors if the tables do not exist

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>4. HuggingFace Model installation</b></p>
<p style = 'font-size:16px;font-family:Arial'>In the below steps we will download and install the HuggingFace Model in Vantage.</p> 

<hr style="height:1px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>4.1 Download the Model using Optium utility</b></p>

<p style = 'font-size:16px;font-family:Arial'>We will be using <a href = 'https://huggingface.co/BAAI/bge-small-en-v1.5'>BAAI/bge-small-en-v1.5</a><br> The bge-small-en model is a small-scale English text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) as part of their FlagEmbedding project.</p>

In [None]:
!optimum-cli export onnx --opset 16 --trust-remote-code -m BAAI/bge-small-en-v1.5 bge-small-en-v1.5-onnx

<hr style="height:1px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>4.2 Model Preparation</b></p>
<p style = 'font-size:16px;font-family:Arial'>In the below steps we will fix dynamic dims, fix versions for compatibility, etc and prepare the model to load in Vantage.</p>

In [None]:
# Set the operator set version
op = onnx.OperatorSetIdProto()
op.version = 16

# Load the original ONNX model
model = onnx.load('bge-small-en-v1.5-onnx/model.onnx')

# Create a new model with a specified IR version and opset
model_ir8 = onnx.helper.make_model(
    model.graph,
    ir_version=8,
    opset_imports=[op]
)

# Fix variable dimension sizes
rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "batch_size", 1)
rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "sequence_length", 512)
rt.tools.onnx_model_utils.make_dim_param_fixed(model_ir8.graph, "Divsentence_embedding_dim_1", 384)

# Remove the unnecessary "token_embeddings" output
for node in model_ir8.graph.output:
    if node.name == "token_embeddings":
        model_ir8.graph.output.remove(node)

# Save the updated model
onnx.save(model_ir8, 'bge-small-en-v1.5-onnx/model_fixed.onnx')

<hr style="height:1px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>4.3 Model Results validation</b></p>
<p style = 'font-size:16px;font-family:Arial'>Checking that everything works with ONNX format locally.</p>

In [None]:
sentences_1 = u'How is the weather today?'
sentences_2 = u'What is the current weather like today?'

In [None]:
# Load the tokenizer and ONNX model session
tokenizer = transformers.AutoTokenizer.from_pretrained("./bge-small-en-v1.5-onnx")
predef_sess = rt.InferenceSession("bge-small-en-v1.5-onnx/model_fixed.onnx")

In [None]:
# Tokenize the first sentence
enc = tokenizer(sentences_1, max_length=512, padding='max_length')

# Run inference to get embeddings for the first sentence
result = predef_sess.run(
    None,
    {
        "input_ids": [enc.input_ids],
        "attention_mask": [enc.attention_mask]
    }
)

# Tokenize the second sentence
enc2 = tokenizer(sentences_2, max_length=512, padding='max_length')

# Run inference to get embeddings for the second sentence
result2 = predef_sess.run(
    None,
    {
        "input_ids": [enc2.input_ids],
        "attention_mask": [enc2.attention_mask]
    }
)

In [None]:
print(cos_sim(result[0][0], result2[0][0]))

In [None]:
# Load the SentenceTransformer model
model = SentenceTransformer('BAAI/bge-small-en-v1.5')

# Generate normalized embeddings for both sentences
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)

# Print the cosine similarity between the two embeddings
print(cos_sim(embeddings_1, embeddings_2))

<hr style="height:1px;border:none">
<p style = 'font-size:18px;font-family:Arial'><b>4.4 Save the Model</b></p>
<p style = 'font-size:16px;font-family:Arial'>In above steps, we have checked that the model is working fine in ONNX format. Now we will save the model file.</p>

In [None]:
try:
    tdml.save_byom('bge-small-en-v1.5')
except Exception as e:
    print(f"The model bge-small-en-v1.5 already exist.")
    pass

try:
    tdml.save_byom('bge-small-en-v1.5-onnx/model_fixed.onnx')
except Exception as e:
    print(f"The model bge-small-en-v1.5-onnx/model_fixed.onnx already exist.")
    pass

try:
    tdml.save_byom('embeddings_models')
except Exception as e:
    print(f"The model embeddings_models already exist.")
    pass
          

<hr style="height:2px;border:none">
<b style = 'font-size:20px;font-family:Arial'>5. Cleanup</b>
<p style = 'font-size:16px;font-family:Arial'>The following code will remove the context.</p>

In [None]:
tdml.remove_context()

<footer style="padding-bottom:35px; border-bottom:3px solid ">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>