<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Language Models<br>
   <span style="font-size: 24px;">An Introduction to Parallel CPU Inferencing of HuggingFace Models in Vantage</span>
       
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial;'>
Hugging Face is a French-American company based in New York City that develops computation tools for building applications using machine learning. They are known for their <b>Transformers Library</b> which provides open-source implementations of transformer models for text, image, video, audio tasks including time-series. These models include well-known architectures like BERT and GPT. The library is compatible with PyTorch, TensorFlow, and JAX deep learning libraries. <br>
    Deep Learning Models in HuggingFace are pre-trained by users/open source outfits/companies on various types of data – NLP, Audio, Images, Videos etc. Most popular tool of choice by users is PyTorch (open source python library) which helps create a Deep Learning model from scratch or take an existing model, retrain/fine-tune (Transfer Learning) on new set of data to be published in HF. Models can be inference with CPUs and GPUs with slight performance improvement for smaller models.<br>
</p>
<p style = 'font-size:18px;font-family:Arial;'><b>Why Vantage?</b></p>  
<p style = 'font-size:16px;font-family:Arial;'>As many Hugging Face models are available in <b>ONNX Runtime</b>, we can load them using the <b>BYOM</b> feature of Vantage and run them in Vantage. Because of <b>Graph Optimizations</b> on ONNX Runtime, there are proven benchmarks that show that inference with <b>ONNX Runtime will be 20% faster than a native PyTorch model on a CPU</b>. </p>
    
<p style = 'font-size:16px;font-family:Arial;'><b>Vantage Parallelism</b> on top of boosted ONNX Runtime inference can turn a Vantage system as effective as inference on GPUs. If we have a <b>Vantage box with 72 AMPs</b>, assuming the table is perfectly distributed, it will <b>closely match the performance of a dedicated GPU and data never moves across the network saving time and I/O operations</b>. As parallelism increases with number of AMPs, the model inference will complete faster in Teradata Vantage with the same amount of text data vs a GPU. We can of course quantize the model (change float8 weights to int8/int4) for inference on CPU to go even faster with some tradeoff with accuracy. However, If Model size goes up GPU advantage will widen – example LLM like LLama3 and costs will be disproportionate with GPU but for smaller models we can get comparable performance. 
</p>

<p style = 'font-size:18px;font-family:Arial;'><b>Overall flow:</b></p>  

<center><img src="images/pat1.png" alt="Design pattern 1" width=1200 height=900 style="border: 4px solid #404040; border-radius: 10px;"/></center>

<hr style='height:2px;border:none;'>
<b style = 'font-size:20px;font-family:Arial;'>1. Configuring the environment</b>


<p style = 'font-size:16px;font-family:Arial;'>Here, we import the required libraries, set environment variables and environment paths (if required).</p>

In [None]:
# Standard libraries
import warnings
import json

# Teradata libraries
from teradataml import (
    create_context,
    delete_byom,
    display,
    execute_sql,
    save_byom,
    remove_context,
    DataFrame
)

display.max_rows = 5

# Suppress warnings
warnings.filterwarnings("ignore")
warnings.simplefilter(action="ignore", category=DeprecationWarning)
warnings.simplefilter(action="ignore", category=RuntimeWarning)
warnings.simplefilter(action="ignore", category=FutureWarning)

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>2. Connect to Vantage</b></p>

<p style = 'font-size:16px;font-family:Arial;'>We will be prompted to provide the password. We will enter the password, press the Enter key, and then use the down arrow to go to the next cell. Begin running steps with Shift + Enter keys.</p>

In [None]:
%run -i ../startup.ipynb
eng = create_context(host='host.docker.internal', username='demo_user', password=password)
print(eng)

In [None]:
%%capture
execute_sql("SET query_band='DEMO=Language_Model_Init_Python.ipynb;' UPDATE FOR SESSION;")

<p style = 'font-size:16px;font-family:Arial;'>Optional step – We should execute the below step only if we want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../run_procedure.py "call space_report();"

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>3. HuggingFace Model installation</b></p>
<p style = 'font-size:16px;font-family:Arial;'>In the below steps we will download and install the HuggingFace Model in Vantage.</p> 

<p style = 'font-size:16px;font-family:Arial;'>To generate embeddings, we need an ONNX model capable of transforming text into vector representations. We will use a pretrained model from [Teradata's Hugging Face repository]  <a href = 'https://huggingface.co/Teradata/bge-small-en-v1.5'>bge-small-en-v1.5</a>. The bge-small-en model is a small-scale English text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) as part of their FlagEmbedding project.The model and its tokenizer are downloaded and stored in Vantage tables as BLOBs using the save_byom function.</p>

In [None]:
import os
os.environ["HF_HUB_DISABLE_PROGRESS_BARS"] = "1"
os.environ["HF_HUB_DISABLE_SYMLINKS_WARNING"] = "1"
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "0"

In [None]:
from huggingface_hub import hf_hub_download

model_name = "bge-small-en-v1.5"
number_dimensions_output = 384
model_file_name = "model.onnx"

In [None]:
# Step 1: Download Model from Teradata HuggingFace Page

hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"onnx/{model_file_name}", local_dir="./")
hf_hub_download(repo_id=f"Teradata/{model_name}", filename=f"tokenizer.json", local_dir="./")


In [None]:
try:
    db_drop_table("embeddings_models")
except:
    pass
try:
    db_drop_table("embeddings_tokenizers")
except:
    pass

In [None]:
# Step 2: Load Models into Vantage
# a) Embedding model
save_byom(model_id = model_name, # must be unique in the models table
               model_file = f"onnx/{model_file_name}",
               table_name = 'embeddings_models' )
# b) Tokenizer
save_byom(model_id = model_name, # must be unique in the models table
              model_file = 'tokenizer.json',
              table_name = 'embeddings_tokenizers') 

<p style = 'font-size:16px;font-family:Arial;'>Recheck the installed model and tokenizer

In [None]:
df_model = DataFrame('embeddings_models')
df_model

In [None]:
df_token = DataFrame('embeddings_tokenizers')
df_token

<hr style="height:2px;border:none;">
<p style = 'font-size:20px;font-family:Arial;'><b>5. Next Steps</b></p>
<p style = 'font-size:16px;font-family:Arial;'> Now we have initialized and loaded the model into Vantage, we can proceed further with the business usecase.      
</p>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial;'> <b>Clean up </b></p>
<p style = 'font-size:16px;font-family:Arial;'>The following code will remove the context.</p>

In [None]:
remove_context()

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>