<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Unstructured Text Analysis With BYO-LLM and NVIDIA GPU Acceleration
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>


<p style="font-size:20px;font-family:Arial"><b>Introduction:</b></p>

<p style="font-size:16px;font-family:Arial">
This notebook is designed for developers, data scientists, and AI practitioners who want to bring open-source Language Models (LMs) and Large Language Models (LLMs) closer to their data — quickly, securely, and at scale. As organizations race to deploy AI applications, developers face key challenges: selecting the right model, minimizing data movement, ensuring security, and controlling costs.  Teradata’s Bring Your Own LLM (BYO-LLM) capability addresses these challenges by allowing you to deploy open-source <b>Hugging Face</b> models directly inside VantageCloud — where your data already lives.
</p>

<p style="font-size:18px;font-family:Arial"><b>What is BYO-LLM?</b></p>

<p style="font-size:16px;font-family:Arial">
<b>BYO-LLM</b> (Bring Your Own Large Language Model) is one of the VantageCloud Open Analytic Framework (OAF) key capabilities you’ll use in this notebook which gives developers complete control over AI deployment in VantageCloud:
</p>

<ul style="font-size:16px;font-family:Arial">
    <li>Seamlessly integrate open-source models from Hugging Face</li>
    <li>Eliminate the need to move data — reducing cost and compliance risk</li>
    <li>Leverage GPU acceleration for inference speeds up to 200x faster than CPU</li>
    <li>Experiment freely without vendor lock-in, while keeping your data secure and your operations scalable</li>
</ul>

<img src="./images/BYOLLM_Flow.png" alt="Architecture for BYOLLM" style="width: 90%; border: 4px solid #404040; border-radius: 10px;"/>
<p style="font-size:18px;font-family:Arial;"><b>Business Impact of Open Source Language Models (LMs) for NLP tasks such as Unstructured Text Analysis?</b></p>


<p style="font-size:16px;font-family:Arial">
Language Models (LMs) are the foundation for solving Natural Language Processing (NLP) tasks such as unstructured text analysis — enabling machines to understand, interpret, and generate human language. Language Models enable businesses to:
</p>

<ul style="font-size:16px;font-family:Arial">
    <li>Extract key information from documents, contracts, and research papers</li>
    <li>Analyze customer feedback from emails, reviews, and social media</li>
    <li>Power chatbots and virtual assistants for real-time support</li>
    <li>Personalize marketing and customer experiences based on user interactions</li>
</ul>

<p style="font-size:16px;font-family:Arial">
Through unstructured text analysis and open source LMs, businesses can:
</p>
<ul style="font-size:16px;font-family:Arial">
    <li>Improve operational efficiency through automation</li>
    <li>Gain real-time insights into customer sentiment and behavior</li>
    <li>Respond proactively to customer concerns</li>
    <li>Stay competitive by adapting to market and customer trends</li>
</ul>

<p style="font-size:18px;font-family:Arial;"><b>How to Get Access to Run This Demo in VantageCloud</b></p>

<p style="font-size:16px;font-family:Arial">
Gain free access to Teradata’s <b>Open Analytics Framework</b>, which includes support for <b>BYO-LLM capabilities</b> and <b>GPU compute clusters</b>. This enables you to run open-source Hugging Face models directly within your VantageCloud environment</p>

<p style="font-size:16px;font-family:Arial">
To request access and be able to execute this demo, send an email to <a href="mailto:Support.ClearScapeAnalytics@Teradata.com">Support.ClearScapeAnalytics@Teradata.com</a>. Our team will provision your environment with the required permissions for BYO-LLM and GPU-accelerated inference.
</p>

<hr style='height:2px;border:none'>
<b style = 'font-size:20px;font-family:Arial'>1. Configure the environment</b>

In [1]:
%%capture
!pip install -r requirements.txt --quiet

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><b>Please</b><i> restart the kernel after executing the above cell to include/update these libraries into memory for this kernel. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i> and then clicking <b>Restart</b>.</p>
</div>

In [1]:
#Import all the libraries and modules required for this notebook
import teradataml
import getpass
import sys
import pandas as pd
import os
import time

#import teradatagenai
from teradatagenai import TeradataAI, TextAnalyticsAI

from dotenv import load_dotenv
from teradataml import *
from teradatasqlalchemy.types import *
from IPython.display import display as ipydisplay
from os.path import expanduser

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>2. Connect to VantageCloud Lake</b></p>
<p style = 'font-size:16px;font-family:Arial'>Connect to VantageCloud using `create_context` from the teradataml Python library. Input your connection details, including the host, username, password and Analytic Compute Group name.</p>

In [2]:
print("Checking if this environment is ready to connect to VantageCloud Lake...")

if os.path.exists("/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env"):
    print("Your environment parameter file exist.  Please proceed with this use case.")
    # Load all the variables from the .env file into a dictionary
    env_vars = dotenv_values("/home/jovyan/JupyterLabRoot/VantageCloud_Lake/.config/.env")
    # Create the Context
    eng = create_context(host=env_vars.get("host"), username=env_vars.get("username"), password=env_vars.get("my_variable"))
    execute_sql('''SET query_band='DEMO=Entity_Recognition_BYOLLM_VCL.ipynb;' UPDATE FOR SESSION;''')
    print("Connected to VantageCloud Lake with:", eng)
else:
    print("Your environment has not been prepared for connecting to VantageCloud Lake.")
    print("Please contact the support team.")

Checking if this environment is ready to connect to VantageCloud Lake...
Your environment parameter file exist.  Please proceed with this use case.
Connected to VantageCloud Lake with: Engine(teradatasql://dallas54-88vgt0b5i55ikpx7:***@54.156.178.22)


<hr style="height:2px; border:none">

<p style="font-size:20px; font-family:Arial"><b>3. Getting Data for  for In-Database NLP tasks</b></p>
<p style="font-size:16px; font-family:Arial">
We have provided data for this demo on an OFS table <code>financial_entity_dataset</code> inside the default <code>DEMO_EntityRecognition</code> database.
</p>
<p style="font-size:16px; font-family:Arial">
<b>💼 Use Case Summary:</b><br>
In the wealth management industry, financial advisors hold many client meetings each week — discussing portfolios, insurance, loans, and retirement planning. Manually summarizing and tagging these interactions for compliance, CRM updates, or follow-up actions is time-consuming and often inconsistent.
</p>
<p style="font-size:16px; font-family:Arial">
With Teradata’s BYO-LLM capability, we can deploy an Open Source Hugging Face model directly within VantageCloud — where the client interaction data already resides.  In this demo, we’ll perform Named Entity Recognition (NER) with <a href="https://huggingface.co/tner/roberta-large-ontonotes5" target="_blank">tner/roberta-large-ontonotes5</a> for Extracting Key Phrases such as:

<ul style="font-size:16px; font-family:Arial">
  <li>Client names</li>
  <li>Financial institutions</li>
  <li>Product types</li>
  <li>Key dates</li>
</ul> 

</p>
<p style="font-size:16px; font-family:Arial">
This helps financial firms:
<ul style="font-size:16px; font-family:Arial">
  <li>Automate meeting note tagging for faster documentation and regulatory compliance</li>
  <li>Enhance client profiling by identifying frequently discussed financial topics</li>
  <li>Streamline CRM updates by structuring insights from unstructured text</li>
</ul>
</p>
<p style="font-size:16px; font-family:Arial">
All of this is achieved securely — without moving data — and using open-source models, giving teams full control, scalability, and flexibility.
</p>


In [3]:
# Creating a teradataml dataframe using sample data in an OFS table.
call_summary_dataset = DataFrame(in_schema("DEMO_EntityRecognition","Financial_CallCenter_Summary"))
call_summary_dataset.head()



Call_Summary
Amelia Greene joined Mastercard’s AI in Finance webinar hosted in Miami. Discussed implications for her fintech startup’s funding plans.
Attended BlackRock fixed income seminar with Mason Lee in Boston. Evaluated suitability of municipal bond ladder strategy.
Attended BlackRock fixed income seminar with Mason Lee in Boston. Evaluated suitability of municipal bond ladder strategy.
Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.
Benjamin Lin inquired about investing in a Nasdaq IPO opportunity on Sept 18th in New York City. Scheduled follow-up to review suitability.
Benjamin Lin inquired about investing in a Nasdaq IPO opportunity on Sept 18th in New York City. Scheduled follow-up to review suitability.
Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.
Amelia Greene joined Mastercard’s AI in Finance webinar hosted in Miami. Discussed implications for her fintech startup’s funding plans.
Alexander Moore activated a Capital One Venture card in Dallas. Discussed how to optimize miles redemption for business expenses.
Alexander Moore activated a Capital One Venture card in Dallas. Discussed how to optimize miles redemption for business expenses.


<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>4. Authenticate into User Environment Service (UES) for Container Management</b></p>
<p style="font-size:16px; font-family:Arial;">
  The <code>teradataml</code> library offers simple yet powerful methods for creating and managing custom Python runtime environments within VantageCloud. This gives developers full control over model behavior, performance, and analytic accuracy when running on the Analytic Cluster.
</p>

<p style="font-size:16px; font-family:Arial;">
  Custom environments are persistent—created once and reused as needed. They can be saved, updated, or modified at any time, allowing for efficient and flexible environment management.
</p>

<p style="font-size:18px; font-family:Arial; color:#00233C;">
  <b>Container Management Process</b>
</p>

<table style="width:100%; table-layout:fixed;">
  <tr>
    <td style="vertical-align:top;" width="40%">
      <ol style="font-size:16px; font-family:Arial; color:#00233C;">
        <li>Create a unique User Environment based on available base images</li>
        <li>Install libraries</li>
        <li>Install models and additional user artifacts</li>
      </ol>
    </td>
    <td>
      <img src="./images/OAF_Env.png" width="600" alt="Container Management Diagram" style="border: 4px solid #404040; border-radius: 10px;"/>
    </td>
  </tr>
</table>
<p style="font-size:16px;font-family:Arial">
<b>UES authentication</b> is required to create and manage the Python or R environments that we will be creating.  A VantageCloud Lake user can easily create the authentication objects using the Console in a VantageCloud Lake environment.  The step to create these authentication objects has already been performed for you.
</p>
<p style="font-size:16px;font-family:Arial">
   
<ul style="font-size:16px;font-family:Arial; margin-top:4px;">
  <li><a href="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Teradata-Package-for-Python-on-VantageCloud-Lake/Working-with-Open-Analytics/APIs-to-Use-with-Open-Analytics-Framework/API-to-Set-Authentication-Token/set_auth_token">Click here</a> to see more details about using the Teradata APIs to set the authentication objects.</li>

  <li>Check out <a href="https://medium.com/teradata/deploy-hugging-face-llms-on-teradata-vantagecloud-lake-with-nvidia-gpu-acceleration-d94d999edaa5">Step 4</a> of this tutorial to to see more details about configuring a VantageCloud Lake Environment to use our Open Analytics Framework</li>
</ul>

In [4]:
# We've already loaded all the values into our environment variables and into a dictionary, env_vars.
# username=env_vars.get("username") isn't required when using base_url, pat and pem.

if set_auth_token(base_url=env_vars.get("ues_uri"),
                  pat_token=env_vars.get("access_token"), 
                  pem_file=env_vars.get("pem_file"),
                  valid_from=int(time.time())
                 ):
    print("UES Authentication successful")
else:
    print("UES Authentication failed. Check credentials.")
    sys.exit(1)

Authentication token is generated, authenticated and set for the session.
UES Authentication successful


<hr style="height:2px; border:none">
<p style="font-size:20px; font-family:Arial"><b>5. Set Up the User Environment in Teradata VantageCloud Lake for Model Deployment</b></p>
<p style="font-size:16px; font-family:Arial">
Now that <b>UES authentication</b> is complete, we can begin managing user environments using Teradata’s API capabilities.
We will start by listing the available base libraries available for the environments.  Next, we'll check if you have already created an environment and then create one if one doesn't exist.  We will attempt to manage the number of OAF environments by deleting any that you may have created the do not use our default environment name. We will be using the <code>get_env</code>, <code>create_env</code>, and <code>list_user_envs</code> API methods.
</p>
<p style="font-size:16px;font-family:Arial">
An easy way to get help and see the details for the different methods is to position your cursor within text of the method or function and then use your <code>Shift TAB</code> keys.  If the kernel is not busy, a help window will open with details about the parameters and how to use the function.</p>

In [5]:
# Check if we have any existing environments
# We can continue useing an existing environment
# If any other environments exist along with our default OAF environment, we will delete them

environment_name = env_vars.get("username")
print("Here is a list of the versions of the libraries available to be used within an OAF environments.\n")
print(list_base_envs())
env_list = list_user_envs()

if env_list is None:
    print("This user does not have any environments.\nCreating your environment now.")
    demo_env = create_env(env_name=f'{environment_name}', base_env='python_3.10', desc='BYOLLM demo env')
    print(demo_env)
else:
    print("\nHere is a list of your current environments:")
    ipydisplay(env_list)
    for env_name in env_list['env_name']:
        if env_name == environment_name:
            print("Your default environment already exists. You can continue with this notebook.\n\n")
        else:
            print(f"Your existing environment, {env_name} doesn't match our default environment for this user.")
            print("We're going to delete it.")      
            print(f"Please wait: Environment {env_name} is being removed!")
            remove_env(env_name)

Here is a list of the versions of the libraries available to be used within an OAF environments.

     base_name language  version
0   python_3.9   Python   3.9.20
1  python_3.10   Python  3.10.15
2  python_3.11   Python  3.11.10
3        r_4.3        R    4.3.3
4        r_4.4        R    4.4.2
No user environment(s) found.
This user does not have any environments.
Creating your environment now.
User environment 'dallas54-88vgt0b5i55ikpx7' created.

Environment Name: dallas54-88vgt0b5i55ikpx7
Base Environment: python_3.10
Description: BYOLLM demo env

############ Libraries installed in User Environment ############

         name version
0         pip  25.0.1
1  setuptools  78.1.0




<p style="font-size:16px;font-family:Arial">
Install the required libraries into your user environment on the GPU Compute Cluster. For most use cases involving Hugging Face, these will be the <b>transformers</b> and <b>torch</b> python libraries.  This could take up to 10 minutes to complete.  Please wait until you see the <b>Libraries installed</b> message.
</p>

In [6]:
lib_claim_id = pd.DataFrame()
lib_claim_id = demo_env.install_lib(["transformers", "torch"])
print("Libraries Installed") 

Libraries Installed


In [7]:
#Get the status of the libraries installation
demo_env.status(str(lib_claim_id["Claim Id"].iloc[0]))

Unnamed: 0,Claim Id,File/Libs/Model,Method Name,Stage,Timestamp,Additional Details
0,6e9b2549-007b-45e7-9e2b-69280de7fd0e,"transformers, torch",install_lib,Started,2025-07-23T20:08:28Z,
1,6e9b2549-007b-45e7-9e2b-69280de7fd0e,"transformers, torch",install_lib,Finished,2025-07-23T20:18:50Z,


<p style="font-size:16px;font-family:Arial">
Now we can configure the TextAnalyticsAI object with the preferred large language model using the TeradataAI object. This will enable us to execute a variety of text analytics tasks. </p>

<hr style="height:2px;border:none">
<p style="font-size:20px;font-family:Arial"><b>6. Download the Hugging Face Language Model and upload to your OAF (UES) environment</b></p>
<p style="font-size:16px;font-family:Arial">
You can download Hugging Face LLMs in either <a href="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Build-Scalable-Analytics-with-Open-Analytics-Framework/Bring-Your-Own-LLM-and-DL-Workloads/Using-Hugging-Face-LLMs"> native format or streamlined format</a>. Here we use the streamlined format to download the model which will download it to our local system directory.
</p>

In [8]:
# Load model directly #save streamlined format under local path
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("tner/roberta-large-ontonotes5")
model = AutoModelForTokenClassification.from_pretrained("tner/roberta-large-ontonotes5")

tokenizer.save_pretrained("./roberta-large-ontonotes5")
model.save_pretrained("./roberta-large-ontonotes5")


In [9]:
#Compress the model into zip file
# For the use cases when the model folder is in another directory than the current directory, root_dir should be the abosulte path.
import shutil
shutil.make_archive('roberta-large-ontonotes5', format='zip', root_dir='roberta-large-ontonotes5')


'/home/jovyan/JupyterLabRoot/VantageCloud_Lake/Entity_Recognition_BYOLLM/roberta-large-ontonotes5.zip'

<p style="font-size:16px;font-family:Arial">
After the LLM directories are compressed into zip files, you can use the <a href = "https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Build-Scalable-Analytics-with-Open-Analytics-Framework/Bring-Your-Own-LLM-and-DL-Workloads/Using-User-Environment-APIs-to-Manage-LLMs">UES APIs</a> to install, uninstall, and list LLMs. Here we use the <code>install_model</code> API.</p>



In [10]:
model_claim_id = ''
model_claim_id = demo_env.install_model(model_path='roberta-large-ontonotes5.zip')

Request for install_model is completed successfully.


<p style="font-size:16px;font-family:Arial">
Check the status of the model installation. There should be 3 Stages:
<ol start="0">
    <li>Endpoint Generated
    <li>File Uploaded
    <li>File Installed
</ol></p>
<p style="font-size:16px;font-family:Arial">
It's ok to re-execute the status() statement if you don't see all 3 Stages.</p>

In [11]:
#Get the status of the libraries installation
demo_env.status(str(model_claim_id["Claim Id"].iloc[0]))

Unnamed: 0,Claim Id,File/Libs/Model,Method Name,Stage,Timestamp,Additional Details
0,4c7a492c-701e-4ebc-868c-471297218ce5,roberta-large-ontonotes5.zip,install_model,Endpoint Generated,2025-07-23T20:45:26Z,
1,4c7a492c-701e-4ebc-868c-471297218ce5,roberta-large-ontonotes5.zip,install_model,File Uploaded,2025-07-23T20:45:42Z,
2,4c7a492c-701e-4ebc-868c-471297218ce5,roberta-large-ontonotes5.zip,install_model,File Installed,2025-07-23T20:47:53Z,


<hr style="height:2px;border:none">
<p style="font-size:20px;font-family:Arial"><b>7. Create a python script and execute it using the Apply Class</b></p>

<p style="font-size:16px;font-family:Arial">
Create a python script that reads text records from VantageCloud using standard input, applies a Hugging Face language model to perform Named Entity Recognition (NER), and outputs the extracted entities in a structured, delimited format using standard output.
</p>

In [12]:
%%writefile entity_recognition.py

#!/usr/bin/env python3
import sys
import warnings

warnings.simplefilter('ignore')
input_str = sys.stdin.read()

DELIMITER = '#'

if len(input_str) > 0:
    from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
    torch_device = 'cuda'
    model_path = "./models/roberta-large-ontonotes5"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForTokenClassification.from_pretrained(model_path)
    translator = pipeline("token-classification", model=model, tokenizer=tokenizer, device=torch_device, aggregation_strategy='max')

    for line in input_str.splitlines():
        results = translator(line)
        dict_val = {}

        for r in results:
            entity = r['entity_group']
            word = r['word']
            dict_val.setdefault(entity, []).append(word)

        combined_str = ""
        for key in ["ORG", "PERSON", "DATE", "PRODUCT", "GPE"]:
            combined_str += f"{DELIMITER}{','.join(dict_val.get(key, []))}"

        print(f"{line}{combined_str}")


Writing entity_recognition.py


<p style="font-size:16px;font-family:Arial">
Install the python script file to the environment using install_file().
</p>

In [13]:
demo_env.install_file(file_path ="entity_recognition.py", replace=True)


File 'entity_recognition.py' replaced successfully in the remote user environment 'dallas54-88vgt0b5i55ikpx7'.


True

<p style="font-size:16px;font-family:Arial">
Set the session to the GPU Analytic compute group you desire otherwise it will set to default. The cluster needs to be running to execute the APPLY class.</p>

In [14]:
gpu_compute_group = env_vars.get("gpu_compute_group")
execute_sql(f"SET SESSION COMPUTE GROUP {gpu_compute_group};")
print(f"Compute group set to {gpu_compute_group}") 

Compute group set to GPUGroup


<p style="font-size:16px;font-family:Arial">
The APPLY class executes your Python script directly within the user environment, enabling in-database processing at scale. It reads text from each row of the dataset, performs Named Entity Recognition (NER) using Hugging Face language models accelerated by NVIDIA GPUs, and outputs the extracted entities into a structured dataframe in just seconds. </p>  

<p style="font-size:16px;font-family:Arial"> In this example, we use <code>tner/roberta-large-ontonotes5</code>, a general-purpose NER model trained on the OntoNotes 5 dataset. It supports entities like ORG, PERSON, DATE, PRODUCT, and GPE. However, it's important to note that the “PRODUCT” entity in OntoNotes refers primarily to physical products (e.g., iPhone, Windows OS), not financial instruments (e.g., Roth IRA, mutual funds). To improve financial domain accuracy, this model can be further fine-tuned on domain-specific data to better recognize investment products, insurance types, and retirement accounts.
 </p> 

In [15]:
apply_obj = Apply(data = call_summary_dataset,
                  apply_command = 'python entity_recognition.py',
                  returns = {"Call_Summary": VARCHAR(64000), "ORG": VARCHAR(64000), "PERSON": VARCHAR(64000), "DATES": VARCHAR(64000), "PRODUCT": VARCHAR(64000), "GPE": VARCHAR(64000)},
                  env_name = f'{environment_name}',
                  delimiter = '#',
                  quotechar = '|'
                 )

import time

start = time.time()

# Execute the Python script inside the remote user environment.
df = apply_obj.execute_script()

print(f'Time: {time.time() - start}')

df


Time: 45.3939368724823




Call_Summary,ORG,PERSON,DATES,PRODUCT,GPE
Quarterly check-in with Noah Patel on April 20th in San Francisco. Portfolio rebalance completed shifted 5% from bonds to growth ETFs.,,Noah Patel,"Quarterly, April 20th",,San
Call with James Delgado on April 2nd in Chicago to discuss employer 401(k) rollover to a Fidelity IRA. Sent transfer paperwork via DocuSign.,Fidelity,James Delgado,April 2nd,,Chicago
Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.,Liberty Mutual.,Ava Johnson,July 12th,,Phoenix.
Mortgage consultation with Isabella Chen on March 1st in Seattle. Compared preapproval options from Wells Fargo and Chase.,"Wells Fargo, Chase.",Isabella Chen,March 1st,,
"Ethan Wright completed a full financial plan update on June 30th in Denver. Updated risk tolerance, added crypto exposure via Coinbase.",Coinbase.,Ethan Wright,June 30th,,
Elijah Brooks executed multiple trades via Robinhood in Las Vegas. Cautioned on concentration risk in tech sector.,Robinhood,Elijah Brooks,,,Las Vegas.
Reviewed insurance coverage with Sophia Martinez on May 10th in Austin. Quoted State Farm term life policy and advised on umbrella coverage.,State Farm,Sophia Martinez,May 10th,,Austin.
Crypto education session with Mia Davis on Feb 1st in San Diego. Walked through setting up a Coinbase Pro account and cold wallet storage.,Coinbase,Mia Davis,Feb 1st,,San
Harper Nguyen approved for SBA loan through JPMorgan in Houston. Reviewed repayment options and cash flow strategy.,"SBA, JPMorgan",Harper Nguyen,,,Houston.
Lucas Taylor has a financial planning session on March 29th in Portland. Primary focus: saving for childâ s education via 529 plan.,,Lucas Taylor,March 29th,,


<p style="font-size:16px;font-family:Arial">
  You can explore additional Natural Language Processing (NLP) tasks directly in-database using task-specific models:
</p>

<ul style="font-size:15px;font-family:Arial">
  <li><strong>Text Classification</strong> — <a href="https://huggingface.co/facebook/bart-large-mnli" target="_blank">facebook/bart-large-mnli</a>: Classifies text into predefined categories</li>
  <li><strong>Language Detection</strong> — <a href="https://huggingface.co/papluca/xlm-roberta-base-language-detection" target="_blank">papluca/xlm-roberta-base-language-detection</a>: Detects the language</li>
  <li><strong>Generating Embeddings</strong> — <a href="https://huggingface.co/sentence-transformers/all-mpnet-base-v2" target="_blank">sentence-transformers/all-mpnet-base-v2</a>: Converts text into vector representations for similarity searches</li>
  <li><strong>Named Entity Recognition</strong> — <a href="https://huggingface.co/tner/roberta-large-ontonotes5" target="_blank">tner/roberta-large-ontonotes5</a>: Identifies and categorizes named entities within unstructured text</li>
  <li><strong>Extracting Key Phrases</strong> — <a href="https://huggingface.co/ml6team/keyphrase-extraction-kbir-kpcrowd" target="_blank">ml6team/keyphrase-extraction-kbir-kpcrowd</a>: Extracts important phrases from a document to summarize its content</li>
  <li><strong>Grammar Correction</strong> — <a href="https://huggingface.co/pszemraj/flan-t5-large-grammar-synthesis" target="_blank">pszemraj/flan-t5-large-grammar-synthesis</a>: Automatically corrects grammatical errors in text</li>
  <li><strong>Masking PII Entities</strong> — <a href="https://huggingface.co/ab-ai/pii_model" target="_blank">ab-ai/pii_model</a>: Masks personally identifiable information (PII)</li>
  <li><strong>Sentiment Analysis</strong> — <a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english" target="_blank">distilbert-base-uncased-finetuned-sst-2-english</a>: Determines the emotional tone (positive, negative, neutral)</li>
  <li><strong>Sentence Similarity</strong> — <a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2" target="_blank">sentence-transformers/all-MiniLM-L6-v2</a>: Measures semantic similarity between sentences</li>
  <li><strong>Summarization</strong> — <a href="https://huggingface.co/facebook/bart-large-cnn" target="_blank">facebook/bart-large-cnn</a>: Generates concise summaries of longer documents</li>
  <li><strong>Translation</strong> — <a href="https://huggingface.co/Helsinki-NLP/opus-mt-en-fr" target="_blank">Helsinki-NLP/opus-mt-en-fr</a>: Translates English text to French</li>
</ul>



<hr style='height:2px;border:none;'>
<b style="font-size:20px;font-family:Arial">8. Automate Entity Recognition with <code>teradatagenai</code> Python Package</b>
<p style="font-size:16px;font-family:Arial">
  To simplify and accelerate setup, you can now use the <code>teradatagenai</code> package to automate model deployment and inference. This Python library enables seamless, in-database, AI-driven text analytics within Teradata VantageCloud. It offers a variety of user-friendly functions that wrap API calls for common text analysis tasks, making it easy to apply LLMs to proprietary unstructured data.
    
</p>

<p style="font-size:16px;font-family:Arial">
  Built on Teradata VantageCloud’s open and connected architecture, this solution uses BYOLLM capability and enables teams to rapidly develope generative AI use cases that enhance customer experiences, improve employee productivity, and streamline operations—all within a secure, scalable environment.
</p>

<p style="font-size:16px;font-family:Arial"> TextAnalyticsAI gives us access to the 11+ genAI functions and enables users to use advanced text analytics capabilities seamlessly on data stored in Vantage.
<ul style="font-size:16px; font-family:Arial; margin-left:20px; line-height:1.8;">
  <li><code>classify()</code> – Classify text into predefined categories</li>
  <li><code>analyze_sentiment()</code> – Perform sentiment analysis</li>
  <li><code>detect_language()</code> – Detect the language of a text</li>
  <li><code>embeddings()</code> – Generate embeddings for similarity search</li>
  <li><code>recognize_entities()</code> – Extract named entities</li>
   <li><code>recognize_pii_entities()</code> – Detect and label PII entities</li>
  <li><code>extract_key_phrases()</code> – Identify key phrases in text</li>
  <li><code>mask_pii()</code> – Mask personally identifiable information (PII)</li>
  <li><code>sentence_similarity()</code> – Measure semantic similarity between sentences</li>
  <li><code>summarize()</code> – Generate summaries of longer documents</li>
  <li><code>translate()</code> – Translate text between languages</li>
</ul>

</p>


<p style="font-size:16px;font-family:Arial">  
TeradataAI handles the download and installation of the Hugging Face model (example: <i>'tner/roberta-large-ontonotes5</i>) in the user's environment. Incase the environment is not specified, a sample environment named <i>'td_gen_ai_env'</i> is created with <code>torch</code> and <code>transformer</code> libraries and their dependencies. The TeradataAI class will manage the entire setup process. In the background, this process utilizes Teradata’s Bring Your Own Large Language Model (BYO LLM) offering.</p>

<p style="font-size:16px;font-family:Arial">For this next example we're going to continue using our default the environment we just created and re-install the same <code>tner/roberta-large-ontonotes5</code> with the <code>teradatagenai</code> package. </p>

In [16]:
# Check if we have any existing environments
# We can continue useing an existing environment
# If any other environments exist along with our default OAF environment, we will delete them

environment_name = env_vars.get("username")
print("Here is a list of the versions of the libraries available to be used within an OAF environments.\n")
print(list_base_envs())
env_list = list_user_envs()

if env_list is None:
    print("This user does not have any environments.\nCreating your environment now.")
    demo_env = create_env(env_name=f'{environment_name}', base_env='python_3.10', desc='BYOLLM demo env')
    print(demo_env)
else:
    print("\nHere is a list of your current environments:")
    ipydisplay(env_list)
    for env_name in env_list['env_name']:
        if env_name == environment_name:
            print("Your default environment already exists. You can continue with this notebook.\n\n")
        else:
            print(f"Your existing environment, {env_name} doesn't match our default environment for this user.")
            print("We're going to delete it.")      
            print(f"Please wait: Environment {env_name} is being removed!")
            remove_env(env_name)

Here is a list of the versions of the libraries available to be used within an OAF environments.

     base_name language  version
0   python_3.9   Python   3.9.20
1  python_3.10   Python  3.10.15
2  python_3.11   Python  3.11.10
3        r_4.3        R    4.3.3
4        r_4.4        R    4.4.2

Here is a list of your current environments:


Unnamed: 0,env_name,env_description,base_env_name,language,conda
0,dallas54-88vgt0b5i55ikpx7,BYOLLM demo env,python_3.10,Python,False


Your default environment already exists. You can continue with this notebook.




In [17]:
#Define your model and initialize the TeradataAI Class
model_name = 'tner/roberta-large-ontonotes5'
model_args = {'transformer_class': 'AutoModelForTokenClassification',
              'task' : 'token-classification'}
ues_args = {'env_name': f'{environment_name}'}

llm = TeradataAI(api_type = "hugging_face",
                 model_name = model_name,
                 model_args = model_args,
                 ues_args = ues_args )

Using env: 'dallas54-88vgt0b5i55ikpx7'.
Model is already available in the user environment.


<p style="font-size:16px;font-family:Arial">
This model should have already been installed in the environment. If it is, then you can skip to the next cell. If you've skipped around and it's not already been installed, the the Python kernel may show it is <b>Idle</b>. Please wait until you see that the installation status above has completed before you continue. This should only take 2 minutes depending on your network.<br>
<p style="font-size:16px;font-family:Arial">Because we are continuing to re-use the same environment, we do not need to install the libraries again. Executing the environment's <code>status</code> method will validate the installation of the libraries.</p>

In [18]:
# We can run the run the status method against the demo_env again to verify the status of the installed libraries
demo_env.status(str(lib_claim_id["Claim Id"].iloc[0]))

Unnamed: 0,Claim Id,File/Libs/Model,Method Name,Stage,Timestamp,Additional Details
0,6e9b2549-007b-45e7-9e2b-69280de7fd0e,"transformers, torch",install_lib,Started,2025-07-23T20:08:28Z,
1,6e9b2549-007b-45e7-9e2b-69280de7fd0e,"transformers, torch",install_lib,Finished,2025-07-23T20:18:50Z,


<p style="font-size:16px;font-family:Arial">
Now we can configure the TextAnalyticsAI object with the preferred large language model using the TeradataAI object. This will enable us to execute a variety of text analytics tasks. </p>

In [19]:
obj = TextAnalyticsAI(llm=llm)

File 'td_sample_inference_script.py' replaced successfully in the remote user environment 'dallas54-88vgt0b5i55ikpx7'.
File 'td_sample_embeddings_script.py' replaced successfully in the remote user environment 'dallas54-88vgt0b5i55ikpx7'.


In [20]:
# Default script is used
obj.recognize_entities(column='Call_Summary', data=call_summary_dataset, script="entity_recognition.py", returns = {"txt": VARCHAR(64000),
                                                  "ORG": VARCHAR(64000),
                                                  "PERSON": VARCHAR(64000),
                                                  "DATES": VARCHAR(64000),
                                                  "PRODUCT": VARCHAR(64000),
                                                  "GPE": VARCHAR(64000)}, delimiter="#")



File 'entity_recognition.py' replaced successfully in the remote user environment 'dallas54-88vgt0b5i55ikpx7'.




txt,ORG,PERSON,DATES,PRODUCT,GPE
Mortgage consultation with Isabella Chen on March 1st in Seattle. Compared preapproval options from Wells Fargo and Chase.,"Wells Fargo, Chase.",Isabella Chen,March 1st,,
Quarterly check-in with Noah Patel on April 20th in San Francisco. Portfolio rebalance completed shifted 5% from bonds to growth ETFs.,,Noah Patel,"Quarterly, April 20th",,San
Lucas Taylor has a financial planning session on March 29th in Portland. Primary focus: saving for childâ s education via 529 plan.,,Lucas Taylor,March 29th,,
Call with James Delgado on April 2nd in Chicago to discuss employer 401(k) rollover to a Fidelity IRA. Sent transfer paperwork via DocuSign.,Fidelity,James Delgado,April 2nd,,Chicago
Reviewed insurance coverage with Sophia Martinez on May 10th in Austin. Quoted State Farm term life policy and advised on umbrella coverage.,State Farm,Sophia Martinez,May 10th,,Austin.
Crypto education session with Mia Davis on Feb 1st in San Diego. Walked through setting up a Coinbase Pro account and cold wallet storage.,Coinbase,Mia Davis,Feb 1st,,San
Olivia Rodriguez promoted at Morgan Stanley in Chicago; discussed changes to her restricted stock unit (RSU) vesting schedule on Feb 22.,Morgan Stanley,Olivia Rodriguez,Feb 22.,,
Amelia Greene joined Mastercardâ s AI in Finance webinar hosted in Miami. Discussed implications for her fintech startupâ s funding plans.,,Amelia Greene,,,Miami.
Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.,Liberty Mutual.,Ava Johnson,July 12th,,Phoenix.
Harper Nguyen approved for SBA loan through JPMorgan in Houston. Reviewed repayment options and cash flow strategy.,"SBA, JPMorgan",Harper Nguyen,,,Houston.


<hr style="height:2px; border:none">

<b style="font-size:20px; font-family:Arial;">
  9. Call Hosted LLMs From In-Database (AWS Bedrock, Google Gemini, Azure AI Models)
</b>

<p style="font-size:16px; font-family:Arial; line-height:1.6;">
  Teradata lets you run text analysis using large language models (LLMs) hosted on cloud platforms such as <strong>AWS</strong>, <strong>Google</strong>, and <strong>Azure</strong>, while working directly with data stored in <strong>Vantage</strong>. 
</p>

<p style="font-size:16px; font-family:Arial; line-height:1.6;">
  These built-in functions support a wide range of NLP tasks on unstructured text:
</p>

<ul style="font-size:16px; font-family:Arial; margin-left:20px; line-height:1.8;">
  <li><code>AI_AnalyzeSentiment</code></li>
  <li><code>AI_AskLLM</code></li>
  <li><code>AI_DetectLanguage</code></li>
  <li><code>AI_MaskPII</code></li>
  <li><code>AI_RecognizeEntities</code></li>
  <li><code>AI_RecognizePIIEntities</code></li>
  <li><code>AI_TextClassifier</code></li>
  <li><code>AI_TextEmbeddings</code></li>
  <li><code>AI_TextSummarize</code></li>
</ul>

<p style="font-size:16px; font-family:Arial; line-height:1.6;">
  In the following example, we’ll demonstrate how to use <code>AI_RecognizeEntities</code> for in-database entity recognition with Amazon Bedrock's Anthropic LLM: "anthropic.claude-v2".
</p>


In [21]:
# Securely prompt for AWS credentials
# Please enter the region using this format: Country-Region-Number. For example, us-east-1
access_key = getpass.getpass("Enter AWS Access Key: ")
secret_key = getpass.getpass("Enter AWS Secret Key: ")
region = getpass.getpass("Region: ")

Enter AWS Access Key:  ····················
Enter AWS Secret Key:  ········································
Region:  ·········


In [22]:
# Define your query dynamically
query = f"""
SELECT * FROM AI_RecognizeEntities( 
  ON DEMO_EntityRecognition.Financial_CallCenter_Summary AS InputTable
  USING 
    TextColumn('Call_Summary')
    ApiType('aws')
    REGION('{region}')
    ACCESSKEY('{access_key}')
    SECRETKEY('{secret_key}')
    ModelName('anthropic.claude-v2')
    isDebug('true')
    Accumulate('[0:]')
) AS dt;
"""

<p style="font-size:16px;font-family:Arial">
While our open-source Hugging Face models provide a powerful and customizable foundation for entity recognition, it will require additional domain-specific fine-tuning to accurately capture specialized financial terms or product names. In contrast, <b>hosted LLMs</b> such as Anthropic’s Claude or other models available through <b>AWS Bedrock, Google Vertex AI, or Azure OpenAI</b> often benefit from broad training by the provider—resulting in deeper contextual understanding out of the box. In this example, we will observe that the AWS Bedrock's Anthropic model identified a richer set of financial products with no additional training.
</p>


In [23]:
query = DataFrame.from_query(query)
query



Call_Summary,Labeled_Entities,Message
Reviewed insurance coverage with Sophia Martinez on May 10th in Austin. Quoted State Farm term life policy and advised on umbrella coverage.,"(Sophia Martinez, people), (May 10th, date/time), (Austin, places), (State Farm, organizations)",
Mortgage consultation with Isabella Chen on March 1st in Seattle. Compared preapproval options from Wells Fargo and Chase.,"(Isabella Chen, people), (March 1st, date/time), (Seattle, places), (Wells Fargo, organizations), (Chase, organizations)",
"Ethan Wright completed a full financial plan update on June 30th in Denver. Updated risk tolerance, added crypto exposure via Coinbase.","(Ethan Wright, people), (June 30th, date/time), (Denver, places), (Coinbase, product)",
Olivia Rodriguez promoted at Morgan Stanley in Chicago; discussed changes to her restricted stock unit (RSU) vesting schedule on Feb 22.,"(Olivia Rodriguez, people), (Morgan Stanley, organizations), (Chicago, places), (Feb 22, date/time)",
Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.,"(Ava Johnson, people), (July 12th, date/time), (Phoenix, places), (Liberty Mutual, organizations)",
Benjamin Lin inquired about investing in a Nasdaq IPO opportunity on Sept 18th in New York City. Scheduled follow-up to review suitability.,"(Benjamin Lin, people), (Nasdaq, organizations), (Sept 18th, date/time), (New York City, places)",
Met with Emily Thompson on March 15th in New York to review her retirement portfolio. Recommended reallocating 10% into international equities through JPMorgan mutual funds.,"(Emily Thompson, people), (March 15th, date/time), (New York, places), (10%, percentages), (JPMorgan, organizations)",
Call with James Delgado on April 2nd in Chicago to discuss employer 401(k) rollover to a Fidelity IRA. Sent transfer paperwork via DocuSign.,"(James Delgado, people), (April 2nd, date/time), (Chicago, places), (Fidelity, organizations), (IRA, products), (DocuSign, products)",
"Liam O'Brien opened a SEP IRA on January 5th in Los Angeles and contributed $6,500. Discussed tax implications with his CPA.","(Liam O'Brien, people), (SEP IRA, product), (January 5th, date/time), (Los Angeles, places), ($6,500, currencies), (CPA, people)",
Reviewed insurance coverage with Sophia Martinez on May 10th in Austin. Quoted State Farm term life policy and advised on umbrella coverage.,"(Sophia Martinez, people), (May 10th, date/time), (Austin, places), (State Farm, organizations)",



<p style="font-size:16px;font-family:Arial">
 This is why <b>Teradata’s support for both open-source and hosted models</b> is great for developers and enteprises: developers need the flexibility to choose the right model for each use case based on the <b>business goals, data privacy requirements, cost considerations,</b> and <b>infrastructure preferences</b>. Whether you're performing domain-specific NLP with tightly controlled data using <b>BYO-LLM</b> or leveraging the latest generative AI via <b>fast-path cloud functions</b>, Teradata enables you to do both—<b>securely, efficiently, and at scale</b>. The real differentiator is not just the model, but the ability to <b>operationalize AI seamlessly where your data lives</b> and apply it to solve meaningful business problems with measurable impact.
</p>


<hr style='height:2px;border:none'>
<b style = 'font-size:20px;font-family:Arial'>10. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>Cleanup the OAF User Environment storage. If you will be executing other VantageCloud Lake OAF notebooks, you can skip this step.</p>

In [24]:
#Remove the existing user environment 
from IPython.display import display, HTML
try:
    result = remove_env(environment_name)
    print("Environment removed!")
except Exception as e:
    print("Could not remove the environment!")
    print("Error:", str(e))

User environment 'dallas54-88vgt0b5i55ikpx7' removed.
Environment removed!


<p style = 'font-size:16px;font-family:Arial'>Please delete your database connection.</p>

In [25]:
try:
    result = remove_context()
    print("Context removed!")
except Exception as e:
    print("Could not remove the Context!")
    print("Error:", str(e))

Context removed!


In [None]:
help(TeradataAI)

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>