<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Unstructured Text Analysis With BYO-LLM and NVIDIA GPU Acceleration
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>


<p style="font-size:20px;font-family:Arial;color:#00233C"><b>Introduction:</b></p>

<p style="font-size:16px;font-family:Arial;color:#00233C">
This notebook is designed for developers, data scientists, and AI practitioners who want to bring open-source Language Models (LMs) and Large Language Models (LLMs) closer to their data — quickly, securely, and at scale. As organizations race to deploy AI applications, developers face key challenges: selecting the right model, minimizing data movement, ensuring security, and controlling costs.  Teradata’s Bring Your Own LLM (BYO-LLM) capability addresses these challenges by allowing you to deploy open-source <b>Hugging Face</b> models directly inside VantageCloud — where your data already lives.
</p>

<p style="font-size:18px;font-family:Arial;color:#00233C"><b>What is BYO-LLM?</b></p>

<p style="font-size:16px;font-family:Arial;color:#00233C">
<b>BYO-LLM</b> (Bring Your Own Large Language Model) is one of the VantageCloud Open Analytic Framework (OAF) key capabilities you’ll use in this notebook which gives developers complete control over AI deployment in VantageCloud:
</p>

<ul style="font-size:16px;font-family:Arial;color:#00233C">
    <li>Seamlessly integrate open-source models from Hugging Face</li>
    <li>Eliminate the need to move data — reducing cost and compliance risk</li>
    <li>Leverage GPU acceleration for inference speeds up to 200x faster than CPU</li>
    <li>Experiment freely without vendor lock-in, while keeping your data secure and your operations scalable</li>
</ul>

<img src="BYOLLM_Flow.png" alt="Architecture for BYOLLM" style="width: 90%">
<p style="font-size:18px;font-family:Arial;color:#00233C"><b>Buisness Impact of Open Source Language Models (LMs) for NLP tasks such as Unstructured Text Analysis?</b></p>


<p style="font-size:16px;font-family:Arial;color:#00233C">
Language Models (LMs) are the foundation for solving Natural Language Processing (NLP) tasks such as unstructured text analysis — enabling machines to understand, interpret, and generate human language. Language Models enable businesses to:
</p>

<ul style="font-size:16px;font-family:Arial;color:#00233C">
    <li>Extract key information from documents, contracts, and research papers</li>
    <li>Analyze customer feedback from emails, reviews, and social media</li>
    <li>Power chatbots and virtual assistants for real-time support</li>
    <li>Personalize marketing and customer experiences based on user interactions</li>
</ul>

<p style="font-size:16px;font-family:Arial;color:#00233C">
Through unstructured text analysis and open source LMs, businesses can:
</p>
<ul style="font-size:16px;font-family:Arial;color:#00233C">
    <li>Improve operational efficiency through automation</li>
    <li>Gain real-time insights into customer sentiment and behavior</li>
    <li>Respond proactively to customer concerns</li>
    <li>Stay competitive by adapting to market and customer trends</li>
</ul>

<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>1. Configure the environment</b>

In [None]:
%%capture
!pip install -r requirements.txt --quiet

In [2]:
import teradataml
import getpass
import sys
import pandas as pd
from teradataml import *
from teradatasqlalchemy.types import *
from IPython.display import display as ipydisplay
from os.path import expanduser

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

<hr style="height:2px;border:none;background-color:#00233C;">
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>2. Connect to VantageCloud Lake</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Connect to VantageCloud using `create_context` from the teradataml Python library. Input your connection details, including the host, username, password and Analytic Compute Group name.</p>






In [None]:
# Read the database/user_name|password from the environment.
print("Creating the context...") 

load_dotenv("../.config/.env")
host = os.getenv("host")
username = os.getenv("username")
my_variable = os.getenv("my_variable")

eng = create_context(host=host, username=username, password=my_variable)
print("Connected to Teradata:", eng)

execute_sql('''SET query_band='Entity_Recognition_BYOLLM_VCL.ipynb;' UPDATE FOR SESSION;''')

<hr style="height:2px; border:none; background-color:#00233C;">

<p style="font-size:20px; font-family:Arial; color:#00233C;"><b>3. Create a Sample Dataset for In-Database NLP tasks</b></p>
<p style="font-size:16px; font-family:Arial; color:#00233C;">
Create a sample dataset to test NLP tasks with Hugging Face models. In this step, we create a table named <code>financial_entity_dataset</code> inside the default <code>DEMO_USER</code> database.
</p>
<p style="font-size:16px; font-family:Arial; color:#00233C;">
<b>💼 Use Case Summary:</b><br>
In the wealth management industry, financial advisors hold many client meetings each week — discussing portfolios, insurance, loans, and retirement planning. Manually summarizing and tagging these interactions for compliance, CRM updates, or follow-up actions is time-consuming and often inconsistent.
</p>
<p style="font-size:16px; font-family:Arial; color:#00233C;">
With Teradata’s BYO-LLM capability, we can deploy an Open Source Hugging Face model directly within VantageCloud — where the client interaction data already resides.  In this demo, we’ll perform Named Entity Recognition (NER) with <a href="https://huggingface.co/tner/roberta-large-ontonotes5" target="_blank">tner/roberta-large-ontonotes5</a> for Extracting Key Phrases such as:

<ul style="font-size:16px; font-family:Arial; color:#00233C;">
  <li>Client names</li>
  <li>Financial institutions</li>
  <li>Product types</li>
  <li>Key dates</li>
</ul> 

</p>
<p style="font-size:16px; font-family:Arial; color:#00233C;">
This helps financial firms:
<ul style="font-size:16px; font-family:Arial; color:#00233C;">
  <li>Automate meeting note tagging for faster documentation and regulatory compliance</li>
  <li>Enhance client profiling by identifying frequently discussed financial topics</li>
  <li>Streamline CRM updates by structuring insights from unstructured text</li>
</ul>
</p>
<p style="font-size:16px; font-family:Arial; color:#00233C;">
All of this is achieved securely — without moving data — and using open-source models, giving teams full control, scalability, and flexibility.
</p>


In [4]:
import pandas as pd

data = {
    'text': [
        "Met with Emily Thompson on March 15th in New York to review her retirement portfolio. Recommended reallocating 10% into international equities through JPMorgan mutual funds.",
        "Call with James Delgado on April 2nd in Chicago to discuss employer 401(k) rollover to a Fidelity IRA. Sent transfer paperwork via DocuSign.",
        "Liam O'Brien opened a SEP IRA on January 5th in Los Angeles and contributed $6,500. Discussed tax implications with his CPA.",
        "Reviewed insurance coverage with Sophia Martinez on May 10th in Austin. Quoted State Farm term life policy and advised on umbrella coverage.",
        "Quarterly check-in with Noah Patel on April 20th in San Francisco. Portfolio rebalance completed shifted 5% from bonds to growth ETFs.",
        "Mortgage consultation with Isabella Chen on March 1st in Seattle. Compared preapproval options from Wells Fargo and Chase.",
        "Ethan Wright completed a full financial plan update on June 30th in Denver. Updated risk tolerance, added crypto exposure via Coinbase.",
        "Olivia Rodriguez promoted at Morgan Stanley in Chicago; discussed changes to her restricted stock unit (RSU) vesting schedule on Feb 22.",
        "Attended BlackRock fixed income seminar with Mason Lee in Boston. Evaluated suitability of municipal bond ladder strategy.",
        "Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.",
        "Benjamin Lin inquired about investing in a Nasdaq IPO opportunity on Sept 18th in New York City. Scheduled follow-up to review suitability.",
        "Crypto education session with Mia Davis on Feb 1st in San Diego. Walked through setting up a Coinbase Pro account and cold wallet storage.",
        "Lucas Taylor has a financial planning session on March 29th in Portland. Primary focus: saving for child’s education via 529 plan.",
        "Harper Nguyen approved for SBA loan through JPMorgan in Houston. Reviewed repayment options and cash flow strategy.",
        "Elijah Brooks executed multiple trades via Robinhood in Las Vegas. Cautioned on concentration risk in tech sector.",
        "Amelia Greene joined Mastercard’s AI in Finance webinar hosted in Miami. Discussed implications for her fintech startup’s funding plans.",
        "Logan Pierce invested in a Vanguard ESG fund on Sept 3rd from San Jose. Reviewed carbon exposure and impact metrics.",
        "Charlotte Bennett updated her AmEx Platinum card in Atlanta for travel benefits. Revisited points strategy for upcoming Europe trip.",
        "Alexander Moore activated a Capital One Venture card in Dallas. Discussed how to optimize miles redemption for business expenses.",
        "Ella Fischer accepted new role at Deutsche Bank. Discussed impact on her deferred compensation plan and relocation to London.",
        "Daniel Reed met at the Chase Manhattan branch in Manhattan. Discussed high-yield savings options and CD laddering.",
        "Scarlett Thompson received $1,200 via PayPal for freelance work in Sacramento. Recommended SEP IRA contribution before tax deadline.",
        "Matthew Hayes enrolled in a Roth IRA through Edward Jones in Cincinnati. Discussed converting his existing traditional IRA over two tax years.",
        "Zoe Alvarez spoke at the Goldman Sachs Women in Finance forum in Washington, D.C. Reviewed her LinkedIn strategy and networking goals.",
        "Levi Bennett transferred funds from HSBC to Barclays on October 10th in London. Set up UK pension contribution plan.",
        "Grace Kim paid April rent using Zelle through Bank of America in Philadelphia. Flagged issue with overdraft protection; escalated to bank.",
        "Henry Clark refinanced with Quicken Loans on June 17th in Minneapolis. Reviewed impact on mortgage interest deduction.",
        "Luna Sinclair began internship at Citadel in Chicago. Discussed opening a Roth IRA with her stipend income.",
        "Wyatt Green reviewed Betterment portfolio in Salt Lake City. Suggested tax-loss harvesting opportunities for Q2.",
        "Violet Harper requested comprehensive financial plan in Orlando. Scheduled data gathering session; using Raymond James planning portal."
    ]
}


# Create DataFrame
df = pd.DataFrame(data)

# Upload to Vantage 
copy_to_sql(df=df, table_name="financial_entity_dataset", if_exists="replace")

# Create a teradataml DataFrame
from teradataml import DataFrame
financial_entity_dataset = DataFrame("financial_entity_dataset")

financial_entity_dataset

  

text
"Liam O'Brien opened a SEP IRA on January 5th in Los Angeles and contributed $6,500. Discussed tax implications with his CPA."
Quarterly check-in with Noah Patel on April 20th in San Francisco. Portfolio rebalance completed shifted 5% from bonds to growth ETFs.
Mortgage consultation with Isabella Chen on March 1st in Seattle. Compared preapproval options from Wells Fargo and Chase.
"Ethan Wright completed a full financial plan update on June 30th in Denver. Updated risk tolerance, added crypto exposure via Coinbase."
Attended BlackRock fixed income seminar with Mason Lee in Boston. Evaluated suitability of municipal bond ladder strategy.
Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.
Olivia Rodriguez promoted at Morgan Stanley in Chicago; discussed changes to her restricted stock unit (RSU) vesting schedule on Feb 22.
Reviewed insurance coverage with Sophia Martinez on May 10th in Austin. Quoted State Farm term life policy and advised on umbrella coverage.
Call with James Delgado on April 2nd in Chicago to discuss employer 401(k) rollover to a Fidelity IRA. Sent transfer paperwork via DocuSign.
Met with Emily Thompson on March 15th in New York to review her retirement portfolio. Recommended reallocating 10% into international equities through JPMorgan mutual funds.


<hr style="height:2px;border:none;background-color:#00233C;">
<p style="font-size:20px;font-family:Arial;color:#00233C"><b>4. Authenticate into User Environment Service (UES)</b></p>
<p style="font-size:16px;font-family:Arial;color:#00233C">
UES authentication is required to manage Python or R environments in Teradata via APIs.
</p>
<p style="font-size:16px;font-family:Arial;color:#00233C">
You can authenticate using either:
<ul style="font-size:16px;font-family:Arial;color:#00233C; margin-top:4px;">
  <li><b>OAuth via<a href="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Teradata-Package-for-Python-on-VantageCloud-Lake/Working-with-Open-Analytics/APIs-to-Use-with-Open-Analytics-Framework/API-to-Set-Authentication-Token/set_auth_token" target="_blank" style="color:#005B94;text-decoration:none;">
OAuth via <code>set_auth_token()</code></a></b>this will open a session in your default browser for authentication (default in this demo)</li>
  <li><b>Personal Access Token (PAT)</b> and <b>PEM key file</b>
    <li> For VantageCloud Lake systems, follow 
<a href="https://medium.com/teradata/deploy-hugging-face-llms-on-teradata-vantagecloud-lake-with-nvidia-gpu-acceleration-d94d999edaa5" target="_blank" style="color:#005B94;text-decoration:none;">Step 4 of this tutorial</a> to locate your UES URL, PAT, and PEM file.</li>
</ul>
<p style="font-size:16px;font-family:Arial;color:#00233C">
For quick demonstration purposes we use OAuth and pass in the open analytics endpoint to our enviorment</p>

In [None]:
#set_auth_token(ues_url=getpass.getpass("ues_url:"))

In [None]:
# uncomment to use personal access token and key 
load_dotenv("../.config/.env")
open_analytics_endpoint = os.getenv("ues_uri")
access_token = os.getenv("access_token")
pem_file = os.getenv("pem_file")


In [None]:
## uncomment to use personal access token and key 
configure.ues_url = open_analytics_endpoint
if set_auth_token(ues_url=open_analytics_endpoint, username=username, pat_token=access_token, pem_file=pem_file):
    print("UES Authentication successful")
else:
    print("UES Authentication failed. Check credentials.")
    sys.exit(1)

<hr style="height:2px; border:none; background-color:#00233C;">
<p style="font-size:20px; font-family:Arial; color:#00233C;"><b>5. Set Up the User Environment in Teradata VantageCloud Lake for Model Deployment</b></p>
<p style="font-size:16px; font-family:Arial; color:#00233C;">
Now that <b>UES authentication</b> is complete, we can begin managing user environments using Teradata’s API capabilities.
Start by listing the available user environments to see if an existing one can be reused. To select and work within an existing environment, use the <code>get_env</code> API method.
</p>

In [8]:
env_list = list_user_envs()
print("Available Environments:")
ipydisplay(env_list)

Available Environments:


Unnamed: 0,env_name,env_description,base_env_name,language,conda
0,HuggingFace_Webinar_CSAE,BYOLLM demo env,python_3.11,Python,False
1,td_gen_ai_env,This env 'td_gen_ai_env' is created with base ...,python_3.11,Python,False
2,teradatagenai_automation,BYOLLM demo env,python_3.11,Python,False


In [6]:
demo_env = get_env(input("Env Name:"))

Env Name: HuggingFace_Webinar_CSAE


<p style="font-size:16px;font-family:Arial;color:#00233C">
To create a new user environment, start by retrieving an available base python version.<br>
Use the <a href ="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Teradata-Package-for-Python-on-VantageCloud-Lake/Working-with-Open-Analytics/APIs-to-Use-with-Open-Analytics-Framework/APIs-to-Manage-User-Environments/create_env"<code>create_env</code></a> API to assign a name, specify the version, and add a description for your new environment.
</p>


In [23]:
# Listing the available base environments 
print(list_base_envs())

     base_name language  version
0   python_3.9   Python   3.9.20
1  python_3.10   Python  3.10.15
2  python_3.11   Python  3.11.10
3        r_4.3        R    4.3.3
4        r_4.4        R    4.4.2


In [24]:
## Create a new python user environment with a base enviorment available
demo_env = create_env(env_name = "HuggingFace_Webinar_CSAE",
                      base_env = 'python_3.10',
                      desc = 'BYOLLM demo env')

User environment 'HuggingFace_Webinar_CSAE' created.


<p style="font-size:16px;font-family:Arial;color:#00233C">
Install required libraries into the user environment. For most Hugging Face these will be transformers and torch.
</p> 

In [None]:
demo_env.install_lib(["transformers", "torch"])
print("All libs installed") 

In [7]:
demo_env.libs


Unnamed: 0,name,version
0,certifi,2025.4.26
1,charset-normalizer,3.4.2
2,filelock,3.18.0
3,fsspec,2025.3.2
4,huggingface-hub,0.30.2
5,idna,3.10
6,Jinja2,3.1.6
7,MarkupSafe,3.0.2
8,mpmath,1.3.0
9,networkx,3.4.2


<hr style="height:2px;border:none;background-color:#00233C;">
<p style="font-size:20px;font-family:Arial;color:#00233C"><b>6. Download the Hugging Face Language Model and upload to UES environment</b></p>
<p style="font-size:16px;font-family:Arial;color:#00233C">
You can download Hugging Face LLMs in either <a href="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Build-Scalable-Analytics-with-Open-Analytics-Framework/Bring-Your-Own-LLM-and-DL-Workloads/Using-Hugging-Face-LLMs"> native format or streamlined format</a>. Here we use the streamlined format to download the model which will download it to our local system directory.

</p>



In [8]:
# Load model directly #save streamlined format under local path
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("tner/roberta-large-ontonotes5")
model = AutoModelForTokenClassification.from_pretrained("tner/roberta-large-ontonotes5")

tokenizer.save_pretrained("./roberta-large-ontonotes5")
model.save_pretrained("./roberta-large-ontonotes5")


In [9]:
#Compress the model into zip file
# For the use cases when the model folder is in another directory than the current directory, root_dir should be the abosulte path.
import shutil
shutil.make_archive('roberta-large-ontonotes5', format='zip', root_dir='roberta-large-ontonotes5')


'/home/jovyan/JupyterLabRoot/UseCases/Entity_Recognition_BYOLLM/roberta-large-ontonotes5.zip'

<p style="font-size:16px;font-family:Arial;color:#00233C">
After the LLM directories are compressed into zip files, you can use the <a href = "https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Build-Scalable-Analytics-with-Open-Analytics-Framework/Bring-Your-Own-LLM-and-DL-Workloads/Using-User-Environment-APIs-to-Manage-LLMs">UES APIs</a> to install, uninstall, and list LLMs. Here we use the install_model API.</p>



In [10]:
demo_env.install_model(model_path='roberta-large-ontonotes5.zip', asynchronous=True)


Model installation is initiated. Check the status using status() with the claim id 98bb25a9-d562-4e05-8e91-65a1b828cad0.


'98bb25a9-d562-4e05-8e91-65a1b828cad0'

In [11]:
ipydisplay(demo_env.models)


Unnamed: 0,Model,Size,Timestamp
0,roberta-large-ontonotes5,6144,2025-05-02T16:08:07Z


<hr style="height:2px;border:none;background-color:#00233C;">
<p style="font-size:20px;font-family:Arial;color:#00233C"><b>7. Create a python script and call the Apply Operator</b></p>

<p style="font-size:16px;font-family:Arial;color:#00233C">
Create a python script that reads text records from VantageCloud using standard input, applies a Hugging Face language model to perform Named Entity Recognition (NER), and outputs the extracted entities in a structured, delimited format using standard output.
</p>

In [12]:
%%writefile entity_recognition.py

#!/usr/bin/env python3
import sys
import warnings

warnings.simplefilter('ignore')
input_str = sys.stdin.read()

DELIMITER = '#'

if len(input_str) > 0:
    from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
    torch_device = 'cuda'
    model_path = "./models/roberta-large-ontonotes5"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForTokenClassification.from_pretrained(model_path)
    translator = pipeline("token-classification", model=model, tokenizer=tokenizer, device=torch_device, aggregation_strategy='max')

    for line in input_str.splitlines():
        results = translator(line)
        dict_val = {}

        for r in results:
            entity = r['entity_group']
            word = r['word']
            dict_val.setdefault(entity, []).append(word)

        combined_str = ""
        for key in ["ORG", "PERSON", "DATE", "PRODUCT", "GPE"]:
            combined_str += f"{DELIMITER}{','.join(dict_val.get(key, []))}"

        print(f"{line}{combined_str}")


Overwriting entity_recognition.py


<p style="font-size:16px;font-family:Arial;color:#00233C">
Install the python script file to the environment using install_file().
</p>

In [13]:
demo_env.install_file(file_path ="entity_recognition.py", replace=True)


File 'entity_recognition.py' replaced successfully in the remote user environment 'HuggingFace_Webinar_CSAE'.


True

<p style="font-size:16px;font-family:Arial;color:#00233C">
Set the session to the GPU Analytic compute group you desire otherwise it will set to default. The cluster needs to be running to execute the APPLY operator.</p>

In [14]:
execute_sql(f"SET SESSION COMPUTE GROUP {compute_group};")
print(f"Compute group set to {compute_group}")

Compute group set to CG_BusGrpA_GPU


<p style="font-size:16px;font-family:Arial;color:#00233C">
The APPLY operator executes your Python script directly within the user environment, enabling in-database processing at scale. It reads text from each row of the dataset, performs Named Entity Recognition (NER) using Hugging Face language models accelerated by NVIDIA GPUs, and outputs the extracted entities into a structured dataframe in just seconds. </p>  

<p style="font-size:16px;font-family:Arial;color:#00233C"> In this example, we use <code>tner/roberta-large-ontonotes5</code>, a general-purpose NER model trained on the OntoNotes 5 dataset. It supports entities like ORG, PERSON, DATE, PRODUCT, and GPE. However, it's important to note that the “PRODUCT” entity in OntoNotes refers primarily to physical products (e.g., iPhone, Windows OS), not financial instruments (e.g., Roth IRA, mutual funds). To improve financial domain accuracy, this model can be further fine-tuned on domain-specific data to better recognize investment products, insurance types, and retirement accounts.
 </p> 

In [16]:
apply_obj = Apply(data = financial_entity_dataset,
                  apply_command = 'python entity_recognition.py',
                  returns = {"text": VARCHAR(64000), "ORG": VARCHAR(64000), "PERSON": VARCHAR(64000), "DATES": VARCHAR(64000), "PRODUCT": VARCHAR(64000), "GPE": VARCHAR(64000)},
                  env_name = demo_env,
                  delimiter = '#',
                  quotechar = '|'
                 )

import time

start = time.time()

# Execute the Python script inside the remote user environment.
df = apply_obj.execute_script()

print(f'Time: {time.time() - start}')

df


Time: 15.749340295791626


text,ORG,PERSON,DATES,PRODUCT,GPE
"Liam O'Brien opened a SEP IRA on January 5th in Los Angeles and contributed $6,500. Discussed tax implications with his CPA.",,Liam O'Brien,January 5th,,Los Angeles
Quarterly check-in with Noah Patel on April 20th in San Francisco. Portfolio rebalance completed shifted 5% from bonds to growth ETFs.,,Noah Patel,"Quarterly, April 20th",,San
Mortgage consultation with Isabella Chen on March 1st in Seattle. Compared preapproval options from Wells Fargo and Chase.,"Wells Fargo, Chase.",Isabella Chen,March 1st,,
"Ethan Wright completed a full financial plan update on June 30th in Denver. Updated risk tolerance, added crypto exposure via Coinbase.",Coinbase.,Ethan Wright,June 30th,,
Attended BlackRock fixed income seminar with Mason Lee in Boston. Evaluated suitability of municipal bond ladder strategy.,BlackRock,Mason Lee,,,Boston.
Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.,Liberty Mutual.,Ava Johnson,July 12th,,Phoenix.
Olivia Rodriguez promoted at Morgan Stanley in Chicago; discussed changes to her restricted stock unit (RSU) vesting schedule on Feb 22.,Morgan Stanley,Olivia Rodriguez,Feb 22.,,
Reviewed insurance coverage with Sophia Martinez on May 10th in Austin. Quoted State Farm term life policy and advised on umbrella coverage.,State Farm,Sophia Martinez,May 10th,,Austin.
Call with James Delgado on April 2nd in Chicago to discuss employer 401(k) rollover to a Fidelity IRA. Sent transfer paperwork via DocuSign.,Fidelity,James Delgado,April 2nd,,Chicago
Met with Emily Thompson on March 15th in New York to review her retirement portfolio. Recommended reallocating 10% into international equities through JPMorgan mutual funds.,JPMorgan,Emily Thompson,March 15th,,New York


<p style="font-size:16px;font-family:Arial;color:#00233C">
  You can explore additional Natural Language Processing (NLP) tasks directly in-database using task-specific models:
</p>

<ul style="font-size:15px;font-family:Arial">
  <li><strong>Text Classification</strong> — <a href="https://huggingface.co/facebook/bart-large-mnli" target="_blank">facebook/bart-large-mnli</a>: Classifies text into predefined categories</li>
  <li><strong>Language Detection</strong> — <a href="https://huggingface.co/papluca/xlm-roberta-base-language-detection" target="_blank">papluca/xlm-roberta-base-language-detection</a>: Detects the language</li>
  <li><strong>Generating Embeddings</strong> — <a href="https://huggingface.co/sentence-transformers/all-mpnet-base-v2" target="_blank">sentence-transformers/all-mpnet-base-v2</a>: Converts text into vector representations for similarity searches</li>
  <li><strong>Named Entity Recognition</strong> — <a href="https://huggingface.co/tner/roberta-large-ontonotes5" target="_blank">tner/roberta-large-ontonotes5</a>: Identifies and categorizes named entities within unstructured text</li>
  <li><strong>Extracting Key Phrases</strong> — <a href="https://huggingface.co/ml6team/keyphrase-extraction-kbir-kpcrowd" target="_blank">ml6team/keyphrase-extraction-kbir-kpcrowd</a>: Extracts important phrases from a document to summarize its content</li>
  <li><strong>Grammar Correction</strong> — <a href="https://huggingface.co/pszemraj/flan-t5-large-grammar-synthesis" target="_blank">pszemraj/flan-t5-large-grammar-synthesis</a>: Automatically corrects grammatical errors in text</li>
  <li><strong>Masking PII Entities</strong> — <a href="https://huggingface.co/ab-ai/pii_model" target="_blank">ab-ai/pii_model</a>: Masks personally identifiable information (PII)</li>
  <li><strong>Sentiment Analysis</strong> — <a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english" target="_blank">distilbert-base-uncased-finetuned-sst-2-english</a>: Determines the emotional tone (positive, negative, neutral)</li>
  <li><strong>Sentence Similarity</strong> — <a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2" target="_blank">sentence-transformers/all-MiniLM-L6-v2</a>: Measures semantic similarity between sentences</li>
  <li><strong>Summarization</strong> — <a href="https://huggingface.co/facebook/bart-large-cnn" target="_blank">facebook/bart-large-cnn</a>: Generates concise summaries of longer documents</li>
  <li><strong>Translation</strong> — <a href="https://huggingface.co/Helsinki-NLP/opus-mt-en-fr" target="_blank">Helsinki-NLP/opus-mt-en-fr</a>: Translates English text to French</li>
</ul>



<hr style='height:2px;border:none;background-color:#00233C;'>
<b style="font-size:20px;font-family:Arial;color:#00233C">8. Automate Entity Recognition with <code>teradatagenai</code> Python Package</b>
<p style="font-size:16px;font-family:Arial;color:#00233C">
  To simplify and accelerate setup, you can now use the <code>teradatagenai</code> package to automate model deployment and inference. This Python library enables seamless, in-database, AI-driven text analytics within Teradata VantageCloud. It offers a variety of user-friendly functions that wrap API calls for common text analysis tasks, making it easy to apply LLMs to proprietary unstructured data.
    
</p>

<p style="font-size:16px;font-family:Arial;color:#00233C">
  Built on Teradata VantageCloud’s open and connected architecture, this solution uses BYOLLM capability and enables teams to rapidly develope generative AI use cases that enhance customer experiences, improve employee productivity, and streamline operations—all within a secure, scalable environment.
</p>

<p style="font-size:16px;font-family:Arial;color:#00233C"> TextAnalyticsAI gives us access to the 11+ genAI functions and enables users to use advanced text analytics capabilities seamlessly on data stored in Vantage.
<ul style="font-size:16px; font-family:Arial; color:#00233C; margin-left:20px; line-height:1.8;">
  <li><code>classify()</code> – Classify text into predefined categories</li>
  <li><code>analyze_sentiment()</code> – Perform sentiment analysis</li>
  <li><code>detect_language()</code> – Detect the language of a text</li>
  <li><code>embeddings()</code> – Generate embeddings for similarity search</li>
  <li><code>recognize_entities()</code> – Extract named entities</li>
   <li><code>recognize_pii_entities()</code> – Detect and label PII entities</li>
  <li><code>extract_key_phrases()</code> – Identify key phrases in text</li>
  <li><code>mask_pii()</code> – Mask personally identifiable information (PII)</li>
  <li><code>sentence_similarity()</code> – Measure semantic similarity between sentences</li>
  <li><code>summarize()</code> – Generate summaries of longer documents</li>
  <li><code>translate()</code> – Translate text between languages</li>
</ul>

</p>


In [None]:
#install the teradatagenai package
!pip install teradatagenai

In [11]:
#import libraries and modules
import teradatagenai
from teradatagenai import TeradataAI, TextAnalyticsAI

<p style="font-size:16px;font-family:Arial;color:#00233C">  
TeradataAI handles the download and installation of the Hugging Face model (example: <i>'tner/roberta-large-ontonotes5</i>) in the user's environment. Incase the environment is not specified, a sample environment named <i>'td_gen_ai_env'</i> is created with <code>torch</code> and <code>transformer</code> libraries and their dependencies. The TeradataAI class will manage the entire setup process. In the background, this process utilizes Teradata’s Bring Your Own Large Language Model (BYO LLM) offering.</p>
<p style="font-size:16px;font-family:Arial;color:#00233C">    For this example we set a new user environment to showcase the process to install the same <code>tner/roberta-large-ontonotes5</code> with the <code>teradatagenai</code> package. </p>

In [None]:
#Define your environment
create_env(env_name = "teradatagenai_automation",
                      base_env = 'python_3.11',
                      desc = 'BYOLLM demo env')

In [13]:
#Define your model and initialize the TeradataAI Class
model_name = 'tner/roberta-large-ontonotes5'
model_args = {'transformer_class': 'AutoModelForTokenClassification',
              'task' : 'token-classification'}
ues_args = {'env_name': 'teradatagenai_automation'}

llm = TeradataAI(api_type = "hugging_face",
                 model_name = model_name,
                 model_args = model_args,
                 ues_args = ues_args )

Using env: 'teradatagenai_automation'.
Model is already available in the user environment.
File 'td_sample_inference_script.py' replaced successfully in the remote user environment 'teradatagenai_automation'.
File 'td_sample_embeddings_script.py' replaced successfully in the remote user environment 'teradatagenai_automation'.



<p style="font-size:16px;font-family:Arial;color:#00233C">  
We will configure the TextAnalyticsAI object with the preferred large language model using the TeradataAI object. This will enable us to execute a variety of text analytics tasks. </p>


In [None]:
obj = TextAnalyticsAI(llm=llm)

In [9]:
demo_env2 = get_env(input("Env Name:"))

Env Name: teradatagenai_automation


In [20]:
demo_env2.install_lib(["transformers", "torch"])
print("All libs installed") 

Env Name: teradatagenai_automation


All libs installed


In [16]:
# Default script is used
obj.recognize_entities(column='text', data=financial_entity_dataset, script="entity_recognition.py", returns = {"text": VARCHAR(64000),
                                                  "ORG": VARCHAR(64000),
                                                  "PERSON": VARCHAR(64000),
                                                  "DATES": VARCHAR(64000),
                                                  "PRODUCT": VARCHAR(64000),
                                                  "GPE": VARCHAR(64000)}, delimiter="#")



File 'entity_recognition.py' replaced successfully in the remote user environment 'teradatagenai_automation'.


text,ORG,PERSON,DATES,PRODUCT,GPE
"Liam O'Brien opened a SEP IRA on January 5th in Los Angeles and contributed $6,500. Discussed tax implications with his CPA.",,Liam O'Brien,January 5th,,Los Angeles
Quarterly check-in with Noah Patel on April 20th in San Francisco. Portfolio rebalance completed shifted 5% from bonds to growth ETFs.,,Noah Patel,"Quarterly, April 20th",,San
Mortgage consultation with Isabella Chen on March 1st in Seattle. Compared preapproval options from Wells Fargo and Chase.,"Wells Fargo, Chase.",Isabella Chen,March 1st,,
"Ethan Wright completed a full financial plan update on June 30th in Denver. Updated risk tolerance, added crypto exposure via Coinbase.",Coinbase.,Ethan Wright,June 30th,,
Attended BlackRock fixed income seminar with Mason Lee in Boston. Evaluated suitability of municipal bond ladder strategy.,BlackRock,Mason Lee,,,Boston.
Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.,Liberty Mutual.,Ava Johnson,July 12th,,Phoenix.
Olivia Rodriguez promoted at Morgan Stanley in Chicago; discussed changes to her restricted stock unit (RSU) vesting schedule on Feb 22.,Morgan Stanley,Olivia Rodriguez,Feb 22.,,
Reviewed insurance coverage with Sophia Martinez on May 10th in Austin. Quoted State Farm term life policy and advised on umbrella coverage.,State Farm,Sophia Martinez,May 10th,,Austin.
Call with James Delgado on April 2nd in Chicago to discuss employer 401(k) rollover to a Fidelity IRA. Sent transfer paperwork via DocuSign.,Fidelity,James Delgado,April 2nd,,Chicago
Met with Emily Thompson on March 15th in New York to review her retirement portfolio. Recommended reallocating 10% into international equities through JPMorgan mutual funds.,JPMorgan,Emily Thompson,March 15th,,New York


<hr style="height:2px; border:none; background-color:#00233C;">

<b style="font-size:20px; font-family:Arial; color:#00233C;">
  9. Call Hosted LLMs From In-Database (AWS Bedrock, Google Gemini, Azure AI Models)
</b>

<p style="font-size:16px; font-family:Arial; color:#00233C; line-height:1.6;">
  Teradata lets you run text analysis using large language models (LLMs) hosted on cloud platforms such as <strong>AWS</strong>, <strong>Google</strong>, and <strong>Azure</strong>, while working directly with data stored in <strong>Vantage</strong>. This expansion introduces a database-side approach using fast path functions, contrasting the client-side approach.
</p>

<p style="font-size:16px; font-family:Arial; color:#00233C; line-height:1.6;">
  These built-in functions support a wide range of NLP tasks on unstructured text:
</p>

<ul style="font-size:16px; font-family:Arial; color:#00233C; margin-left:20px; line-height:1.8;">
  <li><code>AI_AnalyzeSentiment</code></li>
  <li><code>AI_AskLLM</code></li>
  <li><code>AI_DetectLanguage</code></li>
  <li><code>AI_MaskPII</code></li>
  <li><code>AI_RecognizeEntities</code></li>
  <li><code>AI_RecognizePIIEntities</code></li>
  <li><code>AI_TextClassifier</code></li>
  <li><code>AI_TextEmbeddings</code></li>
  <li><code>AI_TextSummarize</code></li>
</ul>

<p style="font-size:16px; font-family:Arial; color:#00233C; line-height:1.6;">
  In the following example, we’ll demonstrate how to use <code>AI_RecognizeEntities</code> for in-database entity recognition with a hosted LLM.
</p>


In [None]:
# Securely prompt for AWS credentials
access_key = getpass.getpass("Enter AWS Access Key: ")
secret_key = getpass.getpass("Enter AWS Secret Key: ")
session_token = getpass.getpass("Enter AWS Session Token: ")

# Define your query dynamically
query = f"""
SELECT * FROM AI_RecognizeEntities( 
  ON financial_entity_dataset AS InputTable
  USING 
    TextColumn('text')
    ApiType('aws')
    REGION('us-west-2')
    ACCESSKEY('{access_key}')
    SECRETKEY('{secret_key}')
    SESSIONKEY('{session_token}')
    ModelName('anthropic.claude-v2')
    isDebug('true')
    Accumulate('[0:]')
) AS dt;
"""

In [50]:
query = DataFrame.from_query(query)
query

text,Labeled_Entities,Message
"Liam O'Brien opened a SEP IRA on January 5th in Los Angeles and contributed $6,500. Discussed tax implications with his CPA.","(Liam O'Brien, people), (SEP IRA, product), (January 5th, date/time), (Los Angeles, places), ($6,500, currencies)",
Quarterly check-in with Noah Patel on April 20th in San Francisco. Portfolio rebalance completed shifted 5% from bonds to growth ETFs.,"(Noah Patel, people), (April 20th, date/time), (San Francisco, places), (5%, percentages), (bonds, products), (growth ETFs, products)",
Mortgage consultation with Isabella Chen on March 1st in Seattle. Compared preapproval options from Wells Fargo and Chase.,"(Isabella Chen, people), (March 1st, date/time), (Seattle, places), (Wells Fargo, organizations), (Chase, organizations)",
"Ethan Wright completed a full financial plan update on June 30th in Denver. Updated risk tolerance, added crypto exposure via Coinbase.","(Ethan Wright, people), (June 30th, date/time), (Denver, places), (Coinbase, organizations)",
Attended BlackRock fixed income seminar with Mason Lee in Boston. Evaluated suitability of municipal bond ladder strategy.,"(BlackRock, organizations), (Mason Lee, people), (Boston, places)",
Ava Johnson requested auto insurance review on July 12th in Phoenix. Referred her to our partner agent at Liberty Mutual.,"(Ava Johnson, people), (July 12th, date/time), (Phoenix, places), (Liberty Mutual, organizations)",
Olivia Rodriguez promoted at Morgan Stanley in Chicago; discussed changes to her restricted stock unit (RSU) vesting schedule on Feb 22.,"(Olivia Rodriguez, people), (Morgan Stanley, organizations), (Chicago, places), (Feb 22, date/time)",
Reviewed insurance coverage with Sophia Martinez on May 10th in Austin. Quoted State Farm term life policy and advised on umbrella coverage.,"(Sophia Martinez, people), (May 10th, date/time), (Austin, places), (State Farm, organizations)",
Call with James Delgado on April 2nd in Chicago to discuss employer 401(k) rollover to a Fidelity IRA. Sent transfer paperwork via DocuSign.,"(James Delgado, people), (April 2nd, date/time), (Chicago, places), (Fidelity, organizations), (IRA, products)",
Met with Emily Thompson on March 15th in New York to review her retirement portfolio. Recommended reallocating 10% into international equities through JPMorgan mutual funds.,"(Emily Thompson, people), (March 15th, date/time), (New York, places), (10%, percentages), (JPMorgan, organizations)",


<hr style='height:2px;border:none;background-color:#00233C;'>
<b style = 'font-size:20px;font-family:Arial;color:#00233c'>9. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Cleanup work tables to prevent errors next time.</p>

In [None]:
execute_sql("""Drop table retail_marketing""")

In [None]:
remove_context() 

In [None]:
help(TeradataAI)