<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Unstructured Text Analysis With BYO-LLM and NVIDIA GPU Acceleration
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>


<p style="font-size:20px;font-family:Arial"><b>Introduction:</b></p>

<p style="font-size:16px;font-family:Arial">
This notebook is designed for developers, data scientists, and AI practitioners who want to bring open-source Language Models (LMs) and Large Language Models (LLMs) closer to their data — quickly, securely, and at scale. As organizations race to deploy AI applications, developers face key challenges: selecting the right model, minimizing data movement, ensuring security, and controlling costs.  Teradata’s Bring Your Own LLM (BYO-LLM) capability addresses these challenges by allowing you to deploy open-source <b>Hugging Face</b> models directly inside VantageCloud — where your data already lives.
</p>

<p style="font-size:18px;font-family:Arial"><b>What is BYO-LLM?</b></p>

<p style="font-size:16px;font-family:Arial">
<b>BYO-LLM</b> (Bring Your Own Large Language Model) is one of the VantageCloud Open Analytic Framework (OAF) key capabilities you’ll use in this notebook which gives developers complete control over AI deployment in VantageCloud:
</p>

<ul style="font-size:16px;font-family:Arial">
    <li>Seamlessly integrate open-source models from Hugging Face</li>
    <li>Eliminate the need to move data — reducing cost and compliance risk</li>
    <li>Leverage GPU acceleration for inference speeds up to 200x faster than CPU</li>
    <li>Experiment freely without vendor lock-in, while keeping your data secure and your operations scalable</li>
</ul>

<img src="BYOLLM_Flow.png" alt="Architecture for BYOLLM" style="width: 90%">
<p style="font-size:18px;font-family:Arial;"><b>Business Impact of Open Source Language Models (LMs) for NLP tasks such as Unstructured Text Analysis?</b></p>


<p style="font-size:16px;font-family:Arial">
Language Models (LMs) are the foundation for solving Natural Language Processing (NLP) tasks such as unstructured text analysis — enabling machines to understand, interpret, and generate human language. Language Models enable businesses to:
</p>

<ul style="font-size:16px;font-family:Arial">
    <li>Extract key information from documents, contracts, and research papers</li>
    <li>Analyze customer feedback from emails, reviews, and social media</li>
    <li>Power chatbots and virtual assistants for real-time support</li>
    <li>Personalize marketing and customer experiences based on user interactions</li>
</ul>

<p style="font-size:16px;font-family:Arial">
Through unstructured text analysis and open source LMs, businesses can:
</p>
<ul style="font-size:16px;font-family:Arial">
    <li>Improve operational efficiency through automation</li>
    <li>Gain real-time insights into customer sentiment and behavior</li>
    <li>Respond proactively to customer concerns</li>
    <li>Stay competitive by adapting to market and customer trends</li>
</ul>

<hr style='height:2px;border:none'>
<b style = 'font-size:20px;font-family:Arial'>1. Configure the environment</b>

In [None]:
%%capture
!pip install -r requirements.txt --quiet

In [None]:
import teradataml
import getpass
import sys
import pandas as pd
import os

from dotenv import load_dotenv
from teradataml import *
from teradatasqlalchemy.types import *
from IPython.display import display as ipydisplay
from os.path import expanduser

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>2. Connect to VantageCloud Lake</b></p>
<p style = 'font-size:16px;font-family:Arial'>Connect to VantageCloud using `create_context` from the teradataml Python library. Input your connection details, including the host, username, password and Analytic Compute Group name.</p>

In [None]:
# Read the database/user_name|password from the environment.
print("Creating the context...") 

load_dotenv("../.config/.env", override=True)
host = os.getenv("host")
username = os.getenv("username")
my_variable = os.getenv("my_variable")

eng = create_context(host=host, username=username, password=my_variable)
print("Connected to Teradata:", eng)

execute_sql('''SET query_band='DEMO=Entity_Recognition_BYOLLM_VCL.ipynb;' UPDATE FOR SESSION;''')

<hr style="height:2px; border:none">

<p style="font-size:20px; font-family:Arial"><b>3. Getting Data for  for In-Database NLP tasks</b></p>
<p style="font-size:16px; font-family:Arial">
We have provided data for this demo on an OFS table <code>financial_entity_dataset</code> inside the default <code>DEMO_EntityRecognition</code> database.
</p>
<p style="font-size:16px; font-family:Arial">
<b>💼 Use Case Summary:</b><br>
In the wealth management industry, financial advisors hold many client meetings each week — discussing portfolios, insurance, loans, and retirement planning. Manually summarizing and tagging these interactions for compliance, CRM updates, or follow-up actions is time-consuming and often inconsistent.
</p>
<p style="font-size:16px; font-family:Arial">
With Teradata’s BYO-LLM capability, we can deploy an Open Source Hugging Face model directly within VantageCloud — where the client interaction data already resides.  In this demo, we’ll perform Named Entity Recognition (NER) with <a href="https://huggingface.co/tner/roberta-large-ontonotes5" target="_blank">tner/roberta-large-ontonotes5</a> for Extracting Key Phrases such as:

<ul style="font-size:16px; font-family:Arial">
  <li>Client names</li>
  <li>Financial institutions</li>
  <li>Product types</li>
  <li>Key dates</li>
</ul> 

</p>
<p style="font-size:16px; font-family:Arial">
This helps financial firms:
<ul style="font-size:16px; font-family:Arial">
  <li>Automate meeting note tagging for faster documentation and regulatory compliance</li>
  <li>Enhance client profiling by identifying frequently discussed financial topics</li>
  <li>Streamline CRM updates by structuring insights from unstructured text</li>
</ul>
</p>
<p style="font-size:16px; font-family:Arial">
All of this is achieved securely — without moving data — and using open-source models, giving teams full control, scalability, and flexibility.
</p>


In [None]:
# Creating a teradataml dataframe using sample data in an OFS table.
call_summary_dataset = DataFrame(in_schema("DEMO_EntityRecognition","Financial_CallCenter_Summary"))
call_summary_dataset.head()

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>4. Authenticate into User Environment Service (UES) for Container Management</b></p>
<p style="font-size:16px; font-family:Arial;">
  The <code>teradataml</code> library offers simple yet powerful methods for creating and managing custom Python runtime environments within VantageCloud. This gives developers full control over model behavior, performance, and analytic accuracy when running on the Analytic Cluster.
</p>

<p style="font-size:16px; font-family:Arial;">
  Custom environments are persistent—created once and reused as needed. They can be saved, updated, or modified at any time, allowing for efficient and flexible environment management.
</p>

<p style="font-size:18px; font-family:Arial; color:#00233C;">
  <b>Container Management Process</b>
</p>

<table style="width:100%; table-layout:fixed;">
  <tr>
    <td style="vertical-align:top;" width="40%">
      <ol style="font-size:16px; font-family:Arial; color:#00233C;">
        <li>Create a unique User Environment based on available base images</li>
        <li>Install libraries</li>
        <li>Install models and additional user artifacts</li>
      </ol>
    </td>
    <td>
      <img src="OAF_Env.png" width="600" alt="Container Management Diagram">
    </td>
  </tr>
</table>
<p style="font-size:16px;font-family:Arial">
<b>UES authentication</b> is required to create and manage the Python or R environments that we will be creating.  A VantageCloud Lake user can easily create the authentication objects using the Console in a VantageCloud Lake environment.  The step to create these authentication objects has already been performed for you.
</p>
<p style="font-size:16px;font-family:Arial">
   
<ul style="font-size:16px;font-family:Arial; margin-top:4px;">
  <li><a href="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Teradata-Package-for-Python-on-VantageCloud-Lake/Working-with-Open-Analytics/APIs-to-Use-with-Open-Analytics-Framework/API-to-Set-Authentication-Token/set_auth_token">Click here</a> to see more details about using the Teradata APIs to set the authentication objects.</li>

  <li>Check out <a href="https://medium.com/teradata/deploy-hugging-face-llms-on-teradata-vantagecloud-lake-with-nvidia-gpu-acceleration-d94d999edaa5">Step 4</a> of this tutorial to to see more details about configuring a VantageCloud Lake Environment to use our Open Analytics Framework</li>
</ul>

In [None]:
# Load the values for the Open Analytics Endpoint, the Access Token and the location of the .pem file.
# We also assign the name of our GPU Compute Group.

open_analytics_endpoint = os.getenv("ues_uri")
access_token = os.getenv("access_token")
pem_file = os.getenv("pem_file")
gpu_compute_group = os.getenv("gpu_compute_group")

In [None]:
## uncomment to use personal access token and key 
configure.ues_url = open_analytics_endpoint
if set_auth_token(ues_url=open_analytics_endpoint, username=username, pat_token=access_token, pem_file=pem_file):
    print("UES Authentication successful")
else:
    print("UES Authentication failed. Check credentials.")
    sys.exit(1)

<hr style="height:2px; border:none">
<p style="font-size:20px; font-family:Arial"><b>5. Set Up the User Environment in Teradata VantageCloud Lake for Model Deployment</b></p>
<p style="font-size:16px; font-family:Arial">
Now that <b>UES authentication</b> is complete, we can begin managing user environments using Teradata’s API capabilities.
Start by listing the available user environments to see if an existing one can be reused. To select and work within an existing environment, use the <code><b>get_env<b></code> API method.
</p>

In [None]:
# Check if I have any existing environments
env_list = list_user_envs()
if env_list is None:
    print("This user does not have any environments. Please continue...")
else:
    ipydisplay(env_list)
    # Iterate over the env_name column and remove
    for env_name in env_list['env_name']:
        print(f"Env Name: {env_name} is being removed!")
        remove_env(env_name)
    print("Finished removing the environments.  Plese continue...")

<p style="font-size:16px;font-family:Arial">
To create a new user environment, start by retrieving an available base python version.
</p>

In [None]:
# Listing the available base environments 
print(list_base_envs())

<p style="font-size:16px;font-family:Arial">
Use the <a href ="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Teradata-Package-for-Python-on-VantageCloud-Lake/Working-with-Open-Analytics/APIs-to-Use-with-Open-Analytics-Framework/APIs-to-Manage-User-Environments/create_env"<code>create_env</code></a> API to assign a name, specify the version, and add a description for your new environment. </p>

In [None]:
## Create a new python user environment by specifying a name, a version of python, and a description.
env_name = username.replace("-","_")
demo_env = create_env(env_name,
                      base_env = 'python_3.10',
                      desc = 'BYOLLM demo env')

<p style="font-size:16px;font-family:Arial">
Install the required libraries into your user environment on the GPU Compute Cluster. For most use cases involving Hugging Face, these will be the <b>transformers</b> and <b>torch</b> python libraries.  This could take up to 5 minutes to complete.  Please wait until you see the <b>Libraries installed</b> message.
</p>

In [None]:
lib_claim_id = ''
lib_claim_id = demo_env.install_lib(["transformers", "torch"])
print("Libraries installed") 

In [None]:
demo_env.status(lib_claim_id)

<p style="font-size:16px;font-family:Arial">
Now we can configure the TextAnalyticsAI object with the preferred large language model using the TeradataAI object. This will enable us to execute a variety of text analytics tasks. </p>

<hr style="height:2px;border:none">
<p style="font-size:20px;font-family:Arial"><b>6. Download the Hugging Face Language Model and upload to UES environment</b></p>
<p style="font-size:16px;font-family:Arial">
You can download Hugging Face LLMs in either <a href="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Build-Scalable-Analytics-with-Open-Analytics-Framework/Bring-Your-Own-LLM-and-DL-Workloads/Using-Hugging-Face-LLMs"> native format or streamlined format</a>. Here we use the streamlined format to download the model which will download it to our local system directory.
</p>

In [None]:
# Load model directly #save streamlined format under local path
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("tner/roberta-large-ontonotes5")
model = AutoModelForTokenClassification.from_pretrained("tner/roberta-large-ontonotes5")

tokenizer.save_pretrained("./roberta-large-ontonotes5")
model.save_pretrained("./roberta-large-ontonotes5")


In [None]:
#Compress the model into zip file
# For the use cases when the model folder is in another directory than the current directory, root_dir should be the abosulte path.
import shutil
shutil.make_archive('roberta-large-ontonotes5', format='zip', root_dir='roberta-large-ontonotes5')


<p style="font-size:16px;font-family:Arial">
After the LLM directories are compressed into zip files, you can use the <a href = "https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Build-Scalable-Analytics-with-Open-Analytics-Framework/Bring-Your-Own-LLM-and-DL-Workloads/Using-User-Environment-APIs-to-Manage-LLMs">UES APIs</a> to install, uninstall, and list LLMs. Here we use the install_model API.</p>



In [None]:
claim_id = demo_env.install_model(model_path='roberta-large-ontonotes5.zip', asynchronous=True)

demo_env.status(claim_id)

<hr style="height:2px;border:none">
<p style="font-size:20px;font-family:Arial"><b>7. Create a python script and execute it using the Apply Class</b></p>

<p style="font-size:16px;font-family:Arial">
Create a python script that reads text records from VantageCloud using standard input, applies a Hugging Face language model to perform Named Entity Recognition (NER), and outputs the extracted entities in a structured, delimited format using standard output.
</p>

In [None]:
%%writefile entity_recognition.py

#!/usr/bin/env python3
import sys
import warnings

warnings.simplefilter('ignore')
input_str = sys.stdin.read()

DELIMITER = '#'

if len(input_str) > 0:
    from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
    torch_device = 'cuda'
    model_path = "./models/roberta-large-ontonotes5"
    tokenizer = AutoTokenizer.from_pretrained(model_path)
    model = AutoModelForTokenClassification.from_pretrained(model_path)
    translator = pipeline("token-classification", model=model, tokenizer=tokenizer, device=torch_device, aggregation_strategy='max')

    for line in input_str.splitlines():
        results = translator(line)
        dict_val = {}

        for r in results:
            entity = r['entity_group']
            word = r['word']
            dict_val.setdefault(entity, []).append(word)

        combined_str = ""
        for key in ["ORG", "PERSON", "DATE", "PRODUCT", "GPE"]:
            combined_str += f"{DELIMITER}{','.join(dict_val.get(key, []))}"

        print(f"{line}{combined_str}")


<p style="font-size:16px;font-family:Arial">
Install the python script file to the environment using install_file().
</p>

In [None]:
demo_env.install_file(file_path ="entity_recognition.py", replace=True)


<p style="font-size:16px;font-family:Arial">
Set the session to the GPU Analytic compute group you desire otherwise it will set to default. The cluster needs to be running to execute the APPLY class.</p>

In [None]:
execute_sql(f"SET SESSION COMPUTE GROUP {gpu_compute_group};")
print(f"Compute group set to {gpu_compute_group}")

<p style="font-size:16px;font-family:Arial">
The APPLY class executes your Python script directly within the user environment, enabling in-database processing at scale. It reads text from each row of the dataset, performs Named Entity Recognition (NER) using Hugging Face language models accelerated by NVIDIA GPUs, and outputs the extracted entities into a structured dataframe in just seconds. </p>  

<p style="font-size:16px;font-family:Arial"> In this example, we use <code>tner/roberta-large-ontonotes5</code>, a general-purpose NER model trained on the OntoNotes 5 dataset. It supports entities like ORG, PERSON, DATE, PRODUCT, and GPE. However, it's important to note that the “PRODUCT” entity in OntoNotes refers primarily to physical products (e.g., iPhone, Windows OS), not financial instruments (e.g., Roth IRA, mutual funds). To improve financial domain accuracy, this model can be further fine-tuned on domain-specific data to better recognize investment products, insurance types, and retirement accounts.
 </p> 

In [None]:
apply_obj = Apply(data = call_summary_dataset,
                  apply_command = 'python entity_recognition.py',
                  returns = {"Call_Summary": VARCHAR(64000), "ORG": VARCHAR(64000), "PERSON": VARCHAR(64000), "DATES": VARCHAR(64000), "PRODUCT": VARCHAR(64000), "GPE": VARCHAR(64000)},
                  env_name = demo_env,
                  delimiter = '#',
                  quotechar = '|'
                 )

import time

start = time.time()

# Execute the Python script inside the remote user environment.
df = apply_obj.execute_script()

print(f'Time: {time.time() - start}')

df


<p style="font-size:16px;font-family:Arial">
  You can explore additional Natural Language Processing (NLP) tasks directly in-database using task-specific models:
</p>

<ul style="font-size:15px;font-family:Arial">
  <li><strong>Text Classification</strong> — <a href="https://huggingface.co/facebook/bart-large-mnli" target="_blank">facebook/bart-large-mnli</a>: Classifies text into predefined categories</li>
  <li><strong>Language Detection</strong> — <a href="https://huggingface.co/papluca/xlm-roberta-base-language-detection" target="_blank">papluca/xlm-roberta-base-language-detection</a>: Detects the language</li>
  <li><strong>Generating Embeddings</strong> — <a href="https://huggingface.co/sentence-transformers/all-mpnet-base-v2" target="_blank">sentence-transformers/all-mpnet-base-v2</a>: Converts text into vector representations for similarity searches</li>
  <li><strong>Named Entity Recognition</strong> — <a href="https://huggingface.co/tner/roberta-large-ontonotes5" target="_blank">tner/roberta-large-ontonotes5</a>: Identifies and categorizes named entities within unstructured text</li>
  <li><strong>Extracting Key Phrases</strong> — <a href="https://huggingface.co/ml6team/keyphrase-extraction-kbir-kpcrowd" target="_blank">ml6team/keyphrase-extraction-kbir-kpcrowd</a>: Extracts important phrases from a document to summarize its content</li>
  <li><strong>Grammar Correction</strong> — <a href="https://huggingface.co/pszemraj/flan-t5-large-grammar-synthesis" target="_blank">pszemraj/flan-t5-large-grammar-synthesis</a>: Automatically corrects grammatical errors in text</li>
  <li><strong>Masking PII Entities</strong> — <a href="https://huggingface.co/ab-ai/pii_model" target="_blank">ab-ai/pii_model</a>: Masks personally identifiable information (PII)</li>
  <li><strong>Sentiment Analysis</strong> — <a href="https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english" target="_blank">distilbert-base-uncased-finetuned-sst-2-english</a>: Determines the emotional tone (positive, negative, neutral)</li>
  <li><strong>Sentence Similarity</strong> — <a href="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2" target="_blank">sentence-transformers/all-MiniLM-L6-v2</a>: Measures semantic similarity between sentences</li>
  <li><strong>Summarization</strong> — <a href="https://huggingface.co/facebook/bart-large-cnn" target="_blank">facebook/bart-large-cnn</a>: Generates concise summaries of longer documents</li>
  <li><strong>Translation</strong> — <a href="https://huggingface.co/Helsinki-NLP/opus-mt-en-fr" target="_blank">Helsinki-NLP/opus-mt-en-fr</a>: Translates English text to French</li>
</ul>



<hr style='height:2px;border:none;'>
<b style="font-size:20px;font-family:Arial">8. Automate Entity Recognition with <code>teradatagenai</code> Python Package</b>
<p style="font-size:16px;font-family:Arial">
  To simplify and accelerate setup, you can now use the <code>teradatagenai</code> package to automate model deployment and inference. This Python library enables seamless, in-database, AI-driven text analytics within Teradata VantageCloud. It offers a variety of user-friendly functions that wrap API calls for common text analysis tasks, making it easy to apply LLMs to proprietary unstructured data.
    
</p>

<p style="font-size:16px;font-family:Arial">
  Built on Teradata VantageCloud’s open and connected architecture, this solution uses BYOLLM capability and enables teams to rapidly develope generative AI use cases that enhance customer experiences, improve employee productivity, and streamline operations—all within a secure, scalable environment.
</p>

<p style="font-size:16px;font-family:Arial"> TextAnalyticsAI gives us access to the 11+ genAI functions and enables users to use advanced text analytics capabilities seamlessly on data stored in Vantage.
<ul style="font-size:16px; font-family:Arial; margin-left:20px; line-height:1.8;">
  <li><code>classify()</code> – Classify text into predefined categories</li>
  <li><code>analyze_sentiment()</code> – Perform sentiment analysis</li>
  <li><code>detect_language()</code> – Detect the language of a text</li>
  <li><code>embeddings()</code> – Generate embeddings for similarity search</li>
  <li><code>recognize_entities()</code> – Extract named entities</li>
   <li><code>recognize_pii_entities()</code> – Detect and label PII entities</li>
  <li><code>extract_key_phrases()</code> – Identify key phrases in text</li>
  <li><code>mask_pii()</code> – Mask personally identifiable information (PII)</li>
  <li><code>sentence_similarity()</code> – Measure semantic similarity between sentences</li>
  <li><code>summarize()</code> – Generate summaries of longer documents</li>
  <li><code>translate()</code> – Translate text between languages</li>
</ul>

</p>


In [None]:
#install the teradatagenai package
!pip install teradatagenai --quiet

In [None]:
#import libraries and modules
import teradatagenai
from teradatagenai import TeradataAI, TextAnalyticsAI

<p style="font-size:16px;font-family:Arial">  
TeradataAI handles the download and installation of the Hugging Face model (example: <i>'tner/roberta-large-ontonotes5</i>) in the user's environment. Incase the environment is not specified, a sample environment named <i>'td_gen_ai_env'</i> is created with <code>torch</code> and <code>transformer</code> libraries and their dependencies. The TeradataAI class will manage the entire setup process. In the background, this process utilizes Teradata’s Bring Your Own Large Language Model (BYO LLM) offering.</p>
<p style="font-size:16px;font-family:Arial">    For this next example we're going to delete the environment we just created and then create a new user environment to showcase the process to install the same <code>tner/roberta-large-ontonotes5</code> with the <code>teradatagenai</code> package. </p>

In [None]:
# Check if I have any existing environments
env_list = list_user_envs()
claim_id = ''
if env_list is None:
    print("This user does not have any environments. Please continue...")
else:
    ipydisplay(env_list)
    # Iterate over the env_name column and remove
    for env_name in env_list['env_name']:
        print(f"Env Name: {env_name} is being removed!")
        claim_id = remove_env(env_name, asynchronous=True)
    print("Finished removing the environment.  Please continue...")

In [None]:
#Verify that the environment has been deleted
if claim_id != '':
    async_run_status(claim_id)
else:
    print("No status to display")

In [None]:
## Create a new user environment with a base enviroment available
demo_env = create_env(env_name, base_env = 'python_3.10', desc = 'BYOLLM demo env')

In [None]:
# Validating the environments
env_list = list_user_envs()
print("Available Environments:")
ipydisplay(env_list)

In [None]:
#Define your model and initialize the TeradataAI Class
model_name = 'tner/roberta-large-ontonotes5'
model_args = {'transformer_class': 'AutoModelForTokenClassification',
              'task' : 'token-classification'}
ues_args = {'env_name': f'{env_name}'}

llm = TeradataAI(api_type = "hugging_face",
                 model_name = model_name,
                 model_args = model_args,
                 ues_args = ues_args )


<p style="font-size:16px;font-family:Arial">
    <br>
    Even though the Python kernel is <b>Idle</b>, please wait until you see that the installation status above has completed before you continue. This should only take 2 minutes depending on your network.<br>
    <br> Because we've recreated our customer environment, we need to install our python dependencies again.</p>



In [None]:
lib_claim_id = ''
lib_claim_id = demo_env.install_lib(["transformers", "torch"])
print("Installing Libraries") 

In [None]:
demo_env.status(lib_claim_id)

<p style="font-size:16px;font-family:Arial"> Next, we configure the <code>TextAnalyticsAI</code> object with our preferred large language model using the <code>TeradataAI</code> instance. This setup allows us to run a variety of text analytics tasks inside VantageCloud.
When the object is created, you’ll notice that two default scripts—<code>td_sample_inference_script.py</code> and <code>td_sample_embeddings_script.py</code>—are automatically uploaded to the remote user environment. These are used by default for inference and embeddings tasks. 

If you want to use your own script, you can upload it and reference it using the <code>script</code> parameter. To use the default script, simply omit the <code>script</code> argument.
</p>


In [None]:
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Default script is used
obj.recognize_entities(column='Call_Summary', data=call_summary_dataset, script="entity_recognition.py", returns = {"txt": VARCHAR(64000),
                                                  "ORG": VARCHAR(64000),
                                                  "PERSON": VARCHAR(64000),
                                                  "DATES": VARCHAR(64000),
                                                  "PRODUCT": VARCHAR(64000),
                                                  "GPE": VARCHAR(64000)}, delimiter="#")



<hr style="height:2px; border:none">

<b style="font-size:20px; font-family:Arial;">
  9. Call Hosted LLMs From In-Database (AWS Bedrock, Google Gemini, Azure AI Models)
</b>

<p style="font-size:16px; font-family:Arial; line-height:1.6;">
  Teradata lets you run text analysis using large language models (LLMs) hosted on cloud platforms such as <strong>AWS</strong>, <strong>Google</strong>, and <strong>Azure</strong>, while working directly with data stored in <strong>Vantage</strong>. 
</p>

<p style="font-size:16px; font-family:Arial; line-height:1.6;">
  These built-in functions support a wide range of NLP tasks on unstructured text:
</p>

<ul style="font-size:16px; font-family:Arial; margin-left:20px; line-height:1.8;">
  <li><code>AI_AnalyzeSentiment</code></li>
  <li><code>AI_AskLLM</code></li>
  <li><code>AI_DetectLanguage</code></li>
  <li><code>AI_MaskPII</code></li>
  <li><code>AI_RecognizeEntities</code></li>
  <li><code>AI_RecognizePIIEntities</code></li>
  <li><code>AI_TextClassifier</code></li>
  <li><code>AI_TextEmbeddings</code></li>
  <li><code>AI_TextSummarize</code></li>
</ul>

<p style="font-size:16px; font-family:Arial; line-height:1.6;">
  In the following example, we’ll demonstrate how to use <code>AI_RecognizeEntities</code> for in-database entity recognition with Amazon Bedrock's Anthropic LLM: "anthropic.claude-v2".
</p>


In [None]:
# Securely prompt for AWS credentials
region = getpass.getpass("Region: ")
access_key = getpass.getpass("Enter AWS Access Key: ")
secret_key = getpass.getpass("Enter AWS Secret Key: ")
session_token = getpass.getpass("Enter AWS Session Token: ")

In [None]:
# Define your query dynamically
query = f"""
SELECT * FROM AI_RecognizeEntities( 
  ON DEMO_EntityRecognition.Financial_CallCenter_Summary AS InputTable
  USING 
    TextColumn('Call_Summary')
    ApiType('aws')
    REGION('{region}')
    ACCESSKEY('{access_key}')
    SECRETKEY('{secret_key}')
    SESSIONKEY('{session_token}')
    ModelName('anthropic.claude-v2')
    isDebug('true')
    Accumulate('[0:]')
) AS dt;
"""

<p style="font-size:16px;font-family:Arial">
While our open-source Hugging Face models provide a powerful and customizable foundation for entity recognition, it will require additional domain-specific fine-tuning to accurately capture specialized financial terms or product names. In contrast, <b>hosted LLMs</b> such as Anthropic’s Claude or other models available through <b>AWS Bedrock, Google Vertex AI, or Azure OpenAI</b> often benefit from broad training by the provider—resulting in deeper contextual understanding out of the box. In this example, we will observe that the AWS Bedrock's Anthropic model identified a richer set of financial products with no additional training.
</p>


In [None]:
query = DataFrame.from_query(query)
query


<p style="font-size:16px;font-family:Arial">
 This is why <b>Teradata’s support for both open-source and hosted models</b> is great for developers and enteprises: developers need the flexibility to choose the right model for each use case based on the <b>business goals, data privacy requirements, cost considerations,</b> and <b>infrastructure preferences</b>. Whether you're performing domain-specific NLP with tightly controlled data using <b>BYO-LLM</b> or leveraging the latest generative AI via <b>fast-path cloud functions</b>, Teradata enables you to do both—<b>securely, efficiently, and at scale</b>. The real differentiator is not just the model, but the ability to <b>operationalize AI seamlessly where your data lives</b> and apply it to solve meaningful business problems with measurable impact.
</p>


<hr style='height:2px;border:none'>
<b style = 'font-size:20px;font-family:Arial'>10. Cleanup</b>
<p style = 'font-size:18px;font-family:Arial'><b>Work Tables</b></p>
<p style = 'font-size:16px;font-family:Arial'>Cleanup work tables to prevent errors next time.</p>

In [None]:
#Remove the existing user environment 
remove_env(env_name)

In [None]:
remove_context() 

In [None]:
help(TeradataAI)

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>