<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
      Simplify Text Analytics with Teradata Python package for Generative AI
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style="font-size:20px;font-family:Arial"><b>Introduction:</b></p>


<p style="font-size:16px; font-family:Arial">
  The <code>teradatagenai</code> Python library enables data scientists, analysts, and developers to run analytics on their unstructured data directly within Teradata VantageCloud. It's built-in support for open-source Hugging Face models through Teradata's  Bring Your Own Large Language Model (BYOLLM) capability and cloud service provider or by using In-DB TextAnalytics AI functions to access models provided by AWS, Azure, and GCP.

<div style="text-align:center">
  <img src="./images/teradatagenai.png" width="1000" alt="teradatagenai Diagram" style="border:4px solid #404040; border-radius: 10px;">
</div>

<p style="font-size:20px;font-family:Arial;margin-top:10px"><b>Business Value:</b></p>

<p style="font-size:16px; font-family:Arial">
 Organizations handle massive volumes of unstructured text including emails, voice call transcripts, customer reviews, contracts and more. Traditional approaches to analyze this data often involve costly data transfers, building custom ML pipelines, and extended turnaround times. <code>teradatagenai</code> addresses these challenges by bringing domain specific language models LLMs and hosted LLMs closer to your data.
</p>
<p style="font-size:16px; font-family:Arial">
  With built-in support for GPU acceleration and seamless integration with VantageCloud, the library offers simple function calls that abstract complex APIs, enabling secure, scalable, and performant text processing. Whether you're deploying open source models in-database or calling hosted LLMs like Amazon Bedrock, <code>teradatagenai</code> provides the flexibility to align with your organization's security, cost, and performance needs.
</p>

<p style="font-size:16px; font-family:Arial">
  The <code>TextAnalyticsAI</code> module within the library provides over 11 built-in generative AI functions for powerful in-database NLP capabilities:
</p>

<ul style="font-size:16px; font-family:Arial; margin-left:20px; line-height:1.8;">
  <li><code>classify()</code> – Classify text into predefined categories</li>
  <li><code>analyze_sentiment()</code> – Perform sentiment analysis</li>
  <li><code>detect_language()</code> – Detect the language of a text</li>
  <li><code>embeddings()</code> – Generate embeddings for similarity search</li>
  <li><code>recognize_entities()</code> – Extract named entities</li>
   <li><code>recognize_pii_entities()</code> – Detect and label PII entities</li>
  <li><code>extract_key_phrases()</code> – Identify key phrases in text</li>
  <li><code>mask_pii()</code> – Mask personally identifiable information (PII)</li>
  <li><code>sentence_similarity()</code> – Measure semantic similarity between sentences</li>
  <li><code>summarize()</code> – Generate summaries of longer documents</li>
  <li><code>translate()</code> – Translate text between languages</li>
</ul>
</p>

In [None]:
%%capture
!pip install -r requirements.txt --quiet

<hr style='height:2px;border:none'>
<p style="font-size:20px;font-family:Arial"><b>1. Configure the environment</b></p>
<p style="font-size:16px; font-family:Arial">
Before we start working with our data, we need to set up our environment. This involves importing the necessary packages and establishing a connection to Vantage.
<br>
Here's how we can do this: </p>

In [None]:
import os
from getpass import getpass
from dotenv import load_dotenv

from teradataml import *
from teradatagenai import TextAnalyticsAI, TeradataAI, load_data
from IPython.display import display as ipydisplay

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>2. Connect to VantageCloud Lake</b></p>
<p style = 'font-size:16px;font-family:Arial'>Connect to VantageCloud using `create_context` from the teradataml Python library. Input your connection details, including the host, username, password and Analytic Compute Group name.</p>

In [None]:
print("Checking if this environment is ready to connect to VantageCloud Lake...")

if os.path.exists("../.config/.env"):
    print("Your environment parameter file exist.  Please proceed with this use case.")
    load_dotenv("../.config/.env", override=True)
    host = os.getenv("host")
    username = os.getenv("username")
    my_variable = os.getenv("my_variable")
    eng = create_context(host=host, username=username, password=my_variable)
    execute_sql('''SET query_band='DEMO=text_analytics_teradatagenai_aws_huggingface.ipynb;' UPDATE FOR SESSION;''')
    print("Connected to VantageCloud Lake with:", eng)
else:
    print("Your environment has not been prepared for connecting to VantageCloud Lake.")
    print("Please contact the support team.")


<hr style="height:2px; border:none">

<p style="font-size:20px; font-family:Arial"><b>3.Load the data</b></p>


<p style="font-size:16px; font-family:Arial">
We will be loading sample unstructured data which represents real world advisor-client conversations. We will use this this sample data to explore various NLP tasks including PII masking, entity recognition, summarization, sentiment analysis, language detection and more.
</p>
<p style="font-size:16px; font-family:Arial">
<b>💼 Use Case Summary:</b><br>
In the wealth management industry, financial advisors hold many client meetings each week — discussing portfolios, insurance, loans, and retirement planning. Manually summarizing and tagging these interactions for compliance, CRM updates, or follow-up actions is time-consuming and often inconsistent.
</p>
<p style="font-size:16px; font-family:Arial">
With Teradata’s teradatagenai library, you can apply generative AI functions directly inside VantageCloud using either hosted LLMs (e.g., via Amazon Bedrock) or Open Source Hugging Face models—keeping the data in place for secure, scalable inference.
<br>
This enables financial firms to:
<br>   
<ul>
    <li>📌 Automate meeting note tagging for faster documentation and regulatory compliance </li>
    <li> 📈 Enhance client profiling by identifying frequently discussed financial topics </li>
    <li> 🔄 Streamline CRM updates by structuring insights from unstructured text </li>
</ul>
</p>



In [None]:
import pandas as pd
from teradataml import DataFrame, copy_to_sql

# Define the dataset with embedded fake PII
customer_profile = {
    "customer_id": [101, 102, 103, 104, 105, 106],
    "first_name": ["John", "Alex", "Maria", "Sarah", "Michael", "Claire"],
    "last_name": ["Doe", "Kim", "Lopez", "Lee", "Choi", "Dupont"],
    "ssn": ["123-45-6789", "321-54-9876", "231-67-9843", "456-89-1230", "876-32-1901", "654-78-3201"],
    "address": [
        "101 Main St, Springfield, IL",
        "102 Oak Ave, Denver, CO",
        "103 Pine Rd, Miami, FL",
        "104 Elm St, Seattle, WA",
        "105 Maple Ln, Austin, TX",
        "106 Birch Blvd, Boston, MA"
    ],
    "email": [
        "john.doe@email.com",
        "alex.kim@email.com",
        "maria.lopez@email.com",
        "sarah.lee@email.com",
        "michael.choi@email.com",
        "claire.dupont@email.com"
    ]
}

# Convert to DataFrame
df = pd.DataFrame(customer_profile)

# Insert into Teradata (replace if exists)
copy_to_sql(df=df, table_name="customer_profile", if_exists="replace")

# Create a teradataml DataFrame to validate
customer_profile = DataFrame("customer_profile")
customer_profile


In [None]:
advisor_customer_interactions = {
    "customer_id": [101, 102, 103, 104, 105, 106],
    "advisor_notes": [
    "John Doe tried to execute a $5,000 ETF purchase of the Fidelity Total Market Index (FTMX) on April 10, but the trade failed due to a pending funding transfer. I explained the delay was caused by a hold on ACH funds. He was confused and I promised to follow up with trade ops. A reattempt was placed for April 11.",
    "Alex Kim reported a $350 overdraft on his Fidelity Cash Management Account. The issue occurred due to a delayed deposit from his payroll on March 1. I submitted a one-time refund and showed him how to enable low-balance alerts. We also discussed switching his linked credit card (Acct #: 4321-8765-0987-6543) to a no-fee card. He appreciated the quick resolution.",
    "I had a long session with Maria Lopezfocused on life insurance and estate planning. She's worried about her term policy expiring in 3 years. We reviewed conversion options, whole life, and trust establishment. Maria also updated beneficiaries for her Roth IRA and brokerage. I’ll send FAQs and a summary for review.",
    "Sarah Lee applied for a Fidelity credit card but her application was declined for the second time due to a flagged credit report item. She shared frustration about not being informed sooner. I explained the credit criteria and recommended she review her TransUnion report. I also offered to help reapply after 60 days and explore secured credit card options..",
    "Michael Choi emailed about errors in his student loan refinance . The APR in his dashboard didn’t match his signed agreement. After investigation, I found a backend error had applied a variable rate instead of the fixed 4.25% APR he agreed to. Michael expressed frustration and referenced SoFi as a potential alternative. I apologized and escalated the issue to Loan Ops for immediate correction and promised to track progress. We also discussed auto loan refinancing—Michael currently has a 6.75% APR. I shared that Fidelity’s partner rates start at 4.99% for 36-month terms and 5.49% for 60-month terms, depending on credit. He asked for a link to compare offers and I agreed to follow up with resources.",
    "Claire Dupont reported her Fidelity Rewards card wasn't providing 2% cashback on travel. I reviewed and saw transactions miscategorized. I submitted a correction and a cashback request. Claire was frustrated and asked about alternative cards. We discussed switching to a travel partner card."
],
    "customer_question": [
    "Can you clarify why the ACH hold occurred on account #123456789 and how to prevent this in the future?",
    "Why did my account (Acct ID: CMA-7789) show a $350 overdraft on March 1, and how can I prevent this going forward?",
    "Which no-fee credit card did you recommend again for SSN 321-54-9876?",
    "When should I consider converting to whole life for policyholder Sarah Lee (SSN: 456-89-1230)?",
    "What were the auto refinance rates you mentioned for Loan ID SL-99881?",
    "Given my conservative strategy and current market volatility, is now a good time to convert my traditional IRA into a Roth?"
],
    "customer_feedback": [
        "I appreciate the follow-up, but I'm still confused about the ETF delay.",
        "The advisor was very helpful and walked me through projections clearly.",
        "Unhappy with the high fees—need a better explanation and options.",
        "Disappointed with how long it took to get an answer about my card.",
        "Grateful for how fast the refund was processed. Thank you",
        "Good discussion on IRA conversion—still thinking it over."
    ]
}

# Convert to DataFrame
df = pd.DataFrame(advisor_customer_interactions)

# Insert into Teradata (replace if exists)
copy_to_sql(df=df, table_name="advisor_customer_interactions", if_exists="replace")

# Create a teradataml DataFrame to validate
advisor_customer_interactions = DataFrame("advisor_customer_interactions")
advisor_customer_interactions 


<hr style="height:2px; border:none">
<b style='font-size:20px;font-family:Arial'>4.Setting up TeradataAI to access Amazon Bedrock Model</b>
<p style="font-size:16px; font-family:Arial">
This section describes how to instantiate the `TeradataAI` class to set up the environment and initialize the LLM endpoint.
<br>
Users can provide the required authorization information in four different ways:
<ol>
    <li>Explicitly pass the authorization information to each argument of the function</li>
    <li>Set the environment variables related to the authorization arguments</li>
    <li>Supply the authorization information via a configuration file</li>
    <li>Pass an existing database authorization object containing the credentials using the authorization parameter</li>
</ol>
</p>

In [None]:
# Set the environment variables related to the authorization arguments.
os.environ['AWS_ACCESS_KEY_ID'] = getpass(prompt='Enter AWS_ACCESS_KEY_ID: ') 
os.environ['AWS_SECRET_ACCESS_KEY'] = getpass(prompt='AWS_SECRET_ACCESS_KEY: ') 
os.environ['AWS_DEFAULT_REGION'] = getpass(prompt='AWS_DEFAULT_REGION: ')

In [None]:
# Instantiate the TeradataAI class with the Amazon Bedrock model.
llm_aws = TeradataAI(api_type="aws",
                     model_name="anthropic.claude-v2"
                     )

<hr style='height:2px;border:none'>
<p style='font-size:20px;font-family:Arial'><b>5. Setting up TextAnalyticsAI to Perform Various Text Analytics Tasks</b></p>
<p style='font-size:16px;font-family:Arial'>This section describes how to instantiate the `TextAnalyticsAI` class to access a variety of text analytics methods.
</p>
### **Key Notes:**

- **General Method Arguments:**
    - **`column`**:  
        Specifies the name of the column to be used.  
        - Type: `str`  

    - **`data`**:  
        Specifies the `teradataml.DataFrame` that includes the column specified by the `column` argument.  
        - Type: `teradataml.DataFrame`  

- **Optional Parameters:**
    - **`persist`**:  
        Specifies whether to persist the output in permanent tables.  
        - Type: `bool`  
        - Default: `False`  

    - **`accumulate`**:  
        Specifies the name(s) of input `teradataml.DataFrame` column(s) to copy to the output. By default, all input columns are copied to the output.  
        - Type: `str` or `list of str`  

    - **`volatile`**:  
        Specifies whether to store the results in a volatile table.  
        - Type: `bool`  
        - Default: `False`  

- **Additional Arguments (`**kwargs`)**:  
    Methods accept additional arguments that can be passed as part of `**kwargs`. For more details, refer to the user guide.


In [None]:
# Instantiate the TextAnalyticsAI class with the Amazon Bedrock model.
obj = TextAnalyticsAI(llm=llm_aws)

<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>6. Sentiment Analysis</b></p>
<p style = 'font-size:16px;font-family:Arial'>
In this section, we'll explore the `analyze_sentiment()` function provided by TextAnalyticsAI which classifies text into positive, neutral, or negative categories. Sentiment analysis is:
<ul>
    <li>A powerful way to understand the voice of the customer
    <li>A complex task due to nuanced interpretation, sarcasm, and irony. However, teradatagenai and LLMs provide a quick and powerful way to decrypt customer sentiments</li>
</ul></p>
<p style = 'font-size:16px;font-family:Arial'>
Using teradatagenai coupled with  like Amazon Bedrock is faster compared to traditional machine learning as there is no need to train machine learning models and related activities such as data labeling, feature engineering, and model tuning—making it easier for teams to get immediate insights.
</p>

In [None]:
# Analyze the sentiment of the reviews in the  customer feedback data.
obj.analyze_sentiment(column="customer_feedback",data=advisor_customer_interactions, accumulate="customer_feedback")

<div style="overflow: auto;">
  <img src="./images/sentiment_analysis.png" alt="Sentiment Analysis Chart" style="float: right; margin-left: 20px; width: 550px; border:4px solid #404040; border-radius: 10px; margin-top: 20px;">

  <p style="font-size:16px; font-family:Arial;padding-top:20px;">
    Beyond basic classification, sentiment analysis can be extended to support deeper business use cases—for example, tracking product complaints over time by monitoring the frequency and trend of negative sentiment. This helps organizations identify areas for improvement, prioritize feature updates, and enhance customer satisfaction.
  </p>
</div>

<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>7. Masking Personal Identifiable Information (PII) Entities</b></p>
<p style = 'font-size:16px;font-family:Arial'>
The `mask_pii()` function is used to mask Personal Identifiable Information (PII) entities within a given text. This can be particularly useful when you want to protect sensitive data in your text.
</p>

In [None]:
# Mask PII (Personally Identifiable Information) in the employee data.
obj.mask_pii(column="customer_question",data=advisor_customer_interactions,accumulate='advisor_notes',volatile=True)

<hr style="height:2px; border:none">
<p style='font-size:20px; font-family:Arial;'><b>8. Generating Embeddings</b></p>
<p style='font-size:16px; font-family:Arial;'>
<div style="display: flex; align-items: center; gap: 30px; margin-top: 10px;">
  <div style="flex: 1; font-size:16px; font-family:Arial;">
    <p>
      The <code>embeddings()</code> function generates vector representations of text from a specified column, capturing the semantic meaning of each entry.
    </p>
    <p>
      These embeddings can then be used for tasks such as semantic similarity, clustering, retrieval, or as input features for downstream machine learning models.
    </p>
  </div>

  <img src="./images/clustering.png" alt="Text Embedding Clustering" style="width: 400px; max-width: 100%;border: 4px solid #404040; border-radius: 10px;">
</div>


In [None]:
# Instantiate the TeradataAI class with the Amazon Bedrock model.
llm_embedding = TeradataAI(api_type="aws",                      
               model_name="amazon.titan-embed-text-v2:0",
               region="us-west-2")

In [None]:
# Instantiate the TextAnalyticsAI class with the embedding model.
obj_embeddings = TextAnalyticsAI(llm=llm_embedding)

In [None]:
# Generate embeddings
obj_embeddings.embeddings(column="advisor_notes",data=advisor_customer_interactions,accumulate="customer_id",output_format='VARCHAR')


<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>9. Entity Recognition </b></p>
<p style='font-size:16px; font-family:Arial;'>
The <code>recognize_entities()</code> function is designed to identify a wide range of entities within text data. Examples of these entities can include:</p>
<table>
    <tr>
        <td style='font-size:16px; font-family:Arial;'>people</td>
        <td style='font-size:16px; font-family:Arial;'>places</td>
        <td style='font-size:16px; font-family:Arial;'>products</td>
        <td style='font-size:16px; font-family:Arial;'>organizations</td>
        <td style='font-size:16px; font-family:Arial;'>date/time</td>
        <td style='font-size:16px; font-family:Arial;'>quantities</td>
        <td style='font-size:16px; font-family:Arial;'>percentages</td>
        <td style='font-size:16px; font-family:Arial;'>currencies</td>
        <td style='font-size:16px; font-family:Arial;'>names</td>
    </tr>
</table>
</p>

In [None]:
# Recognize entities in the articles in the data.
obj.recognize_entities(column="advisor_notes",data=advisor_customer_interactions)

<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>10. Recognizing Personal Information Identification (PII) Entities</b><p>
<p style = 'font-size:16px;font-family:Arial;'>
The <code>recognize_pii_entities()</code> function provided by TextAnalyticsAI is designed to identify Personal Identifiable Information (PII) entities within text data.  PII entities can include sensitive data like:</p>
<table>
    <tr>
        <td style='font-size:16px; font-family:Arial;'>names</td>
        <td style='font-size:16px; font-family:Arial;'>addresses</td>
        <td style='font-size:16px; font-family:Arial;'>social security numbers</td>
        <td style='font-size:16px; font-family:Arial;'>email addresses</td>
        <td style='font-size:16px; font-family:Arial;'>phone numbers</td>        
    </tr>
</table>
</p>

In [None]:
# Recognize PII entities in the advisor notes.
obj.recognize_pii_entities(column="advisor_notes", data=advisor_customer_interactions, accumulate='advisor_notes',volatile=True)

<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>11. Text Classification </b></p>
<p style = 'font-size:16px;font-family:Arial'>In this section, we'll explore the <code>classify()</code> function provided by TextAnalyticsAI. This function is used to classify the given text.</p>

In [None]:
# Classify the advisor notes into multiple relevant financial and emotional categories.
obj.classify(
    column="advisor_notes",
    data=advisor_customer_interactions,
    accumulate="advisor_notes",
    labels=[
        "Life Insurance",
        "Estate Planning",
        "Beneficiary Updates",
        "Roth IRA",
        "Student Loan Refinance",
        "APR Discrepancy",
        "Auto Loan Refinance",
        "Credit Card Rewards",
        "Cashback Dispute",
        "Card Recommendations",
        "Retirement Fund Performance",
        "Investment Strategy",
        "Portfolio Reallocation",
        "401(k) Strategy",
        "Managed Account Fees",
        "Overdraft Fee",
        "Account Management",
        "Fee Waiver",
        "Customer Complaint",
        "Customer Escalation"
    ],
    multi_label=True
)


<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>12. Text Summarization</b></p>
<p style = 'font-size:16px;font-family:Arial'>The <code>Summarize()</code> function is used to generate a concise summary of a given text.</p>

In [None]:
# Summarize the articles in the employee data.
obj.summarize(column="advisor_notes",data=advisor_customer_interactions,accumulate='customer_id',volatile=True)

<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>13. Key Phrase Extraction</b></p>
<p style = 'font-size:16px;font-family:Arial'>The <code>extract_key_phrases()</code> function provided by TextAnalyticsAI is used to extract key phrases from a given text. These key phrases can provide a quick understanding of the main concepts in the text.</p>

In [None]:
# Extract key phrases from advisor notes
obj.extract_key_phrases(column="advisor_notes",data=advisor_customer_interactions,volatile=True)

<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>14.Language Translation </b></p>
<p style = 'font-size:16px;font-family:Arial'>
The <code>translate()</code> function provided by TextAnalyticsAI is used to translate the language of a given text.</p>

In [None]:
# Translate the text in the employee data to the default language English.
obj.translate(column="advisor_notes",data=advisor_customer_interactions,accumulate='advisor_notes',volatile=True)


<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>15. Language Detection </b></p>
<p style = 'font-size:16px;font-family:Arial'>The <code>detect_language()</code> function is used to identify the language of a given text.</p>

In [None]:
# Detect the language of the quotes in the employee data 
obj.detect_language(column="advisor_notes",data=advisor_customer_interactions,volatile=True)

<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>16. Asking the LLM </b></p>
<p style = 'font-size:16px;font-family:Arial'>The <code>ask()</code> function is used to ask questions to the LLM based on the given context.</p>

In [None]:
# Asking questions to the LLM using context data.
# data_partition_column: 'id' and context_partition_column: 'id' are used to partition the data and context tables.
# Prompt is used to provide a template for the question and data.
obj.ask(column="customer_question", data=advisor_customer_interactions,
        context=advisor_customer_interactions, context_column='advisor_notes',
        data_partition_column='customer_id', context_partition_column='customer_id',
        prompt='''Provide an answer to the customer question using advisor notes as
        information relevant to the question.
        \nQuestion: #QUESTION# \n Data: #DATA#''',
        data_position='#DATA#',
        question_position='#QUESTION#')


<hr style="height:2px; border:none">
<p style = 'font-size:20px;font-family:Arial'><b>17.Text Analytics with Open-Source Hugging Face Language Models</b></p>
<p style="font-size:16px; font-family:Arial"> As you’ve seen, hosted LLMs like Amazon Bedrock’s Anthropic Claude require no fine-tuning to perform a wide range of NLP tasks. This is because these hosted LLMs are trained on massive, diverse datasets, enabling them to handle tasks such as entity recognition, sentiment analysis, PII masking and more right out of the box.
</p>

<p style="font-size:16px; font-family:Arial">
While Amazon Bedrock offers the convenience, scalability, and simplicity of hosted models, many organizations also value the greater domain-specific accuracy, data privacy, control, and customization that open-source models provide.
</p>

<p style="font-size:16px; font-family:Arial">
A <a href="https://arxiv.org/pdf/2203.15556"> recent study from DeepMind </a> shows that smaller, task-specific models—when fine-tuned on domain-specific data—can outperform larger general-purpose models in targeted applications. This makes open-source models an attractive option for organizations wanting greater accuracy, enhanced privacy, and cost-efficiency in specific use cases. However, this approach does require additional expertise and effort to fine-tune and manage these models effectively.
</p>

<p style="font-size:16px; font-family:Arial">
In the next section, we’ll explore how <code>teradatagenai</code> enables seamless integration with Hugging Face models using BYO-LLM capabilities in VantageCloud—allowing you to deploy compact, specialized models directly where your data lives with GPU acceleration.
</p>

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>18. Authenticate into User Environment Service (UES) for Container Management</b></p>
<p style="font-size:16px; font-family:Arial;">
The <code>teradataml</code> library offers simple yet powerful methods for creating and managing custom Python runtime environments within VantageCloud. This gives developers full control over model behavior, performance, and analytic accuracy when running on the Analytic Cluster.
</p>

<p style="font-size:16px; font-family:Arial;">
Custom environments are persistent—created once and reused as needed. They can be saved, updated, or modified at any time, allowing for efficient and flexible environment management.
</p>

<p style="font-size:18px; font-family:Arial;"><b>18.1 Container Management Process</b></p>
<p style="font-size:16px; font-family:Arial;">

<table style="width:100%; table-layout:fixed;">
  <tr>
    <td style="vertical-align:top;" width="40%">
      <ol style="font-size:16px; font-family:Arial;">
        <li>Create a unique User Environment based on available base images</li>
        <li>Install libraries</li>
        <li>Install models and additional user artifacts</li>
      </ol>
    </td>
    <td>
      <img src="./images/OAF_Env.png" width="600" alt="Container Management Diagram" style="border:4px solid #404040; border-radius: 10px;">
    </td>
  </tr>
</table>
<p style="font-size:18px;font-family:Arial"><b>18.2 UES Authentication</b></p>
<p style="font-size:16px;font-family:Arial">This security mechanism is required to create and manage the Python or R environments that we will be creating.  A VantageCloud Lake user can easily create the authentication objects using the Console in a VantageCloud Lake environment.  For this use case, the authentication objects has already been created and copied into this JupyterLab environment for you.
</p>
<p style="font-size:16px;font-family:Arial">
   
<ul style="font-size:16px;font-family:Arial; margin-top:4px;">
  <li><a href="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Teradata-Package-for-Python-on-VantageCloud-Lake/Working-with-Open-Analytics/APIs-to-Use-with-Open-Analytics-Framework/API-to-Set-Authentication-Token/set_auth_token">Click here</a> to see more details about using the Teradata APIs to set the authentication objects.</li>

  <li>Check out <a href="https://medium.com/teradata/deploy-hugging-face-llms-on-teradata-vantagecloud-lake-with-nvidia-gpu-acceleration-d94d999edaa5">Step 4</a> of this tutorial to to see more details about configuring a VantageCloud Lake Environment to use our Open Analytics Framework</li>
</ul>
</p>

In [None]:
# Load the values for the Open Analytics Endpoint, the Access Token and the location of the .pem file.
# We also assign the name of our GPU Compute Group.

open_analytics_endpoint = os.getenv("ues_uri")
access_token = os.getenv("access_token")
pem_file = os.getenv("pem_file")
gpu_compute_group = os.getenv("gpu_compute_group")


In [None]:
configure.ues_url = open_analytics_endpoint
if set_auth_token(ues_url=open_analytics_endpoint, username=username, pat_token=access_token, pem_file=pem_file):
    print("UES Authentication successful")
else:
    print("UES Authentication failed. Check credentials.")
    sys.exit(1)

In [None]:
# Check if I have any existing environments
env_list = list_user_envs()
print("Available Environments:")
ipydisplay(env_list)

<p style="font-size:16px;font-family:Arial">  
TeradataAI handles the download and installation of the Hugging Face model (example: <i>'tner/roberta-large-ontonotes5</i>) in the user's environment. If an environment name is not specified, a default name <i>'td_gen_ai_env'</i> will be used. The TeradataAI class will manage the entire creation and setup process. In the background, this process utilizes Teradata’s <code>Bring Your Own Large Language Model (BYO LLM)</code> offering.</p>
<p style="font-size:16px;font-family:Arial">For this use case, we're going to check if the OAF Container already exist.  If it doesn't, we will create it to show the process of using the <code>tner/roberta-large-ontonotes5</code> model with the <code>teradatagenai</code> package. </p>
<p style="font-size:16px;font-family:Arial">  
Define the Hugging Face model and initialize the TeradataAI Class </p>


In [None]:
# # Accessing the LLM endpoint and initializing TeradataAI and TextAnalyticsAI
model_name = 'tner/roberta-large-ontonotes5'
model_args = {'transformer_class': 'AutoModelForTokenClassification',
              'task' : 'token-classification'}
llm = TeradataAI(api_type = "hugging_face",
                 model_name = model_name,
                 model_args = model_args)


<p style="font-size:16px;font-family:Arial">This is creating a new OAF Container and then installing 40+ libraries.  After those are installed, it will begin downloading and installing the model.  If your Kernel status is showing <b>Idle</b>, please wait until you see that the "Completed" message directly above this cell. This could take several minutes.</p>

<p style="font-size:18px;font-family:Arial"><b>18.3 Create the TextAnalyticsAI object</b></p>
<p style="font-size:16px;font-family:Arial">Now we can execute the portion of the process that will run in our GPU Analytics Cluster.  We configure the TextAnalyticsAI object with the preferred large language model. This will enable us to execute a variety of text analytics tasks.</p>

In [None]:
obj = TextAnalyticsAI(llm=llm)

In [None]:
execute_sql(f"SET SESSION COMPUTE GROUP {gpu_compute_group};")
print(f"Compute group set to {gpu_compute_group}") 

In [None]:
obj.recognize_entities(column='advisor_notes', data=advisor_customer_interactions, delimiter="#")

<hr style='height:2px;border:none'>
<p style = 'font-size:20px;font-family:Arial'><b>19. Cleanup</b></p>
<p style = 'font-size:18px;font-family:Arial'><b>19.1 Delete your OAF Container</b></p>


In [None]:
#Remove the existing user environment 
try:
    result = remove_env("td_gen_ai_env")
    print("Environment removed!")
except Exception as e:
    print("Could not remove the environment!")
    print("Error:", str(e))

<p style = 'font-size:18px;font-family:Arial'><b>19.2 Remove your database Context</b></p>

In [None]:
try:
    result = remove_context()
    print("Context removed!")
except Exception as e:
    print("Could not remove the Context!")
    print("Error:", str(e))

<hr style='height:2px;border:none'>
<p style = 'font-size:20px;font-family:Arial'><b>View the full TeradataAI Help</b></p>

In [None]:
help(TeradataAI)

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>