<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
      Employee Feedback and Insights Platform
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style="font-size:20px;font-family:Arial"><b>Introduction:</b></p>


<p style="font-size:16px; font-family:Arial">
   In this notebook, we will demonstrate how HR teams can analyze employee feedback at scale using advanced text analytics with teradatagenai.
<p style="font-size:16px; font-family:Arial">
The goal is to build an end-to-end pipeline that:
    
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Understands employee sentiment and emotions</li>
    <li>Extracts key themes and topics from feedback</li>
    <li>Identifies sensitive (PII) information for compliance</li>
    <li>Supports global employees by detecting and translating languages</li>
    <li>Summarizes insights for leadership</li>
    <li>Enables semantic search using embeddings</li>
</ul>
    
    
<p style="font-size:16px; font-family:Arial">    
  The <code>teradatagenai</code> Python library enables data scientists, analysts, and developers to run analytics on their unstructured data directly within Teradata VantageCloud. It's built-in support for open-source Hugging Face models through Teradata's  Bring Your Own Large Language Model (BYOLLM) capability and cloud service provider or by using In-DB TextAnalytics AI functions to access models provided by AWS, Azure, and GCP.

<div style="text-align:center">
  <img src="./images/teradatagenai.png" width="1000" alt="teradatagenai Diagram" style="border:4px solid #404040; border-radius: 10px;">
</div>

<p style="font-size:20px;font-family:Arial;margin-top:10px"><b>Business Value:</b></p>

<p style="font-size:16px; font-family:Arial">
 Organizations handle massive volumes of unstructured text including emails, voice call transcripts, customer reviews, contracts and more. Traditional approaches to analyze this data often involve costly data transfers, building custom ML pipelines, and extended turnaround times. <code>teradatagenai</code> addresses these challenges by bringing domain specific language models LLMs and hosted LLMs closer to your data.
</p>
<p style="font-size:16px; font-family:Arial">
  With built-in support for GPU acceleration and seamless integration with VantageCloud, the library offers simple function calls that abstract complex APIs, enabling secure, scalable, and performant text processing. Whether you're deploying open source models in-database or calling hosted LLMs like Amazon Bedrock, <code>teradatagenai</code> provides the flexibility to align with your organization's security, cost, and performance needs.
</p>

<p style="font-size:16px; font-family:Arial">
  The <code>TextAnalyticsAI</code> module within the library provides over 11 built-in generative AI functions for powerful in-database NLP capabilities:
</p>

<ul style="font-size:16px; font-family:Arial; margin-left:20px; line-height:1.8;">
  <li><code>classify()</code> – Classify text into predefined categories</li>
  <li><code>analyze_sentiment()</code> – Perform sentiment analysis</li>
  <li><code>detect_language()</code> – Detect the language of a text</li>
  <li><code>embeddings()</code> – Generate embeddings for similarity search</li>
  <li><code>recognize_entities()</code> – Extract named entities</li>
   <li><code>recognize_pii_entities()</code> – Detect and label PII entities</li>
  <li><code>extract_key_phrases()</code> – Identify key phrases in text</li>
  <li><code>mask_pii()</code> – Mask personally identifiable information (PII)</li>
  <li><code>sentence_similarity()</code> – Measure semantic similarity between sentences</li>
  <li><code>summarize()</code> – Generate summaries of longer documents</li>
  <li><code>translate()</code> – Translate text between languages</li>
</ul>
</p>

<p style="font-size:18px;font-family:Arial;"><b>How to Get Access to Run This Demo in VantageCloud</b></p>

<p style="font-size:16px;font-family:Arial">
Gain free access to Teradata’s <b>Open Analytics Framework</b>, which includes support for <b>BYO-LLM capabilities</b> and <b>GPU compute clusters</b>. This enables you to run open-source Hugging Face models directly within your VantageCloud environment</p>
<p style="font-size:16px;font-family:Arial">
To request the access required for this demo, send an email to <a href="mailto:Support.ClearScapeAnalytics@Teradata.com?subject=Requesting%20OAF%20Access">Support.ClearScapeAnalytics@Teradata.com</a> and include the Host name of the environment you are requsting access from.  This can be found on the ClearScape Analytics Dashboard in the section <b>Connection Details for Vantage Database</b>. Our team will provision your connection with the required permissions for BYO-LLM and GPU-accelerated demos.
</p>


In [None]:
%%capture
!pip install -r requirements.txt --quiet

<div class="alert alert-block alert-info">
<p style = 'font-size:16px;font-family:Arial'><b>Please</b><i> restart the kernel after executing the above cell to include/update these libraries into memory for this kernel. The simplest way to restart the Kernel is by typing zero zero: <b> 0 0</b></i> and then clicking <b>Restart</b>.</p>
</div>

<hr style='height:2px;border:none'>
<p style="font-size:20px;font-family:Arial"><b>1. Configure the environment</b></p>
<p style="font-size:16px; font-family:Arial">
Before we start working with our data, we need to set up our environment. This involves importing the necessary packages and establishing a connection to Vantage.
<br>
Here's how we can do this: </p>

In [None]:
# Importing required packages
import sys
from teradatagenai import TeradataAI, TextAnalyticsAI, load_data
from teradataml import *
import getpass, os
from teradataml import *
import teradatagenai
import time
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
from sentence_transformers import SentenceTransformer
from IPython.display import display as ipydisplay
#from teradataml import create_context, set_config_params, list_base_envs, list_user_envs, create_env

<hr style="height:2px;border:none">
<p style = 'font-size:20px;font-family:Arial'><b>2. Connect to VantageCloud Lake</b></p>
<p style = 'font-size:16px;font-family:Arial'>Connect to VantageCloud using <code>create_context</code> from the teradataml Python library. If this environment has been prepared for connecting to a VantageCloud Lake OAF Container, all the details required will be loaded and you will see an acknowledgement after executing this cell.</p>

In [None]:
print("Checking if this environment is ready to connect to VantageCloud Lake...")

if os.path.exists("/home/jovyan/JupyterLabRoot/VantageCloud_Lake/config/1.env"):
    print("Your environment parameter file exist.  Please proceed with this use case.")
    # Load all the variables from the .env file into a dictionary
    env_vars = dotenv_values("/home/jovyan/JupyterLabRoot/VantageCloud_Lake/config/1.env")
    # Create the Context
    eng = create_context(host=env_vars.get("host"), username=env_vars.get("username"), password=env_vars.get("my_variable"))
    #execute_sql('''SET query_band='DEMO=Entity_Recognition_BYOLLM_VCL.ipynb;' UPDATE FOR SESSION;''')
    print("Connected to VantageCloud Lake with:", eng)
else:
    print("Your environment has not been prepared for connecting to VantageCloud Lake.")
    print("Please contact the support team.")

<hr style="height:2px; border:none">
<p style="font-size:20px; font-family:Arial"><b>3.Load the data</b></p>


<p style = 'font-size:16px;font-family:Arial'>
We will be loading the sample employee data using the <code>'load_data()'</code> helper function. To utilize the TextAnalyticsAI functions effectively, we first need to organize our data appropriately. We are particularly interested in the 'articles', 'reviews', 'quotes', and 'employee_data' columns for each 'employee_id' and 'employee_name' in our dataframe.

<p style = 'font-size:16px;font-family:Arial'>
To streamline this process, we will generate individual dataframes for each of these columns:

In [None]:
load_data('employee', 'employee_data')

In [None]:
df=DataFrame('employee_data')

In [None]:
# Create separate DataFrames for articles, reviews, quotes, and employee data.
df_articles = df.select(["employee_id", "employee_name", "articles"])
df_reviews = df.select(["employee_id", "employee_name", "reviews"])
df_quotes = df.select(["employee_id", "employee_name", "quotes"])
df_employeeData = df.select(["employee_id", "employee_name", "employee_data"])
df_classify_articles = df.select(["employee_id", "articles"])

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>4. Authenticate and Prepare the OAF Environment.</b></p>
<p style="font-size:16px; font-family:Arial;">
The <code>teradataml</code> library offers simple yet powerful methods for creating and managing custom Python runtime environments within VantageCloud. This gives developers full control over model behavior, performance, and analytic accuracy when running on the Analytic Cluster.
</p>

<p style="font-size:16px; font-family:Arial;">
Custom environments are persistent—created once and reused as needed. They can be saved, updated, or modified at any time, allowing for efficient and flexible environment management.
</p>

<p style="font-size:16px; font-family:Arial;">
<table style="width:100%; table-layout:fixed;">
  <tr>
    <td style="vertical-align:top;" width="40%">
      <ol style="font-size:16px; font-family:Arial;">
        <li>Create a unique User Environment based on available base images</li>
        <li>Install libraries</li>
        <li>Install models and additional user artifacts</li>
      </ol>
    </td>
    <td>
      <img src="./images/OAF_Env.png" width="600" alt="Container Management Diagram" style="border:4px solid #404040; border-radius: 10px;">
    </td>
  </tr>
</table>
<p style="font-size:18px;font-family:Arial"><b>4.1 UES Authentication</b></p>
<p style="font-size:16px;font-family:Arial">This security mechanism is required to create and manage the Python or R environments that we will be creating.  A VantageCloud Lake user can easily create the authentication objects using the Console in a VantageCloud Lake environment.  For this use case, the authentication objects has already been created and copied into this JupyterLab environment for you.
</p>
<p style="font-size:16px;font-family:Arial">
   
<ul style="font-size:16px;font-family:Arial; margin-top:4px;">
  <li><a href="https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Teradata-Package-for-Python-on-VantageCloud-Lake/Working-with-Open-Analytics/APIs-to-Use-with-Open-Analytics-Framework/API-to-Set-Authentication-Token/set_auth_token">Click here</a> to see more details about using the Teradata APIs to set the authentication objects.</li>

  <li>Check out <a href="https://medium.com/teradata/deploy-hugging-face-llms-on-teradata-vantagecloud-lake-with-nvidia-gpu-acceleration-d94d999edaa5">Step 4</a> of this tutorial on Medium.com to to see more details about configuring a VantageCloud Lake Environment to use our Open Analytics Framework</li>
</ul>
</p>

In [None]:
# We've already loaded all the values into our environment variables and into a dictionary, env_vars.
# username=env_vars.get("username") isn't required when using base_url, pat and pem.

if set_auth_token(base_url=env_vars.get("ues_uri"),
                  pat_token=env_vars.get("access_token"), 
                  pem_file=env_vars.get("pem_file"),
                  valid_from=int(time.time())
                 ):
    print("UES Authentication successful")
else:
    print("UES Authentication failed. Check credentials.")
    sys.exit(1)

<p style="font-size:18px;font-family:Arial"><b>4.2 Check for an existing OAF environment or Create a new one</b></p>
<p style="font-size:16px;font-family:Arial">It's ok to reuse the same OAF environment. Our VantageCloud Lake OAF Use cases and demos will use a default naming convention for the environment names. If you haven't already created one, we'll create it now.</p>

In [None]:
environment_name = env_vars.get("username")
print("\nHere is a list of your current environments:")
env_list = list_user_envs()
ipydisplay(env_list)

if environment_name in env_list['env_name'].values:  
    demo_env = get_env(environment_name)
    print("Your default environment already exists. You can continue with this notebook.\n\n")
else:
    demo_env = create_env(env_name=f'{environment_name}', base_env='python_3.10')
    print(demo_env)


In [None]:
lib_claim_id = demo_env.install_lib(["transformers", "torch","sentencepiece","sentence-transformers"])
print("Libraries Installed") 
#Get the status of the libraries installation
demo_env.status(str(lib_claim_id["Claim Id"].iloc[0]))

In [None]:
gpu_compute_group = env_vars.get("gpu_compute_group")
execute_sql(f"SET SESSION COMPUTE GROUP {gpu_compute_group};")
print(f"Compute group set to {gpu_compute_group}") 

In [None]:
def clean_env(llm):
    ##Get LLM
    llm_instance = llm.get_llm()
    print("LLM instance:", llm_instance)
    ##Remove LLM
    llm.remove()
    print("LLM removed successfully.")

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>5. Sentiment Analysis</b></p>

<p style="font-size:16px;font-family:Arial">First, we want to gauge employee morale by analyzing the emotional tone of employee reviews and quotes.
We use the Hugging Face model <code>bhadresh-savani/distilbert-base-uncased-emotion</code> which detects emotions like joy, anger, sadness, optimism, etc.</p>

In [None]:
# Acess LLM endpoint
model_name = 'bhadresh-savani/distilbert-base-uncased-emotion'
model_args = {'transformer_class': 'AutoModelForSequenceClassification',
              'task' : 'text-classification'}
llm = TeradataAI(api_type = "hugging_face",
         model_name = model_name,
         model_args = model_args)

<p style="font-size:18px;font-family:Arial"><b>5.1 Create the TextAnalyticsAI object</b></p>
<p style="font-size:16px;font-family:Arial">Now we can execute the portion of this demo that will run in our GPU Analytics Cluster.  We'll provide the TextAnalyticsAI object with the preferred large language model. This will enable us to execute a variety of text analytics tasks.</p>

In [None]:
# Create a TextAnalyticsAI object.
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Using the default script
obj.analyze_sentiment(column='reviews', data=df_reviews, delimiter="#")

In [None]:
# Using sample_script with output_labels.
obj.analyze_sentiment(column='reviews', data=df_reviews,
output_labels={'label': str, 'score': float}, delimiter="#")

In [None]:
clean_env(llm)

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>6. Key Phrase Extraction</b></p>
<p style="font-size:16px;font-family:Arial">Next, we extract key phrases to identify recurring themes in employee responses, such as “work-life balance,” “salary growth,” or “team support.”
This helps HR quickly spot the main concerns and motivators.</p>

In [None]:
# Accessing the LLM endpoint and initializing the TeradataAI and TextAnalyticsAI
model_name = 'ml6team/keyphrase-extraction-kbir-kpcrowd'
model_args = {'transformer_class': 'AutoModelForTokenClassification',
      'task' : 'text-classification'} 
llm = TeradataAI(api_type = "hugging_face",
         model_name = model_name,
         model_args = model_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Default script is used
obj.extract_key_phrases(column="articles", data=df_articles, delimiter="#")

In [None]:
# Using a user defined script.
base_dir = os.path.dirname(teradatagenai.__file__)
extract_key_phrases_script = os.path.join(base_dir, 'example-data', 'extract_key_phrases.py')
obj.extract_key_phrases(column="articles", data=df_articles, script=extract_key_phrases_script, delimiter="#")

In [None]:
clean_env(llm)

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>7. Recongnize entities</b></p>
<p style="font-size:16px;font-family:Arial">Employees often mention departments, managers, projects, and organizations in their feedback.
By running entity recognition, we can structure unstructured text and identify these references for deeper analysis.</p>

In [None]:
# # Accessing the LLM endpoint and initializing TeradataAI and TextAnalyticsAI
model_name = 'tner/roberta-large-ontonotes5'
model_args = {'transformer_class': 'AutoModelForTokenClassification',
              'task' : 'token-classification'}
llm = TeradataAI(api_type = "hugging_face",
                 model_name = model_name,
                 model_args = model_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Default script is used
obj.recognize_entities(column='articles', data=df_articles, delimiter="#")

In [None]:
#  use user_defined script for inferencing along with returns argument 
base_dir = os.path.dirname(teradatagenai.__file__)
entity_recognition_script = os.path.join(base_dir, 'example-data', 'entity_recognition.py')
obj.recognize_entities(column='articles',
                                       returns = {"text": VARCHAR(64000),
                                                  "ORG": VARCHAR(64000),
                                                  "PERSON": VARCHAR(64000),
                                                  "DATE1": VARCHAR(64000),
                                                  "PRODUCT": VARCHAR(64000),
                                                  "GPE": VARCHAR(64000),
                                                  "EVENT": VARCHAR(64000),
                                                  "LOC": VARCHAR(64000),
                                                  "WORK_OF_ART": VARCHAR(64000)},
                                       data=df_articles,
                                       script = entity_recognition_script, delimiter="#")

In [None]:
clean_env(llm)

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>8. Language detection</b></p>
<p style="font-size:16px;font-family:Arial">Since employees may respond in multiple languages, we first detect the language of the feedback.
This ensures proper routing and translation where needed.</p>

In [None]:
# Accessing the LLM endpoint and initializing the TeradataAI and TextAnalyticsAI
# demo_env = create_env(env_name=f'{environment_name}', base_env='python_3.10', desc='BYOLLM demo env')
#demo_env = create_env(env_name=f'{environment_name}', base_env='python_3.10')
model_name = 'papluca/xlm-roberta-base-language-detection'
model_args = {'transformer_class': 'AutoModelForSequenceClassification', 'task' : 'text-classification'}
ues_args = {'env_name': f'{environment_name}'}
llm = TeradataAI(api_type = "hugging_face",
     model_name = model_name,
     model_args = model_args,
     ues_args = ues_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Default script is used
obj.detect_language(column="quotes", data=df_quotes, delimiter="#")

In [None]:
# output_labels argument is specified along with the default script
obj.detect_language(column='quotes', data=df_quotes, output_labels={'label': str, 'score': float}, delimiter="#")

In [None]:
clean_env(llm)

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>9. Text Summarization</b></p>
<p style="font-size:16px;font-family:Arial">Some employee feedback may be lengthy. Using summarization, we create concise reports that highlight the main point without losing meaning.</p>

In [None]:
# Accessing the LLM endpoint and initializing TeradataAI and TextAnalyticsAI
model_name = 'facebook/bart-large-cnn'
model_args = {'transformer_class': 'AutoModelForSeq2SeqLM', 'task' : 'summarization'}
llm = TeradataAI(api_type = "hugging_face",
model_name = model_name,
model_args = model_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Using default script
obj.summarize(column='articles', data=df_articles, quotechar="|", delimiter="#")

In [None]:
# Using a user defined script.
base_dir = os.path.dirname(teradatagenai.__file__)
summarization_script = os.path.join(base_dir, 'example-data', 'summarize_text.py')
obj.summarize(column='articles',
       returns = {"text": VARCHAR(10000),
       "summarized_text": VARCHAR(10000)},
       data=df_articles,
       script = summarization_script, quotechar="|", delimiter="#")

In [None]:
clean_env(llm)

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>10. Text Classification</b></p>
<p style="font-size:16px;font-family:Arial">To make HR analysis easier, we classify feedback into categories such as:
</p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Management</li>
    <li>Compensation & Benefits</li>
    <li>Work Culture</li>
    <li>Facilities</li>
    <li>Technology/Tools</li>
</ul>
<p style="font-size:16px;font-family:Arial">This makes it easy to route feedback to the right HR sub-team.</p>



In [None]:
# Accessing the LLM endpoint and initializing TeradataAI and TextAnalyticsAI
model_name = 'facebook/bart-large-mnli'
model_args = {'transformer_class': 'AutoModelForSequenceClassification', 'task' : 'zero-shot-classification'}
llm = TeradataAI(api_type = "hugging_face",
         model_name = model_name,
         model_args = model_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Using default script
label = ["Medical", "hospital", "healthcare", "historicalNews",
         "Environment", "technology", "Games"]
obj.classify("articles", df_classify_articles, labels=label, delimiter="#")

In [None]:
# Using a user defined script.
base_dir = os.path.dirname(teradatagenai.__file__)
classify_script = os.path.join(base_dir, 'example-data', 'classify_text.py')

obj.classify("articles",
             df_classify_articles,
             labels=["Medical", "Hospitality", "Healthcare",
                     "historical-news", "Games",
                     "Environment", "Technology",
                     "Games"], script=classify_script, delimiter="#")

In [None]:
clean_env(llm)

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>11. Language Translation</b></p>
<p style="font-size:16px;font-family:Arial">Once the language is detected, non-English feedback is translated into English so that the HR team can view all responses in one unified language.</p>

In [None]:
# Acessing the LLM endpoint and initializing TeradataAI and TextAnalyticsAI
model_name = 'Helsinki-NLP/opus-mt-en-fr'
model_args = {'transformer_class': 'AutoModelForSeq2SeqLM', 'task' : 'translation'}
ues_args = {'env_name': f'{environment_name}'}

llm = TeradataAI(api_type = "hugging_face",
         model_name = model_name,
         model_args = model_args,
         ues_args = ues_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Default script is used
obj.translate(column="quotes", data=df_quotes, target_lang="French", delimiter="#")

In [None]:
# output_labels argument is specified along with the default script
obj.translate(column="quotes", data=df_quotes, target_lang="French", output_labels={'translation_text': str}, delimiter="#")

In [None]:
clean_env(llm)

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>12. Recongnize PII</b></p>
<p style="font-size:16px;font-family:Arial">In this section, we'll delve into the <code>recognize_pii_entities()</code> function provided by TextAnalyticsAI. This function is designed to identify Personal Identifiable Information (PII) entities within text data. PII entities can include sensitive data like 'names', 'addresses', 'social security numbers', 'email addresses', 'phone numbers', etc.</p>

In [None]:
# Acessing the LLM endpoint and initializing the TeradataAI
model_name = 'lakshyakh93/deberta_finetuned_pii'
model_args = {'transformer_class': 'AutoModelForTokenClassification', 'task' : 'token-classification'}
llm = TeradataAI(api_type = "hugging_face",
         model_name = model_name,
         model_args = model_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Default script is used
obj.recognize_pii_entities(column="employee_data", data=df_employeeData, delimiter="#")

In [None]:
# Using a user defined script.
base_dir = os.path.dirname(teradatagenai.__file__)
recognize_script = os.path.join(base_dir, 'example-data', 'recognize_pii.py')
obj.recognize_pii_entities(column="employee_data", data=df_employeeData, script=recognize_script, delimiter="#")

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>13. Mask PII</b></p>
<p style="font-size:16px;font-family:Arial">In this section, we'll delve into the <code>recognize_pii_entities()</code> function provided by TextAnalyticsAI. This function is designed to identify Personal Identifiable Information (PII) entities within text data. PII entities can include sensitive data like 'names', 'addresses', 'social security numbers', 'email addresses', 'phone numbers', etc.</p>

In [None]:
# Acessing the LLM endpoint and initializing the TeradataAI
model_name = 'lakshyakh93/deberta_finetuned_pii'
model_args = {'transformer_class': 'AutoModelForTokenClassification', 'task' : 'token-classification'}
llm = TeradataAI(api_type = "hugging_face",
         model_name = model_name,
         model_args = model_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Using a user defined script.
base_dir = os.path.dirname(teradatagenai.__file__)
mask_pii_script = os.path.join(base_dir, 'example-data', 'mask_pii.py')
obj.mask_pii(column="employee_data", data=df_employeeData, script=mask_pii_script, delimiter="#")

In [None]:
clean_env(llm)

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>14. Sentence Similarity</b></p>
<p style="font-size:16px;font-family:Arial">We can check similarity between employee responses to group together feedback that talks about the same issue.
This helps HR avoid duplicate analysis and focus on unique concerns.</p>

In [None]:
# Acessing the LLM endpoint and initializing the TeradataAI and TextAnalyticsAI
model_name = 'sentence-transformers/all-MiniLM-L6-v2'
model_args = {'transformer_class': 'AutoModelForTokenClassification', 'task' : 'token-classification'}
ues_args = {'env_name': f'{environment_name}'}

llm = TeradataAI(api_type = "hugging_face",
         model_name = model_name,
         model_args = model_args,
         ues_args = ues_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Using a user-defind script
base_dir = os.path.dirname(teradatagenai.__file__)
sentence_similarity_script = os.path.join(base_dir, 'example-data', 'sentence_similarity.py')
obj.sentence_similarity(column1="employee_data", column2="articles", data=df, script=sentence_similarity_script, delimiter="#")

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>15. Embeddings</b></p>
<p style="font-size:16px;font-family:Arial">Finally, we generate vector embeddings for each feedback entry.
This enables:
</p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Semantic search (finding similar responses)</li>
    <li>Clustering of feedback by themes</li>
    <li>Feeding into downstream analytics systems</li>
</ul>


In [None]:
# Acessing the LLM endpoint and initializing TeradataAI and TextAnalyticsAI
model_name = 'sentence-transformers/all-MiniLM-L6-v2'
model_args = {'transformer_class': 'AutoModelForTokenClassification', 'task' : 'token-classification'}
llm = TeradataAI(api_type = "hugging_face",
         model_name = model_name,
         model_args = model_args)
obj = TextAnalyticsAI(llm=llm)

In [None]:
# Using a user-defined script and returns argument
embeddings_script = os.path.join(base_dir, 'example-data', 'embeddings.py')
# Construct retrun columns
returns_ = OrderedDict([('text', VARCHAR(512))])

_ = [returns_.update({"v{}".format(i+1): VARCHAR(1000)}) for i in range(384)]
obj.embeddings(column="articles",data=df, script=embeddings_script, returns=returns_, libs='sentence_transformers', delimiter='#', persist=True)

In [None]:
clean_env(llm)

<hr style="height:2px;border:none;">
<p style="font-size:20px;font-family:Arial"><b>16. Insights & Conclusion</b></p>
<p style="font-size:16px;font-family:Arial">By combining these steps, the HR team can:
</p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Measure employee sentiment trends</li>
    <li>Identify common themes and issues</li>
    <li>Summarize feedback for management reports</li>
    <li>Ensure compliance with PII regulations</li>
    <li>Support global workforce with language translation</li>
    <li>Enable semantic search & advanced analytics</li>
</ul>
<p style="font-size:16px;font-family:Arial">This end-to-end pipeline transforms unstructured employee feedback into actionable insights for HR decision-making.
</p>

<hr style='height:2px;border:none'>
<p style = 'font-size:20px;font-family:Arial'><b>17. Cleanup</b></p>
<p style = 'font-size:18px;font-family:Arial'><b>17.1 Delete your OAF Container</b></p>
<p style="font-size:16px;font-family:Arial">Executing this cell is optional. If you will be executing more OAF use cases, you can leave your OAF environment.</p>

In [None]:
#Remove your default user environment

try:
    result = remove_env(environment_name)
    print("Environment removed!")
except Exception as e:
    print("Could not remove the environment!")
    print("Error:", str(e))

<p style = 'font-size:18px;font-family:Arial'><b>17.2 Remove your database Context</b></p>
<p style="font-size:16px;font-family:Arial">Please remove your context after you've completed this notebook.

In [None]:
try:
    result = remove_context()
    print("Context removed!")
except Exception as e:
    print("Could not remove the Context!")
    print("Error:", str(e))

<hr style='height:2px;border:none'>
<p style = 'font-size:20px;font-family:Arial'><b>View the full TeradataAI Help</b></p>

In [None]:
help(TeradataAI)

In [None]:
help(TextAnalyticsAI)

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>