<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       WordEmbeddings Function in Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial'>Word embedding is the representation of a word/token in multi-dimensional space such that words/tokens with similar meanings have similar embeddings. Each word/token is mapped to a vector of real numbers that represent the word/token. The Analytics Database function WordEmbeddings produces vectors for each piece of text and can find the similarity between the texts. The options are token-embedding, doc-embedding, token2token-similarity, and doc2doc-similarity.<br> In this notebook we will see how we can use the WordEmbeddings function available in Vantage.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>

<p style = 'font-size:16px;font-family:Arial'>In the section, we import the required libraries and set environment variables and environment paths (if required).

In [None]:
from teradataml import *

# Modify the following to match the specific client environment settings
display.max_rows = 5

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>1.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=PP_WordEmbeddings_Python.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<hr style='height:1px;border:none;'>

<p style = 'font-size:18px;font-family:Arial'><b>1.2 Getting Data for This Demo</b></p>

<p style = 'font-size:16px;font-family:Arial'>Here, we will get the data which is available in the teradataml library and use the same to show the usage of the function.</p>

In [None]:
load_example_data("teradataml", ["word_embed_model","word_embed_input_table1","word_embed_input_table2"])

<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../../UseCases/run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>2. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial'>Create a "Virtual DataFrame" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>

In [None]:
model_input = DataFrame.from_table("word_embed_model")
data_input1 = DataFrame.from_table("word_embed_input_table1")
data_input2 = DataFrame.from_table("word_embed_input_table2")

In [None]:
model_input

<p style = 'font-size:16px;font-family:Arial'>Creating embedding vector from the word embedded model.<br>Detailed help can be found by passing function name to built-in help function.</p>

In [None]:
help(WordEmbeddings)

In [None]:
# Example  : Generate vectors for each words present in column 'doc1'
#             using the word embedding model and 'token-embedding' operation.
#             Each word is assigned with vectors and closer the distance
#             between two vectors greater the similarity between two words.
WordEmbeddings_out1 = WordEmbeddings(data = data_input1,
                                     model = model_input,
                                     id_column = "doc_id",
                                     model_text_column = "token",
                                     model_vector_columns = ["v1", "v2", "v3", "v4"],
                                     primary_column = "doc1",
                                     operation = "token-embedding"
                                     )
# Print the result DataFrame.
WordEmbeddings_out1.result

In [None]:
# Example  : Find the similarity between two columns 'token1' and 'token2' in 'data_input2',
#             using the word embedding model and 'token2token-similarity' operation.
WordEmbeddings_out2 = WordEmbeddings(data = data_input2,
                                     model = model_input,
                                     id_column = "token_id",
                                     model_text_column = "token",
                                     model_vector_columns = ["v1", "v2", "v3", "v4"],
                                     primary_column = "token1",
                                     secondary_column = "token2",
                                     operation = "token2token-similarity",
                                     accumulate = ["token1", "token2"]
                                     )

# Print the result DataFrame.
WordEmbeddings_out2.result

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>3. Cleanup</b>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
db_drop_table("word_embed_model")

In [None]:
db_drop_table("word_embed_input_table1")

In [None]:
db_drop_table("word_embed_input_table2")

In [None]:
remove_context()

<hr style="height:1px;border:none;">
<p style = 'font-size:16px;font-family:Arial'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>WordEmbeddings function reference: <a href = 'https://docs.teradata.com/search/all?query=WordEmbeddings&content-lang=en-US'>here</a></li>
</ul>

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>