<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       KMeans and KMeansPredict function in Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial'> The K-means() function groups a set of observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid). KMeansPredict uses the k-means algorithm to predict the target class of unseen or new data. In this notebook we will see how we can use the KMeans and KMeansPredict function available in Vantage.</p>

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>1. Initiate a connection to Vantage</b>

<p style = 'font-size:16px;font-family:Arial'>In the section, we import the required libraries and set environment variables and environment paths (if required).

In [None]:
from teradataml import *

# Modify the following to match the specific client environment settings
display.max_rows = 5

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>1.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=PP_KMeans_KMeansPredict_Python.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial'>Begin running steps with Shift + Enter keys. </p>

<hr style='height:1px;border:none;'>

<p style = 'font-size:18px;font-family:Arial'><b>1.2 Getting Data for This Demo</b></p>

<p style = 'font-size:16px;font-family:Arial'>Here, we will get the data which is available in the teradataml library and use the same to show the usage of the function.</p>

In [None]:
load_example_data("kmeans", "computers_train1")

<p style = 'font-size:16px;font-family:Arial'>Next is an optional step – if you want to see the status of databases/tables created and space used.</p>

In [None]:
%run -i ../../UseCases/run_procedure.py "call space_report();"        # Takes 10 seconds

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>2. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial'>Create a "Virtual DataFrame" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>

In [None]:
# Create teradataml DataFrame objects.
computers_train = DataFrame.from_table("computers_train1")

In [None]:
computers_train

In [None]:
computers_train.shape

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>2.1 KMeans Function</b></p>

<p style = 'font-size:16px;font-family:Arial'>We want to divide our data into two clusters, we will use KMeans function for this.<br>
Detailed help can be found by passing function name to built-in help function. </p>

In [None]:
help(KMeans)

In [None]:
KMeans_out = KMeans(id_column="id",
                     target_columns=['price', 'speed'],
                     data=computers_train,
                     num_clusters=2)
# Print the result DataFrame
KMeans_out.result

<p style = 'font-size:16px;font-family:Arial'>We can also specify initial centroid information instead of number of clusters.</p>

In [None]:
kmeans_initial_centroids_table = computers_train.loc[[19, 97]]

In [None]:
kmeans_initial_centroids_table

In [None]:
KMeans_out_1 = KMeans(id_column="id",
                       target_columns=['price', 'speed'],
                       data=computers_train,
                       centroids_data=kmeans_initial_centroids_table)
 
 # Print the result DataFrames.
KMeans_out_1.result

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'><b>2.2 KMeansPredict Function</b></p>

<p style = 'font-size:16px;font-family:Arial'>We can assign the input data points to the cluster centroid using the model generated by the KMeans() function in KMeansPredict() function.<br> Detailed help can be found by passing function name to built-in help function.</p>


In [None]:
help(KMeansPredict)

In [None]:
KMeansPredict_out = KMeansPredict(data=computers_train,
                                  object=KMeans_out.result
                                   )
      
# Print the result DataFrames.
KMeansPredict_out.result

In [None]:
KMeansPredict_out_1 = KMeansPredict(data=computers_train,
                                   object=KMeans_out_1,
                                   accumulate=["ram","price","speed"],
                                   output_distance=True
                                   )
      
# Print the result DataFrames.
KMeansPredict_out_1.result

<hr style="height:2px;border:none;">
<b style = 'font-size:20px;font-family:Arial'>3. Cleanup</b>

<hr style="height:1px;border:none;">
<p style = 'font-size:18px;font-family:Arial'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial'>The following code will clean up tables and databases created above.</p>

In [None]:
db_drop_table("computers_train1")

In [None]:
remove_context()

<hr style="height:1px;border:none;">

<p style = 'font-size:16px;font-family:Arial'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>KMeans function reference: <a href = 'https://docs.teradata.com/search/all?query=kmeans&content-lang=en-US'>here</a></li>
    <li>KMeansPredict function reference: <a href = 'https://docs.teradata.com/search/all?query=kmeanspredict&content-lang=en-US'>here</a></li>
</ul>

<footer style="padding-bottom:35px; border-bottom:3px solid #91A0Ab">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>