<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       KNN (K Nearest Neighbor) function in Vantage
  <br>
       <img id="teradata-logo" src="https://storage.googleapis.com/clearscape_analytics_demo_data/DEMO_Logo/teradata.svg" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>

<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Introduction</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>K-nearest Neighbors (k-NN) is a supervised learning technique that predicts the test data by computing nearest neighbors from training data based on a similarity (distance) metric. The algorithm does not construct a model from the training set, instead, it predicts the test data directly based on similarity with training data.<br>In this notebook we will see how we can use the KNN function available in Vantage.</p>

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>1. Initiate a connection to Vantage</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>In the section, we import the required libraries and set environment variables and environment paths (if required).

In [None]:
from teradataml import *

# Modify the following to match the specific client environment settings
display.max_rows = 5

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.1 Connect to Vantage</b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>You will be prompted to provide the password. Enter your password, press the Enter key, and then use the down arrow to go to the next cell.</p>

In [None]:
%run -i ../../UseCases/startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)

In [None]:
%%capture
execute_sql('''SET query_band='DEMO=PP_KNN_Python.ipynb;' UPDATE FOR SESSION; ''')

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Begin running steps with Shift + Enter keys. </p>

<hr style='height:1px;border:none;background-color:#00233C;'>

<p style = 'font-size:18px;font-family:Arial;color:#00233c'><b>1.2 Getting Data for This Demo</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Here, we will get the time series data which is available in the teradataml library and use the same to show the usage of the function.</p>

In [None]:
load_example_data("knn", ["computers_train1_clustered", "computers_test1"])

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>2. Data Exploration</b>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Create a "Virtual DataFrame" that points to the data set in Vantage. Check the shape of the dataframe as check the datatype of all the columns of the dataframe.</p>

In [None]:
# Create teradataml DataFrame objects.
computers_test = DataFrame.from_table("computers_test1")
computers_train = DataFrame.from_table("computers_train1_clustered")

In [None]:
computers_train

In [None]:
computers_train.shape

In [None]:
computers_train.select(['id','computer_category']).groupby('computer_category').count()

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Detailed help can be found by passing function name to built-in help function. </p>

In [None]:
help(KNN)

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>From out data, let us map the test computer data to "special" category. As a first step we will encode the computer_category to special and others.</p>

In [None]:
# Generate fit object for column "computer_category".
fit_obj = OneHotEncodingFit(data=computers_train,
                             is_input_dense=True,
                             target_column="computer_category",
                             categorical_values=["ultra", "special"],
                             other_column="other")
 
 
 # Encode "ultra" and "special" values of column "computer_category".
computers_train_encoded = OneHotEncodingTransform(data=computers_train,
                                                    object=fit_obj.result,
                                                    is_input_dense=True)

In [None]:
KNN_out = KNN(train_data = computers_train_encoded.result,
               test_data = computers_test,
               k = 50,
               response_column = "computer_category_special",
               id_column="id",
               output_prob=False,
               input_columns = ["price", "speed", "hd", "ram", "screen"],
               voting_weight = 1.0,
               emit_distances=False)
 
# Print the result DataFrame.
KNN_out.result

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>To find out the distance from nearest neighbours we can use the below code. </p>

In [None]:
#Get the distance of 10 nearest neighbours based on "price", "speed" and "hd".
KNN_out_2 = KNN(train_data = computers_train_encoded.result,
               test_data = computers_test,
               k=10,
               model_type="neighbors",
               id_column="id",
               input_columns = ["price", "speed", "hd"],
               emit_distances=True,
               emit_neighbors=True)
 
 # Print the result DataFrame.
KNN_out_2.result

<hr style="height:2px;border:none;background-color:#00233C;">
<b style = 'font-size:20px;font-family:Arial;color:#00233C'>3. Cleanup</b>

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:18px;font-family:Arial;color:#00233C'> <b>Databases and Tables </b></p>
<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following code will clean up tables created above.</p>

In [None]:
db_drop_table("computers_test1")

In [None]:
db_drop_table("computers_train1_clustered")

In [None]:
remove_context()

<hr style="height:1px;border:none;background-color:#00233C;">
<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Links:</b></p>
<ul style = 'font-size:16px;font-family:Arial'>
    <li>Teradataml Python reference: <a href = 'https://docs.teradata.com/search/all?query=Python+Package+User+Guide&content-lang=en-US'>here</a></li>
    <li>KNN function reference: <a href = 'https://docs.teradata.com/search/all?query=KNN&content-lang=en-US'>here</a></li>
</ul>

<footer style="padding-bottom:35px; background:#f9f9f9; border-bottom:3px solid #00233C">
    <div style="float:left;margin-top:14px">ClearScape Analytics™</div>
    <div style="float:right;">
        <div style="float:left; margin-top:14px">
            Copyright © Teradata Corporation - 2025. All Rights Reserved
        </div>
    </div>
</footer>