### SCRIPT Table Operator: Sandbox And Interaction

Example use case:
* Clustering analysis of a set of observations by using a K-Means algorithm.

Example goals:
* Using the STO Sandbox environment to test Python scripts locally on client before installing them on Vantage.
* Executing scripts with the SCRIPT Table Operator (STO) through the teradataml STO Script() API.

Files needed:
* "ex2p.py" : Python script that performs the analysis.
* "ex2dataTbl.csv" : Data file to create the Database table for the example.
* "ex2data.csv" : Data file without header row for script execution in the STO Sandbox.

Notes:
* SCRIPT Table Operator (STO) **must be enabled** in your target Advanced SQL Engine, and the Teradata Packages for In-nodes Analytics must be installed on its nodes. Specifically, the required packages for Python are **teradata-python** and **teradata-python-addons**
* This notebook utilizes several Python packages in addition to **teradataml** which you may need to install on your client.


Notebook workflow:
1. Setup environment by loading STO Sandbox docker image from local path on client.
2. Test-run user script inside docker container by reading input data from file.
3. Install script on Vantage.
4. User teradataml's Script wrapper function to execute script on Advanced SQL Engine target server.

#### Import Statements

In [None]:
import os
import json
import unittest
import pytest
import getpass
import pandas as pd
from teradataml import create_context, DataFrame, get_context, copy_to_sql, in_schema
from teradataml.context.context import *
from teradataml.dataframe.dataframe import DataFrame
from teradataml.options.display import display
from teradataml.table_operators.Script import Script
from teradatasqlalchemy import (VARCHAR, INTEGER, FLOAT)
from teradataml.table_operators.sandbox_container_util import *
from teradataml.options.configure import configure

#### Create context

In [None]:
# Specify a Vantage system to connect to. Specify default database accordingly, or delete the argument.
host = input("Host: ")
username = input("Username: ")
password = getpass.getpass()
# Specify a database name and the database argument, if desired to connect to another than the default.
database = "xxxxx"
con = create_context(host = host, username = username, password = password, database = database)

In [None]:
# Specify the target folder on client where necessary files are kept for this example.
#
path_to_files = "xxxxx"
# Request to print the SQL submitted to the Advanced SQL Engine
#
display.print_sqlmr_query = True

#### Prepare for present example: Create database table

In [None]:
# Create the database table to use in this example. Use the data file provided.
# Ensure path to the data file is correct in the following statement before running it.
#
dataForTable = pd.read_csv(path_to_files + "ex2dataTbl.csv")
copy_to_sql(dataForTable, table_name="ex2tbl", if_exists="replace")

#### Necessary database set-up to run the SCRIPT Table Operator

In [None]:
# Set session SEARCHUIFDBPATH to the database where the input table is present 
# and necessary script files will be installed.
#
con.execute("SET SESSION SEARCHUIFDBPATH = XXXXXX;")

#### Create teradataml DataFrame from Database table

In [None]:
ex2tbl = DataFrame.from_table("ex2tbl")
ex2tbl.to_pandas().head(n=5)

#### Create Script object

* Ensure the "files_local_path" specifies the correct path to your script on your client.
* Suitably specify the SEARCHUIFDBPATH database name in the "script_command" argument.

In [None]:
sto = Script(data = ex2tbl,
             script_name = "ex2p.py",
             files_local_path = path_to_files, 
             script_command = "python3 ./XXXXXX/ex2p.py 7",
             data_partition_column = "ObsGroup",
             data_order_column = "ObsID",
             delimiter = ',',
             returns = { "oc1": INTEGER(), "oc2": INTEGER(), "oc3": INTEGER(), "oc4": FLOAT(),
                         "oc5": FLOAT(), "oc6": FLOAT(), "oc7": FLOAT(), "oc8": FLOAT() }
            )

#### Setup STO Sandbox environment by loading image from specified location

In [None]:
# Currently, when specifying a sandbox by image location, then the "sandbox_image_name"
# must be specified, too, and must be "stosandbox:1.0". Image loading can take a few minutes.
# Current images are available at downloads.teradata.com.
#
setup_sandbox_env(sandbox_image_location = path_to_files + "sto_sandbox_Python3.7.7_sles12sp3.0.5.4_docker_image.1.0.0.tar.gz",
                  sandbox_image_name = "stosandbox:1.0")

In [None]:
configure.sandbox_container_id

#### Run user script in the Sandbox

Use input data from a file. This is what happens internally when user calls test_script():
1. A container on the docker image that was loaded in previous step is started.
2. The input_data_file (specified in test_script() function) and user script (mentioned in script_local_path while creating Script object) are copied to the container.
3. User script is executed inside the container by using the script_command specified in the Script() object.

In [None]:
# Ensure path to the input_data_file is correct in the following statement before running it.
# The data file for the STO Sandbox has no header row, which is as expected by Python script.
#
testOut = sto.test_script(input_data_file = path_to_files + "ex2data.csv", 
                          script_args='7'
                         )

In [None]:
testOut.head(n = 5)

#### Clean-up Sandbox environment

In [None]:
cleanup_sandbox_env()

#### Install file 

In [None]:
# If older file version has been previously installed, then remove it first to replace.
# If file does not exist in Database, then an error will be produced by the following statement.
#
sto.remove_file(file_identifier='ex2p', force_remove=True)

In [None]:
sto.install_file(file_identifier='ex2p', file_name='ex2p.py', is_binary=False)

#### Run user script via Script Table Operator

In [None]:
sto.execute_script()

#### Cleanup

In [None]:
sto.remove_file('ex2p',True)

In [None]:
remove_context()