<header>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Accelerate the development and deployment of real-world AI-powered analytics with Teradata VantageCloud and open-source language models
  <br>
       <img id="teradata-logo" src="../../../images/TeradataLogo.png" alt="Teradata" style="width: 125px; height: auto; margin-top: 20pt;">
    </p>
</header>
<hr>

<b style = 'font-size:28px;font-family:Arial;color:#00233C'>Demonstrations Overview</b>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following demonstrations illustrate the end-to-end process of how organizations can utilize VantageCloud Lake <b>GPU-enabled Analytic Cluster</b> architecture to run open-source large language models at massive parallelism and scale, and then leverage these next-generation capabilities in ad-hoc analytics, development, and operational processing.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>This notebook contains several high-level demonstrations that can be run together or individually, and are designed to illustrate;</p>
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Container Management</b>.  Administrators can create and manage <b>secure, custom</b> runtime containers that will host any number of models and model artifacts to unlock GPU-augmented analytics</li>
    <li><b>Use-case Development</b>. Developers and data scientists can use familiar tools and techniques to develop and test analytic processing that combinines traditional analytics and data processing with open-source Hugging Face models <b>at scale</b>.  The use case illustrated here is a simple <b>Vector Embedding</b> of retail customer comments</li>
    <li><b>Operationalization</b>. Deploy the AI-augmented pipeline <b>operationally</b>; unlocking the power of these advanced techniques to the broadest set of tools, applications, and consumers</li>
    </ol>

<b style = 'font-size:28px;font-family:Arial;color:#00233C'>End-to-End workflow</b>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The high level process is as follows:</p>

<table style = 'width:100%;table-layout:fixed;font-family:Arial;color:#00233C'>
    <tr><td style = 'vertical-align:top' width = '40%'>
            <ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
                <li>The Data Scientist conducts analytics activities using his or her own python tools and packages of choice, then connects to VantageCloud Lake through teradataml client library and teradatasql python driver.</li>
                <br>      
                <li>Teradataml provides APIs to create and manage custom runtime environments; including custom libraries, dependencies, model artifacts, and scoring scripts.  The user can leverage these APIs to create one or many custom, dedicated environments to host their code.</li>
                <br>
                <li>The Data Scientist will then execute their pipeline that will;
                    <ul style = 'width:100%;table-layout:fixed;font-family:Arial;color:#00233C'><li>Call ClearScape Analytics functions on Compute Clusters (data prep, transformation, etc.)</li>
                        <li>Prepared data is passed to the python container running in parallel on cluster nodes.</li>
                        <li>Results (inference/predictions) are returned as "virtual" dataframes; where the data resides <b>in Vantage</b></li>
                    </ul></li>
                <br>
                <li>Worfklow can be operationalized using SQL, and results can be returned to common BI tools, persisted as part of an ETL process, or embedded in application code</li>
            </ol>
        </td><td><img src = 'images/BYOLLM_Overview.png' width = '600'></td></tr>
</table>
<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Python Package Installation</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>If necessary, install required client packages for the demonstrations.  User may need to restart the Jupyter kernel after installation.</p> 

In [None]:
%pip install -f requirements.txt

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Python Package Imports</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Standard practice to import required packages and libraries; execute this cell to import packages for Teradata automation as well as machine learning, analytics, utility, and data management packages.</p> 

In [2]:
from teradataml import *
from oaf_utils import *
from teradatasqlalchemy.types import *
from time import sleep
import pandas as pd
import csv, sys, os, warnings
from os.path import expanduser
from collections import OrderedDict

from IPython.display import clear_output , display as ipydisplay
import matplotlib.pyplot as plt
from itables import init_notebook_mode
import itables.options as opt

# Set display options for dataframes, plots, and warnings

opt.style="table-layout:auto;width:auto;float:left"
opt.columnDefs = [{"className": "dt-left", "targets": "_all"}]
init_notebook_mode(all_interactive=True)
%matplotlib inline
warnings.filterwarnings('ignore')
display.suppress_vantage_runtime_warnings = True

# load vars json
with open('vars_gpu.json', 'r') as f:
    session_vars = json.load(f)

# Database login information
host = session_vars['environment']['host']
username = session_vars['hierarchy']['users']['business_users'][1]['username']
password = session_vars['hierarchy']['users']['business_users'][1]['password']

# UES Authentication information
ues_url = session_vars['environment']['UES_URI']
configure.ues_url = ues_url
pat_token = session_vars['hierarchy']['users']['business_users'][1]['pat_token']
pem_file = session_vars['hierarchy']['users']['business_users'][1]['key_file']


compute_group = session_vars['hierarchy']['users']['business_users'][1]['compute_group']


# get the current python version to match deploy a custom container
python_version = str(sys.version_info[0]) + '.' + str(sys.version_info[1])
print(f'Using Python version {python_version} for user environment')


# Hugging Face model for the demo
model_name = 'sentence-transformers/all-MiniLM-L6-v2'

# a list of required packages to install in the custom OAF container
# modify this if using different models or design patterns
pkgs = ['transformers',
        'torch',
        'sentencepiece',
        'pandas',
        'sentence-transformers',
        'dill']

# container name - set here for easier notebook navigation
### User will also be asked to change it ###
oaf_name = 'oaf_demo_3'
###########################

Using Python version 3.8 for user environment


<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Connect to the database</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>After connecting, check cluster status. Start it if necessary - note the cluster only needs to be running to execute the APPLY sections of the demo.</p> 

In [3]:
# check for existing connection
eng = check_and_connect(host=host, username=username, password=password, compute_group = compute_group)
print(eng)

# check cluster status
res = check_cluster_start(compute_group = compute_group)

Engine(teradatasql://data_scientist:***@50.112.44.50)


ComputeProfileName,InstanceName,ComputeGroupName,ComputeMapName,ComputeInstanceType,CurrentState,LastReqState,LastStartTime,LastEndTime
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?),,,,,,,,


GPU Cluster Available


<hr>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Demo 1 - Container Management
  <br>
    </p>



<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The Teradata Vantage Python Client Library provides simple, powerful methods for the creation and maintenance of custom Python runtime environments <b>in the VantageCloud environment</b> .  This allows practitioners complete control over the behavior and quality of their model performance and analytic accuracy running on the Analytic Cluster.</p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Custom environments are persistent.</b> Users only need to create these once and then can be saved, updated, or modified only as needed.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Container Management Process</b></p>
<table style = 'width:100%;table-layout:fixed;'>
    <tr>
        <td style = 'vertical-align:top' width = '40%'>
            <ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
                <li>Create a unique User Environment based on available base images</li>
                <br>
                <li>Install libraries</li>
                <br>
                <li>Install models and additional user artifacts</li>
            </ol>
        </td>
        <td><img src = 'images/OAF_Env.png' width = '600'></td>
    </tr>
</table>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>1.  Connect to the Environment Service</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>To better support integration with Cloud Services and commong automation tools; the <b > User Environment Service</b> is accessed via RESTful APIs.  These APIs can be called directly or in the examples shown below that leverage the Python Package for Teradata (teradataml) methods.</p> 

In [4]:
# check to see if there is a valid UES auth
# if not, authenticate
try:
    demo_env = get_env(oaf_name)
    print('Existing valid UES token')

except Exception as e:
    if '''NoneType' object has no attribute 'value''' in str(e):
        if set_auth_token(ues_url = ues_url, username = username, pat_token = pat_token, pem_file = pem_file):
            print('UES Authentication successful')
        else:
            print('UES Authentication failed, check URL and account info')
        pass
    else:
        raise
    

UES Authentication successful


<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.  Create a Custom Container in Vantage</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>If desired, the user can create a <b>new</b> custom environment by starting with a "base" image and customizing it.  The steps are:</p> 
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>List the available "base" images the system supports</li>
    <li>List any existing "custom" environments the user has created</li>
    <li>If there are no custom environments, then create a new one from a base image</li>
    </ul>

In [5]:
# Create a new environment, or connect to an existing one


try:
    ipydisplay(list_user_envs())
except Exception as e:
    if str(e).find('No user environments found') > 0:
        print('No user environments found')
        pass
    else:
        raise

print('Use an existing environment, or create a new one:')
print(f'OAF Environment is set to {oaf_name}.')
print('Enter to accept, or input a new value.')
print('If the environment is not in the list, an new one will be created')
i = input()
if len(i) != 0:
    oaf_name = i
    print(f'OAF Environment is now {oaf_name}')

try:
    demo_env = create_env(env_name = oaf_name,
                      base_env = f'python_{python_version}',
                      desc = 'OAF Demo env for LLM')
except Exception as e:
    if str(e).find('same name already exists') > 0:
        print('Environment already exists, obtaining a reference to it')
        demo_env = get_env(oaf_name)
        pass
    elif 'Invalid value for base environment name' in str(e):
        print('Unsupported base environment version, using defaults')
        demo_env = create_env(env_name = oaf_name,
                      desc = 'OAF Demo env for LLM')
    else:
        raise

# Note create_env seems to be asynchronous - sleep a bit for it to register
sleep(5)

try:
    ipydisplay(list_user_envs())
except Exception as e:
    if str(e).find('No user environments found') > 0:
        print('No user environments found')
        pass
    else:
        raise

env_name,env_description,base_env_name,language,conda
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?),,,,


Use an existing environment, or create a new one:
OAF Environment is set to oaf_demo_3.
Enter to accept, or input a new value.
If the environment is not in the list, an new one will be created


 


Environment already exists, obtaining a reference to it


env_name,env_description,base_env_name,language,conda
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?),,,,


<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>3.  Install Dependencies</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The second step in the customization process is to install Python package dependencies. This demonstration uses the Hugging Face <a href = 'https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2'>all-MiniLM-L6-v2</a> Sentence Transformer.  Since VantageCloud Lake Analytic Clusters are secured by default against unauthorized access to the outside network, the user can load the required libraries and model using teradataml methods:
</p> 

<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>List the currently installed models and python libraries</li>
    <li><b>If necessary</b>, install any required packages</li>
    <li><b>If necessary</b>, install the pre-trained model.  This process takes several steps;
        <ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
            <li>Import and download the model</li>
            <li>Create a zip archive of the model artifacts</li>
            <li>Call the install_model() method to load the model to the container</li>
        </ol></li>
    </ul>

In [6]:
ipydisplay(demo_env.models)

# just showing a sample here - remove .head(5) to see them all
ipydisplay(demo_env.libs.head(5))

Model,Size,Timestamp
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?),,


name,version
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?),


<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>A note on package versions</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The next demonstration makes use of the DataFrame apply() method, which automatically passes the python code to the Analytic Cluster.  As such, one needs to ensue the python package versions match.  dill and pandas are required, as is any additional libraries for the use case.
</p> 

<p style = 'font-size:16px;font-family:Arial;color:#00233C'><b>Note</b> while not required for many OAF use cases, for this demo the required packages for the model execution must be installed in the local environment first.</p>

In [7]:
# Install any Python add-ons needed by the script in the user environment
# Using option asynchronous=True for an asychronous execution of the statement.
# Note: Avoid asynchronous installation when batch-executing all notebook statements,
#       as execution will continue even without installation being complete.
#
# Can install by passing a list of packages/versions
#   Or 
# install using a requirements.txt file.

# For this demo, 
# this code block will collect the current user's package versions
# for installation into the container
# when using dataframe.apply(), it pandas and dill are required
# to reduce issues, match the version between client and container

# import these functions inside of a function namespace
def get_versions(pkgs):
    local_v_pkgs = []
    for p in pkgs:

        #fix up any hyphened package names
        p_fixed = p.replace('-', '_')

        #import the packages and append the strings to the list
        exec(f'''import {p_fixed}; local_v_pkgs.append('{p}==' + str({p_fixed}.__version__))''')
    return local_v_pkgs

v_pkgs = get_versions(pkgs)



# check to see if these packages need to be installed
# by comparing the len of the intersection of the list of required packages with the installed ones
if not len(set([x.split('==')[0] for x in pkgs]).intersection(demo_env.libs['name'].to_list())) == len(pkgs):
    
    # pass the list of packages - split off any extra info from the version property e.g., plus sign
    claim_id = demo_env.install_lib([x.split('+')[0] for x in v_pkgs], asynchronous=True)
else:
    print(f'All required packages are installed in the {oaf_name} environment')

All required packages are installed in the oaf_demo_3 environment


<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Monitor library installation status</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Optionally - users can monitor the library installation status using the cell below:
</p> 

In [None]:
# Check the status of installation using status() API.
# Create a loop here for demo purposes
try: 
    claim_id
    ipydisplay(demo_env.status(claim_id))
    stage = demo_env.status(claim_id)['Stage'].iloc[-1]
    while stage == 'Started':
        stage = demo_env.status(claim_id)['Stage'].iloc[-1]
        clear_output()
        ipydisplay(demo_env.status(claim_id))
        sleep(5)
except NameError:
    print('No installations to monitor')

    
# Verify the Python libraries have been installed correctly.
ipydisplay(demo_env.libs)

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Download and install model</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Open Analytics Framework containers do not have open access to the external network, which contributes to a very secure runtime environment.  As such, users will load pre-trained models using the below APIs.  For illustration purposes, the following code will check to see if the model archive exists locally and if it doesn't, will import and download it by creating a model object.  The archive will then be created and installed into the remote environment.
</p> 

In [8]:
# check to see if the model needs to be downloaded/archived

# construct the file name for the model:
model_fname = 'models--' + model_name.replace('/', '--')

if not os.path.isfile(f'{model_fname}.zip'):

    from sentence_transformers import SentenceTransformer
    import shutil
    print('Creating Model Archive...')

    model = SentenceTransformer(model_name)
    shutil.make_archive(model_fname, 
                        format='zip', 
                        root_dir=f'{expanduser("~")}/.cache/huggingface/hub/{model_fname}/')
else:
    print('Local model archive exists.')

# check to see if the model is already installed
try:
    if demo_env.models.empty: # no models installed at all
        print('Installing Model...')
        claim_id = demo_env.install_model(model_path = f'{model_fname}.zip', asynchronous = True)
    elif not any(model_fname in x for x in demo_env.models['Model']): #see if model is there
        print('Installing Model...')
        claim_id = demo_env.install_model(model_path = f'{model_fname}.zip', asynchronous = True)
    else:
        print('Model already installed')
except Exception as e:
    if '''NoneType' object has no attribute 'empty''' in str(e):
        print('Installing Model...')
        claim_id = demo_env.install_model(model_path = f'{model_fname}.zip', asynchronous = True)
        pass
    else:
        raise

Local model archive exists.
Model already installed


<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Monitor model installation status</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Optionally - users can monitor the model installation status using the cell below:
</p> 

In [None]:
# Check the status of installation using status() API.
# Create a loop here for demo purposes
try: 
    claim_id
    ipydisplay(demo_env.status(claim_id))
    stage = demo_env.status(claim_id)['Stage'].iloc[-1]
    while stage != 'File Installed':
        stage = demo_env.status(claim_id)['Stage'].iloc[-1]
        clear_output()
        ipydisplay(demo_env.status(claim_id))
        sleep(5)
except NameError:
    print('No installations to monitor')

    
# Verify the model has been installed correctly.
demo_env.refresh()
ipydisplay(demo_env.models)

<hr>
<p style = 'font-size:24px;font-family:Arial;color:#00233C'><b>Conclusion - Environment Management</b></p>



<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The preceding demo showed how users can perform a <b>one-time</b> configuration task to prepare a custom environment for analytic processing at scale.  Once this configuration is complete, these containers can be re-used in ad-hoc development tasks, or used for operationalizing analytics in production.</p>

<hr>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Demo 2 - Developing a massively-scalable AI-powered analytic pipeline
  <br>
    </p>


<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following demonstration will illustrate how simple it is to utilize common Python design patterns to create a vector embedding function.  This function can then be applied <b>directly</b> to the data in Vantage to run at scale on a GPU-enabled analytic cluster.</p>


<table style = 'width:100%;table-layout:fixed;'>
    <tr>
        <td style = 'vertical-align:top' width = '30%'>
            <ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
                <li>Write a <b>local</b> function that leverages a pre-trained Language Model to create vector embeddings</li>
                <br>
                <li>Push this processing to the <b>GPU Analytic Cluster</b> for processing at scale</li>
                <br>
            </ol>
        </td>
        <td style = 'vertical-align:top'><img src = 'images/local_remote_functions.png'></td>
    </tr>
</table>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Check connection</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Check database and UES connection</p> 

In [9]:
# check for existing connection
eng = check_and_connect(host=host, username=username, password=password, compute_group = compute_group)
print(eng)
    

# check to see if there is a valid UES auth
# if not, authenticate
try:
    demo_env = get_env(oaf_name)

except Exception as e:
    if '''NoneType' object has no attribute 'value''' in str(e): #UES auth expired/required
        if set_auth_token(ues_url = ues_url, username = username, pat_token = pat_token, pem_file = pem_file):
            print('UES Authentication successful')
            try:
                demo_env = get_env(oaf_name)
                pass
            except Exception as l:
                if f'''User environment '{oaf_name}' not found''' in str(l):
                    print('User environment not found')
                    pass
                else:
                    raise
        else:
            print('UES Authentication failed, check URL and account info')
        pass
    elif f'''User environment '{oaf_name}' not found''' in str(e):
        print('User environment not found')
        pass
    else:
        raise


# check cluster status
check_cluster_start(compute_group = compute_group)

Engine(teradatasql://data_scientist:***@50.112.44.50)
UES Authentication successful


ComputeProfileName,InstanceName,ComputeGroupName,ComputeMapName,ComputeInstanceType,CurrentState,LastReqState,LastStartTime,LastEndTime
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?),,,,,,,,


GPU Cluster Available


True

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Step 1.  Create a client-side embedding function</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The goal of this exercise is to create a client-side function which can be "pushed" to the analytic cluster for processing at scale.  There are a few minor enhancements here to improve performance and usability in this remote environment:</p> 
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Imports</b>.  By default, the teradataml python client will package the code, objects, and dependencies and serialize it to the analytic cluster.  Users can make this more efficient by staging these objects before hand (done in the first demo of this notebook).  <b>Important</b> - place the larger libraries (sentence_transformers and torch) inside the function so they won't be registered as new dependencies that need to be installed.</li>
    <li><b>GPU Drivers</b>.  Since this function will run both on the client (CPU) and cluster (GPU), place a conditional to check</li>
    </ul>


In [10]:

def create_embeddings(row):
    from sentence_transformers import SentenceTransformer
    import torch
    
    model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
    
    # check for NVIDIA drivers
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    # create series from the embeddings list that is returned
    s = pd.Series(model.encode(row['comment_text'], device = device))
    
    #concat them together
    row = pd.concat([row, s], axis = 0)
    
    return row

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Call the function on local data</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Python pandas provides an apply() method to execute the function across all rows in the DataFrame</p>

In [11]:
df = pd.DataFrame({'comment_id':[1,2], 'comment_text':['hello world', 'hello back']})

df.apply(create_embeddings, axis = 1)


comment_id,comment_text,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


<hr>
<p style = 'font-size:20px;font-family:Arial;color:#00233C'><b>Step 2. Push this function to the analytic cluster</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Similar to pandas, the teradataml package provides an apply() method which is called in a similar manner, except it runs <b>in parallel on</b> the cluster, leveraging the GPU infrastructure and the MPP processing capabilities of Vantage.</p>

    

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.1 - Inspect the Data</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Simple DataFrame methods to show the data.  A teradataml DataFrame behaves like a normal pandas DataFrame, with one significant difference in that it is a reference to data on the analytic database.  This allows developers to perform familiar data mangement operations on extremely large data sets as if the data is local.</p>

In [12]:
tdf_comments = DataFrame('"demo_ofs"."web_comment"')

ipydisplay(tdf_comments.sample(2))

comment_id,customer_id,comment_text,comment_summary,sampleid
21200,854,This really is a beautiful blouse. it's much prettier in person than in the photo. the embroidery is delicate and the blouse is feminine. is has a nice drape that isn't boxy or flowy. my only complaint is that it gaps widely right at the bust. and i am not busty at all. there is no pulling it together and smoothing it out. it is just too tight and it has to be returned. boooo,,1
21530,529,"These shorts are super soft and comfortable. i am normally a size 27 or a 2 in shorts. i got a size 26 in these shorts and they are still large. i exchanged them for a 25 and they fit perfectly. so these run 1-2 sizes large. other than sizing, these shorts are perfectly comfortable and functional.",Perfect summer comfy short!,1


<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.2 - Prepare to execute the apply method</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>When using the teradataml apply method,there are some differences in how data is passed in and out of the multiple runtime containers on the distributed nodes.  This offers a great deal of processing power, but also requires some additional considerations when calling the method:</p>

<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Optional Return Types</b>.  If the data being returned by the function has different columns/data types than the input DataFrame, the user passes these as dictionary value to the keyword argument</li>
    <li><b>Data Preparation</b>.  Of course, data preparation and cleansing can be performed inside the function. However, the unique architecture of the <b>Teradata Vantage</b> analytic database, is that users can execute powerful native data preparation functions that will operate at extreme scale and performance</li>
    </ol>

In [13]:

types_dict = OrderedDict({})
types_dict['comment_id'] = BIGINT()
types_dict['comment_text'] = VARCHAR(1000)

for i in range(384):
    key = '"' + str(i) + '"'
    types_dict[key] = FLOAT()

tdf = DataFrame.from_query('''SELECT TOP 100 comment_id,
    CASE 
        WHEN comment_text IS NULL THEN ' '
        ELSE OREPLACE(OREPLACE(OREPLACE(OREPLACE(OREPLACE(comment_text , X'0d' , ' ') , X'0a' , ' ') , X'09', ' '), ',', ' '), '"', ' ')
    END comment_text 
    FROM demo_ofs.web_comment WHERE comment_text <> '';''')

tdf.sample(2)

comment_id,comment_text,sampleid
8247,Very cozy lounge tee but runs large. i ordered a small rather than my usual medium and could of probably went down two sizes.,1
9715,So this shirt is adorable. soft fun lays nicely... but you should have your arms amputated and replaced with a barbie's before you put it on. i don't even have large arms the medium fit me perfectly but then like i hulked out the arms and there are now little tears at the seams. this was all very confusing and vexing as this shirt is otherwise perfection. i mean i'm not a large? but i guess if i want this shirt i will need to take it back and get a large otherwise it will be sleev,1


<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>2.3 - Execute the function on the nodes</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Call the apply method on the teradataml DataFrame.  This will push the function to the analytic nodes for processing in parallel.  Data is returned as another teradataml DataFrame, which represents the entire result set of the operation.  For this example, additional method arguments include:</p>

<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Function</b>.  The name of the function to pass - this is the original client-side function written above</li>
    <li><b>Environment</b>.  The custom runtime environment with the models and dependencies loaded</li>
    <li><b>Return types</b>. OrderedDict that represents the column names and data types</li>
    </ol>

In [14]:
# create a new dataframe representing the embeddings
tdf_embedded = tdf.apply(lambda row: create_embeddings(row),
                    env_name = demo_env, 
                    returns = types_dict)

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Check the data</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Now - users can create a table with these embeddings, use it in additional analytics, or operationalize this as part of a pipeline.</p>

In [15]:
ipydisplay(tdf_embedded.sample(2))

comment_id,comment_text,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,sampleid
21606,Too bad the blue one is sold out in my size (petite) i ordered the 00p based on feedback at m store that it runs big that was a good choice (i am 5'1.5 115 lbs#. the material teds to wrinkle though so will have to think about keeping it. cut is flattering i like it with the belt like the picture on the model but was nice too with the drawstring and a nice chunky necklace #bought black dress so gold necklace is nice). length was mid calf but because it has a rounded hem it wa,-0.0183671172708272,0.0733094736933708,0.1273084282875061,0.0692610070109367,0.0322424583137035,-0.0118172513321042,-0.0073643857613205,0.0603126585483551,-0.067592017352581,0.0561967529356479,-0.0130629632622003,0.0075026703998446,0.008945264853537,-0.038383450359106,-0.0397353842854499,0.0456796176731586,0.1055334061384201,-0.0273751430213451,-0.0146472146734595,-0.0704546794295311,-0.0068378741852939,0.0019615683704614,-0.0412319712340831,0.1216671019792556,-0.0670526474714279,-0.0853076875209808,0.0497120879590511,0.0056311232037842,-0.0934997349977493,-0.0273768901824951,-0.0523525848984718,-0.0123810423538088,0.0326975993812084,-0.0227297693490982,-0.0406257919967174,-0.1002182215452194,0.1438896059989929,0.0276491455733776,-0.0938193425536155,0.0193500723689794,-0.003979658242315,-0.0328947491943836,-0.0761070251464843,0.0371552519500255,0.0116907870396971,-0.00797666516155,-0.0019025375368073,0.0522385872900486,-0.0626434087753295,0.0601691305637359,-0.0044961487874388,0.030736431479454,-0.1442394703626632,0.048168521374464,-0.0253958217799663,0.0444835312664508,-0.0614709630608558,-0.0092925643548369,-0.0385096296668052,-0.0141807049512863,-0.035255491733551,0.0340765416622161,-0.0628984123468399,0.0146032571792602,0.0240391157567501,0.0187673959881067,-0.0079330569133162,-0.0580470860004425,-0.0265547968447208,-0.0290225036442279,0.030147923156619,-0.0271486956626176,-0.052830696105957,0.0999305918812751,-0.026269894093275,-0.016195161268115,0.1159111559391021,0.0267813671380281,-0.0492437854409217,0.0167016275227069,-0.0580086819827556,-0.0609613098204135,-0.0366170331835746,-0.0052733146585524,-0.0054903267882764,0.0481316670775413,0.0514991320669651,0.0171338077634573,0.0178927108645439,-0.0249248966574668,0.0520641468465328,0.0965958535671234,-0.0553968846797943,-5.796125697088428e-05,0.0404252037405967,0.0281772390007972,-0.0315381325781345,0.0343327261507511,0.01413553673774,0.021086897701025,0.0269845444709062,-0.0202843211591243,0.0486671663820743,0.0046159941703081,0.0585565045475959,-0.0694267302751541,0.0708118826150894,0.0379826836287975,0.1619573831558227,-0.0076881670393049,0.027928177267313,-0.0199745781719684,-0.012666075490415,0.0211224760860204,0.0423130020499229,-0.0556617826223373,-0.0018738743383437,0.0698918849229812,0.0326929315924644,0.0481628105044364,-0.0026385984383523,-0.023564463481307,-0.0707337409257888,-0.0509777851402759,-0.1508083939552307,0.0166239831596612,0.0444375649094581,4.4806097765102354e-33,-0.0395760387182235,0.0560729913413524,0.0451435633003711,-0.0333634577691555,-0.0116124926134943,-0.0085652181878685,0.0515155084431171,-0.0057572810910642,-0.0852480828762054,0.0598429031670093,-0.0141318282112479,-0.0379481054842472,-0.0727574229240417,0.0716420114040374,0.03967747092247,-0.0215212274342775,-0.0147286113351583,0.0286196004599332,-0.0974177941679954,0.0959982350468635,0.01095184776932,0.0389313362538814,0.0170288700610399,1.1735074622265527e-05,0.0590040162205696,-0.0320296734571456,0.0721368119120597,-0.0012178088072687,-0.0673053190112114,-0.003306291764602,0.0153626389801502,-0.01223926525563,0.0322882123291492,-0.0579191297292709,-0.0767579153180122,-0.1152282506227493,0.020667128264904,-0.0415032841265201,0.0326127149164676,-0.025897804647684,0.0133232325315475,0.0661875084042549,0.0083015151321887,0.0433320999145507,-0.0233948193490505,0.0152158681303262,0.0781998708844184,0.0677876845002174,-0.0139409294351935,-0.0261621885001659,-0.0320008434355258,0.005456477869302,0.0902998596429824,-0.0306866578757762,-0.0052924142219126,-0.0650918334722518,0.0399197526276111,-0.013179799541831,-0.0341434627771377,-0.0145200472325086,0.0942536443471908,0.0802389681339263,-0.0064065065234899,-0.0267152469605207,-0.0372806452214717,-0.0313086546957492,-0.0002714094880502,-0.0613576099276542,0.0013907619286328,-0.0690558925271034,-0.0710323452949523,0.0863693580031395,0.0778254121541976,0.0421379245817661,0.027314567938447,-0.0064887874759733,0.0090960413217544,-0.0410807915031909,-0.1121386364102363,-0.0506359860301017,-0.0339846201241016,0.0022248858585953,0.0210410747677087,0.0807285457849502,-0.0056559168733656,-0.0163839533925056,-0.0005264738574624,0.013153564184904,0.0253488421440124,-0.066762700676918,0.0560965798795223,0.0251059178262949,-0.0229081325232982,0.0305584575980901,0.0268034692853689,-4.505125078538619e-33,-0.0051246783696115,0.0568677112460136,0.0068627633154392,0.0133916083723306,0.106747530400753,-0.0217053014785051,-0.008749364875257,0.1150291860103607,0.0143191302195191,0.0295053347945213,0.049227274954319,-0.0376820564270019,-0.0183928329497575,-0.0363365784287452,0.1106877848505973,-0.0093824202194809,0.0648949667811393,-0.0958960875868797,0.0208435859531164,-0.0267052836716175,0.020479392260313,0.0641596913337707,0.0047873998992145,-0.0750585347414016,-0.0020869341678917,0.0680879652500152,0.0067367106676101,-0.0047530145384371,-0.0076520149596035,-0.0661405846476554,-0.0324684903025627,0.0192182753235101,0.0574437081813812,-0.0454121083021163,0.048517357558012,0.0247914399951696,-0.0607180520892143,0.0498648807406425,0.0507958494126796,0.1190410777926445,0.013491659425199,-0.0475263781845569,0.0372598208487033,-0.0449965111911296,-0.0211357250809669,-0.1361492574214935,0.0188598092645406,0.0001257932308362,0.0028684763237833,0.0039599859155714,-0.0173620581626892,0.0249872561544179,-0.0189438611268997,-0.0038426918908953,0.0005535437958315,-0.0878960862755775,-0.060824628919363,0.0558594353497028,-0.0347645692527294,0.0389006361365318,-0.0860429406166076,0.0217466410249471,-0.025634791702032,0.0106985522434115,-0.0124893253669142,0.0747130289673805,0.0449070110917091,-0.0819373130798339,-0.1102365478873252,-0.0062052868306636,-0.1175939738750457,-0.01410573720932,0.028326291590929,0.0233114827424287,-0.0738583728671073,-0.089183509349823,-0.0101691596210002,0.0280707329511642,-0.0260972809046506,0.1172419637441635,0.0309550538659095,-0.0540563128888607,0.0204896070063114,-0.0475194565951824,0.0043536103330552,0.0243323426693677,-0.0381366871297359,0.0671583488583564,-0.0497806444764137,-0.0098816249519586,-0.0011549796909093,0.0391944050788879,-0.0350644215941429,0.0097187403589487,-0.0155763858929276,-5.1384276389399026e-08,0.0795984491705894,0.0513039752840995,0.0381005853414535,0.0728297159075737,0.0208325609564781,0.0353323593735694,-0.0218631010502576,-0.0734788551926612,-0.0038351968396455,0.0475524626672267,0.0097076538950204,-0.0457668788731098,0.0148658547550439,-0.0451518855988979,-0.0705712810158729,-0.0453060530126094,-0.0300403255969285,0.0045369979925453,0.0069881929084658,-0.0635675191879272,0.0335181020200252,-0.0452568754553794,-0.0837633311748504,0.0121688181534409,-0.0351860299706459,-0.0794544294476509,-0.0104150231927633,0.0502888411283493,-0.0123056760057806,0.0401940047740936,0.0547898784279823,0.0344871915876865,0.018084317445755,-0.0013199390377849,-0.0484735630452632,-0.0456180088222026,-0.0541748702526092,0.0449182465672493,0.0028521600179374,0.0194732137024402,-0.001324116718024,-0.1030205264687538,0.0225698482245206,0.0269428845494985,0.0641354545950889,-0.0592560283839702,0.0575915686786174,-0.0949759185314178,-0.0004606259753927,0.0492893084883689,0.0200985930860042,-0.0939215198159217,-0.0156382210552692,-0.0282153841108083,-0.0026432971935719,0.0196400955319404,0.0068704755976796,0.0843331217765808,-0.0416471362113952,0.0230618845671415,-0.0541037134826183,-0.1180433630943298,-0.0116294827312231,0.0310221835970878,1
17789,I really liked the look of this top online but when it came i had a couple of issues with it. first when i tried it on i found the underarms to be a little snug (i normally wear a 10/m and that's what i ordered). it makes raising my arms tough and as a teacher i need to be able to lift my arms above my waist. secondly my husband's first reaction was that it looks like pajamas. the abstract floral/stripe gets a little lost. on the other hand the color palette is very pretty and the qual,-0.0662292391061782,0.0609903968870639,0.0565071962773799,0.0013824307825416,0.0400502309203147,0.0001657017564866,0.0073010842315852,0.0103060118854045,-0.0250655636191368,0.048648964613676,-0.1342769712209701,0.0525464192032814,0.069804161787033,-0.0554199665784835,0.0184254571795463,0.0048649632371962,0.1460059881210327,-0.0397952646017074,0.0850230082869529,-0.0336032211780548,-0.0532032400369644,-0.03589678555727,0.0120963081717491,0.0112033355981111,-0.0421392135322093,-0.0157697461545467,-0.004502722993493,0.0238075852394104,-0.0729123651981353,-0.1200744062662124,-0.0373415760695934,0.0458073765039444,0.0348775312304496,0.0217504203319549,-0.0593173541128635,-0.0084890415892004,0.0908661484718322,0.0313316658139228,-0.0997904315590858,0.0627991780638694,-0.0182922706007957,-0.0121084842830896,-0.0079782325774431,0.0382301174104213,0.0082855680957436,0.0152740767225623,0.0451927483081817,0.0223212391138076,-0.0310418885201215,-0.0086347749456763,-0.0311339423060417,-0.0775021687150001,0.0241162329912185,-0.0568844936788082,-0.0024753173347562,0.0790154412388801,-0.0202726144343614,-0.0678123533725738,-0.0113944187760353,-0.0786987468600273,-0.0058306292630732,0.0541433170437812,-0.0152699556201696,0.0591242872178554,0.106572724878788,-0.0832331255078315,-0.0046180444769561,0.0361635275185108,0.0240650456398725,-0.0130186825990676,0.0574543699622154,3.1524421501671895e-05,-0.0276556145399808,0.0249070134013891,-0.0274157635867595,-0.0072225732728838,0.0302102379500865,-0.0296550765633583,-0.0511102601885795,0.0310015399008989,-0.0085638491436839,0.042393147945404,-0.0288589745759964,0.0266719982028007,-0.0234904829412698,-0.0003341932606417,0.0172504726797342,-0.0621824339032173,-0.0515994876623153,-0.0042786253616213,0.0493346117436885,0.0351161286234855,0.0036338225472718,-0.060817789286375,0.0116944015026092,0.062785156071186,0.0099934916943311,-0.0494283847510814,-0.0022024312056601,0.0138890985399484,0.0130243971943855,0.0245630871504545,0.0691820904612541,-0.0449602603912353,-0.0015498189022764,-0.0835888162255287,0.1311333328485489,-0.0074250046163797,0.0063781575299799,-0.0116806821897625,0.0076201632618904,-0.0866284146904945,-0.0640036165714263,0.0114764990285038,-0.0503303818404674,-0.0025316262617707,0.0887826457619667,0.0256755892187356,0.0425401851534843,0.0326980389654636,-0.0022990712895989,-0.0089743304997682,0.0263557732105255,-0.0408907048404216,-0.0870446190237999,0.0433042831718921,0.0066171213984489,-1.6872761221665025e-33,0.0187249090522527,0.024722509086132,0.0013222555862739,0.0317394696176052,0.0914353802800178,-0.0237000845372676,0.0210983287543058,-0.0889263153076171,-0.1079943478107452,0.1330888569355011,0.0162673704326152,0.0448424145579338,-0.0482307635247707,0.1141862049698829,0.0120402919128537,-0.0173431318253278,-0.0356583148241043,-0.0376666188240051,-0.0923572480678558,0.0136180929839611,-0.0457341820001602,0.0882884114980697,-0.0297670327126979,-0.0064780628308653,-0.045064877718687,0.0132424924522638,0.0125084714964032,0.0533291473984718,-0.006885758601129,0.0073621682822704,-0.0087091065943241,0.0037893895059823,0.0017971843481063,-0.0401755720376968,-0.0516122244298458,-0.0291794389486312,0.0524508990347385,-0.042280126363039,0.0433979593217372,-0.031531948596239,-0.0045134504325687,0.0278980769217014,0.0492549426853656,0.1086009368300437,-0.0566371530294418,0.1146508529782295,0.0891240239143371,-0.0459513142704963,0.029287327080965,0.0607890374958515,-0.0441363677382469,-0.0470699109137058,0.0368831269443035,-0.0039816740900278,-0.0586434714496135,-0.0038881259970366,-0.0072549539618194,0.0149476500228047,0.0367674864828586,-0.0139143662527203,0.1036310642957687,-0.0490915775299072,-0.0535277798771858,-0.0586034730076789,0.0021905230823904,0.0280073154717683,-0.0063689267262816,-0.0268601141870021,0.051960464566946,0.0587741360068321,-0.01543496735394,0.0606945566833019,0.0440423861145973,0.0720304176211357,0.0921880975365638,0.0459455661475658,0.0221896357834339,-0.0505055896937847,0.0185252279043197,-0.0546770468354225,0.1192529499530792,0.0736657828092575,0.0003189157578162,-0.0067065339535474,-0.0183849837630987,-0.0249210204929113,0.0079149408265948,0.0461056195199489,-0.0105411596596241,0.0881273448467254,-0.0192253664135932,-0.0197979789227247,0.131320059299469,-0.0473347380757331,-0.0432178415358066,4.1379228675318064e-34,-0.0046918145380914,-0.0049418061971664,0.0506856404244899,-0.0523059032857418,0.0103427898138761,-0.0008431115420535,-0.0345299914479255,0.0651313215494155,-0.0475700050592422,0.0932701304554939,0.0951145216822624,0.009500334970653,-0.0269477814435958,-0.0439702197909355,0.043492455035448,-0.0126619888469576,0.0493317209184169,-0.0379905477166175,0.0019544456154108,-0.1012101247906684,0.0159302912652492,0.04646772518754,-0.0278508495539426,-0.0789748653769493,-0.1367103457450866,0.0649164244532585,-0.0405067391693592,0.0133210970088839,-0.0262968614697456,0.0272855442017316,0.0028218142688274,-0.0360105410218238,0.0338049530982971,0.0348301641643047,0.0109594333916902,0.0291457362473011,-0.0510398559272289,0.0242243818938732,-0.0179425571113824,0.0002166273188777,-0.0428324826061725,-0.1066076159477233,-0.0012499536387622,-0.0423174612224102,0.0995546877384185,-0.0994087904691696,-0.1132721379399299,0.0321819968521595,-0.0192991290241479,-0.0375373773276805,-0.0716917887330055,-0.0256645306944847,0.0269250907003879,-0.0062734289094805,0.0067605515941977,-0.0277094636112451,-0.0145479217171669,0.0601924248039722,0.0227580908685922,0.1158119365572929,-0.023632014170289,0.0410272255539894,-0.0751028209924697,-0.0463151633739471,0.013082698918879,-0.0129450326785445,-0.0262279734015464,-0.0827311277389526,-0.1016253903508186,0.0213374439626932,-0.0846384540200233,-0.1045858636498451,-0.0234983973205089,0.0143348984420299,0.0564247071743011,-0.0788258463144302,0.0966463759541511,0.0319295972585678,0.0022621150128543,0.0206855963915586,-0.0207712985575199,-0.0340252369642257,-0.051692146807909,-0.0041240127757191,0.0210126675665378,0.1211984455585479,-0.0068882838822901,0.0017164973542094,-0.0228320229798555,-0.0021312232129275,-0.0319468006491661,0.0261492021381855,-0.0927432999014854,0.0464039556682109,0.0407148338854312,-5.7021626531650333e-08,0.0499724186956882,-0.0874054059386253,0.0908349528908729,0.0053576845675706,-0.0436422228813171,-0.0080514615401625,-0.0097251180559396,-0.0857785567641258,-0.0206721145659685,0.0168817900121212,0.0053278286941349,0.0048683485947549,-0.0578032508492469,-0.0040677273645997,-0.0716069713234901,-0.0057718963362276,-0.0309000145643949,0.0933656170964241,-0.0251081995666027,-0.0328735522925853,0.0320222973823547,0.0654275566339492,0.0053489818237721,-0.0056579150259494,-0.0409738644957542,0.0439110621809959,-0.011147366836667,0.0127311758697032,-0.0325626619160175,0.0279171988368034,0.0292803067713975,0.019925333559513,0.0343625955283641,0.0108906514942646,-0.0958321169018745,-0.0175380874425172,-0.0189657974988222,-0.0412294119596481,0.0002431021421216,-0.0517483837902545,0.0162759535014629,-0.1075745522975921,0.0298594050109386,0.0163287669420242,0.0055959150195121,-0.0355435572564601,0.0303533971309661,-0.0089298170059919,-0.0084725059568881,0.0605340339243412,-0.0278394799679517,-0.02416936121881,-0.0282878391444683,0.0770914480090141,-0.0440597087144851,-0.0887115970253944,-0.0337536260485649,0.0518888533115386,0.0023420986253768,0.064518503844738,0.1221847161650657,-0.0691678300499916,-0.0596515089273452,0.1038893386721611,1


<hr>
<p style = 'font-size:24px;font-family:Arial;color:#00233C'><b>Conclusion - executing custom code against GPU-enabled analytic clusters</b></p>



<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The preceding demo showed how users can use simple, familiar patters to execute powerful AI models <b>at scale</b> for development and operational processing.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Stop the Cluster</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Hibernate the environment if desired</p>

In [None]:
check_cluster_stop(compute_group)

<hr>
   <p  style='font-size:36px;font-family:Arial; color:#F0F0F0; background-color: #00233c; padding-left: 20pt; padding-top: 20pt;padding-bottom: 10pt; padding-right: 20pt;'>
       Demo 3 - Operationalizing AI-powered analytics
  <br>
    </p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The following demonstration will illustrate how developers can take the next step in the process to <b>operationalize</b> this processing, enabling the entire organization to leverage AI across the data lifecycle, including</p>

<table style = 'width:100%;table-layout:fixed;'>
    <tr>
        <td style = 'vertical-align:top' width = '30%'>
           <ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
               <li><b>Prepare the environment</b>.  Package the scoring function into a more robust program, and stage it on the remote environment</li>
            <br>
            <br>
               <li><b>Python Pipeline</b>.  Execute the function using Python methods</li>
            <br>
            <br>
               <li><b>SQL Pipeline</b>.  Execute the function using SQL - allowing for broad adoption and use in ETL and operational needs</li>
        </ol>
        </td>
        <td width = '20%'></td>
        <td style = 'vertical-align:top'><img src = 'images/OAF_Ops.png' width = 350 ></td>
    </tr>
</table>


<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Check connection</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Reconnect to the database, UES, and start cluster if necessary<get_context()/p> 

In [None]:
# check for existing connection
eng = check_and_connect(host=host, username=username, password=password, compute_group = compute_group)
print(eng)
    

# check to see if there is a valid UES auth
# if not, authenticate
try:
    demo_env = get_env(oaf_name)

except Exception as e:
    if '''NoneType' object has no attribute 'value''' in str(e): #UES auth expired/required
        if set_auth_token(ues_url = ues_url, username = username, pat_token = pat_token, pem_file = pem_file):
            print('UES Authentication successful')
            try:
                demo_env = get_env(oaf_name)
                pass
            except Exception as l:
                if f'''User environment '{oaf_name}' not found''' in str(l):
                    print('User environment not found')
                    pass
                else:
                    raise
        else:
            print('UES Authentication failed, check URL and account info')
        pass
    elif f'''User environment '{oaf_name}' not found''' in str(e):
        print('User environment not found')
        pass
    else:
        raise


# check cluster status
check_cluster_start(compute_group = compute_group)

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Step 2.  Create a server-side embedding function</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The goal of this exercise is to create a <b>server-side</b> function which can be staged on the analytic cluster.  This offers many improvements over the method used above;</p> 
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Performance</b>.  Staging the code and dependencies in the container environment reduces the amount of I/O, since the function doesn't need to get serialized to the cluster when called</li>
    <li><b>Operationalization</b>.  The execution pipeline can be encapsulated into a SQL statement, which allows for seamless use in ETL pipelines, dashboards, or applications that need access</li>
    <li><b>Flexibility</b>. Developers can express much greater flexibility in how the code works to optimize for performance, stability, data cleanliness or flow logic</li>
</ul>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>These benefits do come with some amount of additional work.  Developers need to account how data is passed in and out of the code runtime, and how to pass it back to the SQL engine to assemble and return the final resultset.  Code is executed when the user expresses an <a href = 'https://docs.teradata.com/r/Teradata-VantageCloud-Lake/SQL-Reference/SQL-Operators-and-User-Defined-Functions/Table-Operators/APPLY'>APPLY SQL function</a>;</p> 
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b>Input Query</b>.  The APPLY function takes a SQL query as input.  This query can be as complex as needed and include data preparation, cleansing, and/or any other set-based logic necessary to create the desired input data set.  This complexity can also be abstracted into a database view.  When using the teradata client connectors for Python or R, thise query is represented as a DataFrame or tibble.</li>
    <li><b>Pre-processing</b>.  Based on the query plan, data is retrieved from storage (cache, block storage, or object storage) and the input query is executed.</li>
    <li><b>Distribution</b>.  Input data can be partitioned and/or ordered to be processed on a specific container or collection of them.  For example, the user may want to process all data for a single post code in one partition, and run thousands of these in parallel.  Data can also be distributed evenly across all units of parallelism in the system</li>
    <li><b>Input</b>.  The data for each container is passed to the runtime using tandard input (stdin)</li>
    <li><b>Processing</b>.  The user's code executes, parsing stdin for the input data</li>
    <li><b>Output</b>.  Data is sent out of the code block using standard output (stdout)</li>
    <li><b>Resultset</b>.  Resultset is assembled by the analytic database, and the SQL query returns</li>
    </ol>


<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Example server-side code block</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>This is the python script used in the demonstration.  It is saved to the filesystem as "embedding.py".  Note here the original client-side processing function has been reused, and the additional logic is for input, output, and error handling.</p> 


```python

#!/usr/bin/env python3
import sys, csv
import warnings
import torch 
from sentence_transformers import SentenceTransformer
import pandas as pd

warnings.simplefilter('ignore')

# Read data from stdin, and construct a Pandas DataFrame #
# Data can also be read/processed directly from stdin if desired

# 1. use the csv reader to parse comma-separated input
# 2. construct the Dataframe from the resulting dictionary

colNames = ['comment_id', 'comment_text']
d = csv.DictReader(sys.stdin.readlines(), fieldnames = colNames)
df = pd.DataFrame(d, columns = colNames)

# Use try...except to produce an error if something goes wrong in the try block
try:
    # Exit gracefully if DataFrame is empty
    # It is possible some partitions won't get any data
    if df.empty:
        sys.exit()
    
    model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
    # execute the same logic as in the demo function
    
    def create_embeddings(row):
        # check for NVIDIA drivers
        if torch.cuda.is_available(): device = 'cuda'  
        else: device = None
            
        # create series from the embeddings list that is returned
        s = pd.Series(model.encode(row['comment_text'], device = device))
    
        #concat them together
        row = pd.concat([row, s], axis = 0)
        
        return row
    
    #call the embedding function using pandas apply()
    df = df.apply(create_embeddings, axis = 1) 


    # Egress results to the Database through standard output.
    # iterrrows generates a Series, iterate through the series to construct
    # a comma-separated output string
    for index, value in df.iterrows():
        my_str = ''
        for val in value.index:
            my_str = my_str + str(value[val]) + ','
        print(my_str[:-1])
        
# raise any errors back to the SQL engine
except (SystemExit):
    # Skip exception if system exit requested in try block
    pass
except:    # Specify in standard error any other error encountered
    print("Script Failure :", sys.exc_info()[0], file=sys.stderr)
    raise
    sys.exit()
```

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Step 3.  Install the file and any additional artifacts</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Use the install_file() method to install this python file to the container.  As a reminder, this container is persistent, so these steps need only be done infrequently.</p> 

In [16]:
demo_env.install_file('embedding.py', replace = True)

File 'embedding.py' replaced successfully in the remote user environment 'oaf_demo_3'.


True

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Step 4.  Call the APPLY function </b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>This function can be executed in two ways;</p> 
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li><b><a href = 'https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Analyzing-Your-Data/Teradata-Package-for-Python-on-VantageCloud-Lake/Working-with-Open-Analytics/teradataml-Apply-Class-for-APPLY-Table-Operator'>Python</a></b> by calling the Apply() module function</li>
    <li><b><a href = 'https://docs.teradata.com/r/Teradata-VantageCloud-Lake/SQL-Reference/SQL-Operators-and-User-Defined-Functions/Table-Operators/APPLY'>SQL</a></b> which allows for broad adoption across the enterprise</li>
    </ul>
    

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>APPLY using Python</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The process is as follows</p> 
<ol style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>Construct a dictionary that will define the return columns and data types</li>
    <li>Construct a teradataml DataFrame representing the data to be processed - note this is a "virtual" object representing data and logic <b>in-database</b></li>
    <li>Execute the module function.  This constructs the function call in the database, but does not execute anything.  Note the Apply function takes several arguments - the input data, environment name, and the command to run</li>
    <li>In order to execute the function, an "execute_script()" method must be called.  This method returns the server-side DataFrame representing the complete operation.  This DataFrame can be used in further processing, stored as a table, etc.</li>
    </ol>
    

In [17]:
types_dict = OrderedDict({})
types_dict['comment_id'] = BIGINT()
types_dict['comment_text'] = VARCHAR(1000)

for i in range(384):
    key = '"' + str(i) + '"'
    types_dict[key] = FLOAT()

tdf = DataFrame.from_query('''SELECT TOP 1000 comment_id,
    CASE 
        WHEN comment_text IS NULL THEN ' '
        ELSE OREPLACE(OREPLACE(OREPLACE(OREPLACE(OREPLACE(comment_text , X'0d' , ' ') , X'0a' , ' ') , X'09', ' '), ',', ' '), '"', ' ')
    END comment_text 
    FROM demo_ofs.web_comment WHERE comment_text <> '';''')


In [18]:
apply_obj = Apply(data = tdf,
                  apply_command = 'python embedding.py',
                  returns = types_dict,
                  env_name = oaf_name
                 )

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Execute the function</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>call execute_script(), and return a single record to the client to check the data.</p> 

In [19]:
ipydisplay(apply_obj.execute_script().sample(1))

comment_id,comment_text,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,sampleid
18382,Gorgeous shirt! the detail on this shirt is impeccable! even with its boxy shape it was very flattering on my petite curvy frame. i'd say this shirt runs just a tad big. i'm 5' 118lbs and the xs fit perfectly.,0.0290744230151176,0.11325304210186,0.0166481994092464,0.0169600024819374,0.1046436876058578,-0.0539661385118961,-0.0019702813588082,-0.0294460114091634,-0.0489669702947139,0.0168222598731517,-0.0307464525103569,0.0294074844568967,0.0535939335823059,-0.0697206780314445,-0.0212020687758922,-0.0183386988937854,0.0437776520848274,-0.0415646582841873,-0.0411187075078487,-0.002362904138863,0.0188633371144533,-0.0213365536183118,-0.037036906927824,0.0090522272512316,-0.0866058096289634,-0.0082718189805746,0.0512326098978519,-0.0908460989594459,-0.0553026534616947,-0.0337276868522167,0.0334504470229148,0.0433124490082263,0.0474754497408866,-0.0449750535190105,-0.0586670376360416,0.0070988913066685,0.1221359595656395,0.0004955385811626,-0.0392562970519065,-0.0440121218562126,-0.028323596343398,-0.0775691568851471,0.0097655933350324,0.078823447227478,-0.0690739452838897,0.0025769579224288,0.0197278875857591,0.0012601759517565,-0.0567251630127429,0.1003736034035682,0.0085771037265658,-0.0439747385680675,-0.0435048006474971,-0.0228121541440486,-0.0376036912202835,0.1179819926619529,-0.0559070333838462,-0.0591128543019294,-0.0923714488744735,-0.0484207496047019,-0.0112670762464404,0.0847122222185134,0.0271985046565532,0.0501600392162799,0.0850809141993522,0.0348573848605155,-0.026313042268157,-0.0547638498246669,-0.0186894237995147,0.003963001538068,0.1001452431082725,0.0497186630964279,-0.0149698024615645,0.0675533637404441,-0.0066227675415575,-0.0063574509695172,0.0687578469514846,-0.0294247958809137,0.0125180790200829,0.0571020133793354,-0.0460969656705856,0.025730961933732,-0.0272815749049186,0.0030610233079642,-0.0248003173619508,-0.0111685469746589,-0.0420420281589031,-0.0254672840237617,0.0466830097138881,0.0405426174402236,0.0028052665293216,0.1135843768715858,0.0065573179163038,-0.0035662974696606,0.0312660560011863,-0.0042333509773015,0.0447703823447227,-0.0243616364896297,-0.0156241133809089,0.0069997678510844,0.0142582580447196,0.0065628695301711,0.0883326604962349,0.0281506422907114,-0.0170770492404699,-0.0870483964681625,-0.013190422207117,0.0909766852855682,0.0489964038133621,-0.0145610701292753,0.006234991364181,-0.0195406451821327,0.0074330153875052,0.0394671931862831,0.0666808933019638,-0.064095988869667,0.0485701337456703,0.0386976525187492,-0.0298185627907514,-0.0099086845293641,-0.0495197996497154,0.0086968280375003,-0.1158187165856361,-0.005403992254287,-0.0707751139998436,-0.0084949312731623,0.0453004539012908,1.4696248024106133e-33,-0.1016977578401565,0.0908498764038086,0.0386711992323398,-0.0471926443278789,-0.0001243473379872,-0.0590896941721439,0.0733590796589851,-0.0631874799728393,-0.0584353432059288,0.1723754256963729,-0.04330725222826,0.0423959158360958,-0.0296546258032321,0.1084221825003624,0.0480312667787075,0.0022996054030954,-0.0281220953911542,-0.0176709219813346,-0.1039567440748214,0.0913541913032531,0.0686916038393974,-0.0027152951806783,0.0159718673676252,-0.0021037047263234,-0.0295560322701931,-0.0060246451757848,0.0559025518596172,-0.0170666109770536,-0.0712816193699836,-0.0166880749166011,-0.0242568645626306,-0.035100944340229,0.0373675227165222,0.0286310836672782,-0.1078903079032898,-0.0570965558290481,-0.0223823096603155,-0.0116555765271186,0.0882331877946853,0.0766962766647338,0.0546584315598011,-0.0254520699381828,-0.0104774385690689,-0.0199126806110143,-0.0303341522812843,0.0590474344789981,-0.0041302479803562,-0.0572945587337017,0.0397325679659843,-0.0734437331557273,0.0270890668034553,-0.0694347769021987,0.0056834630668163,-0.0201998502016067,0.0742174908518791,-0.0122116301208734,-0.0269145816564559,-0.0153837390244007,-0.0861732959747314,0.0127901956439018,0.0855270475149154,-0.0557947494089603,-0.0462705269455909,0.0027018508408218,-0.0995479673147201,-0.0280350614339113,-0.0004254306259099,-0.1134960874915123,-0.0525575578212738,0.0171494539827108,-0.0324651896953582,0.0035891039296984,0.0737536698579788,0.0102821942418813,0.0855877548456192,-0.007597180083394,-0.0651342496275901,0.086595319211483,-0.0413696132600307,0.0605082176625728,0.0149258505553007,0.0282396841794252,-0.0167213417589664,0.0166150797158479,0.0392276532948017,0.0046217897906899,-0.0183570124208927,0.0265892408788204,0.0213632192462682,0.0189832635223865,0.084857627749443,-0.0125857479870319,0.0102771697565913,0.0025693327188491,0.0106575991958379,-2.4433078035253346e-33,0.0278090443462133,0.0354278534650802,0.0446679852902889,-0.010892485268414,0.0194176919758319,0.0056758574210107,-0.0671698302030563,0.1248935535550117,-0.0709130987524986,0.0289046820253133,0.0376124791800975,-0.0066304523497819,-0.0670640468597412,-0.0712718740105629,0.0206352993845939,0.0093799950554966,0.184068500995636,-0.0348372086882591,-0.0397136360406875,-0.0803262963891029,0.0082607325166463,-0.0185101125389337,0.0348828434944152,-0.1308949291706085,0.0072691128589212,0.0172267537564039,0.029805077239871,0.0239499397575855,0.029236901551485,-0.054443497210741,-0.1188021749258041,0.0272326618432998,0.0956239849328994,0.0001785558124538,0.0813683494925499,0.0303688514977693,-0.01361995190382,0.1063966751098632,-0.0175989866256713,0.0475621037185192,-0.0181769486516714,0.0040268842130899,0.0505311973392963,0.0620923601090908,0.0205885395407676,-0.0641757249832153,-0.0136963762342929,-0.0486760959029197,-0.0255352780222892,0.0284373201429843,0.0212401244789361,-0.0146755259484052,0.081040970981121,0.0424987189471721,-0.013476432301104,-0.0506166368722915,-0.0259900763630867,0.0431232936680316,0.0071341469883918,0.051013845950365,-0.0636665150523185,-0.0178628731518983,-0.0650794878602027,-0.0057570994831621,-0.0117437336593866,-0.0318510942161083,-0.0258084498345851,-0.0522671788930892,-0.0814239680767059,0.0085856579244136,-0.0831819400191307,-0.1129554957151413,-0.0675422623753547,0.0029615936800837,-0.0918339192867279,-0.0759781897068023,-0.0232893321663141,0.0284560322761535,-0.0309210680425167,0.0480395816266536,0.0240388344973325,-0.0258562881499528,0.0301910657435655,-0.0466247946023941,0.0366521589457988,0.075937420129776,-0.026303693652153,0.0344884879887104,-0.0016526208491995,0.0171049237251281,-0.0130547573789954,0.0539177991449832,-0.0660939663648605,-0.045350257307291,0.1148731634020805,-3.671629755785944e-08,-0.0864507406949997,0.0597877763211727,-0.0233036745339632,0.0062941177748143,0.0261803902685642,0.1191465929150581,-0.0134132849052548,-0.0237284377217292,0.0180234462022781,0.0685852468013763,0.0246851220726966,0.0061455578543245,0.0076674385927617,-0.0254336390644311,-0.0923266857862472,-0.0603989139199256,-0.0067706345580518,0.0561831854283809,0.0381838344037532,-0.0173501092940568,0.005660030990839,-0.0177152380347251,-0.0864838436245918,0.0210542250424623,-0.011286312714219,-0.0852393954992294,0.0296199563890695,0.0054274005815386,-0.0218538865447044,0.0141780851408839,-0.0047904122620821,0.0139604341238737,0.0073785842396318,0.0051908320747315,-0.0190907288342714,-0.0208104327321052,0.0052824178710579,-0.0533195324242115,0.0880930125713348,-0.0514202788472175,-0.0184078644961118,0.0081542404368519,0.048700988292694,0.0072076390497386,0.0741314366459846,-0.0603392571210861,0.0091230329126119,-0.08884958922863,0.0091330809518694,-0.0086287297308444,0.0499394722282886,-0.0552193969488143,-0.0040676668286323,-0.0260724630206823,0.0006879108259454,0.0198774579912424,-0.0160072762519121,0.0729195028543472,-0.0077541139908134,0.0702188834547996,0.0132345175370574,-0.0240518599748611,-0.0309485159814357,-0.0378026701509952,1


<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Step 5. Persist the resultset</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Simple methods for saving the results to a table for reuse.  For this demo, a temporary (volatile) table is created for illustration purposes.</p> 

In [20]:
copy_to_sql(apply_obj.execute_script(), table_name = 'demo_embeddings', temporary = True, if_exists = 'replace')
execute_sql('SELECT TOP 1 comment_text, "0", "1" FROM demo_embeddings;').fetchall()

[["This is such a unique and fun dress. i'm so glad i found it and it fits perfectly!",
  -0.0641724169254303,
  0.07574359327554703]]

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>APPLY using SQL</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The process is much the same, except the benefit is that this SQL can be used by a wide range of tools, applications, and dashboards; as well as automated processes.  Construct the statement using the following values:</p> 
<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
    <li>
        <b>ON</b> Clause<ul style = 'font-size:16px;font-family:Arial;color:#00233C'>
        <li>Input table or query</li>
        <li>Hash or Partition column(s)</li>
        <li>Order by and/pr local order by directives</li>
        <li>Return columns and data types</li></ul></li>
    <li>Any partition column(s)</li>
    <li>The shell command to run</li>
    <li>Additional functional arguments</li>
    </ul>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Simplify the SQL using Views</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Create a view that represents the data preparation steps.  In this example, the preparation tasks include removing NULLs and any special characters.  Note the same SQL used to contstruct the teradataml DataFrame is used here.</p> 

In [21]:
qry = '''
    REPLACE VIEW prepared_data_V AS
    
    SELECT TOP 1000 
    comment_id,
    CASE 
        WHEN comment_text IS NULL THEN ' '
        ELSE OREPLACE(OREPLACE(OREPLACE(OREPLACE(OREPLACE(comment_text , X'0d' , ' ') , X'0a' , ' ') , X'09', ' '), ',', ' '), '"', ' ')
    END comment_text 
    FROM demo_ofs.web_comment WHERE comment_text <> '';
    
    '''

# execute the sql against the database
execute_sql(qry)

# check the data
execute_sql('SELECT TOP 1 * FROM prepared_data_V;').fetchall()

[[21848,
  "Loved the design and print of this blouse.  however  every time i put it on or took it off  i could hear the seams ripping.  definitely wasn't because it was too small either."]]

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Construct the APPLY Query</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Use the view as input data, provide "returns" payload and required parameters.</p> 

In [22]:
qry = f'''
SELECT * FROM Apply(
    ON prepared_data_V
    PARTITION BY ANY
    
    returns(comment_id BIGINT, comment_text VARCHAR(1000), "0" FLOAT, "1" FLOAT, "2" FLOAT, "3" FLOAT, "4" FLOAT, "5" FLOAT, "6" FLOAT, "7" FLOAT, "8" FLOAT, "9" FLOAT, "10" FLOAT, "11" FLOAT, "12" FLOAT, "13" FLOAT, "14" FLOAT, "15" FLOAT, "16" FLOAT, "17" FLOAT, "18" FLOAT, "19" FLOAT, "20" FLOAT, "21" FLOAT, "22" FLOAT, "23" FLOAT, "24" FLOAT, "25" FLOAT, "26" FLOAT, "27" FLOAT, "28" FLOAT, "29" FLOAT, "30" FLOAT, "31" FLOAT, "32" FLOAT, "33" FLOAT, "34" FLOAT, "35" FLOAT, "36" FLOAT, "37" FLOAT, "38" FLOAT, "39" FLOAT, "40" FLOAT, "41" FLOAT, "42" FLOAT, "43" FLOAT, "44" FLOAT, "45" FLOAT, "46" FLOAT, "47" FLOAT, "48" FLOAT, "49" FLOAT, "50" FLOAT, "51" FLOAT, "52" FLOAT, "53" FLOAT, "54" FLOAT, "55" FLOAT, "56" FLOAT, "57" FLOAT, "58" FLOAT, "59" FLOAT, "60" FLOAT, "61" FLOAT, "62" FLOAT, "63" FLOAT, "64" FLOAT, "65" FLOAT, "66" FLOAT, "67" FLOAT, "68" FLOAT, "69" FLOAT, "70" FLOAT, "71" FLOAT, "72" FLOAT, "73" FLOAT, "74" FLOAT, "75" FLOAT, "76" FLOAT, "77" FLOAT, "78" FLOAT, "79" FLOAT, "80" FLOAT, "81" FLOAT, "82" FLOAT, "83" FLOAT, "84" FLOAT, "85" FLOAT, "86" FLOAT, "87" FLOAT, "88" FLOAT, "89" FLOAT, "90" FLOAT, "91" FLOAT, "92" FLOAT, "93" FLOAT, "94" FLOAT, "95" FLOAT, "96" FLOAT, "97" FLOAT, "98" FLOAT, "99" FLOAT, "100" FLOAT, "101" FLOAT, "102" FLOAT, "103" FLOAT, "104" FLOAT, "105" FLOAT, "106" FLOAT, "107" FLOAT, "108" FLOAT, "109" FLOAT, "110" FLOAT, "111" FLOAT, "112" FLOAT, "113" FLOAT, "114" FLOAT, "115" FLOAT, "116" FLOAT, "117" FLOAT, "118" FLOAT, "119" FLOAT, "120" FLOAT, "121" FLOAT, "122" FLOAT, "123" FLOAT, "124" FLOAT, "125" FLOAT, "126" FLOAT, "127" FLOAT, "128" FLOAT, "129" FLOAT, "130" FLOAT, "131" FLOAT, "132" FLOAT, "133" FLOAT, "134" FLOAT, "135" FLOAT, "136" FLOAT, "137" FLOAT, "138" FLOAT, "139" FLOAT, "140" FLOAT, "141" FLOAT, "142" FLOAT, "143" FLOAT, "144" FLOAT, "145" FLOAT, "146" FLOAT, "147" FLOAT, "148" FLOAT, "149" FLOAT, "150" FLOAT, "151" FLOAT, "152" FLOAT, "153" FLOAT, "154" FLOAT, "155" FLOAT, "156" FLOAT, "157" FLOAT, "158" FLOAT, "159" FLOAT, "160" FLOAT, "161" FLOAT, "162" FLOAT, "163" FLOAT, "164" FLOAT, "165" FLOAT, "166" FLOAT, "167" FLOAT, "168" FLOAT, "169" FLOAT, "170" FLOAT, "171" FLOAT, "172" FLOAT, "173" FLOAT, "174" FLOAT, "175" FLOAT, "176" FLOAT, "177" FLOAT, "178" FLOAT, "179" FLOAT, "180" FLOAT, "181" FLOAT, "182" FLOAT, "183" FLOAT, "184" FLOAT, "185" FLOAT, "186" FLOAT, "187" FLOAT, "188" FLOAT, "189" FLOAT, "190" FLOAT, "191" FLOAT, "192" FLOAT, "193" FLOAT, "194" FLOAT, "195" FLOAT, "196" FLOAT, "197" FLOAT, "198" FLOAT, "199" FLOAT, "200" FLOAT, "201" FLOAT, "202" FLOAT, "203" FLOAT, "204" FLOAT, "205" FLOAT, "206" FLOAT, "207" FLOAT, "208" FLOAT, "209" FLOAT, "210" FLOAT, "211" FLOAT, "212" FLOAT, "213" FLOAT, "214" FLOAT, "215" FLOAT, "216" FLOAT, "217" FLOAT, "218" FLOAT, "219" FLOAT, "220" FLOAT, "221" FLOAT, "222" FLOAT, "223" FLOAT, "224" FLOAT, "225" FLOAT, "226" FLOAT, "227" FLOAT, "228" FLOAT, "229" FLOAT, "230" FLOAT, "231" FLOAT, "232" FLOAT, "233" FLOAT, "234" FLOAT, "235" FLOAT, "236" FLOAT, "237" FLOAT, "238" FLOAT, "239" FLOAT, "240" FLOAT, "241" FLOAT, "242" FLOAT, "243" FLOAT, "244" FLOAT, "245" FLOAT, "246" FLOAT, "247" FLOAT, "248" FLOAT, "249" FLOAT, "250" FLOAT, "251" FLOAT, "252" FLOAT, "253" FLOAT, "254" FLOAT, "255" FLOAT, "256" FLOAT, "257" FLOAT, "258" FLOAT, "259" FLOAT, "260" FLOAT, "261" FLOAT, "262" FLOAT, "263" FLOAT, "264" FLOAT, "265" FLOAT, "266" FLOAT, "267" FLOAT, "268" FLOAT, "269" FLOAT, "270" FLOAT, "271" FLOAT, "272" FLOAT, "273" FLOAT, "274" FLOAT, "275" FLOAT, "276" FLOAT, "277" FLOAT, "278" FLOAT, "279" FLOAT, "280" FLOAT, "281" FLOAT, "282" FLOAT, "283" FLOAT, "284" FLOAT, "285" FLOAT, "286" FLOAT, "287" FLOAT, "288" FLOAT, "289" FLOAT, "290" FLOAT, "291" FLOAT, "292" FLOAT, "293" FLOAT, "294" FLOAT, "295" FLOAT, "296" FLOAT, "297" FLOAT, "298" FLOAT, "299" FLOAT, "300" FLOAT, "301" FLOAT, "302" FLOAT, "303" FLOAT, "304" FLOAT, "305" FLOAT, "306" FLOAT, "307" FLOAT, "308" FLOAT, "309" FLOAT, "310" FLOAT, "311" FLOAT, "312" FLOAT, "313" FLOAT, "314" FLOAT, "315" FLOAT, "316" FLOAT, "317" FLOAT, "318" FLOAT, "319" FLOAT, "320" FLOAT, "321" FLOAT, "322" FLOAT, "323" FLOAT, "324" FLOAT, "325" FLOAT, "326" FLOAT, "327" FLOAT, "328" FLOAT, "329" FLOAT, "330" FLOAT, "331" FLOAT, "332" FLOAT, "333" FLOAT, "334" FLOAT, "335" FLOAT, "336" FLOAT, "337" FLOAT, "338" FLOAT, "339" FLOAT, "340" FLOAT, "341" FLOAT, "342" FLOAT, "343" FLOAT, "344" FLOAT, "345" FLOAT, "346" FLOAT, "347" FLOAT, "348" FLOAT, "349" FLOAT, "350" FLOAT, "351" FLOAT, "352" FLOAT, "353" FLOAT, "354" FLOAT, "355" FLOAT, "356" FLOAT, "357" FLOAT, "358" FLOAT, "359" FLOAT, "360" FLOAT, "361" FLOAT, "362" FLOAT, "363" FLOAT, "364" FLOAT, "365" FLOAT, "366" FLOAT, "367" FLOAT, "368" FLOAT, "369" FLOAT, "370" FLOAT, "371" FLOAT, "372" FLOAT, "373" FLOAT, "374" FLOAT, "375" FLOAT, "376" FLOAT, "377" FLOAT, "378" FLOAT, "379" FLOAT, "380" FLOAT, "381" FLOAT, "382" FLOAT, "383" FLOAT)
    USING
    
    APPLY_COMMAND('python embedding.py')
    ENVIRONMENT('{oaf_name}')
    STYLE('csv')
    delimiter(',') 
) as d;'''


# execute the query on the server, and return the first four columns of the first record
execute_sql(qry).fetchone()[:4]

[9429,
 'I bought this dress because i saw it in stores when it first came to retailer. it was stunning. the colors were vivid and the beading was intricate. i regretted not purchasing it immediately.       fast forward to today... the dress i received was not the one that was originally in stores. the fabric was different  there was hardly any beading  and the colors were washed out and dull.     i read previous reviews saying that the dress was more like the studio pictures than the original promo. but t',
 -0.056448906660079956,
 0.04960934817790985]

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Simplify the APPLY using a view</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Construct a database view using the query above.  Now this simple query can be embedded in operational processing, ETL functions, or the like.</p> 

In [23]:
view_qry = f'''REPLACE VIEW simplified_apply_V AS {qry}
'''

execute_sql(view_qry);

In [24]:
df = pd.read_sql('SELECT TOP 10 * FROM simplified_apply_V', eng)

In [25]:
df

comment_id,comment_text,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383
Loading ITables v2.1.4 from the init_notebook_mode cell... (need help?),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


<hr>
<p style = 'font-size:24px;font-family:Arial;color:#00233C'><b>Conclusion - Operationalizing AI-powered analytics</b></p>



<p style = 'font-size:16px;font-family:Arial;color:#00233C'>The preceding demo showed two methods for operationalizing the model execution; using python syntax, or embedding it as a simplified SQL view.  The former allows for developers and data scientists to easily embed this processing in their existing or new applications and workflows.  The latter allows for broad, democratized adoption across the data lifecycle and enterprise - enabling this analytic processing in ETL for data prep and transformation tasks, and in production to power dashboards and/or BI tools.</p>

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#00233C'><b>Cleanup</b></p>

<p style = 'font-size:16px;font-family:Arial;color:#00233C'>Stop the cluster, remove the environment (if desired), and drop views created in the demonstrations.  Finally, disconnect from the database.</p>

In [None]:
res = check_cluster_stop(compute_group)

In [None]:
# uninstall the libraries from the environment first before removing it
# demo_env.uninstall_lib(libs = demo_env.libs['name'].to_list())
# remove_env(oaf_name)

In [None]:
execute_sql('DROP VIEW simplified_apply_V;');
execute_sql('DROP VIEW prepared_data_V;');

In [26]:
remove_context()

True