<header style="padding:10px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>VantageCloud Lake Systems Scaling and Monitoring</b></p>
<hr>

<br>

<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Demo 1 - Generate SQL Workload</b>

<p style = 'font-size:16px;font-family:Arial'>This notebook will users to create a concurrent workload agains the target system, which can be used as a baseline of activity, which allows for users to see system and performance monitoring, as well as exercise Compute Cluster scaling rules.</p>

<p style = 'font-size:16px;font-family:Arial'><b>Note</b> This demonstration assumes the compute group referenced in the vars.json has a profile with scaling capabilities.  If using the default environment template, the "CG_BusGrpA_STD" group will have a profile "CP_BusGrpA_STD_2_XSM" that has scaling capabilities set up.</p>

<b style = 'font-size:28px;font-family:Arial;color:#E37C4D'>Demonstration Overview</b>

<p style = 'font-size:16px;font-family:Arial'>This notebook consists of three primary demonstrations</p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li><b style = 'color:#00b2b1'>Workload Profile Setup</b> - Define the queries, concurrency, and duration of run</li>
    <li><b style = 'color:#00b2b1'>Workload Execution</b> - Submit the workload job for parallel execution</li>
    <li><b style = 'color:#00b2b1'>Thread monitoring and control</b> - Monitor the status of the connections, stop them if desired</li>
    </ol>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Python Package Imports</p>

<p style = 'font-size:16px;font-family:Arial'>
Import the required packages including teradata client packages (teradataml and teradatasql) as well as Python multithreading utilities (concurrent.futures and threading).  Note the local python file import Concurrency_Utils.py - this has some of the custom functions created to drive the lab.</p>

In [1]:
import warnings
warnings.filterwarnings('ignore')

import teradatasql, logging, time, math, json
from teradataml import *
import getpass
import datetime
import pandas as pd
import numpy as np
import concurrent.futures

from time import sleep
from random import random
from threading import current_thread, get_ident, get_native_id, Event
from IPython.display import display

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import clear_output
%matplotlib inline

from Concurrency_Utils import *

# set up logging for the threads
logging.basicConfig(format='%(asctime)s - %(message)s', 
                    filename = 'thread_status.log', 
                    filemode = 'w', 
                    level=logging.INFO)



<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Connect to Vantage</p>

<p style = 'font-size:16px;font-family:Arial'>Before performing any operations in Vantage, we need to read system-specific information about the users, hostnames, etc.  The below code will read in a variables file (vars.json - this has been used in prior environment setup and data engineering examples) and will connect to Vantage with this information.</p> 


In [2]:
# create a local dictionary of environment-specific variables

# load vars json
with open('../../vars.json', 'r') as f:
    session_vars = json.load(f)

# Use the "data_engineer" and Business compute group from the base setup
host = session_vars['environment']['host']
username = session_vars['hierarchy']['users']['business_users'][0]['username']
password = session_vars['hierarchy']['users']['business_users'][0]['password']
compute_group = session_vars['hierarchy']['users']['business_users'][0]['compute_group']

conn_info = {}
conn_info['host'] = host
conn_info['username'] = username
conn_info['password'] = password
conn_info['compute_group'] = compute_group


<hr>
<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>Demo 1 - Design the Query Workload</b></p>



<p style = 'font-size:16px;font-family:Arial'>
The following code will allow the user to customize a "workload profile" which will consist of groups of parallel threads executing a defined SQL query - each of those queries will run for a specific number of iterations or duration.
<br><br>
The default values here have been designed to generate a workload that will run for approximately an hour, and will be used to show activity against the system so one can run monitoring and system performance queries.
<br><br>
If the user wishes to customize this workload profile, either copy additional code blocks, and edit the queries, number of threads, iteration delay, and choose either iteration count or run time (in seconds).</p>

In [3]:
profile = []



#######################################################
######## instance definition ##########################

# This query will run for approximately 2 minutes
# reduce the row count mathematically to reduce runtime
qry = '''
SELECT * FROM TD_UnivariateStatistics (
  ON (SELECT * FROM retail_sample_data.sales_transaction_line_parquet_ft SAMPLE 10000000) AS InputTable
  USING
  TargetColumns ('UnitSellingPriceAmt', 'UnitCostAmt')
  Stats ('ALL')
) AS dt;
'''

# This method call will add the above query to the workload definition - provide the number of threads
# an iteration delay, and define the total run duration in iteration count OR time in seconds
profile.extend(add_workload(qry = qry, threads = 60, delay = 2, iterations = 60))

########################################################



#######################################################
######## instance definition ##########################

# Sample query to illustrate a different mix
# short query, use duration in seconds

qry = '''SELECT COUNT(*) FROM retail_sample_data.sales_transaction_line_parquet_ft;'''
profile.extend(add_workload(qry = qry, threads = 3, delay = 2, duration = 7200))

########################################################




#######################################################
######## instance definition ##########################

###.......####

########################################################



<hr>
<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>Demo 2- Initiate the Workload</b></p>



<p style = 'font-size:16px;font-family:Arial'>
The Python concurrent.futures module allows users to execute a function in a separate thread.  In this case, we will broadcast the function across all instances of our workload profile, and for each thread provide the query, the duration, delay, and connection information.</p>

In [4]:
# get the total number of threads in the workload profile
conn_info['num_cons'] = len(profile)
    
# create a thread pool object using concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(max_workers = conn_info['num_cons'])

# call the user function for each instance in my profile to execute them in parallel
f = [executor.submit(run_sql, q, d, e, i, dur, conn_info) for q, d, e, i, dur in profile]

# object named "f" is an iterable representing the threads

Started Thread 1212 at 2023-04-04 14:01:22.562023, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Thread 1213 at 2023-04-04 14:01:22.573624, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Thread 1214 at 2023-04-04 14:01:22.576063, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Thread 1215 at 2023-04-04 14:01:22.579141, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Thread 1216 at 2023-04-04 14:01:22.582218, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Thread 1217 at 2023-04-04 14:01:22.584708, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Thread 1218 at 2023-04-04 14:01:22.586853, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Thread 1219 at 2023-04-04 14:01:22.589339, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Thread 1220 at 2023-04-04 14:01:22.592644, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Thread 1221 at 2023-04-04 14:01:22.595126, 60 iterations, Query: "
SELECT * FROM TD_Un..."
Started Th

<hr>
<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>Demo 3 - Monitor and Control Threads</b></p>



<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>3.1 - Check Status</p>
<p style = 'font-size:16px;font-family:Arial'>
Sample code has been provided below which allows the user to check the status of the threads.  For the sake of brevity, only high-level information about the thread state (running, finished, exception, killed) is provided in this notebook.  Detailed logging is provided in a log file named "thread_status.log" for detailed information on the threads, query execution status, etc.
<br><br>
To view the log file, either double-click on it in the file browser, or open a Terminal and cat/tail the log file.</p>

In [12]:
i = 0

# iterate over the f object, check its state
# if not running, check the function return - failed, killed, completed gracefully

for r in f:
    i+=1
    print('--------------------------')
    status = 'Running' if r.running() else 'Finished'
    print(f'Thread {str(i)}: {status}')
    if not r.running(): print(f'---Result: {str(r.result())}')

--------------------------
Thread 1: Running
--------------------------
Thread 2: Running
--------------------------
Thread 3: Running
--------------------------
Thread 4: Running
--------------------------
Thread 5: Running
--------------------------
Thread 6: Running
--------------------------
Thread 7: Running
--------------------------
Thread 8: Running
--------------------------
Thread 9: Running
--------------------------
Thread 10: Running
--------------------------
Thread 11: Running
--------------------------
Thread 12: Running
--------------------------
Thread 13: Running
--------------------------
Thread 14: Running
--------------------------
Thread 15: Running
--------------------------
Thread 16: Running
--------------------------
Thread 17: Running
--------------------------
Thread 18: Running
--------------------------
Thread 19: Running
--------------------------
Thread 20: Running
--------------------------
Thread 21: Running
--------------------------
Thread 22: Runni

<hr>
<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>3.2 - Manually Stop the threads</p>
<p style = 'font-size:16px;font-family:Arial'>
A special object "event" was also passed to the threads when they were initiated.  We can call a "set()" method on this event which will be passed to the thread, and inside the thread, we will check this status, and exit the function if so. <b>Only run this code if one wishes to stop the workload prior to the scheduled exit duration/iterations.</b>
<br><br>
To see this logic, open the Concurrency_Utils.py file and note the 'run_sql' function definition.</p>

In [10]:
# use the keyword global to modify the object in the
# threads' namespace

# global profile
for  q, d, e, i, dur in profile:  
    # call the set() method
    e.set()

<hr>
<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>Workload is now running against the system; please see Demo 2 and 3 for real-time monitoring and system performance queries</b></p>

Copyright 2023, Teradata Corporation