<header style="padding:10px;background:#f9f9f9;border-top:3px solid #00b2b1"><img id="Teradata-logo" src="https://www.teradata.com/Teradata/Images/Rebrand/Teradata_logo-two_color.png" alt="Teradata" width="220" align="right" />

<p style = 'font-size:28px;font-family:Arial;color:#E37C4D'><b>VantageCloud Lake Systems Scaling and Monitoring</b></p>
<hr>

<br>

<b style = 'font-size:24px;font-family:Arial;color:#E37C4D'>Demo 3 - VantageCloud Lake Systems and User Monitoring Deep-Dive</b>

<p style = 'font-size:16px;font-family:Arial'>This notebook will illustrate various example queries using Python and SQL to monitor system health, user sessions, and historical events.</p>

<p style = 'font-size:16px;font-family:Arial'>The demonstration consists of short vignettes that illustrate some basic query patterns.</p>
<ol style = 'font-size:16px;font-family:Arial'>
    <li><b style = 'color:#00b2b1'>Connect to the VantageCloud Lake System</b> - Connect as a user with access to the metrics service and performance monitoring functions</li>
    <li><b style = 'color:#00b2b1'>Current Resource Utilization</b> - The current system utilization</li>
    <li><b style = 'color:#00b2b1'>Historic Resource Utilization</b> - Queries showing how to query historical resource usage data</li>
    <li><b style = 'color:#00b2b1'>Cluster Events</b> - Queries to analyze what compute resources were available when</li>
    <li><b style = 'color:#00b2b1'>Active User and Session Monitoring</b> - For active sessions, queries that monitor users and SQL steps and text.  This requires a running workload provided in Demo 1</li>
    <li><b style = 'color:#00b2b1'>Query Logging</b> - Database Query Logs</li>
    </ol>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Python Package Imports</p>

<p style = 'font-size:16px;font-family:Arial'>
Import the required packages including teradata client packages (teradataml), as well as packages for data management and display control.</p>

In [None]:
import warnings
warnings.filterwarnings('ignore')

import json
from teradataml import *
import pandas as pd
import numpy as np

from IPython.display import display


<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Connect to Vantage</p>

<p style = 'font-size:16px;font-family:Arial'>Before performing any operations in Vantage, we need to connect to the system.  The below code will read in a variables file (vars.json - this has been used in prior environment setup and data engineering examples) and will connect to Vantage with this information.  The Vantage connection is referred to as a "Context" - a common python-rdbms connection architecture.</p> 

In [None]:
# create a local dictionary of environment-specific variables

# load vars json
with open('../../vars.json', 'r') as f:
    session_vars = json.load(f)

# Use the "data_engineer" and Business compute group from the base setup
host = session_vars['environment']['host']
username = session_vars['hierarchy']['users']['business_users'][0]['username']
password = session_vars['hierarchy']['users']['business_users'][0]['password']

eng = create_context(host = host, username = username, password = password)

# confirm connection
print(eng)

<hr>
<p style = 'font-size:24px;font-family:Arial;color:#E37C4D'><b>Compute Cluster Information</b></p>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>All Compute Profiles</p>

<p style = 'font-size:16px;font-family:Arial'>Get all configured Compute Profiles in all Compute Groups, and list their status and internal identifiers.  Note that Compute Profiles that have been configured with more than one instance will have those instances listed here regardless of whether they are active or not.</p> 

In [None]:
qry = '''
/* gets compute cluster name, internal identifiers, and status */
select
    g.ComputeGroupName,
    p.ComputeProfileName,
    g.ComputeGroupUniqName,
    p.ComputeProfileUniqName,
    s.InstanceName, 
    s.CurrentState,
    s.LastReqState
from DBC.ComputeGroupsV g, DBC.ComputeProfilesV p, DBC.ComputeStatusV s
where
    g.ComputeGroupName=p.ComputeGroupName and
    p.ComputeProfileName=s.ComputeProfileName
ORDER BY 1,6
    ;'''

pd.read_sql(qry, eng)

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>All Active Profiles</p>

<p style = 'font-size:16px;font-family:Arial'>This is helpful to monitor just the active Compute Profile instances.  During times of scaling - or - if the Profile has been configured to have multiple active instances, each active instance will have a unique InstanceName.</p> 

In [None]:
qry = '''
/* gets compute cluster name, internal identifiers */
select
    g.ComputeGroupName,
    p.ComputeProfileName,
    s.ComputeInstanceType,
    g.ComputeGroupUniqName,
    p.ComputeProfileUniqName,
    s.InstanceName 
from DBC.ComputeGroupsV g, DBC.ComputeProfilesV p, DBC.ComputeStatusV s
where
    g.ComputeGroupName=p.ComputeGroupName and
    p.ComputeProfileName=s.ComputeProfileName
    and s.CurrentState='Active'
    ;'''
pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:24px;font-family:Arial;color:#E37C4D'><b>Current System Performance</b></p>

<p style = 'font-size:16px;font-family:Arial'><a href = 'https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Monitoring-Performance-and-Resource-Usage/Performance-and-Query-Monitoring/Single-Operational-View-APIs'>Single Operational View APIs</a> provide the ability to view current system and query performance.  See the documentation for the full set of APIs and parameters. Some examples follow</p> 

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>CPU Utilization for Compute Groups</p>

<p style = 'font-size:16px;font-family:Arial'>Use the MonitorPhysicalResourceSV API and the configuration information to view CPU utilization for the each Compute Profile Instance.</p> 

In [None]:
qry = '''
/* overall CPU for a compute group, associated profile, and instance*/

SELECT s.InstanceName, p.ComputeProfileName, g.ComputeGroupName,  AVERAGE(dt.CPUUse) as CPUUse FROM
     syslib.MonitorPhysicalResourceSV(USING details('1')) dt, DBC.ComputeGroupsV g, DBC.ComputeProfilesV p, DBC.ComputeStatusV s

WHERE dt."Group" IS NOT NULL AND
         s.CurrentState='Active' AND
         dt."Group" = g.ComputeGroupUniqName AND
         g.ComputeGroupName=p.ComputeGroupName AND
         p.ComputeProfileName=s.ComputeProfileName
GROUP BY 1,2,3'''

pd.read_sql(qry, eng)

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>All Physical Resource Utilization for Primary Groups</p>

<p style = 'font-size:16px;font-family:Arial'>Use the MonitorPhysicalResourceSV API alone to view "primary cluster" resources.</p> 


In [None]:
qry = '''
SELECT  * FROM
     syslib.MonitorPhysicalResourceSV(USING details('1')) dt
WHERE "Type" = 'primary cluster'
ORDER BY 1'''

pd.read_sql(qry, eng)

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Select Physical Resource Utilization for all system components</p>

<p style = 'font-size:16px;font-family:Arial'>Use the MonitorPhysicalResourceSV API alone to view all components, select resource utilization.</p> 

In [None]:
qry = '''
/* Per-node, per-instance, high level information */

SELECT dt."ProcID" as "ProcID",
    dt."Type" as "Type",
    dt."Status" as "Status",
    dt."AmpCount" as "AmpCount",
    dt."Group" as "Group", 
    dt."Name" as "Name",
    dt.CPUUse as CPUUse 

    FROM
     syslib.MonitorPhysicalResourceSV(USING details('1')) AS dt;
     '''

pd.read_sql(qry, eng)

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Virtual Resource Utilization</p>

<p style = 'font-size:16px;font-family:Arial'>Use the MonitorVirtualResource API alone to view Virtual Resource utilization on the Primary Cluster.</p> 

In [None]:
qry = '''
/* Virtual Resources - doesn't show Compute Clusters */
select * from syslib.MonitorVirtualResource(USING details('1')) as x order by 1; 
'''

pd.read_sql(qry, eng)

<hr>
<p style = 'font-size:24px;font-family:Arial;color:#E37C4D'><b>Historic System Utilization</b></p>

<p style = 'font-size:16px;font-family:Arial'><a href = 'https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Monitoring-Performance-and-Resource-Usage/Performance-and-Query-Monitoring/Performance-Trends-and-History/Monitoring-Resource-Usage'>ResUsage</a>
<br>
Each resource usage foreign table represents system usage data from a specific perspective collected during the logging period. The logging period is a system defined interval of time during which usage data is accumulated (specified in the NominalSecs column in each table). At the end of the logging interval, usage data is written to each of the resource usage foreign tables. Views have been created that are named similarly to the foreign tables in your unmanaged object storage.</p>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>CPU Utilization</p>

<p style = 'font-size:16px;font-family:Arial'>If you need detailed statistics for each CPU within the system, use this view to evaluate parallelism of CPUs within a node.  Join to DBC tables to get the Instance, Profile, and Group for each component, and calculate percentages.</p> 

In [None]:
qry = '''
SELECT TOP 5 *
    FROM
    TD_METRIC_SVC.resusagescpuV;
     '''
display(pd.read_sql(qry, eng))

qry = '''
SELECT TOP 10 
    TO_TIMESTAMP(rs.TheTimestamp) TheTimestamp, 
    rn.ComputeGroupName, 
    rn.ComputeProfileName, 
    rn.InstanceName,
    rs.cpuid,
    rs.CentiSecs, 
    rs.CPUIdle/rs.CentiSecs pctIdle, 
    rs.CPUIoWait/rs.CentiSecs pctIoWait, 
    rs.CPUUServ/rs.CentiSecs pctUserv, 
    rs.CPUUexec/rs.CentiSecs pctUexec

FROM TD_METRIC_SVC.resusagescpuV rs

JOIN (SELECT pr."Id", pr."Group", pr."ProcID", g.ComputeGroupName, p.ComputeProfileName, s.InstanceName
            FROM syslib.MonitorPhysicalResourceSV(USING details('1')) pr, DBC.ComputeStatusV s, DBC.ComputeProfilesV p, DBC.ComputeGroupsV g
            WHERE pr."Id" LIKE '%' || s.InstanceName || '%' AND
                 g.ComputeGroupName=p.ComputeGroupName and
                 p.ComputeProfileName=s.ComputeProfileName) rn
ON rn.ID = rs.path_component_id
'''

display(pd.read_sql(qry, eng))

<hr>
<p style = 'font-size:24px;font-family:Arial;color:#E37C4D'><b>Historic Cluster Events</b></p>

<p style = 'font-size:16px;font-family:Arial'><a href = 'https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Monitoring-Performance-and-Resource-Usage/Performance-and-Query-Monitoring/Performance-Trends-and-History/Monitoring-Compute-Cluster-Events'>Monitoring Compute Cluster Events</a>
<br>
Compute clusters are elastic compute resources that can be created, started, stopped, hibernated and released dynamically to provide the necessary compute resources when needed, and removed when no longer needed. To better understand the behavior of these compute resources, you can leverage the views and underlying tables to gain insight into the state of your compute clusters and the events that changed a compute cluster state.</p>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Compute Profile History</p>

<p style = 'font-size:16px;font-family:Arial'>The ComputeProfileHistoryV view can be used to:</p>
<ul style = 'font-size:16px;font-family:Arial'>
<li>Determine when a compute cluster autoscaled up or down.
<li>Identify events that bring a compute cluster online or offline.
<li>Validate the identify overlap between different compute profiles in a compute group.
</ul>

In [None]:
qry = '''
SELECT TOP 20 *
    FROM
    td_metric_svc.ComputeProfileHistoryV
    ORDER BY EventTime DESC;
     '''
display(pd.read_sql(qry, eng))

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Compute Profile History Events</p>

<p style = 'font-size:16px;font-family:Arial'>The ComputeProfileHistoryEventsV view can be used to:</p>
<ul style = 'font-size:16px;font-family:Arial'>
<li>Compute the amount of time during a month when compute clusters were available.
<li>Identify time periods when compute clusters were hibernated.
</ul>

In [None]:
qry = '''
SELECT TOP 10 *
    FROM
    td_metric_svc.ComputeProfileHistoryEventsV
    ORDER BY StartTime DESC;
     '''
display(pd.read_sql(qry, eng))

<hr>
<p style = 'font-size:24px;font-family:Arial;color:#E37C4D'><b>Active User And Query Monitoring</b></p>

<p style = 'font-size:16px;font-family:Arial'><a href = 'https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Monitoring-Performance-and-Resource-Usage/Performance-and-Query-Monitoring/Single-Operational-View-APIs'>Single Operational View APIs</a> provide the ability to view current system and query performance.  See the documentation for the full set of APIs and parameters. Some examples follow</p>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Monitoring Sessions</p>

<p style = 'font-size:16px;font-family:Arial'>Use this single operational view API for session or request resource usage statistics across VantageCloud Lake nodes. This API can return session information across VantageCloud Lake topology (primary clusters and compute clusters). This API can also be used to trace query session to a compute node.</p>

<ol style = 'font-size:16px;font-family:Arial'>
    <li>First, show all sessions</li>
    <li>Next, show all sessions with active queries, connected to the primary cluster</li>
    </ol>


In [None]:
qry = '''
SELECT TOP 5 *
  FROM MONITORSESSIONSV(   
 USING HOSTID('-1')
       USERNAME('*')
       SESSIONID('0')
       ) AS DT;
       '''
display(pd.read_sql(qry, eng))

qry = '''
SELECT UserName, SessionNo, LogonPENo
FROM MONITORSESSIONSV(   
USING HOSTID('-1')
       USERNAME('*')
       SESSIONID('0')
       ) AS DT
WHERE AmpState = 'ACTIVE'
    AND HostID = 1
ORDER BY 1,2;
       '''
display(pd.read_sql(qry, eng))

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Parent/Child Sessions, SQL Steps, SQL Text</p>

<p style = 'font-size:16px;font-family:Arial'>MonitorSessionSV will return information about logged-in sessions for both the Parent/Primary and Child/Compute "users". This session information can be used to monitor active session SQL Steps and SQL Text using additional APIs.
<br>
<b>Note, there should be active Queries on the system for this section.  Ensure the Lab 1 workload is still running. If you see the message "No Active Sessions to retrieve SQL" rerun the workload generation.</b></p>

<ol style = 'font-size:16px;font-family:Arial'>
    <li>Get the current active session HostID, SessionNo, and VProc</li>
    <li>Join Parent to Child Sessions</li>
    <li>Take the first session, pass Session Number and VProc to MonitorSQLStepsSV</li>
    <li>Take the same session, pass Session Number and VProc to 
    </ol>

In [None]:

qry = '''
/* First - get the current active session HostID, SessionNo, and VProc */

SELECT HostId, SessionNo, RunVprocNo, Username from MonitorSessionSV(    
  USING
    hostId('-1')
    userName('*')
    sessionId('0')
    details('1')
) 
AS dt
WHERE AmpState='ACTIVE';'''

print('Active Sessions:')
display(pd.read_sql(qry, eng))

qry = '''

/* Second, join Parent to Child Sessions */
SELECT dt.HostId PrimaryHost, 
    dt.SessionNo PrimarySession, 
    dt.RunVProcNo PrimaryVProc, 
    dc.SessionNo ComputeSession, 
    dc.HostId ComputeHost, 
    dc.RunVProcNo ComputeVproc, 
    dt.UserName PrimaryUser, 
    dc.UserName ComputeUser
FROM MonitorSessionSV(    
  USING
    hostId('-1')
    userName('*')
    sessionId('0')
    details('1')
) 
AS dt,
MonitorSessionSV(    
  USING
    hostId('-1')
    userName('*')
    sessionId('0')
    details('1')
) AS dc
WHERE dt.AmpState='ACTIVE' AND
    dt.ParentSessionNo = 0 AND
    dc.ParentSessionNo > 0 AND
    dc.ParentSessionNo = dt.SessionNo;'''

df = pd.read_sql(qry, eng)
print('Parent/Child Sessions:')
display(df)

if len(df) > 0:
    sess = df['PrimarySession'].iloc[0]
    vproc = df['PrimaryVProc'].iloc[0]

    qry = f'''
    /* Pass these values into the function */
    SELECT StepNum, EstRowCount, ActRowCount, SQLStep
    FROM MonitorSQLStepsSV(
      USING
        HostIdIn('1')
        SessionNoIn('{sess}')
        RunVProcNo('{vproc}')
        details('1')
    ) AS dt;
    '''
    print('SQL Steps:')
    df = pd.read_sql(qry, eng)
    display(df)

    qry = f'''
    SELECT SessionNo, SQLTxt from MonitorSQLTextSV(
      USING
        HostIdIn('1')
        SessionNoIn('{sess}')
        RunVProcNo('{vproc}')
        details('1')
    ) AS dt;
    '''
    print('SQL Text:')
    display(pd.read_sql(qry, eng))

else:
    print('No Active Sessions to retrieve SQL')

<hr>
<p style = 'font-size:24px;font-family:Arial;color:#E37C4D'><b>Query History and Logging</b></p>

<p style = 'font-size:16px;font-family:Arial'><a href = 'https://docs.teradata.com/r/Teradata-VantageCloud-Lake/Monitoring-Performance-and-Resource-Usage/Performance-and-Query-Monitoring/Performance-Trends-and-History/Monitoring-Queries'>Database Query Logging (DBQL)</a> can be used to monitor prior queries and query performance.</p>

<hr>

<p style = 'font-size:18px;font-family:Arial;color:#E37C4D'>Query Logs - Compute Queries</p>

<p style = 'font-size:16px;font-family:Arial'>Uses the td_metrics_svc.dbqlogv table to retrieve query logs for queries running on a Compute Cluster. This query assumes there is a parent-child query ID relationship as above with Sessions.
<br><br>
Next, we can join this to some of the system queries to assemble a more complete picture of Compute instances used.</p>

 

In [None]:
qry = '''
-- This query returns info on the session and instance used

select TOP 10
    a.StartTime, 
    a.QueryID, 
    a.SessionID, 
    a.path_component_id PrimaryClusterID, 
    b.path_component_id ComputeClusterID, 
    b.QueryID ComputeQueryID, 
    a.querytext, 
    b.AcctString  
from td_metric_svc.dbqlogv a
inner join td_metric_svc.dbqlogv b on a.QueryID = b.ParentQueryID
order by a.StartTime DESC;'''

display(pd.read_sql(qry, eng))

qry = '''
--find User queries executed on Compute Group

SELECT TOP 10
    a.queryid, 
    a.UserName, 
    a.StartTime, 
    a.sessionid, 
    a.ProcID VProcID, 
    rn.ProcID, 
    a.querytext, 
    rn.ComputeGroupName, 
    rn.ComputeProfileName, 
    rn.InstanceName

/* First - get logged queries that have run on the Compute Group */
FROM td_metric_svc.dbqlogv a
inner join td_metric_svc.dbqlogv b 
    on a.QueryID = b.ParentQueryID

/* next, join the COGID from query log to the compute group friendly name */
JOIN (SELECT pr."Id", pr."Group", pr."ProcID", g.ComputeGroupName, p.ComputeProfileName, s.InstanceName
            FROM syslib.MonitorPhysicalResourceSV(USING details('1')) pr, DBC.ComputeStatusV s, DBC.ComputeProfilesV p, DBC.ComputeGroupsV g
            WHERE pr."Id" LIKE '%' || s.InstanceName || '%' AND
                 g.ComputeGroupName=p.ComputeGroupName and
                 p.ComputeProfileName=s.ComputeProfileName) rn
            ON rn.ID = b.path_component_id

GROUP BY 1,2,3,4,5,6,7,8,9,10
order by a.StartTime DESC;'''

display(pd.read_sql(qry, eng))

Copyright 2023, Teradata Corporation
VantageCloud Lake Systems Monitoring Lab Notebook v1.0