# Db2 Database Health Check
This Notebook is designed to gather the informaiton that can easily be gathered via database connection to perform a health check on a Db2 database. This notebook must be run separately for each Db2 database, even though it also includes some instance-level checks. While some checks are automated, most of them require human judgement - this notebook mostly just provides the data.

## Instructions for running
1. Install Jupyter Notebook using Anaconda (https://www.anaconda.com/distribution/)
    - Anaconda can be installed on your laptop or a vm on your laptop - anywhere you can connect to the databases in question. This works well if you are working with it alone, or have to connect to different vpns to connect to different databases
    - Anaconda can be installed on a central VM or server that can connect to the databases you wish to work with. This works well if you are working with a team of DBAs and only need to work with databases on that one network. Anaconda works just fine on Ubuntu if you are looking for a free option
    - Anaconda can be installed directly on the database server. This is generally my last choice, as I would rather not run an http server on my database server. I also rarely only care about one database server.
1. Copy this notebook to the computer you've installed Jupyter Notebook on. I'll refer to this as your Jupyter Notebook server. 
1. Create a separate file to store enviornment variables. I've called mine ember_variables.py, and I run in in a cell below. This allows you to easily share the notebook without also sharing your ids and passwords and other sensative information. This also makes using git or other source control easy so you can keep notebooks updated across multiple locations. The format for this file is laid out below.
1. I strongly recommend using a table of contents via the nbextensions module to configure a table of contents to navigate this document. The options for this should appear on the bottom of the edit menu after you have installed the libraries in the first code cell.
1. Cells up through the database connection one should be run one by one to immediately detect and deal with errors. After the database connection cell, all further cells can be run using "Run All Below" from the Cell menu.

### Format for variables file
The ember_variables.py file has a format like this:
```python
NA1_User='yourid'
NA1_PW='yourpw'

NA1_Host='server1.example.com'
NA1_insts = ('db2inst1', 'db2inst2', 'db2inst3', 'db2inst4')
NA1_ports = {'db2inst1': 50001, 'db2inst2': 50002, 'db2inst3':50003, 'db2inst4':50004}
NA1_dbs = {'db2inst1': ['SAMPLE1'], 'db2inst2': ['SAMPLE2'], 'db2inst3':['SAMPLE3'], 'db2inst4':['SAMPLE4','SAMPLE5']}
```
Feel free to structure things differently and if you have any good ideas in this area, please share them.

## Set up the enviornment
### Install Libraries
Run the following cell if it is the first time using this notebook on a specific jupyter notebook server. If anything is installed, restart the kernel using the 'Kernel' menu at the top of this notebook

In [None]:
import sys,os,os.path
os.environ['IBM_DB_HOME']='C:\Program Files\IBM\SQLLIB'

# Check to see if the libraries already have been installed
import importlib

# Check for ibm_db_sa.  If it exists, it's safe to assume that the other requirements
# are already installed.
spec = importlib.util.find_spec("ibm_db_sa")
if spec is None:
    print("Installing prerequisites.")
    !pip install ipython-sql
    !pip install "ibm-db==2.0.8a"
    !pip install ibm_db_sa
else:
    print("sql magic, ibm_db and ibm_db_sa already installed.")
spec = importlib.util.find_spec("jupyter_contrib_nbextensions")
if spec is None:
    print("Installing prerequisites.")
    !pip install jupyter_contrib_nbextensions
    !pip install jupyter_nbextensions_configurator
else:
    print("jupyter_contrib_nbextensions is already installed.")


Restart the Kernel if this is your first time installing the above. The next steps will fail unless you do this.

### Import the modules and load the SQL magic
Required each time the kernel for this notebook is started or restarted

In [None]:
import ibm_db
import ibm_db_sa
import sqlalchemy
%load_ext sql
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
import matplotlib.dates as mdates
from datetime import datetime
import pandas as pd
from IPython.display import display, HTML, Markdown
import nbextensions

%matplotlib inline

### Set Basic Variables and Connect to Database
Connect to the database. Change the values in your variables file to match the environment you're connecting to. The format for this file is provided above.

In [None]:
# Define filename for passwords
filename = 'ember_variables.py'
# source the file
%run $filename

In [None]:
# This is the database connection Cell
user=NA1_User
host=NA1_Host
inst='b2cnapr'

password=NA1_PW
db=NA1_dbs[inst][0]
port=NA1_ports[inst]

%sql db2+ibm_db://$user:$password@$host:$port/$db

After you have a successful connection, all further cells in this notebook can be executed using the "Run All Below" option on the Cell Menu

In [None]:
#Configure SQL Magic in a few nice ways
%config SqlMagic.style = 'MSWORD_FRIENDLY'
pd.set_option('max_rows', 4096)
pd.set_option('max_columns', 4096)

In [None]:
# Functions that may be used in multiple other cells
def highlight_equals(s,threshold,column):
    is_max = pd.Series(data=False, index=s.index)
    is_max[column] = s.loc[column] == threshold
    print(type(is_max))
    return ['background-color: yellow' if is_max.any() else '' for v in is_max]

## Overall System Information<a class="anchor" id="system-info"></a>

In [None]:
%%sql system_info << SELECT OS_NAME 
    , HOST_NAME 
    , OS_FULL_VERSION 
    , OS_KERNEL_VERSION 
    , OS_ARCH_TYPE 
    , CPU_TOTAL 
    , CPU_ONLINE 
    , CPU_CONFIGURED 
    , CPU_SPEED 
    , CPU_HMT_DEGREE 
    , CPU_CORES_PER_SOCKET 
    , MEMORY_TOTAL 
    , MEMORY_FREE 
    , VIRTUAL_MEM_TOTAL 
    , VIRTUAL_MEM_RESERVED 
    , VIRTUAL_MEM_FREE 
    , CPU_LOAD_SHORT 
    , CPU_LOAD_MEDIUM 
    , CPU_LOAD_LONG 
    , CPU_USAGE_TOTAL 
    , CPU_USER 
    , CPU_IDLE 
    , CPU_IOWAIT 
    , CPU_SYSTEM 
    , SWAP_PAGE_SIZE 
    , SWAP_PAGES_IN 
    , SWAP_PAGES_OUT 
FROM TABLE(SYSPROC.ENV_GET_SYSTEM_RESOURCES()) AS T 
with ur

In [None]:
# Print out System information
sys_info_df=system_info.DataFrame()
sys_info_df

In [None]:
# Automatic Checks for Server Info
display(Markdown("### CHECK: Paging space is either 50% of physical memory or a maximum of 16 GB"))
virt_mem=int(sys_info_df.loc[0]['virtual_mem_total']) - int(sys_info_df.loc[0]['memory_total'])
display(Markdown("Physical Memory:"+str(sys_info_df.loc[0]['memory_total'])))
display(Markdown("Virtual Memory:"+str(virt_mem)))
if int(sys_info_df.loc[0]['memory_total']/2) < 16384 :
    thresh=int(sys_info_df.loc[0]['memory_total']/2)
else :
    thresh=16384
if int( virt_mem ) < thresh :
    display(Markdown("**Fail**: Virtual memory of **"+str(sys_info_df.loc[0]['virtual_mem_total']-sys_info_df.loc[0]['memory_total'])+"** is less than **"+str(thresh)+"**. It should be greater than or equal."))
else:
    print("Virtual Memory size of "+str(sys_info_df.loc[0]['virtual_mem_total'])+" is acceptable given the real memory size of "+str(sys_info_df.loc[0]['memory_total'])+".")


## Db2 Version and Fix Pack

In [None]:
%%sql level_info << SELECT INST_NAME 
    , IS_INST_PARTITIONABLE 
    , NUM_DBPARTITIONS 
    , INST_PTR_SIZE 
    , RELEASE_NUM 
    , SERVICE_LEVEL 
    , BLD_LEVEL 
    , PTF 
    , FIXPACK_NUM 
    , NUM_MEMBERS 
FROM SYSIBMADM.ENV_INST_INFO 
with ur

In [None]:
# Print out Db2 Level information
level_info_df=level_info.DataFrame()
level_info_df

## Db2 Licensed Product(s)

In [None]:
%%sql license_info << SELECT INSTALLED_PROD 
    , INSTALLED_PROD_FULLNAME 
    , LICENSE_INSTALLED 
    , PROD_RELEASE 
    , LICENSE_TYPE 
from SYSIBMADM.ENV_PROD_INFO 
with ur

In [None]:
# Print out License information, and highlight items that are actually licensed
lic_info_df=license_info.DataFrame()
lic_info_df.style.apply(highlight_equals,threshold='Y',column=['license_installed'], axis=1)

## Db2 Configuration
### Db2 Registry

In [None]:
%%sql registry_settings << SELECT REG_VAR_NAME 
    , REG_VAR_VALUE 
    , IS_AGGREGATE 
    , AGGREGATE_NAME 
    , LEVEL 
from SYSIBMADM.REG_VARIABLES 
with ur

In [None]:
#Print out Db2 Registry Settings
reg_df=registry_settings.DataFrame()
reg_df=reg_df.set_index('reg_var_name')
reg_df

### DBM CFG

In [None]:
%%sql dbm_settings << SELECT NAME 
    , VALUE 
    , VALUE_FLAGS 
    , DEFERRED_VALUE 
    , DEFERRED_VALUE_FLAGS 
from SYSIBMADM.DBMCFG 
with ur

In [None]:
# Print out all DBM cfg settings
dbm_df=dbm_settings.DataFrame()
dbm_df=dbm_df.set_index('name')
pd.set_option('display.max_rows', 4096)
display(dbm_df)

#### Automatic Checks for DBM CFG

In [None]:
# Automatic Checks for DBM CFG
# Check for deferred settings not in effect
for index, row in dbm_df.iterrows():
    if row['VALUE'] != row['deferred_value']:
        display(Markdown("**WARNING** Deferred value not in effect! "+index.upper()+" value of **"+str(row['VALUE'])+"** and deferred value of **"+str(row['deferred_value'])+"** are different!"))
# Verify diag level is 3
if int(dbm_df.loc['diaglevel']['VALUE']) == 3:
    print("DIAGLEVEL is 3")
else:
    display(Markdown("**WARNING** DIAGLEVEL is "+str(dbm_df.loc['diaglevel']['VALUE'])+", not 3"))
# Verify notify level is 3
if int(dbm_df.loc['notifylevel']['VALUE']) == 3:
    print("NOTIFYLEVEL is 3")
else:
    display(Markdown("**WARNING** NOTIFYLEVEL is "+str(dbm_df.loc['notifylevel']['VALUE'])+", not 3"))
# Verify AUTHENTICATION is not CLIENT
if dbm_df.loc['authentication']['VALUE'] == 'CLIENT':
    display(Markdown("**WARNING** AUTHENTICATION is "+str(dbm_df.loc['authentication']['VALUE'])))
else:
    print("AUTHENTICATION is "+dbm_df.loc['authentication']['VALUE'])
# Check to see if SVCENAME is a number or a string, and if it's set to a common default
try:
    svcename=int(dbm_df.loc['svcename']['VALUE'])
except:
    print ("SVCENAME is "+str(dbm_df.loc['svcename']['VALUE'])+".")
else:
    display(Markdown("**WARNING** SVCENAME is "+str(dbm_df.loc['svcename']['VALUE'])+". This is a number, and it is better to set SVCENAME to a service name that is defined in /etc/services."))   
    if (svcename >= 50000 and svcename < 50011) or (svcename >= 60000 and svcename < 60011)  :
        display(Markdown("**WARNING** SVCENAME is "+str(dbm_df.loc['svcename']['VALUE'])+". This number is an often-used default."))
# Verify SYSMON_GROUP is set to something
if dbm_df.loc['sysmon_group']['VALUE'] is None:
    display(Markdown("**WARNING** SYSMON_GROUP is not set. Recommend setting it to something"))
else:
    print ("SYSMON_GROUP is "+str(dbm_df.loc['sysmon_group']['VALUE'])+".")

#### DBM CFG Values to Pay Special Attention to in Manual Verification

In [None]:
# Print DBM Cfg settings to pay special attention to, which are not included in the automatic warnings
dft_mon_count = 0
sys_group_count = 0
# Report values of INTRA_PARALLEL and INSTANCE_MEMORY
display(Markdown("INTRA_PARALLEL value of "+str(dbm_df.loc['intra_parallel']['VALUE'])))
display(Markdown("INSTANCE_MEMORY value of "+str(dbm_df.loc['instance_memory']['VALUE'])+" 4K pages, automatic setting of "+str(dbm_df.loc['instance_memory']['value_flags'])))
# Report values of DFT_MON parameters
for index, row in dbm_df.iterrows():
    # Report values of DFT_MON parameters
    if index.startswith('dft_mon') :
        if dft_mon_count == 0:
            display(Markdown("DFT_MON settings:"))
            dft_mon_count += 1
        if len(index.upper()) < 16: 
            print(index.upper()+"		="+str(row['VALUE']))
        else: 
            print(index.upper()+"	="+str(row['VALUE']))
    # Report values of system groups
    elif index.endswith('group') :
        if sys_group_count == 0 :
            display(Markdown("System groups:"))
            sys_group_count += 1
        print(index.upper()+"		="+str(row['VALUE']))

## DB CFG

In [None]:
%%sql db_settings << SELECT NAME 
        , VALUE 
        , VALUE_FLAGS 
        , DEFERRED_VALUE 
        , DEFERRED_VALUE_FLAGS 
    from SYSIBMADM.DBCFG 
    with ur

In [None]:
display(Markdown('### Database Configuration Information for '+db+' on '+inst))
db_df=db_settings.DataFrame()
db_df=db_df.set_index('name')
pd.set_option('display.max_rows', 4096)


#### Automatic Checks for this Database

In [None]:
## Automatic Checks for this Database
# LOCKTIMEOUT is set to something and is less than 120
if int(db_df.loc['locktimeout']['VALUE']) == -1 :
    display(Markdown("**WARNING** LOCKTIMEOUT is -1."))
elif int(db_df.loc['locktimeout']['VALUE']) > 180  :
    display(Markdown("**WARNING** LOCKTIMEOUT is "+str(db_df.loc['locktimeout']['VALUE'])+"."))
else :
    display(Markdown("LOCKTIMEOUT is "+str(db_df.loc['locktimeout']['VALUE'])+"."))

In [None]:
# IF HADR, then BLOCKNONLOGGED
## NOT WORKING when HADR is OFF
if (db_df.loc['hadr_local_host']['VALUE'] is not None and db_df.loc['hadr_remote_host']['VALUE'] is not None and db_df.loc['hadr_local_svc']['VALUE'] is not None and db_df.loc['hadr_remote_svc']['VALUE'] is not None) or (db_df.loc['hadr_target_list']['VALUE'] is not None) :
    if db_df.loc['blocknonlogged']['VALUE'].upper() == 'NO':
        display(Markdown("**WARNING** HADR appears to be configured, but BLOCKNONLOGGED is set to NO."))

In [None]:
# TRACKMOD ON
if db_df.loc['trackmod']['VALUE'].upper() == 'OFF' or db_df.loc['trackmod']['VALUE'].upper() == 'NO' or db_df.loc['trackmod']['VALUE'].upper() == 'FALSE' :
    display(Markdown("**WARNING** TRACKMOD is set to NO."))
else:
    display(Markdown("TRACKMOD is set to "+db_df.loc['trackmod']['VALUE'].upper()+"."))

In [None]:
# DFT QUERYOPT is 5
if int(db_df.loc['dft_queryopt']['VALUE']) != 5 :
    display(Markdown("**WARNING** dft_queryopt has been changed from the default of 5. It is set to "+db_df.loc['dft_queryopt']['VALUE']))
else :
    display(Markdown("DFT_QUERYOPT is 5"))

In [None]:
# DFT_DEGREE (is 1)
if int(db_df.loc['dft_degree']['VALUE']) != 1 :
    display(Markdown("**WARNING** DFT_DEGREE has been changed from the default of 1. It is set to "+db_df.loc['dft_degree']['VALUE']))
else :
    display(Markdown("DFT_DEGREE is 1"))

In [None]:
# Verify infinite logging is not used
if int(db_df.loc['logsecond']['VALUE']) == -1 :
    display(Markdown("**WARNING** infinite logging is in use. LOGSECOND is set to "+db_df.loc['logsecond']['VALUE']))
else :
    display(Markdown("LOGSECOND is "+db_df.loc['logsecond']['VALUE']))

#### Manual Checks for this database

In [None]:
## Manual Checks for this Database
# STMT_CONC
display(Markdown("STMT_CONC is set to "+db_df.loc['stmt_conc']['VALUE']))
# CUR_COMMIT
display(Markdown("CUR_COMMIT is set to "+db_df.loc['cur_commit']['VALUE']))
# Encrypted?
display(Markdown("ENCRYPTED_DATABASE is set to "+db_df.loc['encrypted_database']['VALUE']))
# SELF_TUNING_MEM and all STMM areas
display(Markdown("#### Settings related to STMM :"))
display(Markdown("SELF_TUNING_MEM is set to "+db_df.loc['self_tuning_mem']['VALUE']))
print("	INSTANCE_MEMORY is set to "+dbm_df.loc['instance_memory']['VALUE']+" - "+dbm_df.loc['instance_memory']['value_flags'])
print("	DATABASE_MEMORY is set to "+db_df.loc['database_memory']['VALUE']+" - "+db_df.loc['database_memory']['value_flags'])
print("	SHEAPTHRES_SHR is set to "+db_df.loc['sheapthres_shr']['VALUE']+" - "+db_df.loc['sheapthres_shr']['value_flags'])
print("	SHEAPTHRES is set to "+dbm_df.loc['sheapthres']['VALUE']+" - "+dbm_df.loc['sheapthres']['value_flags'])
print("	SORTHEAP is set to "+db_df.loc['sortheap']['VALUE']+" - "+db_df.loc['sortheap']['value_flags'])
print("	PCKCACHESZ is set to "+db_df.loc['pckcachesz']['VALUE']+" - "+db_df.loc['pckcachesz']['value_flags'])
bp_sizes=%sql SELECT BP_NAME \
    , pagesize \
    , BP_CUR_BUFFSZ * pagesize /1024 /1024 as cur_bp_size_mb \
    , case when AUTOMATIC = 1 then 'AUTOMATIC' else 'STATIC' end as AUTOMATIC \
from table(mon_get_bufferpool(null,-2)) mgbp \
    join syscat.bufferpools bp on bp.bpname=mgbp.bp_name \
where BP_NAME not like 'IBMSYSTEM%' \
with ur 
bp_df=bp_sizes.DataFrame()
bp_df=bp_df.set_index('bp_name')
print(bp_df)

# Automatic maintenance 
display(Markdown("#### Automatic Maintenance:"))
display(Markdown("AUTO_MAINT is set to "+str(db_df.loc['auto_maint']['VALUE'])))
display(Markdown("	AUTO_DB_BACKUP is set to "+str(db_df.loc['auto_db_backup']['VALUE'])))
display(Markdown("	AUTO_TBL_MAINT is set to "+str(db_df.loc['auto_tbl_maint']['VALUE'])))
display(Markdown("		AUTO_RUNSTATS is set to "+str(db_df.loc['auto_runstats']['VALUE'])))
display(Markdown("			AUTO_STMT_STATS is set to "+str(db_df.loc['auto_stmt_stats']['VALUE'])))
display(Markdown("			AUTO_STATS_VIEWS is set to "+str(db_df.loc['auto_stats_views']['VALUE'])))
display(Markdown("			AUTO_SAMPLING is set to "+str(db_df.loc['auto_sampling']['VALUE'])))
display(Markdown("		AUTO_REORG is set to "+str(db_df.loc['auto_reorg']['VALUE'])))


In [None]:
# Print out the entire db cfg
display(db_df)

## Memory

### Current Memory Layout

In [None]:
%%sql inst_memory << select memory_set_type || ' ' 
    || db_name as memory_set 
    , sum(memory_set_used)/1024 as used_mb 
from table(mon_get_memory_set(NULL,NULL,-2)) 
group by memory_set_type, db_name
with ur

In [None]:
# Show balance of memory at the instance level
display(Markdown("### Current Memory Layout for "+inst))
display(inst_memory)
df=inst_memory.DataFrame()
df[['used_mb']]=df[['used_mb']].astype(float)

In [None]:
%%sql db_memory << select memory_set_type|| ' ' 
    || memory_pool_type as memory_pool 
    , sum(memory_pool_used)/1024 as used_mb 
from table(mon_get_memory_pool(NULL,:db,-2)) 
where db_name=upper(:db) 
group by memory_set_type, memory_pool_type

In [None]:
# Show balance of memory by database
display(Markdown("### Current Memory Layout for "+db+" on "+inst))
display(db_memory)
db_mem_df=db_memory.DataFrame()
db_mem_df[['used_mb']]=db_mem_df[['used_mb']].astype(float)

### Layout with Maximum Sort
Sort memory is only allocated as needed. This diagram uses the maximum possible sort allocation to understand the balance when sorts are occurring.

In [None]:
# Add memory layout with maximum sort
db_mem_sort_df=db_mem_df.copy()
db_mem_sort_df.columns
db_mem_sort_df=db_mem_sort_df.set_index('memory_pool')
display(db_mem_sort_df.loc['DATABASE SHARED_SORT']['used_mb'])
db_mem_sort_df.loc['DATABASE SHARED_SORT']['used_mb']=float(db_df.loc['sheapthres_shr']['VALUE']) * 4 / 1024
display(db_mem_sort_df)

### Memory Breakdown by Buffer Pool

In [None]:
%%sql bp_sizes << SELECT BP_NAME 
    , pagesize 
    , BP_CUR_BUFFSZ * pagesize /1024 /1024 as cur_bp_size_mb 
    , case when AUTOMATIC = 1 then 'AUTOMATIC' else 'STATIC' end as AUTOMATIC 
from table(mon_get_bufferpool(null,-2)) mgbp 
    join syscat.bufferpools bp on bp.bpname=mgbp.bp_name 
where BP_NAME not like 'IBMSYSTEM%' 
with ur 

In [None]:
%%sql bp_grp_sizes << SELECT BP_NAME 
    , BP_CUR_BUFFSZ * pagesize /1024 /1024 as cur_bp_size_mb 
from table(mon_get_bufferpool(null,-2)) mgbp 
    join syscat.bufferpools bp on bp.bpname=mgbp.bp_name 
where BP_NAME not like 'IBMSYSTEM%' 
and BP_CUR_BUFFSZ * pagesize /1024 /1024 > 1000 
union 
SELECT case pagesize when 4096 then '4K_MISC' 
        when 8192 then '8K_MISC' 
        when 16384 then '16K_MISC' 
        when 32768 then '32K_MISC' 
        else 'OTHER' end as BP_NAME 
    , (sum(BP_CUR_BUFFSZ) * pagesize) /1024 /1024 as cur_bp_size_mb 
from table(mon_get_bufferpool(null,-2)) mgbp 
    join syscat.bufferpools bp on bp.bpname=mgbp.bp_name 
where BP_NAME not like 'IBMSYSTEM%' 
and BP_CUR_BUFFSZ * pagesize /1024 /1024 <= 1000 
group by pagesize 
with ur 

In [None]:
# Memory Breakdown by Buffer Pool

display(Markdown("#### Buffer Pool Layout for "+db+" on "+inst))
%sql db2+ibm_db://$user:$password@$host:$port/$db
bp_df=bp_sizes.DataFrame()
#bp_df=bp_df.set_index('bp_name')

bp_df[['cur_bp_size_mb']]=bp_df[['cur_bp_size_mb']].astype(float)

display(Markdown("#### Buffer Pools by size"))

display(bp_df)

In [None]:
# Group buffer pools for better display
display(Markdown("#### Grouping Small Buffer Pools for Better Display"))

bp_grp_size_df=bp_grp_sizes.DataFrame()
bp_grp_size_df=bp_grp_size_df.set_index('bp_name')

display(bp_grp_size_df)

## Security

In [None]:
print('Security Information for ',db,' on ',inst)

### IDs with SECADM

In [None]:
%%sql secadm_ids << select grantee 
from syscat.dbauth 
where securityadmauth='Y' 
with ur

In [None]:
# Print IDs with SECADM
display(secadm_ids)

### IDs with DBADM

In [None]:
%%sql dbadm_ids << select grantee 
from syscat.dbauth 
where dbadmauth='Y' 
with ur

In [None]:
#Print ids with DBADM
display(dbadm_ids)

### IDs with DATAACCESS

In [None]:
%%sql dataaccess_ids << select grantee 
from syscat.dbauth 
where dataaccessauth='Y' 
with ur

In [None]:
#Print IDs with DATAACCESS
display(dataaccess_ids)

### Privileges held by PUBLIC

In [None]:
%%sql public_perms << select privilege 
            , OBJECTTYPE 
            , OBJECTSCHEMA 
            , OBJECTNAME 
        from sysibmadm.privileges 
        where authid='PUBLIC' 
            and objectschema not like 'SYS%'
            and objectschema not like 'NULLID%'
            and objectschema not like 'DB2CAEM%'
            and objectschema != 'SQLJ'
            and objectname != 'DB2CLI'
        order by objecttype, objectschema, objectname 
        with ur

In [None]:
%%sql public_connect << select COALESCE(connectauth,'N') as connectauth
        , grantee 
        from syscat.dbauth 
        where grantee='PUBLIC' 
        union 
        select 'N', 'ZZZ' as grantee from sysibm.sysdummy1 
        order by grantee 
        with ur

In [None]:
#Check to see if PUBLIC has connect, and generate a warning
#Also print all PUBLIC privileges
pub_df=public_connect.DataFrame()
if 'Y' in public_connect[0][0] :
    display(Markdown("**WARNING** PUBLIC has CONNECT authority on the database"))

List of public permissions minus those on the system catalog and packages granted with the "grant public" keywords on bind for db2cli and db2ubind

In [None]:
display(public_perms)

## Database Objects

In [None]:
%%sql list_evms << select evmonname 
    ,case when event_mon_state(evmonname) = 0 then 'INACTIVE' 
        when event_mon_state(evmonname) = 1 then 'ACTIVE' end as status 
    , target_type 
    , target 
    , autostart 
    , versionnumber 
    from syscat.eventmonitors 
    with ur

### Event Monitors

In [None]:
# Database Objects
## Event Monitors
display(list_evms)

### Explain Tables

In [None]:
%%sql expln_schemas << SELECT tabschema 
    ,count(*) count_tables
    FROM syscat.tables 
    where tabname like 'EXPLAIN%' 
        or tabname like 'ADVISE%' 
        GROUP BY tabschema 
    with ur

In [None]:
# Database Objects
## Explain Tables
expl_schema_last = {}
expl_schema_valid = {}
expln_schema_df=expln_schemas.DataFrame()
expln_schema_df.set_index('tabschema', inplace=True)
display(expln_schema_df)
for index, row in expln_schema_df.iterrows() :
    display(Markdown("#### Verifying Explain Schema for "+index))
    verify_expln=%sql call SYSPROC.SYSINSTALLOBJECTS('EXPLAIN','V', NULL, '{index}')
    if verify_expln is None:
        display(Markdown("**Error found**"))
        display(verify_expln)
        these_expln_tables = %sql SELECT tabname, card \
            FROM syscat.tables \
            WHERE tabschema=:index \
                and (tabname like 'EXPLAIN%' or tabname like 'ADVISE%') with ur
        display(these_expln_tables)
        expln_schema_df.at[index, 'Valid'] = "No"
    else :
        expln_schema_df.at[index, 'Valid'] = "Yes"
        last_explain=%sql select max(explain_time) max_expln FROM {index}.EXPLAIN_INSTANCE with ur
        expln_schema_df.at[index, 'last_explain_time'] = last_explain[0]['max_expln']
display(expln_schema_df)

In [None]:
%%sql list_bps << select 
        substr(bphr.bp_name,1,18) as bp_name 
        , bp_cur_buffsz 
        , pagesize 
        , ((pagesize*bp_cur_buffsz)/1024)/1024 as sz_mb 
        , total_logical_reads 
        , total_physical_reads 
        , data_hit_ratio_percent 
        , (select listagg(tbspace,chr(10)) within group (order by create_time) from syscat.tablespaces ts, syscat.bufferpools b where ts.bufferpoolid = b.bufferpoolid and b.bpname=mgbp.bp_name) as tablespaces 
from sysibmadm.bp_hitratio bphr join table(mon_get_bufferpool(NULL,-2)) mgbp 
        on mgbp.bp_name=bphr.bp_name 
    join syscat.bufferpools sbp on sbp.bpname=mgbp.bp_name 
with ur

### Buffer Pools
#### List of Buffer Pools

In [None]:
# Database Objects
## Buffer Pools
display(list_bps)


#### Buffer Pool Hit Ratios by Tablespace

In [None]:
%%sql ts_bp_hitratios << select tbsp_name 
    , decimal(((float(pool_data_lbp_pages_found) + float(pool_index_lbp_pages_found) + float(pool_xda_lbp_pages_found) + float(pool_col_lbp_pages_found) - float(pool_async_data_lbp_pages_found) - float(pool_async_index_lbp_pages_found) - float(pool_async_xda_lbp_pages_found) - float(pool_async_col_lbp_pages_found)) / (float(pool_data_l_reads) + float(pool_index_l_reads) + float(pool_xda_l_reads) + float(pool_col_l_reads) + float(pool_temp_data_l_reads) + float(pool_temp_xda_l_reads) + float(pool_temp_index_l_reads) + float(pool_temp_col_l_reads))) * 100,5,2) 
    from table(mon_get_tablespace('',-2)) as t 
    order by tbsp_cur_pool_id 
    with ur

In [None]:
display(ts_bp_hitratios)

#### Tables/Table Spaces by Buffer Pool

In [None]:
%%sql bp_tables << select b.bpname 
    , b.bufferpoolid 
    , (select count(*) from syscat.tablespaces ts where b.bufferpoolid=ts.bufferpoolid)as ts_count 
    , (select count(*) 
       from syscat.tables t join syscat.tablespaces ts 
           on t.tbspace=ts.tbspace or t.index_tbspace=ts.tbspace or t.long_tbspace=ts.tbspace 
       where ts.bufferpoolid=b.bufferpoolid) as tab_count 
    from syscat.bufferpools b 
with ur

#### All Buffer Pools

In [None]:
bp_tab_df=bp_tables.DataFrame()
display(bp_tab_df)

#### Unused Bufferpools

In [None]:
filter_bp_no_tab=(bp_tab_df['ts_count'] == 0) | (bp_tab_df['tab_count'] == 0)
display(bp_tab_df[filter_bp_no_tab])

#### Comparing Buffer Pools

In [None]:
%%sql bp_compare << with bp_data as( select 
    mgbp.bp_name 
    , ((pagesize*bp_cur_buffsz)/1024)/1024 as sz_mb 
    , total_logical_reads 
    , data_hit_ratio_percent 
    , (select sum((tbsp_used_pages*pagesize)/1024/1024) as tabledata_mb from table(mon_get_tablespace(null,-2)) as mgt join syscat.bufferpools b on b.bufferpoolid=mgt.tbsp_cur_pool_id where mgbp.bp_name=b.bpname ) as tabledata_mb 
from sysibmadm.bp_hitratio bphr join table(mon_get_bufferpool(NULL,-2)) mgbp 
    on mgbp.bp_name=bphr.bp_name 
join syscat.bufferpools sbp on sbp.bpname=mgbp.bp_name) 
, bp_sum as (select 
    sum(sz_mb) as sum_size_mb 
    , sum(total_logical_reads) as sum_reads 
    , sum(tabledata_mb) as sum_data_mb 
from bp_data) 
select d.bp_name 
    , case when sum_size_mb > 0 then decimal(float(sz_mb)/float(sum_size_mb)*100,5,2) else 0.0 end as bp_size 
    , case when sum_reads >0 then decimal(float(total_logical_reads)/float(sum_reads)*100,5,2) else 0.0 end as bp_reads 
    , case when sum_data_mb >0 then decimal(float( tabledata_mb)/float(sum_data_mb)*100,5,2) else 0.0 end as data_served 
    , data_hit_ratio_percent 
from bp_data as d, bp_sum as s 
with ur

### Table Spaces

#### Database Size by Pages in Table Spaces

In [None]:
# Calculated and display database data size based only on allocated pages in table spaces
db_size_query=%sql select sum(tbsp_total_pages*tbsp_page_size/1024/1024/1024) as data_size_gb from table(mon_get_tablespace('',-2))
display(db_size_query)   

#### Count of Table Spaces

In [None]:
# Database Objects
## Table Spaces
# Tablespace Count
tbsp_count=%sql select count(*) as num_tbsps from syscat.tablespaces
display(tbsp_count)

#### Table Space Details

In [None]:
%%sql list_tbsps << select  tbsp_name, 
    tbsp_type, 
    tbsp_content_type as type, 
    (select count(*) from syscat.tables st where st.tbspace=t.tbsp_name) as tabcount, 
    tbsp_using_auto_storage as auto_sto, 
    tbsp_auto_resize_enabled as auto_resize, 
    tbsp_page_size as page_size, 
    tbsp_used_pages as used_pages, 
    tbsp_total_pages as total_pages, 
    tbsp_total_pages*tbsp_page_size/1024/1024/1024 as ts_gb, 
    case 
            when tbsp_type = 'SMS' then 'EXCLUDE' 
            when tbsp_using_auto_storage = 1 then 'EXCLUDE' 
            when tbsp_auto_resize_enabled = 1 then 'EXCLUDE' 
            else 'INCLUDE' 
    end as Space_check, 
    case 
            when 
                tbsp_max_size > 0 
                and tbsp_max_size < 65536 
            then to_char(tbsp_max_size) 
            when 
                tbsp_type = 'DMS' 
                and tbsp_content_type = 'ANY' 
                and tbsp_page_size = 4096 
            then '64' 
            when 
                tbsp_type = 'DMS' 
                and tbsp_content_type = 'ANY' 
                and tbsp_page_size = 8192 
            then '128' 
            when 
                tbsp_type = 'DMS' 
                and tbsp_content_type = 'ANY' 
                and tbsp_page_size = 16384 
            then '256' 
            when 
                tbsp_type = 'DMS' 
                and tbsp_content_type = 'ANY' 
                and tbsp_page_size = 32768 
            then '512' 
            when 
                tbsp_type = 'DMS' 
                and tbsp_content_type in ('SYSTEMP','USRTEMP','LARGE') 
                and tbsp_page_size = 4096 
            then '8192' 
            when 
                tbsp_type = 'DMS' 
                and tbsp_content_type in ('SYSTEMP','USRTEMP','LARGE') 
                and tbsp_page_size = 8192 
            then '16384' 
            when
                tbsp_type = 'DMS' 
                and tbsp_content_type in ('SYSTEMP','USRTEMP','LARGE') 
                and tbsp_page_size = 16384 
            then '32768' 
            when 
                tbsp_type = 'DMS' 
                and tbsp_content_type in ('SYSTEMP','USRTEMP','LARGE') 
                and tbsp_page_size = 32768 
            then '65536' 
            else 'EXCLUDE' 
    end as maxsize_thresh 
from table(mon_get_tablespace('',-2)) as t 
order by tbsp_name 
with ur

In [None]:
# clean the tablespace data a bit
ts_df=list_tbsps.DataFrame()
ts_df=ts_df.set_index('tbsp_name')
ts_df[['ts_gb']]=ts_df[['ts_gb']].astype(float)

#### Potentially Wasted Space in Table Spaces

In [None]:
%%sql tbsps_space_waste << select  tbsp_name, 
    tbsp_auto_resize_enabled as auto_resize, 
    tbsp_page_size as page_size, 
    tbsp_total_pages*tbsp_page_size/1024/1024 as ts_mb, 
    100*decimal(float(tbsp_used_pages)/float(tbsp_total_pages),5,2) as pct_used, 
    tbsp_extent_size, 
    tbsp_total_pages - case when tbsp_used_pages > (tbsp_extent_size * 5) then tbsp_used_pages else tbsp_extent_size * 5 end as tbsp_freeable_pgs, 
    (tbsp_total_pages - case when tbsp_used_pages > (tbsp_extent_size * 5) then tbsp_used_pages else tbsp_extent_size * 5 end) *tbsp_page_size/1024/1024 as freeable_mb, 
    case when tbsp_used_pages < tbsp_extent_size * 5 then ((tbsp_extent_size * 5) - tbsp_used_pages) * tbsp_page_size/1024/1024 else 0 end as free_mb_sm_ext , 
    RECLAIMABLE_SPACE_ENABLED as reclaimable 
from table(mon_get_tablespace('',-2)) as t 
where tbsp_type = 'DMS' 
order by tbsp_name 
with ur

In [None]:
# Use data collected on tablespaces to do a number of checks and chart tablespace size
## Need to split this cell out into multiple cells
tssw_df=tbsps_space_waste.DataFrame()
tssw_df=tssw_df.set_index('tbsp_name')

# Examine space data for problems
display(Markdown("##### Easily Reclaimable Space in MB"))
display(tssw_df['freeable_mb'].sum())
reducable_ts=tssw_df['freeable_mb'] > 100
display(Markdown("###### Table Spaces With More Than 100 MB to Reclaim"))
with pd.option_context('display.max_rows', 999):
    display(tssw_df[reducable_ts])
display(Markdown("##### Reclaimable Space with Smaller Extent in MB"))
display(tssw_df['free_mb_sm_ext'].sum())   
columns=[ts_df.index.name] + list(tssw_df)
norec_tbsps=pd.DataFrame(columns=columns)
norec_tbsps=norec_tbsps.set_index('tbsp_name')
for index, row in tssw_df.iterrows() :
    # Non-reclaimable
    if row['reclaimable'] != 1 :
        norec_tbsps=norec_tbsps.append(tssw_df.loc[[index]])
display(Markdown("##### Table Spaces that are Not Reclaimable"))
display(norec_tbsps)

# Examine tablespaces for problems
columns=[ts_df.index.name] + list(ts_df)
full_tbsps=pd.DataFrame(columns=columns)
full_tbsps=full_tbsps.set_index('tbsp_name')
noast_tbsps=pd.DataFrame(columns=columns)
noast_tbsps=noast_tbsps.set_index('tbsp_name')
static_tbsps=pd.DataFrame(columns=columns)
static_tbsps=static_tbsps.set_index('tbsp_name')
for index, row in ts_df.iterrows() :
    # Nearly full tablespaces
    max_pct=0
    space_pct=0
    if row['maxsize_thresh'] != 'EXCLUDE' :
        max_thresh=float(row['maxsize_thresh'])
        max_pct=row['ts_gb'] / max_thresh
    if row['space_check'] != 'EXCLUDE' :
        space_thresh=float(row['space_check'])
        space_pct=row['ts_gb'] / space_thresh
    if max_pct > 0.8 or space_pct > 0.8 :
        full_tbsps=full_tbsps.append(ts_df.loc[[index]])
    # Tablespaces not using AST
    if row['auto_sto'] != 1 :
        noast_tbsps=noast_tbsps.append(ts_df.loc[[index]])
    # Tablespace not Using Auto Resize
    if row['auto_resize'] != 1 and row['tbsp_type'] != 'SMS' :
        static_tbsps=static_tbsps.append(ts_df.loc[[index]])
display(Markdown("##### Table Spaces that are Nearly Full"))
display(full_tbsps)
display(Markdown("##### Table Spaces that are Not Using AST"))
display(noast_tbsps)
display(Markdown("##### Table Spaces that are Not Using Automatic Resize"))
display(static_tbsps)

### Grouping tablespaces for better display

In [None]:
%%sql large_tbsps << select  tbsp_name, 
    tbsp_total_pages*tbsp_page_size/1024/1024/1024 as ts_gb 
    from table(mon_get_tablespace('',-2)) as t 
    where (tbsp_total_pages*tbsp_page_size/1024/1024/1024) > 40 
    union 
    select substr(tbsp_name,1,4) || '_' as tbsp_name, 
    sum (tbsp_total_pages*tbsp_page_size/1024/1024/1024) as ts_gb 
    from table(mon_get_tablespace('',-2)) as t 
    where (tbsp_total_pages*tbsp_page_size/1024/1024/1024) <= 40 
    group by substr(tbsp_name,1,4) 
    order by tbsp_name 
    with ur

In [None]:
large_ts_df=large_tbsps.DataFrame()
large_ts_df=large_ts_df.set_index('tbsp_name')

display(ts_df)
display(tssw_df)

## Database Size

In [None]:
    # Total Database size
    # This cell needs work - often does not work
    #conn=ibm_db.connect("DATABASE="+db+";HOSTNAME="+host+";PORT="+port+";PROTOCOL=TCPIP;UID="+user+";PWD="+password+";", "", "")
    size_ts='TIMESTAMP(\'2019-02-19-00.00.00\')'
    db_size=-1
    db_capacity=-1
    size_refresh=-1
    #db_size_query=ibm_db.callproc(conn,'get_dbsize_info', (size_ts,db_size,db_capacity,-1))
    db_size_query=%sql call get_dbsize_info(:size_ts, :db_size, :db_capacity, -1)
    display(db_size_query)   

### Indexes

In [None]:
# Database Objects
## Indexes
display(Markdown("### Indexes in "+db+" on "+inst))

#### Indexes with a cardinality of one

In [None]:
%%sql card1_indexes << with indcols as ( select indschema 
            , indname 
            , listagg(case when colorder = 'A' then '+' when colorder = 'D' then '-' else '>' end || colname,chr(10)) within group (order by colseq) as colnames 
        from syscat.indexcoluse group by indschema, indname)
select  i.lastused, 
    t.tabschema, 
    t.tabname, 
    i.indname, 
    ic.colnames, 
    fullkeycard, 
    card, 
    volatile 
from    syscat.indexes i join syscat.tables t 
    on i.tabname=t.tabname and i.tabschema=t.tabschema 
    join indcols ic on i.indname=ic.indname and i.indschema=ic.indschema
where   fullkeycard=1 
    and indextype not in ('BLOK', 'DIM') 
    and t.tabschema not like 'SYS%' 
    and uniquerule='D' 
    and not exists (select 1 
            from syscat.references r join syscat.keycoluse k 
                    on r.tabschema=k.tabschema and r.tabname=k.tabname 
            where t.tabschema=r.tabschema 
                    and r.tabname = t.tabname 
                    and k.colname in (      select colname 
                                    from syscat.indexcoluse as ic 
                                    where ic.indschema=i.indschema 
                                    and ic.indname=i.indname)) 
with ur

In [None]:
# Format data nicely, and then display
card1_ind_df=card1_indexes.DataFrame()
# Add Thosands comma to numbers
card1_ind_df['card'] = card1_ind_df.apply(lambda x: "{:,}".format(x['card']), axis=1)
# Display each column of an index on a separate line (the line breaks from SQL don't translate right to the data frame)
display(HTML(card1_ind_df.to_html(index=False).replace("\\n","<br>")))

#### Indexes not used in the last 30 days

In [None]:
%%sql unused_indexes << with indcols as ( select indschema 
            , indname 
            , listagg(case when colorder = 'A' then '+' when colorder = 'D' then '-' else '>' end || colname,chr(10)) within group (order by colseq) as colnames 
        from syscat.indexcoluse group by indschema, indname) select  i.lastused, 
    t.tabschema, 
    t.tabname, 
    i.indname, 
    ic.colnames, 
    bigint(fullkeycard)as fullkeycard, 
    bigint(card) as table_card, 
    mi.index_scans, 
    mi.index_only_scans, 
    volatile 
from    syscat.indexes i join syscat.tables t 
    on i.tabname=t.tabname and i.tabschema=t.tabschema 
    join table(mon_get_index('','',-2)) as mi on i.iid=mi.iid and i.tabschema=mi.tabschema and i.tabname = mi.tabname 
    join indcols ic on i.indschema=ic.indschema and i.indname=ic.indname 
where 
    indextype not in ('BLOK', 'DIM') 
    and t.tabschema not like 'SYS%' 
    and uniquerule='D' 
    and i.lastused < current date - 30 days 
    and card > 0 
    and not exists (select 1 
            from syscat.references r join syscat.keycoluse k 
                    on r.tabschema=k.tabschema and r.tabname=k.tabname 
            where t.tabschema=r.tabschema 
                    and r.tabname = t.tabname 
                    and k.colname in (      select colname 
                                    from syscat.indexcoluse as ic 
                                    where ic.indschema=i.indschema 
                                    and ic.indname=i.indname)) 
with ur

In [None]:
# Format data nicely, and display
unused_ind_df=unused_indexes.DataFrame()
# Add Thousands comma to numbers
unused_ind_df['fullkeycard'] = unused_ind_df.apply(lambda x: "{:,}".format(x['fullkeycard']), axis=1)
unused_ind_df['table_card'] = unused_ind_df.apply(lambda x: "{:,}".format(x['table_card']), axis=1)
# Display each column of an index on a separate line (the line breaks from SQL don't translate right to the data frame)
display(HTML(unused_ind_df.to_html(index=False).replace("\\n","<br>")))

#### Tables With Largest Number of Indexes

In [None]:
%%sql many_indexes << select 
    substr(t.tabschema,1,15) as tabschema, 
    substr(t.tabname,1,30) as tabname, 
    lastused, 
    date(stats_time) as stats_date, 
    card ,
    (select count(*) from syscat.indexes i where t.tabschema=i.tabschema and t.tabname=i.tabname) as ind_count
from    syscat.tables t 
where 
    t.tabschema not like 'SYS%' 
    and t.tabname not like 'ADVISE%' 
    and t.tabname not like 'EXPLAIN%' 
    and type = 'T'  
order by ind_count desc, tabschema, tabname 
fetch first 10 rows only
with ur 

In [None]:
# Format data nicely and display
many_ind_df=many_indexes.DataFrame()
# Add thousands comma to numbers
many_ind_df['card'] = many_ind_df.apply(lambda x: "{:,}".format(x['card']), axis=1)
display(HTML(many_ind_df.to_html(index=False)))

#### Tables Without any Indexes

In [None]:
%%sql no_indexes << select 
    substr(t.tabschema,1,15) as tabschema, 
    substr(t.tabname,1,30) as tabname, 
    lastused, 
    date(stats_time) as stats_date, 
    card 
from    syscat.tables t 
where 
    t.tabschema not like 'SYS%' 
    and t.tabname not like 'ADVISE%' 
    and t.tabname not like 'EXPLAIN%' 
    and type = 'T' 
    and t.tabname not like 'TI_%' 
    and not exists (select 1 from syscat.indexes i where t.tabschema=i.tabschema and t.tabname=i.tabname) 
order by card desc, tabschema, tabname 
with ur 

In [None]:
# Format data nicely and display
no_ind_df=no_indexes.DataFrame()
# Add thousands comma to numbers
no_ind_df['card'] = no_ind_df.apply(lambda x: "{:,}".format(x['card']), axis=1)
display(HTML(no_ind_df.to_html(index=False)))

#### Most Used Indexes in the Database

In [None]:
%%sql mostused_indexes << with indcols as ( select indschema 
            , indname
            , listagg(case when colorder = 'A' then '+' when colorder = 'D' then '-' else '>' end || colname,chr(10)) within group (order by colseq) as colnames 
        from syscat.indexcoluse group by indschema, indname) select  i.lastused, 
    t.tabschema as tabschema, 
    t.tabname as tabname, 
    i.indname as indname, 
    ic.colnames, 
    bigint(fullkeycard)as fullkeycard, 
    bigint(card) as table_card, 
    mi.index_scans, 
    mi.index_only_scans, 
    mi.page_allocations, 
    volatile 
from    syscat.indexes i join syscat.tables t 
    on i.tabname=t.tabname and i.tabschema=t.tabschema 
    join table(mon_get_index('','',-2)) as mi on i.iid=mi.iid and i.tabschema=mi.tabschema and i.tabname = mi.tabname 
    join indcols ic on i.indschema=ic.indschema and i.indname=ic.indname 
where 
    indextype not in ('BLOK', 'DIM') 
    and t.tabschema not like 'SYS%' 
    and uniquerule='D' 
    and not exists (select 1 
            from syscat.references r join syscat.keycoluse k 
                    on r.tabschema=k.tabschema and r.tabname=k.tabname 
            where t.tabschema=r.tabschema 
                    and r.tabname = t.tabname 
                    and k.colname in (      select colname 
                                    from syscat.indexcoluse as ic 
                                    where ic.indschema=i.indschema 
                                    and ic.indname=i.indname)) 
order by mi.index_scans desc 
fetch first 20 rows only 
with ur

In [None]:
#Format data nicely and display
busy_ind_df=mostused_indexes.DataFrame()
# Add thousands comma to numbers
busy_ind_df['fullkeycard'] = busy_ind_df.apply(lambda x: "{:,}".format(x['fullkeycard']), axis=1)
busy_ind_df['table_card'] = busy_ind_df.apply(lambda x: "{:,}".format(x['table_card']), axis=1)
busy_ind_df['index_scans'] = busy_ind_df.apply(lambda x: "{:,}".format(x['index_scans']), axis=1)
busy_ind_df['index_only_scans'] = busy_ind_df.apply(lambda x: "{:,}".format(x['index_only_scans']), axis=1)
busy_ind_df['page_allocations'] = busy_ind_df.apply(lambda x: "{:,}".format(x['page_allocations']), axis=1)
# Display each column of an index on a separate line (the line breaks from SQL don't translate right to the data frame)
display(HTML(unused_ind_df.to_html(index=False).replace("\\n","<br>")))

### Tables

In [None]:
# Database Objects
## Tables
#All databases on this instance
display(Markdown("### Tables in "+db+" on "+inst))

#### Tables Not Used in 30 days

In [None]:
%%sql unused_tables << select  t.lastused, 
    date(stats_time) as stats_time, 
    date(create_time) as create_time, 
    substr(t.tabschema,1,10) as tabschema, 
    substr(t.tabname,1,25) as tabname, 
    bigint(card) as table_card, 
    mt.table_scans, 
    mt.rows_read, 
    mt.rows_inserted + mt.rows_updated + mt.rows_deleted as rows_altered, 
    t.volatile 
from    syscat.tables t 
    join table(mon_get_table('','',-2)) as mt on t.tabschema=mt.tabschema and t.tabname = mt.tabname 
where 
    t.tabschema not like 'SYS%' 
    and t.tabname not like '%EXPLAIN%' 
    and t.tabname not like '%ADVISE%' 
    and t.lastused < current date - 30 days 
    and type = 'T' 
order by t.lastused, t.card desc, t.tabschema, t.tabname 
with ur

In [None]:
# Print the unused tables
display(unused_tables)

#### Most used tables

In [None]:
%%sql mostused_tables << select  t.lastused, 
    substr(t.tabschema,1,10) as tabschema, 
    substr(t.tabname,1,25) as tabname, 
    bigint(card) as table_card, 
    mt.table_scans, 
    mt.rows_read, 
    case when card >0 then mt.rows_read/card else 0 end as avg_reads_per_row, 
    mt.rows_inserted + mt.rows_updated + mt.rows_deleted as rows_altered, 
    t.volatile 
from    syscat.tables t 
    join table(mon_get_table('','',-2)) as mt on t.tabschema=mt.tabschema and t.tabname = mt.tabname
where 
    t.tabschema not like 'SYS%' 
    and t.tabname not like '%EXPLAIN%' 
    and t.tabname not like '%ADVISE%' 
order by 7 desc, 8 desc, t.tabschema, t.tabname 
fetch first 20 rows only 
with ur 

In [None]:
# Format the data nicely and display
busy_tab_df=mostused_tables.DataFrame()
# Add thousands comma to numbers
busy_tab_df['table_card'] = busy_tab_df.apply(lambda x: "{:,}".format(x['table_card']), axis=1)
busy_tab_df['table_scans'] = busy_tab_df.apply(lambda x: "{:,}".format(x['table_scans']), axis=1)
busy_tab_df['rows_read'] = busy_tab_df.apply(lambda x: "{:,}".format(x['rows_read']), axis=1)
busy_tab_df['avg_reads_per_row'] = busy_tab_df.apply(lambda x: "{:,}".format(x['avg_reads_per_row']), axis=1)
busy_tab_df['rows_altered'] = busy_tab_df.apply(lambda x: "{:,}".format(x['rows_altered']), axis=1)
display(HTML(busy_tab_df.to_html(index=False)))

#### Tables with the highest number of rows

In [None]:
%%sql highcard_tables << select  t.lastused, 
    substr(t.tabschema,1,10) as tabschema, 
    substr(t.tabname,1,25) as tabname, 
    bigint(card) as table_card, 
    mt.table_scans, 
    mt.rows_read, 
    case when card >0 then mt.rows_read/card else 0 end as avg_reads_per_row, 
    mt.rows_inserted + mt.rows_updated + mt.rows_deleted as rows_altered, 
    t.volatile 
from    syscat.tables t 
    join table(mon_get_table('','',-2)) as mt on t.tabschema=mt.tabschema and t.tabname = mt.tabname 
where 
    t.tabschema not like 'SYS%' 
    and t.tabname not like '%EXPLAIN%' 
    and t.tabname not like '%ADVISE%' 
order by table_card desc, t.tabschema, t.tabname 
fetch first 20 rows only 
with ur

In [None]:
# Format data nicely and display
busy_tab_df=mostused_tables.DataFrame()
# Add thousands comma to numbers
busy_tab_df['table_card'] = busy_tab_df.apply(lambda x: "{:,}".format(x['table_card']), axis=1)
busy_tab_df['table_scans'] = busy_tab_df.apply(lambda x: "{:,}".format(x['table_scans']), axis=1)
busy_tab_df['rows_read'] = busy_tab_df.apply(lambda x: "{:,}".format(x['rows_read']), axis=1)
busy_tab_df['avg_reads_per_row'] = busy_tab_df.apply(lambda x: "{:,}".format(x['avg_reads_per_row']), axis=1)
busy_tab_df['rows_altered'] = busy_tab_df.apply(lambda x: "{:,}".format(x['rows_altered']), axis=1)
display(HTML(busy_tab_df.to_html(index=False)))

#### Tables using the extended row size feature

In [None]:
%%sql ext_row_tables << select 
    mgt.tabschema, 
    mgt.tabname 
from 
    table(mon_get_table(NULL,NULL,NULL)) mgt 
where 
    mgt.lob_object_l_pages > 0 
    and not exists (select 1 
                  from syscat.columns c 
                  where c.tabschema = mgt.tabschema 
          and c.tabname = mgt.tabname 
          and c.typename in ('CLOB','BLOB')) 
with ur

In [None]:
display(ext_row_tables)

#### Tables with LOBs

In [None]:
%%sql lob_tables << select 
    c.tabschema, 
    c.tabname, 
    c.colname,
    c.typename, 
    c.length,
    t.tbspace,
    case ts.fs_caching
        when 0 then 'No'
        when 1 then 'Yes'
        when 2 then 'Depends on OS and FS Type'
        end as fs_caching,
    t.card as table_card,
    mgt.rows_read,
    c.logged,
    pctinlined
from 
    syscat.columns c 
    join syscat.tables t on c.tabschema = t.tabschema and c.tabname = t.tabname
    left outer join table(mon_get_tablespace(NULL,-2)) ts on coalesce(t.long_tbspace, t.tbspace) = ts.tbsp_name
    left outer join table(mon_get_table(NULL,NULL,-2)) as mgt on c.tabschema=mgt.tabschema and c.tabname=mgt.tabname
where 
    c.typename in ('CLOB','BLOB') 
    and c.tabschema not like 'SYS%'
    and t.type = 'T'
order by mgt.rows_read desc, table_card asc
with ur

In [None]:
# Format data nicely and display
lob_tab_df=lob_tables.DataFrame()
# Only do this if there is data. If there is no data, the formatting will throw an error
if int(lob_tab_df.shape[0]) != 0 :
    # Add thousands comma to numbers
    lob_tab_df['table_card'] = lob_tab_df.apply(lambda x: "{:,}".format(x['table_card']), axis=1)
    lob_tab_df['length'] = lob_tab_df.apply(lambda x: "{:,}".format(x['length']), axis=1)
    lob_tab_df['rows_read'] = lob_tab_df.apply(lambda x: "{:,}".format(x['rows_read']), axis=1)
    display(HTML(lob_tab_df.to_html(index=False)))
else :
    print("No tables with LOBs found.")

#### Tables with XML

In [None]:
%%sql xml_tables << select 
    c.tabschema, 
    c.tabname, 
    c.colname,
    c.typename, 
    c.length,
    t.tbspace,
    case ts.fs_caching
        when 0 then 'No'
        when 1 then 'Yes'
        when 2 then 'Depends on OS and FS Type'
        end as fs_caching,
    t.card as table_card,
    mgt.rows_read,
    c.logged,
    pctinlined
from 
    syscat.columns c 
    join syscat.tables t on c.tabschema = t.tabschema and c.tabname = t.tabname
    left outer join table(mon_get_tablespace(NULL,-2)) ts on coalesce(t.long_tbspace, t.tbspace) = ts.tbsp_name
    left outer join table(mon_get_table(NULL,NULL,-2)) as mgt on c.tabschema=mgt.tabschema and c.tabname=mgt.tabname
where 
    c.typename in ('XML') 
    and c.tabschema not like 'SYS%'
    and t.type = 'T'
order by mgt.rows_read desc, table_card asc
with ur

In [None]:
# Format data nicely and display
xml_tab_df=xml_tables.DataFrame()
# Only do the formatting if data exists
if int(xml_tab_df.shape[0]) != 0 :
    # Add thousands comma to numbers
    xml_tab_df['table_card'] = xml_tab_df.apply(lambda x: "{:,}".format(x['table_card']), axis=1)
    xml_tab_df['length'] = xml_tab_df.apply(lambda x: "{:,}".format(x['length']), axis=1)
    xml_tab_df['rows_read'] = xml_tab_df.apply(lambda x: "{:,}".format(x['rows_read']), axis=1)
    display(HTML(xml_tab_df.to_html(index=False)))
else :
    print("No XML tables found.")

#### Tables with constraint issues

In [None]:
%%sql const_tables << select 
    t.tabschema, 
    t.tabname, 
    t.const_checked,
    t.card as table_card,
    mgt.rows_read, 
    (select count(*) from syscat.tabconst tc where t.tabschema=tc.tabschema and t.tabname=tc.tabname and enforced='N') as unenforced_const
from 
    syscat.tables t 
    left outer join table(mon_get_table(NULL,NULL,-2)) as mgt on t.tabschema=mgt.tabschema and t.tabname=mgt.tabname
where 
    t.const_checked like '%N%' or 
    t.const_checked like '%F%' or 
    t.const_checked like '%U%' or 
    t.const_checked like '%W%' 
order by mgt.rows_read desc, t.card desc
with ur

In [None]:
# Format data nicely and display
const_tab_df=const_tables.DataFrame()
# Only do the formatting if data exists
if int(const_tab_df.shape[0]) != 0 :
    # Add thousands comma to numbers
    const_tab_df['table_card'] =const_tab_df.apply(lambda x: "{:,}".format(x['table_card']), axis=1)
    const_tab_df['rows_read'] = const_tab_df.apply(lambda x: "{:,}".format(x['rows_read']), axis=1)
    display(HTML(const_tab_df.to_html(index=False)))
else :
    print("No tables with constraint issues found.")

#### Statistical views

In [None]:
%%sql stats_views << select 
    t.tabschema
    , t.tabname
    , date(t.stats_time) as stats_date
    , t.card as table_card
    , coalesce(mgt.rows_read,0) as rows_read
from 
    syscat.tables t 
    left outer join table(mon_get_table(NULL,NULL,-2)) as mgt on t.tabschema=mgt.tabschema and t.tabname=mgt.tabname
where 
    substr(t.property,13,1) = 'Y' 
order by 5 desc, 4 desc
with ur


In [None]:
# Format data nicely and display
stats_vw_df=stats_views.DataFrame()
# Only format if data exists
if int(stats_vw_df.shape[0]) != 0 :
    # add thousands comma to numbers
    stats_vw_df['table_card'] =stats_vw_df.apply(lambda x: "{:,}".format(x['table_card']), axis=1)
    stats_vw_df['rows_read'] = stats_vw_df.apply(lambda x: "{:,}".format(x['rows_read']), axis=1)
    display(HTML(stats_vw_df.to_html(index=False)))
else :
    print("No statistical views found.")

#### Column-Organized Tables

In [None]:
%%sql col_tables << select 
    t.tabschema, 
    t.tabname 
from 
    syscat.tables t 
where 
    substr(t.property,20,1) = 'Y' 
with ur

In [None]:
display(col_tables)

#### Tables Using Compression

In [None]:
%%sql comp_tables << select 
    case compression when 'B' then 'ROW and VALUE' when 'N' then 'No Compression' when 'R' then 'ROW' when '' then 'NA' else compression end as compression, 
    case rowcompmode when 'A' then 'ADAPTIVE' when 'S' then 'STATIC' else rowcompmode end as comp_mode, 
    count(*) as tables 
from 
    syscat.tables t 
where type='T'
group by compression, rowcompmode 
with ur

In [None]:
display(comp_tables)
# Future check to add: how is compression doing - problem tables or columns for compression

#### Clustered Tables

In [None]:
%%sql clus_tables << select 
    case clustered when 'T' then 'INSERT TIME' when 'Y' then 'DIMENSION' else clustered end as clustered, 
    count(*) as tables 
from 
    syscat.tables t 
group by clustered 
with ur

In [None]:
display(clus_tables)
# Future area to add - space wasted due to MDC or other clustering problems

### Transaction Logs
#### Histogram of Transaction Log Archives

In [None]:
%%sql archlog_histogram << WITH gen_ts (ts) AS ( 
        VALUES current timestamp - 7 days 
        UNION ALL 
        SELECT ts + 1 hour 
        FROM gen_ts 
        WHERE ts <= current timestamp), 
    format_ts (yyyymmddhh) AS ( 
        SELECT bigint(ts)/10000 
        FROM gen_ts), 
    log_archives (yyyymmddhh, archive_count) AS ( 
        SELECT substr(start_time, 1, 10) as YYYYMMDDhh, count(*) 
        FROM sysibmadm.db_history 
        WHERE operation = 'X' 
        GROUP BY substr(start_time, 1, 10) ) 
    SELECT 
        translate('ABCD-EF-GH IJh', cast(f.yyyymmddhh as char(12)), 'ABCDEFGHIJ') as hour 
        ,coalesce(a.archive_count,0) AS logs_archived 
    FROM 
       format_ts f 
            LEFT OUTER JOIN log_archives a 
            ON f.yyyymmddhh = a.yyyymmddhh 
    ORDER BY hour 
    with ur

In [None]:
display(archlog_histogram)

#### Log Buffer Sizing

In [None]:
%%sql logbuf_sizing << select log_reads 
    , log_writes 
    , decimal(float(log_reads)/float(log_writes),10,5) log_read_write_ratio 
from table (mon_get_transaction_log(-2)) 
with ur

In [None]:
# log pages read vs. log pages written
display(logbuf_sizing)

## Database Maintenance

In [None]:
# Database Maintenance
## Backups
display(Markdown("### Database Maintenance for "+db+" on "+inst))

### List of all Backups in History in the Last 14 days

In [None]:
%%sql backup_list << select date(timestamp(start_time)) as start_date 
    , time(timestamp(start_time)) as start_time 
    , start_time as start_timestamp 
    , dayname(start_time) as day
    , timestampdiff ( 4, varchar(timestamp(end_time) - timestamp(start_time)) ) as duration 
    , case operationtype 
        when 'D' then 'Delta Offline' 
        when 'E' then 'Delta Online' 
        when 'F' then 'Offline' 
        when 'I' then 'Incremental Offline' 
        when 'N' then 'Online' 
        when 'O' then 'Incremental Online' 
     else operationtype 
     end || ' ' || case 
            when objecttype = 'D' then 'DB' 
            when objecttype = 'P' then 'TS'
            else objecttype 
        end as Type 
    , devicetype 
    , sqlcode 
from sysibmadm.db_history 
where operation='B' 
    and start_time > current timestamp - 14 days
order by start_date, start_time 
with ur 

In [None]:
# Generate a timeline based on data to more easily see the strategy
backup_df=backup_list.DataFrame()
display(HTML(backup_df.to_html(index=False)))

### Runstats
#### Last runstats date for non-system tables

In [None]:
%%sql last_stats << select date(stats_time) as stats_date
    , volatile
    , count(*) as num_tables 
from syscat.tables 
where type='T' 
    and tabschema not like 'SYS%' 
group by date(stats_time), volatile
with ur

In [None]:
display(last_stats)

#### Last runstats date for system tables

In [None]:
%%sql last_stats << select month(stats_time) as stats_month 
    , year(stats_time) as stats_year 
    , count(*) as num_tables 
from syscat.tables 
where type='T' 
    and tabschema like 'SYS%' 
group by month(stats_time), year(stats_time) 
order by stats_year desc, stats_month desc 
with ur

In [None]:
display(last_stats)

### Reorgs
#### Recent Reorgs

In [None]:
%%sql reorg_list << select date(timestamp(start_time)) as start_date 
    , time(timestamp(start_time)) as start_time 
    , tabschema 
    , tabname 
    , case operationtype 
        when 'F' then 'Offline' 
        when 'N' then 'Online' 
     else operationtype 
     end as Type 
    , sqlcode 
from sysibmadm.db_history 
where operation='G' 
order by start_date, start_time 
with ur

In [None]:
display(reorg_list)

#### Tables Needing Reorg

In [None]:
# Generate reorgchk information
%sql call reorgchk_tb_stats('T', 'ALL')
%sql call reorgchk_ix_stats('T', 'ALL')
#ignore errors that say "ibm_db_dbi::ProgrammingError: The last call to execute did not produce any result set."

In [None]:
%%sql tab_reorg_list << select table_schema
    , table_name
    , npages
    , card
    , reorg
FROM session.tb_stats
WHERE reorg like '%*%'
    and card > 10
    and fpages > 10
with ur

In [None]:
# Print data on tables needing reorg
tab_reorg_needed_df=tab_reorg_list.DataFrame()
if int(tab_reorg_needed_df.shape[0]) != 0 :
    # Add thousands comma for numbers
    tab_reorg_needed_df['card'] =tab_reorg_needed_df.apply(lambda x: "{:,}".format(x['card']), axis=1)
    tab_reorg_needed_df['npages'] =tab_reorg_needed_df.apply(lambda x: "{:,}".format(x['npages']), axis=1)
    display(HTML(tab_reorg_needed_df.to_html(index=False)))
else :
    print("No tables needing reorg found.")

In [None]:
%%sql ind_reorg_list << select table_schema
    , table_name
    , index_schema
    , index_name
    , nleaf
    , indcard
    , reorg
FROM session.ix_stats
WHERE (reorg like '%*%' and reorg not like '*----')
    and indcard > 10
    and nleaf > 10
with ur

In [None]:
#Display tables needing index reorg
#display(ind_reorg_list)
ind_reorg_needed_df=ind_reorg_list.DataFrame()
if int(ind_reorg_needed_df.shape[0]) != 0 :
    # Add thousands comma for numbers
    ind_reorg_needed_df['indcard'] =ind_reorg_needed_df.apply(lambda x: "{:,}".format(x['indcard']), axis=1)
    ind_reorg_needed_df['nleaf'] =ind_reorg_needed_df.apply(lambda x: "{:,}".format(x['nleaf']), axis=1)
    display(HTML(ind_reorg_needed_df.to_html(index=False)))
else :
    print("No indexes needing reorg found.")

## Data Pruning

In [None]:
# Data Pruning
# In seperate Notebook

## SQL Analysis
Only indicators that SQL analysis is needed are calculated here. Separate notebooks are used for identifying and analyzing problem SQL.
### Index Read Efficiency

In [None]:
%%sql ixref << select rows_read/rows_returned 
from table(mon_get_database(-2)) 
with ur

In [None]:
if ixref[0][0] <= 10 :
    print("Index read efficiency is ideal for an OLTP database at "+ str(ixref[0][0]))
elif ixref[0][0] > 10 & ixref[0][0] <= 100 :
    print("Index read efficiency is not great but not horrible for an OLTP database at "+ str(ixref[0][0]))
elif ixref[0][0] > 100 & ixref[0][0] <= 1000 :
    print("Index read efficiency is bad for an OLTP database at "+ str(ixref[0][0]))
elif ixref[0][0] > 1000 :
    print("Index read efficiency is horrendous for an OLTP database at "+ str(ixref[0][0]))

## Memory Picture for Whole Server
This section is designed to be used with a multi-instance server and does not apply to all environments. At least on Linux, Db2 often over-allocates memory for multiple instances

### Total Memory on server

In [None]:
server_memory=%sql select decimal(value/1024,10,2) as mem_tot_gb \
    from sysibmadm.env_sys_resources \
    where name='MEMORY_TOTAL'
server_memory

### Portion of Server Memory this Instance Represents

In [None]:
instance_memory=%sql select decimal((value*4)/1024/1024,20,2) as instance_memory_gb \
  from sysibmadm.dbmcfg \
  where name='instance_memory' \
  with ur
instance_memory