# desc-wfmon/mondump.ipynb
Display the schema for the monitoring table used in DESC gen3_workflow.  

We assume [desc-wfmon](https://github.com/LSSTDESC/desc-wfmon) has been installed using the install notebook.

In [1]:
%run install/setup.py
import sys
import pandas
import desc.wfmon
import desc.sysmon

print(f"Python version is {sys.version}")
for pkg in [desc.wfmon, desc.sysmon]:
    print(f"{pkg} version is {pkg.__version__}")

Python version is 3.9.7 (default, Sep 16 2021, 13:09:58) 
[GCC 7.5.0]
<module 'desc.wfmon' from '/global/u2/d/dladams/desc/dev/desc-wfmon/ipynb/./install/noconda/desc/wfmon/__init__.py'> version is 0.0.15.dev1
<module 'desc.sysmon' from '/global/u2/d/dladams/desc/dev/desc-wfmon/ipynb/./install/noconda/desc/sysmon/__init__.py'> version is 0.0.15.dev1


## Configuration
List the system and process monitoring files for which we want schema.

In [2]:
# List of files to display
sysfils = ['/global/homes/d/dladams/desc/test8/sysmon.csv',
           '/global/homes/d/dladams/desc/test9/sysmon.csv']
prcfils = ['/global/homes/d/dladams/desc/test8/monitoring.db',
           '/global/homes/d/dladams/desc/test9/runinfo/monitoring.db']

# Set the level for process tables.
lev = 2

# Set units for the memory.
bunit, sbuinit = 1, 'byte'
#bunit, sbunit = 2**20, 'MB'
#bunit, sbunit = 2**30, 'GB'

## Fetch system-level monitoring schema.

In [3]:
line = '----------------------------------------------------------------------------'
print(line)
for sysfil in sysfils:
    if os.path.exists(sysfil):
        print(f"System monitor file: {sysfil}")
        sym = pandas.read_csv(sysfil)
        print(f"System monitor sample count: {len(sym)}")
        print(f"System monitor columns:")
        for cnam in sym.columns:
            print(f"  {cnam}")   
        assert(len(sym.cpu_count.unique()) == 1)
        ncpu = sym.cpu_count[0]
        print(f"CPU count is {ncpu:.0f}")
        assert(len(sym.mem_total.unique()) == 1)
        maxmem = sym.mem_total[0]
        print(f"Total memory is {maxmem:.1f} GB")
    else:
        print(f"File not found: {sysfil}")
    print(line)

----------------------------------------------------------------------------
System monitor file: /global/homes/d/dladams/desc/test8/sysmon.csv
System monitor sample count: 619
System monitor columns:
  time
  cpu_count
  cpu_percent
  cpu_user
  cpu_system
  cpu_idle
  cpu_iowait
  cpu_time
  mem_total
  mem_available
  mem_swapfree
  dio_readsize
  dio_writesize
  nio_readsize
  nio_writesize
CPU count is 64
Total memory is 125.8 GB
----------------------------------------------------------------------------
System monitor file: /global/homes/d/dladams/desc/test9/sysmon.csv
System monitor sample count: 706
System monitor columns:
  time
  cpu_count
  cpu_percent
  cpu_user
  cpu_system
  cpu_idle
  cpu_iowait
  cpu_time
  mem_total
  mem_available
  mem_swapfree
  dio_readsize
  dio_writesize
  nio_readsize
  nio_writesize
CPU count is 64
Total memory is 125.8 GB
----------------------------------------------------------------------------


## Fetch the process-level monitoring schema

The process monitoring data is read from the mysql DB produced by parsl. Of particular interest is the task table where metrics are sampled at regular intervals seprately for each job.

In [9]:
print(line)
for prcfil in prcfils:
    if os.path.exists(prcfil):
        dbr = desc.wfmon.MonDbReader(prcfil, fix=False)
        dbr.tables(lev)
        print(dbr.table('resource').query('task_id==555'))
    else:
        print(f"File not found: {sysfil}")
    print(line)

----------------------------------------------------------------------------
DB /global/homes/d/dladams/desc/test8/monitoring.db has 7 tables
*******************************************************
Table workflow has 1 rows and 10 columns
Column names:
    object   run_id
    object   workflow_name
    object   workflow_version
    object   time_began
    object   time_completed
    object   host
    object   user
    object   rundir
     int64   tasks_failed_count
     int64   tasks_completed_count
*******************************************************
Table task has 2158 rows and 15 columns
Column names:
     int64   task_id
    object   run_id
    object   task_depends
    object   task_func_name
    object   task_memoize
    object   task_hashsum
    object   task_inputs
    object   task_outputs
    object   task_stdin
    object   task_stdout
    object   task_stderr
    object   task_time_invoked
    object   task_time_returned
     int64   task_fail_count
   float64   task_fai