[dstat Home](http://dag.wiee.rs/home-made/dstat)<br>
[dstat Manual](http://dag.wiee.rs/home-made/dstat/dstat.1.html)<br>
[dstat on github](https://github.com/dagwieers/dstat)<br>
[dstat issue 154 on github](https://github.com/dagwieers/dstat/issues/154)<br>
***
# This Notebook ingests and plots dstat logs collected with command,
`dstat -Tcdilmnprsy --proc-count --output /tmp/deleteme_dstat_Tcdilmnprsy_proc_${HOSTNAME}_$(date -u +'%Y%m%d_%H%M%S').log`
<br><br>The csv log file grows by 1MB an hour with the '`-Tcdilmnprsy --proc-count`' options selected at a collection rate of one per second.
***
### To run a collection for one day(86400 secs) at one second intervals, within a self terminating gnu screen session, issue command,
`screen -fn -dmS dstat_collect_$(date -u +'%Y%m%d_%H%M%S') dstat -Tcdilmnprsy --proc-count --output /tmp/deleteme_dstat_Tcdilmnprsy_proc_${HOSTNAME}_$(date -u +'%Y%m%d_%H%M%S').csv 1 86400`

***
### Specify the file you wish to parse and plot via the `dstat_csv_file` variable below. Then in Jupyter select `Cell -> Run All`

In [106]:
dstat_csv_file = 'data/dstat_Tcdilmnprsy_proc_server.local_20180317_214611.csv'  # this can be .csv, .csv.gz, .csv.bz2 or .csv.xz file

In [107]:
import pandas as pd

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

import cufflinks as cf
#cf.go_offline()

In [108]:
column_names = ['timestamp',
                'cpu_usr', 'cpu_sys', 'cpu_idl', 'cpu_wai', 'cpu_hiq', 'cpu_siq',
                'dsk_read', 'dsk_writ',
                'intp_a', 'intp_b', 'intp_c', 
                'la_1m', 'la_5m', 'la_15m', 
                'mem_used', 'mem_buff', 'mem_cach', 'mem_free', 
                'net_recv', 'net_send', 
                'proc_run', 'proc_blk', 'proc_new', 
                'io_read', 'io_writ', 
                'swap_used', 'swap_free', 
                'sys_int', 'sys_csw', 
                'proc_total']

df = pd.read_csv(dstat_csv_file, skiprows=6, usecols=range(0,31)) # usecols used to slice a consistent data set and avoid csv variable column width issue.
df.columns = column_names  # Setting column names here rather than with the 'read_csv' 'names' option as read_csv errors when variable column widths seen.
                           # See https://github.com/dagwieers/dstat/issues/154
    
df['timestamp'] = pd.to_datetime(df['timestamp'],unit='s')       # epoch to datetime format
df['timestamp'] = df['timestamp'].values.astype('datetime64[s]') # Remove millisecond precision
df.set_index('timestamp', inplace=True)

df.iloc[[0, -1]] # Show first and last row

Unnamed: 0_level_0,cpu_usr,cpu_sys,cpu_idl,cpu_wai,cpu_hiq,cpu_siq,dsk_read,dsk_writ,intp_a,intp_b,...,proc_run,proc_blk,proc_new,io_read,io_writ,swap_used,swap_free,sys_int,sys_csw,proc_total
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2018-03-17 21:46:11,16.422,15.453,61.571,4.33,0.004,2.22,47853520.0,53431838.22,296.725,168.577,...,0,0,137.357,332.495,1005.91,0,34357637120,13832.33,212149.363,2555
2018-03-18 00:32:51,32.247,11.835,54.589,0.158,0.0,1.171,552960.0,18142720.0,2102.0,1971.0,...,15,0,48.0,12.0,526.0,0,34357637120,92064.0,113278.0,1480


***
### Plotting a full day of data at 1 second intervals produces 86,400 rows.<br>You might find this kills you browser when the charts are rendered.
Uncomment and use one of the slicing methods below to reduce the number of rows and therefore plot size.<br>
A strategy could be to plot all data for Load Average and then zoom/slice in on a problem period.
***

In [109]:
#df = df[(df.index >= '2018-03-17 21:30:22')] # Greater than datetime
#df = df[(df.index <= '2018-03-17 21:30:22')] # Less than datetime
#df = df[(df.index >= '2018-03-17 21:30:22' ) & (df.index <= '2018-03-17 21:34:22')] # Between datetime ranges

#df = df[df.index <= df.index.min() + pd.Timedelta(hours=3)] # First X hours of data
df = df[df.index >= df.index.max() - pd.Timedelta(hours=0.1)] # Last X hours of data

#df.shape # Show total dataframe (rows, columns)

In [110]:
df[['la_1m', 'la_5m', 'la_15m']].iplot(kind='line', title='dstats - Load average stats (1 min, 5 mins, 15mins)')

In [111]:
df[['cpu_usr', 'cpu_sys', 'cpu_idl', 'cpu_wai', 'cpu_hiq', 'cpu_siq']].iplot(kind='line', title='dstats - CPU (system, user, idle, wait, hardware interrupt, software interrupt)')

In [112]:
df[['dsk_read', 'dsk_writ']].iplot(kind='line', title='dstats - Disk stats (read, write)')

In [113]:
df[['intp_a', 'intp_b', 'intp_c']].iplot(kind='line', title='dstats - Interrupt stats')

In [114]:
df[['mem_used', 'mem_buff', 'mem_cach', 'mem_free']].iplot(kind='line', title='dstats - Memory stats (used, buffers, cache, free)')

In [115]:
df[['net_recv', 'net_send']].iplot(kind='line', title='dstats - Network stats (receive, send)')

In [116]:
df[['proc_run', 'proc_blk', 'proc_new']].iplot(kind='line', title='dstats - Process stats (runnable, uninterruptible, new)')

In [117]:
df[['io_read', 'io_writ']].iplot(kind='line', title='dstats - I/O request stats (read, write requests)')

In [118]:
df[['swap_used', 'swap_free']].iplot(kind='line', title='dstats - Swap stats (used, free)')

In [119]:
df[['sys_int', 'sys_csw']].iplot(kind='line', title='dstats - System stats (interrupts, context switches)')

In [120]:
df[['proc_total']].iplot(kind='line', title='dstats - Total number of processes')