<a href="https://colab.research.google.com/github/daviddhale/jupyter_parse_vmstat/blob/main/parse_vmstat.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Running the code
Click on the Run icon to the left of the first code section.  Click on the "Upload" button that is created just under the section.  In the upload dialog, select the vmstat output you would like to analyze.

>**Do not click on any other run button until you have successfully uploaded a file!**

In [None]:
# @title

import io
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, FileLink
import plotly.express as px
from google.colab import files

uploader = widgets.FileUpload(
    multiple=False  # Allow only one file upload
)

# Display the upload button
display(uploader)

# Function to handle the uploaded file
def handle_upload(change):
    global df  # Declare df as a global variable
    uploaded_filename = next(iter(uploader.value))
    content = uploader.value[uploaded_filename]['content']
    df = pd.read_csv(io.BytesIO(content), sep='\s+',
                     names=['Runnable Processes', 'Blocked Processes', 'Swap',
                            'Free Mem', 'Buffered', 'Cache', 'Swap_IN', 'Swap_OUT',
                            'Blocks_IN', 'Blocks_OUT', 'Interupts', 'Context Switches',
                            'User', 'Sys', 'Idle', 'Wait', 'Time_Stolen'])

    # ... (Rest of your code to process the DataFrame 'df') ...

    # Eliminate original header lines in file
    df.drop(df[(df['Runnable Processes'] == 'procs')].index, inplace=True)
    df.drop(df[(df['Swap'] == 'swpd')].index, inplace=True)

    # Renumber the index to account for dropped rows
    df.reset_index(drop=True, inplace=True)

    print("File uploaded and processed successfully!")


# Observe changes in the uploader widget
uploader.observe(handle_upload, names='value')

## Run calculations
Click on the Run icon below to augment with vmstats with extra calculated columns.

In [None]:
# @title
# Calculations:

# Convert the whole dataframe to ints
df = df.astype(int)

# Total Processes
result = df['Runnable Processes'] + df['Blocked Processes']
df.insert(2,'Total Blocked',result)

# Total Swap
result = df['Swap_IN'] + df['Swap_OUT']
df.insert(9,'Swaps per Sample',result)

# Total IO
result = df['Blocks_IN'] + df['Blocks_OUT']
df.insert(12,'Total IO',result)

# Swap Percent
result = df['Swaps per Sample'] / df['Total IO']
df.insert(13,'Swap Percent',result)

# Total CPU in Use
result = df['User'] + df['Sys']
df.insert(20,'Total CPU in Use',result)

# Thread Potential
result = df['Total Blocked'] * (df['Wait'] / 100.0)
df.insert(22,'Thread Potential',result)

# CPU Steady State
result = (df['Total CPU in Use'] / 100.0) + df['Thread Potential']
df.insert(23,'CPU Steady State',result)

## Create graphs
Click on the Run icon below to create (but not display) the various graphs.

After it has completed, you can proceed through the rest of the Run icons to display each graph or to export the results to an Excel file.

In [None]:
# @title
'''
df.plot(title='Swap_IN/Swap_OUT', y=['Swap_IN','Swap_OUT'])
#df.plot(title='CPU',y=['User','Sys','Wait'],ylim=(0,100))
df.plot(title='CPU Wait',y=['Wait'],ylim=(0,100))
plt.show()
'''

# Create Plotly figures
fig_swap = px.line(df, x=df.index, y=['Swap_IN', 'Swap_OUT'],
               title='Swap_IN/Swap_OUT')
fig_blocked = px.line(df, x=df.index, y=['Runnable Processes', 'Blocked Processes'],
               title='Run and Blocked Queues')
fig_cpu = px.line(df, x=df.index, y=['User', 'Sys', 'Wait'], title='CPU')
fig_cpu.update_yaxes(range=[0, 100])  # Set y-axis limits for fig3
fig_potential = px.line(df, x=df.index, y=['Thread Potential'],
                        title='Thread Potential')
fig_blocks = px.line(df, x=df.index, y=['Blocks_IN', 'Blocks_OUT','Total IO'],
                        title='Blocks IN/OUT/Total')

# Display the figures
#fig_swap.show()
#fig_blocked.show()  # Commented out for now
#fig_cpu.show()
#fig_potential.show()

## CPU Usage
Look for any occurances of the CPU being maxed out or excessive wait percentages.

> You can click on data elements in legend to toggle if it is hidden or displayed.
>
> You can double click an element to hide all other elements.
>
> You can also zoom in on a particular range by clicking and dragging


In [None]:
# @title
fig_cpu.show()


## Swap Statistics
Look for high amounts of persistent swap or extended periods where swap is high

In [None]:
# @title
fig_swap.show()

## IO Statistics
Look for spikes in IO IN/OUT/Total that match with spikes in wait time.  If they are for extended periods of time, there may be an issue.

In [None]:
# @title
fig_blocks.show()

## Run and Blocked Queues
This data represents the number of processes that are either waiting to start or have been blocked due to I/O or other wait states.

In [None]:
# @title
fig_blocked.show()

## Thread Potential
Describe thread potential

In [None]:
# @title
fig_potential.show()

## Export to Excel
Running this section will automatically download the file to your system in your web browsers default download location.

>You do not need to run this if you are just exploring the graphs above.


In [None]:
# @title
# Create Excel file with results
df.to_excel("vmstat_augmented.xlsx",index=False)

# Download the file
# display(FileLink('test.xlsx')) # Create download link for a local Jupyter server
files.download('vmstat_augmented.xlsx') # Create download link for Google Colab