# Tutorial 4 - Visualization


Welcome to the  *IBM SMF Explorer* Visualization Tutorial.
This Tutorial provides examples and inspirations on how to create helpful visualizations using SMF data.

> Examples are based on the engine utilization data taken from SMF72 Subtype 3



## Getting started

Initialize a Context for the dataset that you want to work with. Note, *plotly-express* package is imported for plotting the data.

In [None]:
# imports
import smfexplorer
from smfexplorer.fields import SMF72S3
from smfexplorer import names
from plotly import express as px
import pandas as pd

DATASET = "YOUR.SMF.DATA"

# data fetching
ctx = smfexplorer.new_context(DATASET)
df = ctx.samples.smf_72_03_sample().run()

 ## Prepare your data

Before plotting, make sure that the data is meaningful and clean. If your analysis does not require Report Class data, make sure to sort it out: 

In [None]:
df = df[~df["is_report_class"]].drop("is_report_class", axis=1)

# create a df subset:
df = df[
    [
        "timestamp",
        "sid",
        "utilization_cp",
        "utilization_ziip",
        "utilization_zaap",
        "utilization_ziip_on_cp",
        "utilization_total",
    ]
]

Consider data aggregation for meaningful visualization. In the following example, we group values by the *timestamp* and *sid* fields:

In [None]:
df = df.groupby(["timestamp", "sid"], as_index=False).sum()

Rounding long decimals makes your data more readable and easy to visualize

In [None]:
df = df.round(1)

Below we check whether workload was running on all processors. If some processor types were not engaged, they are excluded from the report.

In [None]:
utilization_fields = [
    SMF72S3.utilization_cp,
    SMF72S3.utilization_ziip,
    SMF72S3.utilization_zaap,
    SMF72S3.utilization_ziip_on_cp,
    SMF72S3.utilization_total,
]
display_fields = []
for field in utilization_fields:
    if df[names(field)].sum() > 0:
        display_fields.append(field)

## Plot

> we recommend to start small and then extend your plots.

For this analysis, we create a line plot that shows the system utilization percentage for each processor type over time. To create visualization, we use plotly-express package, the documentation can be found [here](https://plotly.com/python/plotly-express/).
    

In [None]:
plot = px.line(
    df, x=names(SMF72S3.timestamp), y=names(display_fields), title="System Utilisation"
)

display(plot)

### Tooltip and Labels

*Data Plot Tooltip*  appears when you hover on a data plot. The information shown in a tooltip can be adjusted. Consider using ```hover_name``` and ```hover_data``` attributes. For more inspiration visit [hover-text-and-formatting documentatoin](https://plotly.com/python/hover-text-and-formatting/#hovermode-x-or-y). 


If you use Plotly Express, axes and legend are automatically labelled, however, as in example above, labelling does not always provide meaningful information. We recommend overriding it using the ```labels``` keyword argument. 


In [None]:
# adjust your tooltip
plot_hover = px.line(
    df,
    x=names(SMF72S3.timestamp),
    y=names(display_fields),
    title="System Utilisation",
    hover_name=names(SMF72S3.sid),
    hover_data={
        names(SMF72S3.timestamp): False,  # remove timestamp from hover data
        "value": ":.0f",  # format utilization value
    },
    labels={  # labels for axes
        "value": "Utilization %",
        "variable": "Type",
        names(SMF72S3.timestamp): "Time",
    },
)

display(plot_hover)

### Color schemes and Legend 

Working with big amounts of data you may consider using diverse colouring schemes. Plotly provides you with a verity of default [color sequences](https://plotly.com/python/discrete-color/). You can choose built-in qualitative color sequences from ```px.colors.qualitative``` module or define your own.

In [None]:
plot = px.line(
    df,
    x=names(SMF72S3.timestamp),
    y=names(display_fields),
    color_discrete_sequence=px.colors.qualitative.Prism,  # 10 colors
    title="System Utilisation",
    labels={
        "value": "Utilization %",
        "variable": "Type",
        names(SMF72S3.timestamp): "Time",
    },
)

display(plot)

You can define your colors as well, see example below:

In [None]:
MY_COLORS = [
    "#520408",
    "#878d96",
    "#31135e",
    "#fa4d56",
    "#ee5396",
    "#a56eff",
    "#0f62fe",
    "#0072c3",
    "#007d79",
    "#044317",
]
plot = px.line(
    df,
    x=names(SMF72S3.timestamp),
    y=names(display_fields),
    color_discrete_sequence=MY_COLORS,
    title="System Utilisation",
    labels={
        "value": "Utilization %",
        "variable": "Type",
        names(SMF72S3.timestamp): "Time",
    },
)

display(plot)

You can hide and show legend. Sometimes, when two plots share the same legend, you may want to hide one using ```layout.showlegend``` attribute. Moreover, you can position legend within or outside plot area. 

In [None]:
plot = px.line(
    df,
    x=names(SMF72S3.timestamp),
    y=names(SMF72S3.utilization_cp),
    title="CP Utilisation",
    labels={
        "utilization_cp": "Utilization %",
        "variable": "Type",
        names(SMF72S3.timestamp): "Time",
    },
)
plot.update_layout(showlegend=False)  # hide legend

display(plot)

plot_2 = px.line(
    df,
    x=names(SMF72S3.timestamp),
    y=names(display_fields),
    title="System Utilisation",
    labels={
        "value": "Utilization %",
        "variable": "Type",
        names(SMF72S3.timestamp): "Time",
    },
)
plot_2.update_layout(
    legend=dict(
        orientation="h",  # horizontal positioning
        yanchor="bottom",
        y=-0.8,  # add some spaces
        x=0.3,
    )
)
display(plot_2)

####  Interacting with Legend and Axes

- Click on a legend item to hide or show its trace
- Double-click on legend to reset the selection
- Drag the mouse diagonally to zoom to the resulting box
- Drag the mouse vertically to zoom to this part of the y axis
- Drag the mouse horizontally to zoom to this part of the x axis
- Double-click within chart to reset the zoom


See [documentation](https://plotly.com/python/legend/) for more examples.

### Create Bar Chart

to get an overview of your data or to analyze the ratio, you may consider using bar- or histogram- plots.  

In [None]:
hist = px.histogram(
    df,
    x="timestamp",
    y=["utilization_cp", "utilization_ziip"],
    title="Ratio of CP to zIIP Utilization over time",
    barmode="group",
    labels={
        "value": "Utilization %",
        "variable": "Type",
        names(SMF72S3.timestamp): "Time",
    },
)
display(hist)

# to get an overview, aggregate your data

df_aggr = df.agg("mean", numeric_only=True)  # aggregation
df_aggr = pd.DataFrame(df_aggr).T.round(1)  # transpose your df

hist_aggr = px.bar(
    df_aggr,
    x="utilization_total",
    y=["utilization_cp", "utilization_ziip"],
    title="Ratio of CP to zIIP Utilization",
    labels={
        "value": "Utilization %",
        "variable": "Type",
    },
)
hist_aggr.update_layout(xaxis={"visible": False})  # hide x-axis
display(hist_aggr)