# MNIST Classification
> Performance analysis for MNIST Classification on all hardware platforms

- toc: true 
- badges: true
- comments: true
- categories: [MNIST,Rooflines,Performance Prediction]
- image: images/mnist.png

In [1]:
#hide
import pandas as pd
import numpy as np
import altair as alt

W = 600
H = 480

pd.options.display.float_format = '{:20,.7f}'.format
pd.options.display.max_rows =100000 # this will set limit of columns to 500
pd.options.display.max_columns =100000 # this will set limit of columns to 500
#master_df.loc[master_df['NN_Topology'] =='MLP']

csv_path = "./data/cleaned_csv/backup.csv"

%run scripts/altair_plots.py
%run scripts/overlapped_pareto.py
%run scripts/script_tables.py 

# Theoretical Analysis of MNIST

### Rooflines for All Hardware Platforms and CNNs

Combining application requirements with hardware platform characteristics can be leveraged for performance predictions using UCB’s roofline models. Using assumptions for where weights, activation tensors, and state of a neural network are stored, combined with the size of the datatypes used, allow us to derive the arithmetic intensity of a neural network during inference. Combined with the roofline for a given hardware platform, we can provide insight as to whether a neural network will be memory or compute bound and guidance for what is theoretically possible in regards to its throughput.

In [2]:
#hide_input
# Run the Rooflines script which processes the data and creates the chart
%run scripts/altair_plots.py
rooflines('mnist')

*Applies to the following pruning factors: 100%, 50%, 25% and 12,5% 

### Performance Prediction

The following heatmap shows the theoretical performance for the listed hardware platforms for MNIST classification. The metric used for the theoretical performance is input/second.
This plot shows that a combination between prunning and quantization outputs some of the best performance results.

In [3]:
#hide_input
%run scripts/altair_plots.py
heatmap('data/performance_predictions_imagenet_mnist_cifar.csv', 'mnist', 'Performance Prediction for MNIST')

# Experimental Data Analysis

### Overview of All Measurements for MNIST 

In this table, within the rows, we show the type of hardware platforms that we used for this task (for example FPGA or GPU) and then more specifically the exact name of the different hardware platforms. For each hardware platform, we list the sweep of specific deployment parameters (batch sizes, operating modes etc) that were used for the experimentation in separate columns. In the columns, we show CNN topologies. When a CNN topology was implemented on a given hardware platform, we show in the corresponding cell the precisions (quantization information) and the channel pruning scale. Otherwise, “na” indicates that the topology wasn’t executed on this specific hardware platform. Many combinations between topology and hardware platform are not supported by the vendors dedicated software environments. INTx depicts a fixed point integer representation with x bits. FPy represents a floating point representation with y bits, for example FP32 is singe precision floating point. Table follows below.

In [4]:
#hide
%run scripts/script_tables.py  #get table with the experiments overview
print(pd.read_csv('data/overview_experiments_mnist.csv').to_markdown())

|    | Hardware   | Platform         | MLP                                 | Batch/Stream/Thread                  |
|---:|:-----------|:-----------------|:------------------------------------|:-------------------------------------|
|  0 | FPGA       | ZCU102-DPU       | na                                  | [1,2,3,4,5,6,7,8]                    |
|  1 | FPGA       | ZCU104-DPU       | na                                  | [1,2,3,4,5,6,7,8]                    |
|  2 | FPGA       | Ultra96-DPU      | na                                  | [1,2,3,4,5,6,7,8]                    |
|  3 | FPGA       | ZCU104-FINN      | [INT2, INT4] * [100%,50%,25%,12.5%] | [1,2,4,8,16,32,64,128,256,512,10000] |
|  4 | FPGA       | ZCU104-BISMO     | [INT2, INT4] * [100%,50%,25%,12.5%] | [2,4,8,16,32,64,128]                 |
|  5 | GPU        | TX2-maxn         | [FP16, FP32] * [100%,50%,25%,12.5%] | [1,2,4,8,16,32,64,128]               |
|  6 | GPU        | TX2-maxp         | [FP16, FP32] * [100%,50%,25%,12.5

In [5]:
#hide_input
%run scripts/script_tables.py 
dataframes = csv_to_dataframe_multiindex(['data/overview_experiments_mnist_.csv'])
for dataframe in dataframes:   
       display(HTML(dataframe.to_html(index=False)))

Unnamed: 0_level_0,Unnamed: 1_level_0,MNIST Classification,Unnamed: 3_level_0
Hardware,Platform,MLP,Batch/Stream/Thread
FPGA,ZCU102-DPU,na,"[1,2,3,4,5,6,7,8]"
,ZCU104-DPU,na,"[1,2,3,4,5,6,7,8]"
,Ultra96-DPU,na,"[1,2,3,4,5,6,7,8]"
,ZCU104-FINN,"[INT2, INT4] * [100%,50%,25%,12.5%]","[1,2,4,8,16,32,64,128,256,512,10000]"
,ZCU104-BISMO,"[INT2, INT4] * [100%,50%,25%,12.5%]","[2,4,8,16,32,64,128]"
GPU,TX2-maxn,"[FP16, FP32] * [100%,50%,25%,12.5%]","[1,2,4,8,16,32,64,128]"
,TX2-maxp,"[FP16, FP32] * [100%,50%,25%,12.5%]","[1,2,4,8,16,32,64,128]"
,TX2-maxq,"[FP16, FP32] * [100%,50%,25%,12.5%]","[1,2,4,8,16,32,64,128]"
TPU,TPU-fast clk,na,[1]
,TPU-slow clk,na,[1]


In [54]:
#hide
master_df = pd.read_csv(csv_path)
maxp_df = master_df.copy()

maxp_df.loc[:,'HWType'] = maxp_df['HWType'] + ('-' + maxp_df['Op mode']).fillna('')
maxp_df.loc[:,"hw_datatype_prun_net"] = maxp_df.apply(lambda r: "_".join([r.HWType, r.Datatype, r.PruningFactor, r.NN_Topology]), axis=1)
mnist_df = maxp_df[(maxp_df["NN_Topology"] == "MLP") & maxp_df['lat-comp'].notna() & maxp_df["top1 [%]"].notna()]
mnist_df.loc[:,"hw_datatype_prun_net"] = mnist_df.apply(lambda r: "_".join([r.HWType, r.Datatype, r.PruningFactor, r.NN_Topology]), axis=1)
mnist_df.loc[:,"PruningFactor"] = mnist_df["PruningFactor"].str.strip("%").astype(float)
norm_by_group(mnist_df, "lat-comp", "NN_Topology");
mnist_df.loc[:,"datatype_model"] = mnist_df.Datatype + '_' + mnist_df.HWType

mnist_df.rename(columns={"top1 [%]": "top1"}, inplace=True)
mnist_df.loc[:,"tag"] = mnist_df.apply(lambda r: "_".join([r.HWType, r.Datatype, r.NN_Topology, str(r.PruningFactor)]), axis=1)

#filling GOPS values 
mnist_df.loc[:,'GOPS'] = mnist_df.apply(lambda r: 0.020029 if r.PruningFactor == 100 else 
                                          (0.00582 if r.PruningFactor == 50 else 
                                           (0.001862 if r.PruningFactor == 25 else 0.000669 )) , axis=1)
#fill in tp-system and tp-cmp
mnist_df.loc[:,'tp-system'] = mnist_df['GOPS'] * mnist_df['fps-system']
mnist_df.loc[:,'tp-comp'] = mnist_df['GOPS'] * mnist_df['fps-comp']
mnist_df.loc[:,'GOPS'] = mnist_df['GOPS'] * mnist_df['batch/thread/stream']

#save it all up
mnist_df.to_csv('data/cleaned_csv/experimental_data_mnist.csv', index = False)

#mnist_df

### Line Plot

In [55]:
#hide_input
figa_df = mnist_df[(mnist_df["HWType"].isin(["NCS", "ZCU104-Bismo", "U96-Quadcore A53"]))]
figb_df = mnist_df[(mnist_df["HWType"].isin(["TX2-maxp","TX2-maxn","TX2-maxq", "ZCU104-FINN", "U96-Quadcore A53"]))]

fig25s = []
fig25_dfs = [figa_df, figb_df]
for df in fig25_dfs:
    sel = alt.selection_multi(fields=["hw_datatype_prun_net"], bind="legend")
    fig25_dot = alt.Chart(df).mark_point().encode(
        x=alt.X('lat-comp', title="lat-comp [msec]"),
        y=alt.Y('fps-comp', scale=alt.Scale(type="log"),  title="fps-comp [fps]"),
        color=select_color(sel, 'hw_datatype_prun_net:N'),
        tooltip=['fps-comp', 'lat-comp', 'HWType', 'batch/thread/stream'],
    )
    fig25_line = alt.Chart(df).mark_line().encode(
        x=alt.X('lat-comp', title="lat-comp [msec]"),
        y=alt.Y('fps-comp', title="fps-comp [fps]"),
        color=select_color(sel, 'hw_datatype_prun_net:N'),
        tooltip=['fps-comp', 'lat-comp', 'HWType', 'batch/thread/stream'],
    )

    fig = (fig25_dot+fig25_line).properties(
        title="Latency Versus Performance for Pruned and Quantized MLP Variants",
        width=W/len(fig25_dfs),
        height=H,
    ).add_selection(sel).interactive()
    
    fig25s.append(fig)
    
alt.hconcat(*fig25s)

### Boxplots

In [56]:
#these values will be needed for the three boxplots
xaxis='PruningFactor'
x_title='Pruning Factor [%]'
color_col= 'PruningFactor'
color_title='Pruning Factor [%]'
facet_column='datatype_model' 
facet_title = "Datatypes with Hardware Platforms"

In [57]:
#hide_input
%run scripts/altair_plots.py  #run the plot script if it wasn't previously run
boxplot(df = mnist_df, 
        xaxis = xaxis,
        x_title = x_title,
        yaxis="lat-comp", 
        y_title="lat-comp [msec]", 
        color_col = color_col, 
        color_title = color_title , 
        facet_column = facet_column , 
        facet_title = facet_title,
        title="Latency by Hardware/Framework and Pruning for MNIST Classification")

In [58]:
#hide_input
#%run scripts/altair_plots.py  #run the plot script if it wasn't previously run
boxplot(df = mnist_df, 
        xaxis = xaxis,
        x_title = x_title,
        yaxis = "fps-comp", 
        y_title = "fps-comp [fps]", 
        color_col = color_col, 
        color_title = color_title , 
        facet_column = facet_column , 
        facet_title = facet_title,
        title = "Throughput by Hardware/Framework and Pruning for MNIST Classification")

In [59]:
#hide_input
#%run scripts/altair_plots.py  #run the plot script if it wasn't previously run
boxplot(df = mnist_df, 
        xaxis = xaxis,
        x_title = x_title,
        yaxis = "Full_Pwr_W", 
        y_title = "Full_Pwr [Watts]", 
        color_col = color_col, 
        color_title = color_title, 
        facet_column = facet_column , 
        facet_title = facet_title,
        title = "Power Consumption by Hardware/Framework and Pruning for MNIST Classification")

### Pareto Graphs

The following pareto graph presents the accuracy versus performance in fps for all the Hardware Platforms across different Pruning and Quantization configurations. This provides insights into accuracy-based comparisons.

In [67]:
#hide_input
%run scripts/altair_plots.py  #run the plot script if it wasn't previously run
pareto_graph(df= mnist_df, 
             groupcol= 'hw_datatype_prun_net',
             xcol = 'fps-comp', 
             x_title = 'fps-comp [fps]',
             ycol= 'top1',
             y_title = "top1 [%]",
             W= W, 
             H= H, 
             title= "MNIST Cassification Design Space: Accuracy Versus Performance")

# Theoretical Pareto and Measured Pareto Overlapped

In order to easily understand how accurate predictions were, an overlapping between the Theoretical Pareto Plot and Measured Pareto Plot was made. In the plot below we have both theoretical (orange) and measured (blue) pareto lines. All measured datapoints are represented as crosses and all theoretical datapoints are represented as circles. Some theoretical datapoints don't have a measured matched datapoint and the same goes for the measured datapoints. The theoretical pareto curve is surprisingly on the left of the measured one as it predicted lower performance than was measured. The reason behind is that after quantization the MLP model actually fit in the on-chip memory, contrary to the model assumptions. As a result, the arithmetic intensity was much higher, memory bottleneck was avoided, and consequently the performance was well beyond the predictions of the theoretical model.

In [70]:
#hide_input
%run scripts/overlapped_pareto.py
get_overlapped_pareto(machine_learning_task='mnist',
                      title = 'Overlapped Pareto Plots Theoretical + Measured for MNIST') 

# Efficiency Plot

In order to understand the gap between the theoretical predictions and what was measured, an efficiency bar-chart was created. The size of the bar reflects the absolute performance, whereby all theoretical predictions are shown in red, theoretical peak performance in blue, and all measured datapoints in orange. The orange bars are annotated with the efficiency achieved as a percentage of the predicted performance. Please note the logarithmic y-axis scale. The theoretical predictions take memory bottlenecks into account, as such measured performance can actually exceed the predicted result, in which case the percentage can be above 100%.

In [72]:
#hide_input
%run scripts/overlapped_pareto.py
efficiency_plot(machine_learning_task= 'mnist', title='Efficiency Plots for MNIST')

In [63]:
#hide
mnist_df.head()

Unnamed: 0,NN_Topology,HWType,Datatype,Op mode,batch/thread/stream,lat-sys,lat-comp,fps-system,fps-comp,tp-system,tp-comp,top1,top5 [%],Base_Pwr_W,Idle_Pwr_W,Full_Pwr_W,GOPS,PruningFactor,level,hw_peak_perf,hw_bandwidth,nn_total_operations,hw_datatype_prun_net,norm-lat-comp,datatype_model,tag
1058,MLP,NCS2,FP16,,1,3.16,1.6157,316.199,618.928,0.2115371,0.4140628,97.95,,0.53,1.2,1.548,0.000669,12.5,l3,0.5,12.8,0.0007,NCS2_FP16_12.50%_MLP,0.003712,FP16_NCS2,NCS2_FP16_MLP_12.5
1059,MLP,NCS2,FP16,,2,5.08,1.9723,393.474,1014.05,0.2632341,0.6783995,97.95,,0.53,1.2,1.628,0.001338,12.5,l3,0.5,12.8,0.0007,NCS2_FP16_12.50%_MLP,0.0045313,FP16_NCS2,NCS2_FP16_MLP_12.5
1060,MLP,NCS2,FP16,,4,8.87,2.67093,450.727,1497.6,0.3015364,1.0018944,97.95,,0.53,1.2,1.67,0.002676,12.5,l3,0.5,12.8,0.0007,NCS2_FP16_12.50%_MLP,0.0061363,FP16_NCS2,NCS2_FP16_MLP_12.5
1061,MLP,NCS2,FP16,,8,16.44,4.18684,486.767,1910.75,0.3256471,1.2782917,97.95,,0.53,1.2,1.699,0.005352,12.5,l3,0.5,12.8,0.0007,NCS2_FP16_12.50%_MLP,0.0096191,FP16_NCS2,NCS2_FP16_MLP_12.5
1062,MLP,NCS2,FP16,,16,31.79,7.11847,503.27,2247.67,0.3366876,1.5036912,97.95,,0.53,1.2,1.711,0.010704,12.5,l3,0.5,12.8,0.0007,NCS2_FP16_12.50%_MLP,0.0163544,FP16_NCS2,NCS2_FP16_MLP_12.5


# MNIST Power Measurements

In [64]:
#hide_input
def faceted_bar_chart(df,xcol,ycol,colorcol,textcol,title,columncol):
    bars = alt.Chart().mark_bar().encode(
        x=alt.X(xcol +':N', title=''),
        y=alt.Y(ycol + ':Q',  title='Power [W]'),
        color=alt.Color(colorcol +':N', title='Hardware'),
    )
    text = bars.mark_text(
        angle=270,
        align='left',
        baseline='middle',
        dx=10  # Nudges text to right so it doesn't appear on top of the bar
    ).encode(
        text= alt.Text(ycol + ':Q', format='.1f')
    )
    return alt.layer(bars, text, data=df).facet(
        column=alt.Column(columncol+':N', header=alt.Header(labelAngle=-85, labelAlign='right'), title=title)
    ).interactive()

df = mnist_df.copy()
df = df.loc[:,['NN_Topology','HWType','Datatype','Full_Pwr_W','PruningFactor']]
df['NN_Topology'] = df.NN_Topology + '_' + df.PruningFactor.astype(str)
df['hardw_opmode_datatype'] =df.HWType  +'_'+ df.Datatype
df = df.groupby(['NN_Topology','HWType','Datatype','hardw_opmode_datatype'])['Full_Pwr_W'].mean().reset_index()

faceted_bar_chart(df=df, xcol='NN_Topology', ycol='Full_Pwr_W', colorcol='hardw_opmode_datatype', textcol='Full_Pwr_W',columncol = 'hardw_opmode_datatype', title='Power Plots for MNIST Classification')


In [65]:
#hide_input
def faceted_bar_chart(df,xcol,ycol,colorcol,textcol,title,columncol):
    bars = alt.Chart().mark_bar().encode(
        x=alt.X(xcol + ':N', title=''),
        y=alt.Y(ycol + ':Q',  title='fps/Power'),
        color=alt.Color(colorcol +':N'),
    )
    
    text = bars.mark_text(
        angle=270,
        align='left',
        baseline='middle',
        dx=10  # Nudges text to right so it doesn't appear on top of the bar
    ).encode(
        text= alt.Text(ycol + ':Q', format='.1f')
    )
    return alt.layer(bars, text, data=df).interactive().facet(
        column=alt.Column(columncol+':N',  header=alt.Header(labelAngle=-85, labelAlign='right'), title=title))

df = mnist_df.copy()
df = df.loc[:,['NN_Topology','HWType','Datatype','Full_Pwr_W','PruningFactor','fps-comp']]
df['NN_Topology'] = df.NN_Topology + '_' + df.PruningFactor.astype(str)
df['fps_power'] = df.Full_Pwr_W / df['fps-comp'] *1000
df['hardw_opmode_datatype'] =df.HWType  +'_'+ df.Datatype
df = df.groupby(['NN_Topology','HWType','Datatype','hardw_opmode_datatype'])['fps_power'].mean().reset_index()

faceted_bar_chart(df=df, xcol='NN_Topology', ycol='fps_power', colorcol='hardw_opmode_datatype', textcol='Full_Pwr_W',columncol = 'hardw_opmode_datatype', title='FPS nomalized by Power Plots for MNIST Classification')

