# CIFAR-10 Classification
> Performance analysis for CIFAR-10 Classification on all hardware platforms

- toc: true 
- badges: true
- comments: true
- categories: [CIFAR-10,Rooflines,Performance Prediction]
- image: images/cifar_logo.png

In [1]:
#hide
#from altair.vegalite.v4.api import Selection
import pandas as pd
import numpy as np
import altair as alt

W = 600
H = 480

pd.options.display.max_rows = 500 # this will set limit of rows to 500
pd.options.display.max_columns = 500 # this will set limit of columns to 500

pd.options.display.float_format = '{:20,.7f}'.format

csv_path = "./data/cleaned_csv/backup.csv"

# Theretical Analysis of CIFAR-10

### Rooflines for All Hardware Platforms and CNNs

Combining application requirements with hardware platform characteristics can be leveraged for performance predictions using UCB’s roofline models. Using assumptions for where weights, activation tensors, and state of a neural network are stored, combined with the size of the datatypes used, allow us to derive the arithmetic intensity of a neural network during inference. Combined with the roofline for a given hardware platform, we can provide insight as to whether a neural network will be memory or compute bound and guidance for what is theoretically possible in regards to its throughput.

In [2]:
#hide_input

#first process the following csv's to get clean ready-to-plot csv's
%run scripts/script_load_save_data.py
clean_csv_rooflines(path_topologies='c:/Users/alinav/Documents/GitHub/Qutibench_Web/_notebooks/data/topology_details.csv',
                    path_hardware='c:/Users/alinav/Documents/GitHub/Qutibench_Web/_notebooks/data/peakPerfBandHardPlatf.csv')

#Now get the cleaned csv, and plot it as a Roofline
%run scripts/altair_plots.py
rooflines(dataframe = pd.read_csv("data/cleaned_csv/rooflines_hardware_neural_networks.csv"), 
          neural_network = 'cifar')

### Performance Prediction

The following heatmap shows the theoretical performance for the listed hardware platforms for CIFAR-10 classification. The metric used for the theoretical performance is input/second.
We observe that prunning along with quantization outputs some of the best performance results.

In [3]:
#hide_input
path_csv = 'data/performance_predictions_imagenet_mnist_cifar_2.csv'
    ## Reading csv file and converting data to (Neural network, Platform, Value)
df = pd.read_csv(path_csv)

df1 = pd.DataFrame()
columns = (df.loc[:, df.columns!='hardw']).columns #select all columns except first
for column in columns:
    df_=pd.melt(df, id_vars=['hardw'], value_vars=column) #melt df1 into a df1 of 2 columns
    df1=pd.concat([df1,df_])
df1.columns= ['y','x','values'] #setting new column names
#replace 0s for NaN values because with 0s the grid doesn't show up
df1['values'] = df1['values'].replace({ 0.0:np.nan})
df_cifar10 = dataframe_contains(input_df=df1, column='x', value='CNV')
df_cifar10.to_csv('c:/Users/alinav/Documents/GitHub/QutibenchWeb/_notebooks/data/cleaned_csv/performance_prediction_cifar10.csv', index = False)

heatmap(df_cifar10, 'pink', 'Performance prediction for CIFAR-10')

# Experimental Data Analysis

### Overview of All Measurements for CIFAR-10

In this table, within the rows, we show the type of hardware platforms that we used for this task (for example FPGA or GPU) and then more specifically the exact name of the different hardware platforms. For each hardware platform, we list the sweep of specific deployment parameters (batch sizes, operating modes etc) that were used for the experimentation in separate columns. In the columns, we show CNN topologies. When a CNN topology was implemented on a given hardware platform, we show in the corresponding cell the precisions (quantization information) and the channel pruning scale. Otherwise, “na” indicates that the topology wasn’t executed on this specific hardware platform. Many combinations between topology and hardware platform are not supported by the vendors dedicated software environments. INTx depicts a fixed point integer representation with x bits. FPy represents a floating point representation with y bits, for example FP32 is singe precision floating point. Table follows below.

In [5]:
#hide
%run scripts/altair_plots.py   #get table with the experiments overview
print(pd.read_csv('data/overview_experiments_cifar10.csv').to_markdown())

|    | Hardware   | Platform         | CNV                              | Batch/Stream/Thread                  |
|---:|:-----------|:-----------------|:---------------------------------|:-------------------------------------|
|  0 | FPGA       | ZCU102-DPU       | na                               | [1,2,3,4,5,6,7,8]                    |
|  1 | FPGA       | ZCU104-DPU       | na                               | [1,2,3,4,5,6,7,8]                    |
|  2 | FPGA       | Ultra96-DPU      | na                               | [1,2,3,4,5,6,7,8]                    |
|  3 | FPGA       | ZCU104-FINN      | [INT2,INT4]*[100%,50%,25%,12.5%] | [1,2,4,8,16,32,64,128,256,512,10000] |
|  4 | FPGA       | ZCU104-BISMO     | [INT2,INT4]*[100%,50%,25%,12.5%] | [2,4,8,16,32,64,128]                 |
|  5 | GPU        | TX2-maxn         | [FP16,FP32]*[100%,50%,25%,12.5%] | [1,2,4,8,16,32,64,128]               |
|  6 | GPU        | TX2-maxp         | [FP16,FP32]*[100%,50%,25%,12.5%] | [1,2,4,8,16,32,64,128]

In [6]:
#hide_input
%run scripts/script_tables.py 
#get table with the experiments overview
dataframes = csv_to_dataframe_multiindex(['data/overview_experiments_cifar10_.csv'])
for dataframe in dataframes:   
       display(HTML(dataframe.to_html(index=False)))

Unnamed: 0_level_0,Unnamed: 1_level_0,CIFAR-10 Classification,Unnamed: 3_level_0
Hardware,Platform,CNV,Batch/Stream/Thread
FPGA,ZCU102-DPU,na,"[1,2,3,4,5,6,7,8]"
,ZCU104-DPU,na,"[1,2,3,4,5,6,7,8]"
,Ultra96-DPU,na,"[1,2,3,4,5,6,7,8]"
,ZCU104-FINN,"[INT2,INT4]*[100%,50%,25%,12.5%]","[1,2,4,8,16,32,64,128,256,512,10000]"
,ZCU104-BISMO,"[INT2,INT4]*[100%,50%,25%,12.5%]","[2,4,8,16,32,64,128]"
GPU,TX2-maxn,"[FP16,FP32]*[100%,50%,25%,12.5%]","[1,2,4,8,16,32,64,128]"
,TX2-maxp,"[FP16,FP32]*[100%,50%,25%,12.5%]","[1,2,4,8,16,32,64,128]"
,TX2-maxq,"[FP16,FP32]*[100%,50%,25%,12.5%]","[1,2,4,8,16,32,64,128]"
TPU,TPU-fast clk,na,[1]
,TPU-slow clk,na,[1]


In [7]:
#hide
master_df = pd.read_csv(csv_path)
is_maxp = lambda row: row.HWType != "GPU" or row["Op mode"].split(",")[0] == "maxp"
maxp_df = master_df[master_df.apply(is_maxp, axis=1)]
maxp_df["hw_quant_prun"] = maxp_df.apply(lambda r: "_".join([r.HWType, r.Precision, r.PruningFactor]), axis=1)
cnv_df = maxp_df[(maxp_df["NN_Topology"] == "CNV") & maxp_df['lat-comp'].notna() & maxp_df["top1 [%]"].notna()]
cnv_df["hw_quant_prun"] = cnv_df.apply(lambda r: "_".join([r.HWType, r.Precision, r.PruningFactor]), axis=1)
cnv_df["PruningFactor"] = cnv_df["PruningFactor"].str.strip("%").astype(float)
norm_by_group(cnv_df, "lat-comp", "NN_Topology");
cnv_df["quant_model"] = cnv_df.Precision + '_' + cnv_df.HWType
cnv_df.rename(columns={"top1 [%]": "top1"}, inplace=True)

#fill GOPS gap
cnv_df['GOPS'] = cnv_df.apply(lambda r: 0.469450 if r.PruningFactor == 100.00 else 
                                          (0.118923636 if r.PruningFactor == 50.00 else 
                                           (0.030511732 if r.PruningFactor == 25.00 else 0.008018676 )) , axis=1)

#fill in tp-system and tp-cmp
cnv_df['tp-system'] = cnv_df['fps-system'] * cnv_df['GOPS']
cnv_df['tp-comp'] = cnv_df['fps-comp'] * cnv_df['GOPS']
cnv_df['GOPS'] = cnv_df['GOPS'] * cnv_df['batch/thread/stream']
cnv_df.head(300)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/p

Unnamed: 0,NN_Topology,HWType,Precision,Op mode,batch/thread/stream,lat-comp,fps-system,fps-comp,tp-system,tp-comp,top1,top5 [%],Base_Pwr_W,Idle_Pwr_W,Full_Pwr_W,GOPS,PruningFactor,level,hw_quant_prun,norm-lat-comp,quant_model
4,CNV,NCS,FP16,na,1,4.97914,151.555,200.838,71.1474948,94.2833991,87.02,,0.53,1.2,1.728796,0.46945,100.0,l3,NCS_FP16_100%,0.0003419,FP16_NCS
5,CNV,NCS,FP16,na,2,8.93829,162.738,223.757,76.3973541,105.0427236,87.02,,0.53,1.2,1.7803,0.9389,100.0,l3,NCS_FP16_100%,0.0006138,FP16_NCS
6,CNV,NCS,FP16,na,4,16.8473,171.284,237.426,80.4092738,111.4596357,87.02,,0.53,1.2,1.8230243,1.8778,100.0,l3,NCS_FP16_100%,0.001157,FP16_NCS
7,CNV,NCS,FP16,na,8,31.8658,177.259,251.053,83.2142376,117.8568308,87.02,,0.53,1.2,1.8609213,3.7556,100.0,l3,NCS_FP16_100%,0.0021884,FP16_NCS
8,CNV,NCS,FP16,na,16,61.7732,181.621,259.012,85.2619784,121.5931834,87.02,,0.53,1.2,1.8794595,7.5112,100.0,l3,NCS_FP16_100%,0.0042423,FP16_NCS
9,CNV,NCS,FP16,na,32,121.825,183.067,262.673,85.9408031,123.3118398,87.02,,0.53,1.2,1.9037246,15.0224,100.0,l3,NCS_FP16_100%,0.0083665,FP16_NCS
10,CNV,NCS,FP16,na,64,242.098,183.706,264.356,86.2407817,124.1019242,87.02,,0.53,1.2,1.9036711,30.0448,100.0,l3,NCS_FP16_100%,0.0166263,FP16_NCS
11,CNV,NCS,FP16,na,128,481.675,184.24,265.739,86.491468,124.7511736,87.02,,0.53,1.2,1.91102,60.0896,100.0,l3,NCS_FP16_100%,0.0330795,FP16_NCS
30,CNV,GPU,FP16,maxp,1,1.3044,567.942,785.639,266.6203719,368.8182285,87.06,,1.8,4.7,10.3,0.46945,100.0,l3,GPU_FP16_100%,8.96e-05,FP16_GPU
31,CNV,GPU,FP16,maxp,2,2.0865,695.18,976.935,326.352251,458.6221357,87.06,,1.8,4.7,10.4,0.9389,100.0,l3,GPU_FP16_100%,0.0001433,FP16_GPU


In [8]:
#hide
figa_df = cnv_df[(cnv_df["HWType"].isin(["NCS", "ZCU104-Bismo", "U96-Quadcore A53"]))]
figb_df = cnv_df[(cnv_df["HWType"].isin(["GPU", "ZCU104-FINN", "U96-Quadcore A53"]))]

### Line Plot

In [9]:
#hide_input
fig25s = []
fig25_dfs = [figa_df, figb_df]
for df in fig25_dfs:
    sel = alt.selection_multi(fields=["hw_quant_prun"], bind="legend")
    fig25_dot = alt.Chart(df).mark_point().encode(
        x='lat-comp',
        y=alt.Y('fps-comp', scale=alt.Scale(type="log")),
        color=select_color(sel, 'hw_quant_prun:N'),
        tooltip=['fps-comp', 'lat-comp', 'HWType', 'batch/thread/stream'],
    )
    fig25_line = alt.Chart(df).mark_line().encode(
        x='lat-comp',
        y='fps-comp',
        color=select_color(sel, 'hw_quant_prun:N'),
        tooltip=['fps-comp', 'lat-comp', 'HWType', 'batch/thread/stream'],
    )

    fig = (fig25_dot+fig25_line).properties(
        title="Latency versus Performance for Pruned and Quantized CNV Variants",
        width=W/len(fig25_dfs),
        height=H,
    ).add_selection(sel).interactive()
    
    fig25s.append(fig)
    
alt.hconcat(*fig25s)

### Boxplots

In [10]:
#hide_input
#%run scripts/altair_plots.py  #run the plot script if it wasn't previously run
boxplot(df=cnv_df, 
        yaxis="lat-comp", 
        title="Latency by Hardware/Framework and Pruning for CNV")

In [11]:
#hide_input
#%run scripts/altair_plots.py  #run the plot script if it wasn't previously run
boxplot(df= cnv_df, 
        yaxis= "fps-comp", 
        title= "Throughput by Hardware/Framework and Pruning for CNV")

In [12]:
#hide_input
#%run scripts/altair_plots.py  #run the plot script if it wasn't previously run
cnv_df_tmp = cnv_df
cnv_df_tmp = delete_unique_values(df= cnv_df_tmp, 
                                 col_a= 'HWType', 
                                 col_b='Full_Pwr_W' )
boxplot(df= cnv_df_tmp, 
        yaxis= "Full_Pwr_W", 
        title= "Power Consumption by Hardware/Framework and Pruning for CNV")

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['count'] = df.groupby(col_a)[col_a].transform('count')


### Pareto Graphs

The following pareto graph presents the accuracy versus performance in fps for all the Hardware Platforms across different Pruning and Quantization configurations. This provides insights into accuracy-based comparisons.

In [13]:
#hide_input
#%run scripts/altair_plots.py  #run the plot script if it wasn't previously run
pareto_graph(df= cnv_df, 
             groupcol= 'hw_quant_prun',
             xcol = 'fps-comp', 
             ycol= 'top1',
             W= W, 
             H= H, 
             title= "CNV Classification Design Space: Accuracy versus Performance")

In [14]:
#hide
cnv_df.to_csv('data/cleaned_csv/experimental_data_cifar.csv', index = False)
cnv_df.head(10)

Unnamed: 0,NN_Topology,HWType,Precision,Op mode,batch/thread/stream,lat-comp,fps-system,fps-comp,tp-system,tp-comp,top1,top5 [%],Base_Pwr_W,Idle_Pwr_W,Full_Pwr_W,GOPS,PruningFactor,level,hw_quant_prun,norm-lat-comp,quant_model
4,CNV,NCS,FP16,na,1,4.97914,151.555,200.838,71.1474948,94.2833991,87.02,,0.53,1.2,1.728796,0.46945,100.0,l3,NCS_FP16_100%,0.0003419,FP16_NCS
5,CNV,NCS,FP16,na,2,8.93829,162.738,223.757,76.3973541,105.0427236,87.02,,0.53,1.2,1.7803,0.9389,100.0,l3,NCS_FP16_100%,0.0006138,FP16_NCS
6,CNV,NCS,FP16,na,4,16.8473,171.284,237.426,80.4092738,111.4596357,87.02,,0.53,1.2,1.8230243,1.8778,100.0,l3,NCS_FP16_100%,0.001157,FP16_NCS
7,CNV,NCS,FP16,na,8,31.8658,177.259,251.053,83.2142376,117.8568308,87.02,,0.53,1.2,1.8609213,3.7556,100.0,l3,NCS_FP16_100%,0.0021884,FP16_NCS
8,CNV,NCS,FP16,na,16,61.7732,181.621,259.012,85.2619784,121.5931834,87.02,,0.53,1.2,1.8794595,7.5112,100.0,l3,NCS_FP16_100%,0.0042423,FP16_NCS
9,CNV,NCS,FP16,na,32,121.825,183.067,262.673,85.9408031,123.3118398,87.02,,0.53,1.2,1.9037246,15.0224,100.0,l3,NCS_FP16_100%,0.0083665,FP16_NCS
10,CNV,NCS,FP16,na,64,242.098,183.706,264.356,86.2407817,124.1019242,87.02,,0.53,1.2,1.9036711,30.0448,100.0,l3,NCS_FP16_100%,0.0166263,FP16_NCS
11,CNV,NCS,FP16,na,128,481.675,184.24,265.739,86.491468,124.7511736,87.02,,0.53,1.2,1.91102,60.0896,100.0,l3,NCS_FP16_100%,0.0330795,FP16_NCS
30,CNV,GPU,FP16,maxp,1,1.3044,567.942,785.639,266.6203719,368.8182285,87.06,,1.8,4.7,10.3,0.46945,100.0,l3,GPU_FP16_100%,8.96e-05,FP16_GPU
31,CNV,GPU,FP16,maxp,2,2.0865,695.18,976.935,326.352251,458.6221357,87.06,,1.8,4.7,10.4,0.9389,100.0,l3,GPU_FP16_100%,0.0001433,FP16_GPU
