#  Intel® AI Reference Models Jupyter Notebook

This Jupyter notebook helps you choose and run a comparison between two models from the [Intel® AI Reference Models repo](https://github.com/IntelAI/models) using Intel® Optimizations for TensorFlow*. When you run the notebook, it installs required package dependencies, displays information about your platform, lets you choose the two models to compare, runs those models, and finally displays a performance comparison chart. 

<a id='section_1'></a>
# Step 1: Display Platform Information 

Install Tensorflow and required dependencies for the jupyter notebook

In [None]:
!pip install matplotlib ipykernel psutil pandas cxxfilt gitpython 
!pip install intel-tensorflow
!pip install gcg
!python3 -m pip install gitpython
!pip install prettytable
!pip install --upgrade matplotlib


Set the path for the AI reference models, assumed to be the current working directory, and the path for the utility functions directory inside the cloned Intel® AI Reference Models directory.

In [None]:
import os

# If default path does not work, change AIReferenceRoot path according to your environment
current_path = os.getcwd()
os.environ['AIReferenceRoot'] = os.path.dirname(current_path) + "/"
os.environ['ProfileUtilsRoot'] = os.environ['AIReferenceRoot'] + "notebooks/profiling/"
print("Path for the AI Reference Models root is: ", os.environ['AIReferenceRoot'])
print("Path for utility functions directory is: ", os.environ['ProfileUtilsRoot'])

# Check for mandatory python scripts after AIReferenceRoot and ProfileUtilsRoot are assigned
import os
current_path = os.getcwd()
benchmark_path = os.environ['AIReferenceRoot'] + "benchmarks/launch_benchmark.py"
if os.path.exists(benchmark_path) == True:
    print(benchmark_path)
else:
    print("ERROR! Can't find benchmark/launch_benchmark.py script!")

profile_utils_path = os.environ['ProfileUtilsRoot'] + "profile_utils.py"
if os.path.exists(profile_utils_path) == True:
    print(profile_utils_path)
else:
    print("ERROR! Can't find profile_utils.py script!")

Print information about the platform specs

In [None]:
from profiling.profile_utils import PlatformUtils
plat_utils = PlatformUtils()
plat_utils.dump_platform_info()


# Import AI Reference models CPU info
import sys
sys.path.append(os.environ['AIReferenceRoot']+os.sep+'benchmarks/common/')
from platform_util import PlatformUtil 
cpu_info = PlatformUtil("")

# Print Tensorflow version
import tensorflow as tf
print ("We are using Tensorflow version", tf.__version__)

# Display the CPU info
import os
numa_nodes = cpu_info.numa_nodes
print("CPU count per socket:" , cpu_info.cores_per_socket ," \nSocket count:", cpu_info.sockets, " \nNuma nodes:",numa_nodes)
if numa_nodes > 0:
    socket_number = 1
    cpu_count = cpu_info.cores_per_socket
    inter_thread = 1
else:
    # on non-numa machine, we should use all the cores and don't use numactl
    socket_number = -1
    cpu_count = cpu_info.cores_per_socket * cpu_info.sockets
    inter_thread = cpu_info.sockets

## Step 2: Run the first model

This notebook helps you compare the performance of two models listed in the supported models. Select the first model to run and compare.  (If the environment variable MODEL_1_INDEX is set, we'll use that instead of prompting for input.)

### Step 2.1: Display all the supported models and select first  model to run.

After the list of models is displayed, select one by entering the model index number.

In [None]:
from profiling.profile_utils import AIReferenceConfigFile
from prettytable import PrettyTable

config = AIReferenceConfigFile()
sections = config.read_supported_section()

# Create a table with headers
models_table = PrettyTable(["Index", "Model Name", "Framework", "Mode", "Script Type", "Precision"])

# Iterate through the sections and add rows to the models table:
for index, section in enumerate(sections):
    split_section = section.split()

    if len(split_section) >= 5:
        modelname = split_section[0]
        framework = split_section[1]
        mode = split_section[2]
        script_type = split_section[3]
        precision = split_section[4]
    
        models_table.add_row([index, modelname, framework, mode, script_type, precision])

print("Supported Models: ")
print(models_table)

# use the "MODEL_1_INDEX" environment variable value if it exists.
import os
env_model_1_index=os.environ.get('MODEL_1_INDEX', '')
if env_model_1_index != '':
    model_1_index= int(env_model_1_index)
else:
    ## USER INPUT
    model_1_index= int(input('Input a index number of a model: '))

# List out the selected model name
if model_1_index >= len(sections):
    print("ERROR! Input a model_index value between 0 and ", len(sections))
else:
    model_1_name=sections[model_1_index]

# Prints out model name
print("First model is: ", model_1_name)

# Set the environment variable for precision to run the quickstart scripts
configvals = []

# Get the parameters from config
configvals=config.read_config(model_1_name)
os.environ['PRECISION']=config.precision

### Step 2.2: Get the required dataset for the selected model

The following cell checks the dataset path. If the data-location is already specified, then the notebook will use the dataset path mentioned in ai_reference_models.ini file. If the data-location is not specified in the ai_reference_models.ini, the data downloading option instructions are shown. You must manually download the dataset using these instructions before you can proceed to the next cell.

In [None]:
import os
# Get the parameters from config
config = AIReferenceConfigFile()
config.read_config(model_1_name)
data_download_path=''
model_source_dir=''
if config.data_download != '': #and config.data_location == '':
    print("\nFollow these instructions to get the data : ")
    if config.data_download != '':
        val = config.read_value_from_section(model_1_name, 'data-download')
    print(val)
    # use the "DATA_DOWNLOAD_PATH" environment variable value if it exists.
    env_data_download_path=os.environ.get('DATA_DOWNLOAD_PATH', '')
    if env_data_download_path != '':
        data_download_path= env_data_download_path

Set the path for dataset directory for the first model:

DATASET_DIR': the path where the dataset exists and is downloaded.

**ACTION : You need to input the path where the dataset for the first model exists or where you have downloaded it in your system**

In [None]:
dataset_path = input('Input the path where the dataset exists for the first model:')

os.environ['DATASET_DIR'] = dataset_path
print("Data location path:", os.environ['DATASET_DIR'])

### Step 2.3: Prepare pre-trained model for the selected model

This step checks if the pre-trained model for the selected model exists in the pre-trained directory path. If the pre-trained directory does not exist, then it downloads the pre-trained model for the selected precision.

In [None]:
config = AIReferenceConfigFile()
configvals = []

# Get the parameters from config
configvals=config.read_config(model_1_name)

# Get the pre-trained model file
if config.wget != '':
    pretrain_model_path = config.download_pretrained_model(current_path=current_path)
    pretrain_model_path = config.uncompress_file(pretrain_model_path, current_path=current_path)

# Add custom arguments
if config.custom_args != '':
    configvals.append("--")
    custom_config = config.parsing_custom_args(model_1_name, config.custom_args)
    configvals = configvals + custom_config
    
# Combine common parameters and config parameters
params = configvals    
    
sys.argv=[benchmark_path]+params
print(sys.argv)

Set the environment variable for pre-trained model for the first model:

'PRETRAINED_MODEL': the path where is the pretrained_model exists and is downloaded.

NOTE: You can change the value of 'PRETRAINED_MODEL' by changing its assignment in the cell below.

In [None]:
os.environ["PRETRAINED_MODEL"]=pretrain_model_path
print("Pretrain_model_path:", os.environ["PRETRAINED_MODEL"])

### Step 2.4: Set the log directory or output directory

'OUTPUT_DIR': the output directory path where model output logs are collected.

The default directory name for the output logs is \"logs\" in the current working directory. You can change this directory name by replacing the value assigned to log_directory in the cell below

In [None]:
# Set the log directory/output directory to store the logs
log_directory=os.getcwd() + os.sep + "logs"
print(log_directory)

#Set output-dir directory
if log_directory !='':
    configvals.append("--output-dir")
    configvals.append(log_directory)

os.environ['OUTPUT_DIR']=log_directory
print("Output directory path is:", os.environ['OUTPUT_DIR'])

### Step 2.5:  Run the first Model 

**OPTIONAL: You can change the batch size from the model's default. For online_inference,  set batch_size value to be 1.**

In [None]:
batch_size= int(input('Set the value for batch size that you want to run: '))
os.environ["BATCH SIZE"]=batch_size

Run the first model using the quickstart script configured in the ai_reference_models.ini file, and save output logs to the selected log directory.

In [None]:
config = AIReferenceConfigFile()

# Get the parameters from config
config.read_config(model_1_name)

# Split the model_name into individual parts
parts = model_1_name.split()

# Join the parts using hyphens as the separator
log_name = '-'.join(parts)
log_name = log_name + ".log"

ai_reference_root = os.environ.get('AIReferenceRoot')

%cd $ai_reference_root
run_workload = ("quickstart/" + config.ai_type + "/" + config.framework + "/"+ config.model_name + "/"
                + config.mode + "/" + config.device +"/" + config.script ) 

!./$run_workload | tee $log_directory/{log_name}

%cd -

### Step 2.6 Get the throughput or accuracy of the first model

In [None]:
# Throughput of first workload:
# Split the model_name into individual parts
parts_1 = model_1_name.split()

# Join the parts using hyphens as the separator
log_1_name = '-'.join(parts_1)
log_1_name = log_1_name + ".log"

!grep -A 1 "Throughput summary:" $log_directory/{log_1_name} | tail -n 1

import subprocess
# Run the grep command and capture its output
try:
    grep_output = subprocess.check_output(f'grep -A 1 "Throughput summary:" {log_directory}/{log_1_name} | tail -n 1; grep -A 1 "Summary total images/sec:" {log_directory}/{log_1_name} | tail -n 1', shell=True, universal_newlines=True)
except subprocess.CalledProcessError:
    grep_output = "Pattern not found in the file."

# Print or use the captured output as needed
print("Throughput of ", model_1_name, ": ", grep_output.strip())  # Remove leading/trailing whitespace

# Store the captured output in a variable for further use
Throughput_of_workload_1 = grep_output.strip()

## Step 3: Select and run the second model

Let's now run a second model and compare its performance with the first.

### Step 3.1 Select another model for comparision

After the list of models is displayed, select one by entering the model index number. (If the environment variable MODEL_2_INDEX is set, we'll use that instead of prompting for input.)

In [None]:
print("Supported Models: ")
print(models_table)

# use the "MODEL_2_INDEX" environment variable value if it exists.
env_model_2_index=os.environ.get('MODEL_2_INDEX', '')
if env_model_2_index != '':
    model_2_index= int(env_model_2_index)
else:
    ## USER INPUT
    model_2_index= int(input('Input a index number of second model: '))

# List out the selected model name
if model_2_index >= len(sections):
    print("ERROR! Input a model_index value between 0 and ", len(sections))
else:
    model_2_name=sections[model_2_index]

# Prints out model name
print("Second model is: ", model_2_name)

# Set the environment variable for precision to run the quickstart scripts
configvals = []

# Get the parameters from config
configvals=config.read_config(model_2_name)
os.environ['PRECISION']=config.precision

### Step 3.2: Get the required dataset for the selected model

The following cell checks for the second dataset path. If the data-location is already specified, then the notebook will use the dataset path mentioned in ai_reference_models.ini file. If the data-location is not specified in the ai_reference_models.ini, the data downloading option instructions are shown. You must manually download the dataset using these instructions before you can proceed to the next cell.

In [None]:
# Get the parameters from config
config = AIReferenceConfigFile()
config.read_config(model_2_name)
data_download_path=''
model_source_dir=''
if config.data_download != '' and config.data_location == '':
    print("\nFollow these instructions to get the data : ")
    if config.data_download != '':
        val = config.read_value_from_section(model_2_name, 'data-download')
    print(val)
    # use the "DATA_DOWNLOAD_PATH" environment variable value if it exists.
    env_data_download_path=os.environ.get('DATA_DOWNLOAD_PATH', '')
    if env_data_download_path != '':
        data_download_path= env_data_download_path

Set the path for dataset directory for the second model:

DATASET_DIR': the path where the dataset exists and is downloaded.

**ACTION : You need to input the path where the dataset for the second model exists or where you have downloaded it in your system**

In [None]:
dataset_path = input('Input the path where the dataset exists for the second model:')

os.environ['DATASET_DIR'] = dataset_path
print("Data location path:", os.environ['DATASET_DIR'])

### Step 3.3: Prepare pre-trained model for the selected model

This step checks if the pre-trained model for the selected model exists in the pre-trained directory path. If the pre-trained directory does not exist, then it downloads the pre-trained model for the selected precision.

In [None]:
config = AIReferenceConfigFile()
configvals = []

# Get the parameters from config
configvals=config.read_config(model_2_name)

# Set the log directory/output directory to store the logs
log_directory=os.getcwd() + os.sep + "logs"
print(log_directory)

# Get the pre-trained model file
if config.wget != '':
    pretrain_model_path = config.download_pretrained_model(current_path=current_path)
    pretrain_model_path = config.uncompress_file(pretrain_model_path, current_path=current_path)
    
#Set output-dir directory
if log_directory !='':
    configvals.append("--output-dir")
    configvals.append(log_directory)

# Add custom arguments
if config.custom_args != '':
    configvals.append("--")
    custom_config = config.parsing_custom_args(model_2_name, config.custom_args)
    configvals = configvals + custom_config
    
# Combine common parameters and config parameters
params = configvals    
    
sys.argv=[benchmark_path]+params
print(sys.argv)

Set the environment variable for pre-trained model for the second model:

'PRETRAINED_MODEL': the path where is the pretrained_model exists and is downloaded.

NOTE: You can change the value of 'PRETRAINED_MODEL' by changing its assignment in the cell below.

In [None]:
os.environ["PRETRAINED_MODEL"]=pretrain_model_path
print("Pretrain_model_path:", os.environ["PRETRAINED_MODEL"])

### Step 3.4: Run the second model

**OPTIONAL: You can change the batch size from the model's default. For online_inference,  set batch_size value to be 1.**

In [None]:
batch_size= int(input('Set the value for batch size that you want to run: '))
os.environ["BATCH SIZE"]=batch_size

Run the second model using the quickstart script configured in the ai_reference_models.ini file, and save output logs to the selected log directory.

In [None]:
config = AIReferenceConfigFile()

# Get the parameters from config
config.read_config(model_2_name)

# Split the model_name into individual parts
parts = model_2_name.split()

# Join the parts using hyphens as the separator
log_name = '-'.join(parts)
log_name = log_name + ".log"

ai_reference_root = os.environ.get('AIReferenceRoot')

%cd $ai_reference_root
run_workload = ("quickstart/" + config.ai_type + "/" + config.framework + "/"+ config.model_name + "/"
                + config.mode + "/" + config.device +"/" + config.script ) 

!./$run_workload | tee $log_directory/{log_name}

%cd -

### Step 3.5 Get the throughput or accuracy of the second model

In [None]:

# Throughput of second workload:
# Split the model_name into individual parts
parts_2 = model_2_name.split()

# Join the parts using hyphens as the separator
log_2_name = '-'.join(parts_2)
log_2_name = log_2_name + ".log"

!grep -A 1 "Throughput summary:|Summary total images/sec:" $log_directory/{log_2_name} | tail -n 1

# Run the grep command and capture its output
try:
    grep_output = subprocess.check_output(f'grep -A 1 "Throughput summary:" {log_directory}/{log_2_name} | tail -n 1; grep -A 1 "Summary total images/sec:" {log_directory}/{log_2_name} | tail -n 1', shell=True, universal_newlines=True)
except subprocess.CalledProcessError:
    grep_output = "Pattern not found in the file."

# Print or use the captured output as needed
print("Throughput of ", model_2_name, ": ", grep_output.strip())  # Remove leading/trailing whitespace

# Store the captured output in a variable for further use
Throughput_of_workload_2 = grep_output.strip()

## Step 4: Compare performance results and plot a comparison chart for the two models

Get the results (throughput/accuracy) of the two models for comparision

In [None]:
print("Throughput of", model_1_name, ": ", Throughput_of_workload_1)
print("Throughput of", model_2_name, ": ", Throughput_of_workload_2)

Plot a chart for comparision 

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Define your data
categories = [model_1_name, model_2_name]
values = [Throughput_of_workload_1, Throughput_of_workload_2]

# Generate a list of colors for each bar
colors = ['blue', 'orange']

bars = plt.scatter(model_1_name,Throughput_of_workload_1 , label=model_1_name, color='blue')
bars = plt.scatter(model_2_name,Throughput_of_workload_2 , label=model_2_name, color='orange')

# Add labels and a title
plt.xlabel('Workloads')
plt.ylabel('Throughput')
plt.title('Bar Chart of Two workloads')

# Set the x-axis ticks and labels
plt.xticks(range(len(categories)), [])

plt.legend()

plt.gca().invert_yaxis()

# Display the chart
plt.show()