In [4]:
import numpy as np

# Tital V100 Model
We'll define some system parameters and then define the model.

This model, for simplicitys sake, could be broken down into three main inputs

1. GPU parameters
2. Convolutional Parameters
3. "CUDA" parameters (i.e. how the Conv calculations are divided amongst cuda cores)

Given the first 2 inputs, we should be able to define a roofline model. Given the 3rd input, we can then find the point at which our "cuda" implementation lies on said model.

Here, we'll define the parameters for our model.

In [5]:
# convolution model parameters

# first layer is input, RGB image so that gives us three layers
input_layers = {
    'dimx': 256,
    'dimy': 256,
    'depth': 3,
}

kernel_parameters = {
    "x": 5,
    "y": 5,
    "depth": input_layers["depth"],
    "padding": 3,
    "stridex": 1,
    "stridey": 1,
}
output_layers = {
    'dimx': 256,
    'dimy': 256,
    'depth': 64,
}

system_latency_parameters = {
    "multiply": 1,  # Ops per cycle
    "scratchpad_mem_access": 1,  # Ops per cycle
}


Next, we'll make some definitivitve calculations about our model. Total number of bytes and total operations. These should be constant regardless of how we allocate our problem space in CUDA.

In [7]:
# the definitative total number of operations for this convolution.
total_bytes = output_layers['dimx'] * output_layers['dimy'] * output_layers['depth'] * 4 # 4 bytes per float

# every 
total_bytes += kernel_parameters['x'] * kernel_parameters['y'] * kernel_parameters['depth'] * output_layers['depth'] * 4 # 4 bytes per float
total_bytes += input_layers['dimx'] * input_layers['dimy'] * input_layers['depth'] * 4 # 4 bytes per float

# total_ops = input_layers['dimx'] * input_layers['dimy'] * input_layers['depth'] * kernel_parameters['x'] * kernel_parameters['y'] * kernel_parameters['depth']
total_ops = output_layers['dimx'] * output_layers['dimy'] * output_layers['depth'] * kernel_parameters['x'] * kernel_parameters['y'] * kernel_parameters['depth']

# print the total bytes in MB
print("Total bytes: {} MB".format(total_bytes / 1024 / 1024))

# print the total ops in GFLOPS
print("Total ops: {} GFLOP".format(total_ops / 1000 / 1000 / 1000))

Total bytes: 16.768310546875 MB
Total ops: 0.3145728 GFLOPS
