# Benchmarking Julia’s Machine Learning Packages

(Results sneak peek)
Trial 1: Higher Dimensional Data:

**Function**|**Flux runtime**|**Pytorch Runtime**|**Tensorflow Runtime**|**Flux Memory**|**Pytorch Memory**|**Tensorflow Memory**
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:
Conv|294.153 ms|17.902 ms|1.057 ms|73.83 MiB|400 B|2.20 KiB
DepthwiseConv|59.748 ms|90.951 ms**|1.260 ms|73.83 MiB|688 B|2.20 KiB
ConvTranspose|338.793 ms|59.849 ms|1.068 ms|114.28 MiB|416 B|2.20 KiB
Dense|808.073 μs|527.187 μs|2.300 ms|24.25 KiB|400 B|192 B
LSTM|18.129 ms|40.340 ms|146.397 ms|1.38 MiB|1.59 KiB|192 B
RNN|4.770 ms|234.747 μs|64.971 ms|193.02 KiB|192 B|192 B
BatchNorm|16.412 ms|5.932 ms|28.043 ms|25.00 MiB|192 B|192 B
GroupNorm|16.322 ms|4.570 ms|19.457 ms|15.00 MiB|192 B|1.70 KiB
LayerNorm|1.002 μs|6.099 ms|14.670 ms|1.50 KiB|192 B|720 B
CrossCor|275.394 ms|18.884 ms|1.125 ms|73.84 MiB|400 B|1.52 KiB

Trial 0: Lower Dimensional Data:

**Function**|**Flux runtime**|**Pytorch Runtime**|**Tensorflow Runtime**|**Flux Memory**|**Pytorch Memory**|**Tensorflow Memory**
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:
Conv|7.011 μs|31.213 μs|962.888 μs|3.80 KiB|400 B|2.20 KiB
DepthwiseConv|11.024 μs|120.869 μs**|899.810 μs|11.30 KiB|688 B|2.20 KiB
ConvTranspose|18.949 μs|25.888 μs|914.133 μs|19.83 KiB|416 B|2.20 KiB
Dense|432.454 ns|12.770 μs|2.172 ms|592 B|400 B|192 B
LSTM|19.842 ms|73.434 ms|217.620 ms|1.38 MiB|1.59 KiB|192 B
RNN|4.324 ms|272.183 μs|72.779 ms|193.02 KiB|192 B|192 B
BatchNorm|1.430 μs|132.111 μs|26.376 ms |2.28 KiB|192 B|192 B
GroupNorm|4.974 μs|68.747 μs|19.314 ms|2.81 KiB|192 B|1.70 KiB
LayerNorm|704.978 ns|20.479 μs|15.888 ms|880 B|192 B|720 B
CrossCor|7.688 μs|39.505 μs|1.772 ms|4.00 KiB|400 B|1.52 KiB

# 1. Dependecies & Environments

In [1]:
using Pkg
Pkg.add("BenchmarkTools")
Pkg.add("PyCall")
Pkg.add("Flux")

[32m[1m  Updating[22m[39m registry at `C:\Users\CCL\.julia\registries\General`
[32m[1m  Updating[22m[39m git-repo `https://github.com/JuliaRegistries/General.git`
[32m[1m Installed[22m[39m ProgressMeter ─ v1.3.1
[32m[1m  Updating[22m[39m `C:\Users\CCL\Desktop\Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `C:\Users\CCL\Desktop\Manifest.toml`
 [90m [92933f4c][39m[93m ↑ ProgressMeter v1.3.0 ⇒ v1.3.1[39m
[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `C:\Users\CCL\Desktop\Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `C:\Users\CCL\Desktop\Manifest.toml`
[90m [no changes][39m
[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `C:\Users\CCL\Desktop\Project.toml`
[90m [no changes][39m
[32m[1m  Updating[22m[39m `C:\Users\CCL\Desktop\Manifest.toml`
[90m [no changes][39m


In [2]:
using Pkg
using BenchmarkTools
using Flux
using PyCall

In [20]:
## Readying Python environment

ENV["PYTHON"] = "C:/Users/CCL/.julia/conda/3/python.exe"
Pkg.build("PyCall")

torch = pyimport("torch")
F = torch.nn.functional

tf = pyimport("tensorflow")
# rnn = tf.rnn
# layers = tf.keras.layers

[32m[1m  Building[22m[39m Conda ─→ `C:\Users\CCL\.julia\packages\Conda\3rPhK\deps\build.log`
[32m[1m  Building[22m[39m PyCall → `C:\Users\CCL\.julia\packages\PyCall\zqDXB\deps\build.log`


PyObject <module 'tensorflow' (namespace)>

# 2. Benchmarking Flux, Pytorch & Tensorflow

I benchmark the time and memory needed to run the following functions: Conv, DepthwiseConv, ConvTranspose, Dense, LSTM, RNN, Normalization layers, and CrossCor.

### 2.1 Conv

In [8]:
## Flux

conv = Conv((2,2), 10=>30)
conv_ = rand(128,128,10,10)
@benchmark conv(conv_)
#Get time only in sec: println(minimum(c.times)/1e6)

BenchmarkTools.Trial: 
  memory estimate:  73.83 MiB
  allocs estimate:  42
  --------------
  minimum time:     266.504 ms (0.00% GC)
  median time:      287.614 ms (2.36% GC)
  mean time:        300.606 ms (5.89% GC)
  maximum time:     384.220 ms (26.85% GC)
  --------------
  samples:          17
  evals/sample:     1

In [9]:
## Pytorch

conv_t = torch.randn(10,10,128,128)
conv_t1 = torch.randn(30,10,2,2)

@benchmark F.conv2d(conv_t, conv_t1)

BenchmarkTools.Trial: 
  memory estimate:  400 bytes
  allocs estimate:  10
  --------------
  minimum time:     13.514 ms (0.00% GC)
  median time:      14.684 ms (0.00% GC)
  mean time:        15.697 ms (0.00% GC)
  maximum time:     29.406 ms (0.00% GC)
  --------------
  samples:          319
  evals/sample:     1

In [22]:
## Tensorflow

conv_tf = tf.random.uniform((10,128,128,10))
conv_tf1 = tf.random.uniform((2,2,10,30))
@benchmark tf.nn.conv2d(conv_tf, conv_tf1, strides=(1,1,1,1), padding="VALID")

KeyError: KeyError: key "random" not found

### 2.2 DepthwiseConv

In [8]:
## Flux

depthconv = DepthwiseConv((2,2), 10=>30)
depthconv_ = rand(128,128,10,10)
@benchmark depthconv(depthconv_)

BenchmarkTools.Trial: 
  memory estimate:  73.83 MiB
  allocs estimate:  46
  --------------
  minimum time:     47.266 ms (0.00% GC)
  median time:      56.215 ms (13.02% GC)
  mean time:        59.748 ms (16.10% GC)
  maximum time:     162.613 ms (69.70% GC)
  --------------
  samples:          84
  evals/sample:     1

In [9]:
## Pytorch

# DepthwiseConv: "when groups == in_channels and out_channels = K* in_channles where K is a +ve int"

py"""
import torch
import torch.nn as nn
class DepthwiseConv(torch.nn.Module):
    def __init__(self, in_c, K, out_c):
        super().__init__()
        self.depthwise = nn.Conv2d(in_c, in_c*K, kernel_size=3, padding=1, groups=in_c)
        self.pointwise = nn.Conv2d(in_c*K, out_c, kernel_size=2)
    def forward(self, x):
        out = self.depthwise(x) 
        out = self.pointwise(out)
        return out
"""
depthconv_t = py"DepthwiseConv(10, 2, 30)"
@benchmark depthconv_t(torch.rand(10, 10, 128, 128))


BenchmarkTools.Trial: 
  memory estimate:  688 bytes
  allocs estimate:  20
  --------------
  minimum time:     57.272 ms (0.00% GC)
  median time:      69.433 ms (0.00% GC)
  mean time:        90.951 ms (0.00% GC)
  maximum time:     521.443 ms (0.00% GC)
  --------------
  samples:          58
  evals/sample:     1

In [10]:
## Tensorflow

depthconv_tf = tf.random.uniform((10,128,128,10))
depthconv_tf1 = tf.random.uniform((2,2,10,30))

@benchmark tf.nn.conv2d(depthconv_tf, depthconv_tf1, strides=(1,1,1,1), padding="SAME")

BenchmarkTools.Trial: 
  memory estimate:  2.20 KiB
  allocs estimate:  46
  --------------
  minimum time:     956.001 μs (0.00% GC)
  median time:      1.036 ms (0.00% GC)
  mean time:        1.260 ms (0.00% GC)
  maximum time:     593.603 ms (0.00% GC)
  --------------
  samples:          3949
  evals/sample:     1

### 2.3 ConvTranspose

In [11]:
## Flux

convtranspose = ConvTranspose((2,2), 10=>30)
convtranspose_ = rand(128, 128, 10, 10)
@benchmark convtranspose(convtranspose_)

BenchmarkTools.Trial: 
  memory estimate:  114.28 MiB
  allocs estimate:  108
  --------------
  minimum time:     317.023 ms (3.42% GC)
  median time:      331.560 ms (3.31% GC)
  mean time:        338.793 ms (3.16% GC)
  maximum time:     384.906 ms (2.79% GC)
  --------------
  samples:          15
  evals/sample:     1

In [12]:
## Pytorch

convtranspose_t = torch.randn(10,10,128,128)
convtranspose_t1 = torch.randn(10,30,2,2)

@benchmark F.conv_transpose2d(convtranspose_t, convtranspose_t1)

BenchmarkTools.Trial: 
  memory estimate:  416 bytes
  allocs estimate:  10
  --------------
  minimum time:     50.390 ms (0.00% GC)
  median time:      58.173 ms (0.00% GC)
  mean time:        59.849 ms (0.00% GC)
  maximum time:     85.969 ms (0.00% GC)
  --------------
  samples:          84
  evals/sample:     1

In [13]:
## Tensorflow

convtranspose_tf = tf.random.uniform((10,128,128,10))
convtranspose_tf1 = tf.random.uniform((2,2,10,30))
@benchmark tf.nn.conv2d(convtranspose_tf, convtranspose_tf1, strides=(1,1,1,1), padding="SAME")

BenchmarkTools.Trial: 
  memory estimate:  2.20 KiB
  allocs estimate:  46
  --------------
  minimum time:     941.600 μs (0.00% GC)
  median time:      1.005 ms (0.00% GC)
  mean time:        1.068 ms (0.00% GC)
  maximum time:     104.443 ms (0.00% GC)
  --------------
  samples:          4659
  evals/sample:     1

### 2.4 Dense

In [14]:
## Flux

dense = Dense(4096, 1000)
dense_ = rand(4096)
@benchmark dense(dense_)

BenchmarkTools.Trial: 
  memory estimate:  24.25 KiB
  allocs estimate:  3
  --------------
  minimum time:     521.500 μs (0.00% GC)
  median time:      730.200 μs (0.00% GC)
  mean time:        808.073 μs (0.43% GC)
  maximum time:     11.835 ms (91.00% GC)
  --------------
  samples:          6109
  evals/sample:     1

In [15]:
## Pytorch

dense_t = torch.randn(1, 4096)
dense_t1 = torch.randn(1000, 4096)

@benchmark F.linear(dense_t, dense_t1)

BenchmarkTools.Trial: 
  memory estimate:  400 bytes
  allocs estimate:  10
  --------------
  minimum time:     355.999 μs (0.00% GC)
  median time:      481.501 μs (0.00% GC)
  mean time:        527.187 μs (0.00% GC)
  maximum time:     3.968 ms (0.00% GC)
  --------------
  samples:          9389
  evals/sample:     1

In [16]:
## Tensorflow: Keras

dense_tf = layers.Dense(1000)
dense_tf1 = tf.random.uniform((1, 4096))

@benchmark dense_tf(dense_tf1)

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  6
  --------------
  minimum time:     2.001 ms (0.00% GC)
  median time:      2.181 ms (0.00% GC)
  mean time:        2.300 ms (0.00% GC)
  maximum time:     8.206 ms (0.00% GC)
  --------------
  samples:          2164
  evals/sample:     1

### 2.5 LSTM

In [17]:
## Flux

lstm = LSTM(512, 128)
lstm_ = rand(512, 64)
@benchmark lstm(lstm_)

BenchmarkTools.Trial: 
  memory estimate:  1.38 MiB
  allocs estimate:  40
  --------------
  minimum time:     16.665 ms (0.00% GC)
  median time:      17.566 ms (0.00% GC)
  mean time:        18.129 ms (0.73% GC)
  maximum time:     31.251 ms (24.65% GC)
  --------------
  samples:          276
  evals/sample:     1

In [18]:
## Pytorch

lstm_t = torch.nn.LSTM(512, 128, 1)
lstm_t1 = torch.rand(50, 64, 512) #seq length, batch, input size 
# h0 = torch.rand(1, 64, 128) not passed in for fairness in benchmarking
# c0 = torch.rand(1, 64, 128)
@benchmark lstm_t(lstm_t1)

BenchmarkTools.Trial: 
  memory estimate:  1.59 KiB
  allocs estimate:  42
  --------------
  minimum time:     28.388 ms (0.00% GC)
  median time:      36.463 ms (0.00% GC)
  mean time:        40.340 ms (0.00% GC)
  maximum time:     66.987 ms (0.00% GC)
  --------------
  samples:          124
  evals/sample:     1

In [19]:
## Tensorflow: Keras

lstm_tf = layers.LSTM(128)
lstm_tf1 = tf.random.uniform((50, 64, 512))

@benchmark lstm_tf(lstm_tf1)

BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  6
  --------------
  minimum time:     133.507 ms (0.00% GC)
  median time:      141.474 ms (0.00% GC)
  mean time:        146.397 ms (0.00% GC)
  maximum time:     175.156 ms (0.00% GC)
  --------------
  samples:          35
  evals/sample:     1

### 2.6 RNN

In [35]:
## Flux

rnn = RNN(512, 128)
rnn_ = rand(512, 64)
@benchmark rnn(rnn_)

BenchmarkTools.Trial: 
  memory estimate:  193.02 KiB
  allocs estimate:  23
  --------------
  minimum time:     4.257 ms (0.00% GC)
  median time:      4.694 ms (0.00% GC)
  mean time:        4.770 ms (0.56% GC)
  maximum time:     22.202 ms (75.12% GC)
  --------------
  samples:          1046
  evals/sample:     1

In [21]:
## Pytorch

rnn_t = torch.nn.RNNCell(512, 128)
rnn_t1 = torch.rand(64, 512)
@benchmark rnn_t(rnn_t1)

BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  6
  --------------
  minimum time:     135.200 μs (0.00% GC)
  median time:      217.500 μs (0.00% GC)
  mean time:        234.747 μs (0.00% GC)
  maximum time:     864.299 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

In [22]:
## Tensorflow: Keras

rnn_tf = layers.SimpleRNN(128)
rnn_tf1 = tf.random.uniform((64, 1, 512))
@benchmark rnn_tf(rnn_tf1)

BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  6
  --------------
  minimum time:     57.682 ms (0.00% GC)
  median time:      60.417 ms (0.00% GC)
  mean time:        64.971 ms (0.00% GC)
  maximum time:     291.815 ms (0.00% GC)
  --------------
  samples:          77
  evals/sample:     1

### 2.7 Normalisation Layers: BatchNorm, GroupNorm, LayerNorm

In [23]:
## Flux: BatchNorm

bnorm = BatchNorm(10)
bnorm_ = rand(128,128,10,10)
@benchmark bnorm(bnorm_)

BenchmarkTools.Trial: 
  memory estimate:  25.00 MiB
  allocs estimate:  18
  --------------
  minimum time:     12.471 ms (0.00% GC)
  median time:      13.440 ms (0.00% GC)
  mean time:        16.412 ms (14.97% GC)
  maximum time:     38.760 ms (46.59% GC)
  --------------
  samples:          305
  evals/sample:     1

In [24]:
## Pytorch: BatchNorm

bnorm_t = torch.nn.BatchNorm2d(10)
bnorm_t1 = torch.rand(10,10,128,128)
@benchmark bnorm_t(bnorm_t1)

BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  6
  --------------
  minimum time:     4.254 ms (0.00% GC)
  median time:      5.100 ms (0.00% GC)
  mean time:        5.932 ms (0.00% GC)
  maximum time:     13.514 ms (0.00% GC)
  --------------
  samples:          842
  evals/sample:     1

In [25]:
## Tensorflow: Keras: BatchNorm

bnorm_tf = layers.BatchNormalization()
bnorm_tf1 = tf.random.uniform((10,128,128,10))
@benchmark bnorm_tf(bnorm_tf1)

BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  6
  --------------
  minimum time:     24.724 ms (0.00% GC)
  median time:      25.608 ms (0.00% GC)
  mean time:        28.043 ms (0.00% GC)
  maximum time:     342.109 ms (0.00% GC)
  --------------
  samples:          179
  evals/sample:     1

In [26]:
## Flux: GroupNorm

gnorm = GroupNorm(6, 3)
gnorm_ = rand(128, 128, 6, 10)
@benchmark gnorm(gnorm_)

BenchmarkTools.Trial: 
  memory estimate:  15.00 MiB
  allocs estimate:  43
  --------------
  minimum time:     13.730 ms (0.00% GC)
  median time:      14.583 ms (0.00% GC)
  mean time:        16.322 ms (9.08% GC)
  maximum time:     30.292 ms (41.70% GC)
  --------------
  samples:          307
  evals/sample:     1

In [27]:
## Pytorch: GroupNorm

gnorm_t = torch.nn.GroupNorm(3,6)
gnorm_t1 = torch.rand(10,6,128,128)
@benchmark gnorm_t(gnorm_t1)

BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  6
  --------------
  minimum time:     2.821 ms (0.00% GC)
  median time:      4.287 ms (0.00% GC)
  mean time:        4.570 ms (0.00% GC)
  maximum time:     26.590 ms (0.00% GC)
  --------------
  samples:          1094
  evals/sample:     1

In [28]:
## Tensorfow: GroupNorm

gnorm_tf1 = tf.random.uniform((10,128,128,6))
@benchmark tf.contrib.layers.group_norm(gnorm_tf1, groups=6)

BenchmarkTools.Trial: 
  memory estimate:  1.70 KiB
  allocs estimate:  34
  --------------
  minimum time:     17.839 ms (0.00% GC)
  median time:      18.885 ms (0.00% GC)
  mean time:        19.457 ms (0.00% GC)
  maximum time:     37.315 ms (0.00% GC)
  --------------
  samples:          257
  evals/sample:     1

In [29]:
## Flux: LayerNorm

lnorm = LayerNorm(10)
lnorm_ = rand(1,10)
@benchmark lnorm(lnorm_)

BenchmarkTools.Trial: 
  memory estimate:  1.50 KiB
  allocs estimate:  5
  --------------
  minimum time:     529.468 ns (0.00% GC)
  median time:      627.105 ns (0.00% GC)
  mean time:        1.002 μs (13.28% GC)
  maximum time:     54.320 μs (97.26% GC)
  --------------
  samples:          10000
  evals/sample:     190

In [30]:
## Pytorch: LayerNorm

lnorm_t1 = torch.rand(10,10,128,128)
lnorm_t = torch.nn.LayerNorm(lnorm_t1.size()[1:end])
@benchmark lnorm_t(lnorm_t1)

BenchmarkTools.Trial: 
  memory estimate:  192 bytes
  allocs estimate:  6
  --------------
  minimum time:     5.367 ms (0.00% GC)
  median time:      5.913 ms (0.00% GC)
  mean time:        6.099 ms (0.00% GC)
  maximum time:     14.502 ms (0.00% GC)
  --------------
  samples:          819
  evals/sample:     1

In [31]:
## Tensorfow: LayerNorm

lnorm_tf1 = tf.random.uniform((10,10,128,128))
@benchmark tf.contrib.layers.layer_norm(lnorm_tf1)

BenchmarkTools.Trial: 
  memory estimate:  720 bytes
  allocs estimate:  18
  --------------
  minimum time:     12.861 ms (0.00% GC)
  median time:      13.924 ms (0.00% GC)
  mean time:        14.670 ms (0.00% GC)
  maximum time:     30.908 ms (0.00% GC)
  --------------
  samples:          341
  evals/sample:     1

### 2.8 CrossCor

In [32]:
## Flux

cc = CrossCor((2,2), 10=>30)
cc_ = rand(128,128,10,10)
@benchmark cc(cc_)

BenchmarkTools.Trial: 
  memory estimate:  73.84 MiB
  allocs estimate:  52
  --------------
  minimum time:     257.073 ms (0.00% GC)
  median time:      275.567 ms (1.40% GC)
  mean time:        275.394 ms (2.40% GC)
  maximum time:     310.232 ms (4.88% GC)
  --------------
  samples:          19
  evals/sample:     1

In [33]:
## Pytorch: Conv2d is already cross correlation

cc_t = torch.randn(10,10,128,128)
cc_t1 = torch.randn(30,10,2,2)

@benchmark F.conv2d(cc_t, cc_t1)

BenchmarkTools.Trial: 
  memory estimate:  400 bytes
  allocs estimate:  10
  --------------
  minimum time:     16.443 ms (0.00% GC)
  median time:      18.500 ms (0.00% GC)
  mean time:        18.884 ms (0.00% GC)
  maximum time:     25.857 ms (0.00% GC)
  --------------
  samples:          265
  evals/sample:     1

In [34]:
## Tensorflow

cc_tf1 = tf.random.uniform((10, 128, 128, 10))
cc_tf2 = tf.random.uniform((2,2,10, 30))
@benchmark tf.nn.convolution(cc_tf1, cc_tf2, padding="VALID")

BenchmarkTools.Trial: 
  memory estimate:  1.52 KiB
  allocs estimate:  28
  --------------
  minimum time:     973.900 μs (0.00% GC)
  median time:      1.049 ms (0.00% GC)
  mean time:        1.125 ms (0.00% GC)
  maximum time:     3.419 ms (0.00% GC)
  --------------
  samples:          4416
  evals/sample:     1

# Conclusion

Trial 1: Higher Dimensional Data:

**Function**|**Flux runtime**|**Pytorch Runtime**|**Tensorflow Runtime**|**Flux Memory**|**Pytorch Memory**|**Tensorflow Memory**
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:
Conv|294.153 ms|17.902 ms|1.057 ms|73.83 MiB|400 B|2.20 KiB
DepthwiseConv|59.748 ms|90.951 ms**|1.260 ms|73.83 MiB|688 B|2.20 KiB
ConvTranspose|338.793 ms|59.849 ms|1.068 ms|114.28 MiB|416 B|2.20 KiB
Dense|808.073 μs|527.187 μs|2.300 ms|24.25 KiB|400 B|192 B
LSTM|18.129 ms|40.340 ms|146.397 ms|1.38 MiB|1.59 KiB|192 B
RNN|4.770 ms|234.747 μs|64.971 ms|193.02 KiB|192 B|192 B
BatchNorm|16.412 ms|5.932 ms|28.043 ms|25.00 MiB|192 B|192 B
GroupNorm|16.322 ms|4.570 ms|19.457 ms|15.00 MiB|192 B|1.70 KiB
LayerNorm|1.002 μs|6.099 ms|14.670 ms|1.50 KiB|192 B|720 B
CrossCor|275.394 ms|18.884 ms|1.125 ms|73.84 MiB|400 B|1.52 KiB

Trial 0: Lower Dimensionality Data:

**Function**|**Flux runtime**|**Pytorch Runtime**|**Tensorflow Runtime**|**Flux Memory**|**Pytorch Memory**|**Tensorflow Memory**
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:|:-----:
Conv|7.011 μs|31.213 μs|962.888 μs|3.80 KiB|400 B|2.20 KiB
DepthwiseConv|11.024 μs|120.869 μs**|899.810 μs|11.30 KiB|688 B|2.20 KiB
ConvTranspose|18.949 μs|25.888 μs|914.133 μs|19.83 KiB|416 B|2.20 KiB
Dense|432.454 ns|12.770 μs|2.172 ms|592 B|400 B|192 B
LSTM|19.842 ms|73.434 ms|217.620 ms|1.38 MiB|1.59 KiB|192 B
RNN|4.324 ms|272.183 μs|72.779 ms|193.02 KiB|192 B|192 B
BatchNorm|1.430 μs|132.111 μs|26.376 ms |2.28 KiB|192 B|192 B
GroupNorm|4.974 μs|68.747 μs|19.314 ms|2.81 KiB|192 B|1.70 KiB
LayerNorm|704.978 ns|20.479 μs|15.888 ms|880 B|192 B|720 B
CrossCor|7.688 μs|39.505 μs|1.772 ms|4.00 KiB|400 B|1.52 KiB

**_This function was not implemented in Pytorch so I had to use a custom implementation, written by Trevor Standley [https://discuss.pytorch.org/t/using-optimised-depthwise-convolutions/11819/15]._


After rigorous testing, it is seen that Flux outperforms both Pytorch and Tensorflow in runtime for low dimensional data, executing functions several magnitudes faster. However, this advantage is not maintained when working with high dimensionality data representative of real-world use cases, where Flux performs generally performs worse than Pytorch and Tensorflow. This might be indicative of a scaling problem for data of higher dimensions. In terms of space efficiency, Pytorch performs the best, while Flux and Tensorflow achieve similar performance.