CPU with oneDNN 
- Install Spark NLP in your Databricks cluster
    - In Libraries tab inside your cluster you need to follow these steps:
    - Install New -> PyPI -> spark-nlp==4.1.0 -> Install
    - Install New -> Maven -> Coordinates -> com.johnsnowlabs.nlp:spark-nlp_2.12:4.1.0 -> Install
- Will add `TF_ENABLE_ONEDNN_OPTS=1` to `Cluster->Advacend Options->Spark->Environment variables` to enable oneDNN

Databricks:
* Runtime: `11.1 ML (includes Apache Spark 3.3.0, Scala 2.12)`
* Cluster mode: `Single Node`
* Specs: `m5n.8xlarge 128 GB Memory, 32 Cores`

In [None]:
imageNetDatasetSample = spark.read\
      .format("image")\
      .option("dropInvalid", value = True)\
      .load("dbfs:/maziyar/datasets/imagenet-mini-sample/")

In [None]:
from sparknlp.annotator import *
from sparknlp.base import *

image_assembler = ImageAssembler() \
    .setInputCol("image") \
    .setOutputCol("image_assembler")

imageClassifier = ViTForImageClassification \
    .pretrained("image_classifier_vit_base_patch16_224") \
    .setInputCols("image_assembler") \
    .setOutputCol("class")

In [None]:
# print 10 randome classes/labels from this ViT model - it has 1000 classes in total
print(imageClassifier.getClasses()[:10])

['turnstile', 'damselfly', 'mixing bowl', 'sea snake', 'cockroach, roach', 'buckle', 'beer glass', 'bulbul', 'lumbermill, sawmill', 'whippet']


In [None]:
from timeit import default_timer as timer

for b in [1, 2, 4, 8, 16, 32, 64, 128]:
    imageClassifier.setBatchSize(b)

    pipeline = Pipeline(stages=[
      image_assembler,
      imageClassifier,
    ])

    model = pipeline.fit(imageNetDatasetSample)
    pipelineDF = model.transform(imageNetDatasetSample)

    start = timer()
    total_count = pipelineDF.select("class.result").count()
    end = timer() - start
    print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 143.796645 seconds to finish computing 3544 images with batch size 1
took 130.303482 seconds to finish computing 3544 images with batch size 2
took 112.507706 seconds to finish computing 3544 images with batch size 4
took 114.606424 seconds to finish computing 3544 images with batch size 8
took 110.915875 seconds to finish computing 3544 images with batch size 16
took 117.404970 seconds to finish computing 3544 images with batch size 32
took 116.658057 seconds to finish computing 3544 images with batch size 64
took 116.817913 seconds to finish computing 3544 images with batch size 128


### Larger Dataset on CPU 
with oneDNN enabled

In [None]:
imageNetDataset = spark.read\
      .format("image")\
      .option("dropInvalid", value = True)\
      .load("dbfs:/maziyar/datasets/imagenet-mini/")

In [None]:
from timeit import default_timer as timer

imageClassifier.setBatchSize(16)

pipeline = Pipeline(stages=[
  image_assembler,
  imageClassifier,
])

model = pipeline.fit(imageNetDataset)
pipelineDF = model.transform(imageNetDataset)

start = timer()
total_count = pipelineDF.select("class.result").count()
end = timer() - start
print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 1071.917992 seconds to finish computing 34742 images with batch size 16


In [None]:
%sh
lscpu
free -h
nvidia-smi

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          32
On-line CPU(s) list:             0-31
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping:                        7
CPU MHz:                         3110.696
BogoMIPS:                        4999.98
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        16 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):           

## GPU 
- Install Spark NLP in your Databricks cluster
    - In Libraries tab inside your cluster you need to follow these steps:
    - Install New -> PyPI -> spark-nlp==4.1.0 -> Install
    - Install New -> Maven -> Coordinates -> com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.1.0 -> Install

Databricks:
* Runtime: `11.1 ML (includes Apache Spark 3.3.0, GPU, Scala 2.12)`
* Cluster mode: `Single Node`
* Specs: `g4dn.8xlarge 128 GB Memory, 1 GPU`

In [None]:
from sparknlp.annotator import *
from sparknlp.base import *
from timeit import default_timer as timer

imageNetDatasetSample = spark.read\
      .format("image")\
      .option("dropInvalid", value = True)\
      .load("dbfs:/maziyar/datasets/imagenet-mini-sample/")

image_assembler = ImageAssembler() \
    .setInputCol("image") \
    .setOutputCol("image_assembler")

imageClassifier = ViTForImageClassification \
    .pretrained("image_classifier_vit_base_patch16_224") \
    .setInputCols("image_assembler") \
    .setOutputCol("class")

for b in [4, 8, 16, 32, 64, 128, 256, 512, 1024]:
    imageClassifier.setBatchSize(b)

    pipeline = Pipeline(stages=[
      image_assembler,
      imageClassifier,
    ])

    model = pipeline.fit(imageNetDatasetSample)
    pipelineDF = model.transform(imageNetDatasetSample)

    start = timer()
    total_count = pipelineDF.select("class.result").count()
    end = timer() - start
    print(f'took {end:2f} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 65.660726 seconds to finish computing 3544 images with batch size 4
took 47.099872 seconds to finish computing 3544 images with batch size 8
took 54.175194 seconds to finish computing 3544 images with batch size 16
took 55.870537 seconds to finish computing 3544 images with batch size 32
took 68.032822 seconds to finish computing 3544 images with batch size 64
took 53.325992 seconds to finish computing 3544 images with batch size 128
took 53.692435 seconds to finish computing 3544 images with batch size 256
took 57.822982 seconds to finish computing 3544 images with batch size 512
took 54.753315 seconds to finish computing 3544 images with batch size 1024


### Larger Dataset on GPU

In [None]:
imageNetDataset = spark.read\
      .format("image")\
      .option("dropInvalid", value = True)\
      .load("dbfs:/maziyar/datasets/imagenet-mini/")

In [None]:
imageClassifier.setBatchSize(8)

pipeline = Pipeline(stages=[
  image_assembler,
  imageClassifier,
])

model = pipeline.fit(imageNetDataset)
pipelineDF = model.transform(imageNetDataset)

start = timer()
total_count = pipelineDF.select("class.result").count()
end = timer() - start
print(f'took {end:2f} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 435.000940 seconds to finish computing 34742 images with batch size 8


In [None]:
%sh
lscpu
free -h
nvidia-smi

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          32
On-line CPU(s) list:             0-31
Thread(s) per core:              2
Core(s) per socket:              16
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
Stepping:                        7
CPU MHz:                         3100.771
BogoMIPS:                        4999.99
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       512 KiB
L1i cache:                       512 KiB
L2 cache:                        16 MiB
L3 cache:                        35.8 MiB
NUMA node0 CPU(s):           