# CPU 

- Install Spark NLP in your Databricks cluster
    - In Libraries tab inside your cluster you need to follow these steps:
    - Install New -> PyPI -> spark-nlp==4.1.0 -> Install
    - Install New -> Maven -> Coordinates -> com.johnsnowlabs.nlp:spark-nlp_2.12:4.1.0 -> Install
- Will add `TF_ENABLE_ONEDNN_OPTS=1` to `Cluster->Advacend Options->Spark->Environment variables` to enable oneDNN

Databricks:
* Runtime: `11.1 ML (includes Apache Spark 3.3.0, Scala 2.12)`
* Cluster mode: `Multi Node`
* Executors: `m5n.8xlarge 128 GB Memory, 32 Cores`

In [None]:
from sparknlp.annotator import *
from sparknlp.base import *
from timeit import default_timer as timer

imageNetDataset = spark.read\
      .format("image")\
      .option("dropInvalid", value = True)\
      .load("dbfs:/maziyar/datasets/imagenet-mini/")

image_assembler = ImageAssembler() \
    .setInputCol("image") \
    .setOutputCol("image_assembler")

imageClassifier = ViTForImageClassification \
    .pretrained("image_classifier_vit_base_patch16_224") \
    .setInputCols("image_assembler") \
    .setOutputCol("class") \
    .setBatchSize(16)

pipeline = Pipeline(stages=[
  image_assembler,
  imageClassifier,
])

model = pipeline.fit(imageNetDataset)
pipelineDF = model.transform(imageNetDataset)

### 2 Workers:  256 GB Memory 64 Cores

In [None]:
start = timer()
total_count = pipelineDF.select("class.result").count()
end = timer() - start
print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 549.6623425070002 seconds to finish computing 34742 images with batch size 16


### 4 Workers: 512 GB Memory 128 Cores

In [None]:
start = timer()
total_count = pipelineDF.select("class.result").count()
end = timer() - start
print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 288.5531326099999 seconds to finish computing 34744 images with batch size 16


### 8 Workers: 1024 GB Memory 256 Cores

In [None]:
start = timer()
total_count = pipelineDF.select("class.result").count()
end = timer() - start
print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 161.2116333849999 seconds to finish computing 34742 images with batch size 16


### 10 Workers: 1280 GB Memory 320 Cores

In [None]:
start = timer()
total_count = pipelineDF.select("class.result").count()
end = timer() - start
print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 111.69446445700032 seconds to finish computing 34742 images with batch size 16


## GPU 
- Install Spark NLP in your Databricks cluster
    - In Libraries tab inside your cluster you need to follow these steps:
    - Install New -> PyPI -> spark-nlp==4.1.0 -> Install
    - Install New -> Maven -> Coordinates -> com.johnsnowlabs.nlp:spark-nlp-gpu_2.12:4.1.0 -> Install

Databricks:
* Runtime: `11.1 ML (includes Apache Spark 3.3.0, GPU, Scala 2.12)`
* Cluster mode: `Multi Node`
* Executors: `g4dn.8xlarge 128 GB Memory, 1 GPU`

The only config I have set is `spark.task.resource.gpu.amount` to control the number of tasks per machine (considering 1 GPU and 32 cores per node)
https://docs.databricks.com/clusters/gpu.html

In [None]:
from sparknlp.annotator import *
from sparknlp.base import *
from timeit import default_timer as timer

imageNetDataset = spark.read\
      .format("image")\
      .option("dropInvalid", value = True)\
      .load("dbfs:/maziyar/datasets/imagenet-mini/")

image_assembler = ImageAssembler() \
    .setInputCol("image") \
    .setOutputCol("image_assembler")

imageClassifier = ViTForImageClassification \
    .pretrained("image_classifier_vit_base_patch16_224") \
    .setInputCols("image_assembler") \
    .setOutputCol("class")

In [None]:
imageNetDataset.printSchema()

root
 |-- image: struct (nullable = true)
 |    |-- origin: string (nullable = true)
 |    |-- height: integer (nullable = true)
 |    |-- width: integer (nullable = true)
 |    |-- nChannels: integer (nullable = true)
 |    |-- mode: integer (nullable = true)
 |    |-- data: binary (nullable = true)



### 2 Workers: 2x NVIDIA T4 GPU 16GB

In [None]:
task_per_gpu = float(spark.conf.get("spark.task.resource.gpu.amount"))
executors = int(spark.conf.get("spark.databricks.clusterUsageTags.clusterTargetWorkers"))
gpu_instance = spark.conf.get("spark.databricks.clusterUsageTags.clusterNodeType")

partitions = int(imageNetDataset.rdd.getNumPartitions())
total_tasks = (1/task_per_gpu) * executors

print(f'GPU instance type of {gpu_instance}')
print(f'{task_per_gpu} task gpu amount')
print(f'{executors} executors')
print(f'total partitions {partitions}')
print(f'total running tasks {total_tasks} - tasks per machines {total_tasks/executors}')
print(f'Running tasks to partition ratio is {partitions/total_tasks:0.0f}')

GPU instance type of g4dn.8xlarge
0.04 task gpu amount
2 executors
total partitions 1099
total running tasks 50.0 - tasks per machines 25.0
Running tasks to partition ratio is 22


In [None]:
from timeit import default_timer as timer

for b in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]:
    imageClassifier.setBatchSize(b)

    pipeline = Pipeline(stages=[
      image_assembler,
      imageClassifier,
    ])

    model = pipeline.fit(imageNetDataset)
    pipelineDF = model.transform(imageNetDataset)

    start = timer()
    total_count = pipelineDF.select("class.result").count()
    end = timer() - start
    print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 319.87963630299964 seconds to finish computing 34742 images with batch size 1
took 259.5521752349996 seconds to finish computing 34742 images with batch size 2
took 235.64105476499935 seconds to finish computing 34742 images with batch size 4
took 231.89560089599945 seconds to finish computing 34742 images with batch size 8
took 231.2127806669996 seconds to finish computing 34742 images with batch size 16
took 237.9331510810007 seconds to finish computing 34742 images with batch size 32
took 238.78765448600097 seconds to finish computing 34742 images with batch size 64
took 234.17468454200025 seconds to finish computing 34742 images with batch size 128
took 232.502572753001 seconds to finish computing 34742 images with batch size 256
took 240.10180371500064 seconds to finish computing 34742 images with batch size 512


### 4 Workers: 4x NVIDIA T4 GPU 16GB

In [None]:
task_per_gpu = float(spark.conf.get("spark.task.resource.gpu.amount"))
executors = int(spark.conf.get("spark.databricks.clusterUsageTags.clusterTargetWorkers"))
gpu_instance = spark.conf.get("spark.databricks.clusterUsageTags.clusterNodeType")

partitions = int(imageNetDataset.rdd.getNumPartitions())
total_tasks = (1/task_per_gpu) * executors

print(f'GPU instance type of {gpu_instance}')
print(f'{task_per_gpu} task gpu amount')
print(f'{executors} executors')
print(f'total partitions {partitions}')
print(f'total tasks {total_tasks} - tasks per machines {total_tasks/executors}')
print(f'All tasks finish in {partitions/total_tasks:0.0f} try')

GPU instance type of g4dn.8xlarge
0.04 task gpu amount
4 executors
total partitions 1099
total tasks 100.0 - tasks per machines 25.0
All tasks finish in 11 try


In [None]:
from timeit import default_timer as timer

for b in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]:
    imageClassifier.setBatchSize(b)

    pipeline = Pipeline(stages=[
      image_assembler,
      imageClassifier,
    ])

    model = pipeline.fit(imageNetDataset)
    pipelineDF = model.transform(imageNetDataset)

    start = timer()
    total_count = pipelineDF.select("class.result").count()
    end = timer() - start
    print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 173.41131163000045 seconds to finish computing 34742 images with batch size 1
took 129.19329973999993 seconds to finish computing 34742 images with batch size 2
took 117.66454362900004 seconds to finish computing 34742 images with batch size 4
took 119.80921282099916 seconds to finish computing 34742 images with batch size 8
took 122.0702281770009 seconds to finish computing 34742 images with batch size 16
took 126.85776031200112 seconds to finish computing 34742 images with batch size 32
took 125.29146863700043 seconds to finish computing 34742 images with batch size 64
took 125.21238900500066 seconds to finish computing 34742 images with batch size 128
took 123.46789567799897 seconds to finish computing 34742 images with batch size 256
took 127.55372481199993 seconds to finish computing 34742 images with batch size 512


### 8 Workers: 8x NVIDIA T4 GPU 16GB

In [None]:
task_per_gpu = float(spark.conf.get("spark.task.resource.gpu.amount"))
executors = int(spark.conf.get("spark.databricks.clusterUsageTags.clusterTargetWorkers"))
gpu_instance = spark.conf.get("spark.databricks.clusterUsageTags.clusterNodeType")

partitions = int(imageNetDataset.rdd.getNumPartitions())
total_tasks = (1/task_per_gpu) * executors

print(f'GPU instance type of {gpu_instance}')
print(f'{task_per_gpu} task gpu amount')
print(f'{executors} executors')
print(f'total partitions {partitions}')
print(f'total tasks {total_tasks} - tasks per machines {total_tasks/executors}')
print(f'All tasks finish in {partitions/total_tasks:0.0f} try')

GPU instance type of g4dn.8xlarge
0.04 task gpu amount
8 executors
total partitions 1099
total tasks 200.0 - tasks per machines 25.0
All tasks finish in 5 try


In [None]:
from timeit import default_timer as timer

for b in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]:
    imageClassifier.setBatchSize(b)

    pipeline = Pipeline(stages=[
      image_assembler,
      imageClassifier,
    ])

    model = pipeline.fit(imageNetDataset)
    pipelineDF = model.transform(imageNetDataset)

    start = timer()
    total_count = pipelineDF.select("class.result").count()
    end = timer() - start
    print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 101.19505050400039 seconds to finish computing 34742 images with batch size 1
took 66.77397438299977 seconds to finish computing 34742 images with batch size 2
took 61.54780982599914 seconds to finish computing 34742 images with batch size 4
took 61.45625800400012 seconds to finish computing 34742 images with batch size 8
took 62.51468649199978 seconds to finish computing 34742 images with batch size 16
took 65.90733181500036 seconds to finish computing 34742 images with batch size 32
took 72.0166545129996 seconds to finish computing 34742 images with batch size 64
took 64.86596081300013 seconds to finish computing 34742 images with batch size 128
took 66.45860373800042 seconds to finish computing 34742 images with batch size 256
took 65.32813091099979 seconds to finish computing 34742 images with batch size 512


### 10 Workers: 10x NVIDIA T4 GPU 16GB

In [None]:
task_per_gpu = float(spark.conf.get("spark.task.resource.gpu.amount"))
executors = int(spark.conf.get("spark.databricks.clusterUsageTags.clusterTargetWorkers"))
gpu_instance = spark.conf.get("spark.databricks.clusterUsageTags.clusterNodeType")

partitions = int(imageNetDataset.rdd.getNumPartitions())
total_tasks = (1/task_per_gpu) * executors

print(f'GPU instance type of {gpu_instance}')
print(f'{task_per_gpu} task gpu amount')
print(f'{executors} executors')
print(f'total partitions {partitions}')
print(f'total tasks {total_tasks} - tasks per machines {total_tasks/executors}')
print(f'All tasks finish in {partitions/total_tasks:0.0f} try')

GPU instance type of g4dn.8xlarge
0.04 task gpu amount
10 executors
total partitions 1099
total tasks 250.0 - tasks per machines 25.0
All tasks finish in 4 try


In [None]:
from timeit import default_timer as timer

for b in [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]:
    imageClassifier.setBatchSize(b)

    pipeline = Pipeline(stages=[
      image_assembler,
      imageClassifier,
    ])

    model = pipeline.fit(imageNetDataset)
    pipelineDF = model.transform(imageNetDataset)

    start = timer()
    total_count = pipelineDF.select("class.result").count()
    end = timer() - start
    print(f'took {end} seconds to finish computing {total_count} images with batch size {imageClassifier.getBatchSize()}')

took 88.49770859 seconds to finish computing 34743 images with batch size 1
took 55.147554744000445 seconds to finish computing 34742 images with batch size 2
took 50.35473146100048 seconds to finish computing 34742 images with batch size 4
took 50.822814176000065 seconds to finish computing 34744 images with batch size 8
took 53.320964721000564 seconds to finish computing 34742 images with batch size 16
took 55.740571029999955 seconds to finish computing 34742 images with batch size 32
took 53.99739717600005 seconds to finish computing 34743 images with batch size 64
took 54.02455361400007 seconds to finish computing 34742 images with batch size 128
took 53.79816353100068 seconds to finish computing 34742 images with batch size 256
took 53.49764558999959 seconds to finish computing 34743 images with batch size 512
