<h1>Distributed vs Non-Distributed Benchmark</h1>

This test will be used to benchmark a direct comparison of processing time for a task between a singleuser instance and a dask-work cluster.

<h2>Dask Gateway</h2>
Dask Gateway provides a secure, multi-tenant server for managing Dask clusters. It allows users to launch and use Dask clusters in a shared, centrally managed cluster environment, without requiring users to have direct access to the underlying cluster backend (e.g. Kubernetes, Hadoop/YARN, HPC Job queues, etc…).

Dask Gateway is one of many options for deploying Dask clusters, see Deploying Dask in the Dask documentation for an overview of additional options.

<h3>Highlights</h3>

* Centrally Managed: Administrators do the heavy lifting of configuring the Gateway, users simply connect to the Gateway to get a new cluster. Eases deployment, and allows enforcing consistent configuration across all users.

* Secure by Default: Cluster communication is automatically encrypted with TLS. All operations are authenticated with a configurable protocol, allowing you to use what makes sense for your organization.

* Flexible: The gateway is designed to support multiple backends, and runs equally well in the cloud as on-premise. Natively supports Kubernetes, Hadoop/YARN, and HPC Job Queueing systems.

* Robust to Failure: The gateway can be restarted or experience failover without losing existing clusters. Allows for seamless upgrades and restarts without disrupting users.

<h3>Architecture Overview</h3>
Dask Gateway is divided into three separate components:

Multiple active Dask Clusters (potentially more than one per user)

A Proxy for proxying both the connection between the user’s client and their respective scheduler, and the Dask Web UI for each cluster

A central Gateway that manages authentication and cluster startup/shutdown



In [None]:
# import dask, time, torch
from dask_gateway import Gateway
import time
import torch
gateway = Gateway( "http://10.107.108.18", auth="jupyterhub", )
options = gateway.cluster_options()

In [None]:
# create new cluster
cluster = gateway.new_cluster(options)

In [None]:
# make the cluster scalable
cluster.adapt(minimum=2, maximum=10)
# showcase the gateway cluster information
cluster

In [None]:
import dask.array as da

In [None]:
start_time = time.time()

In [None]:
# cluster array 
arr = da.random.random((1000, 1000), chunks=(1000,1000))
for _ in range(10000):
    arr += arr
elapsed_time = time.time() - start_time
print('Cluster time in seconds = ',elapsed_time)

# CPU array
cpu_a = torch.ones(1000,1000)
for _ in range(10000):
    cpu_a += cpu_a
elapsed_time = time.time() - start_time
print('CPU time in seconds = ',elapsed_time)

In [None]:
# make sure to shut down the cluster
cluster.shutdown()
print('Cluster is shutdown')