# dask-cloudprovider 

The [dask-cloudprovider](https://cloudprovider.dask.org/en/latest/index.html) package can be used to launch Dask clusters on a variety of cloud providers, including [Azure](https://cloudprovider.dask.org/en/latest/azure.html).

First, install the required packages:

In [None]:
!pip install --upgrade "distributed dask-cloudprovider[azure]"

## Required setup

Currently, there is some required setup needed. This has been abstracted in a [setup script](setup.sh). This will create:

- an Azure resource group named "dask-cloudprovider"
- an Azure virtual network named "dask-vnet"
- an Azure network security group named "dask-nsg"
- an Azure network security group rule named "daskRule" to allow traffic to 8786-8787 from the Internet

There is [an open issue](https://github.com/dask/dask-cloudprovider/issues/190) to abstract this setup away in `dask-cloudprovider`. Run the setup script:

In [None]:
%%writefile setup.sh
#export ID=<your-subscription-id>
export RG="dask-cloudprovider"
export LOC="eastus"

az group create --location $LOC --name $RG --subscription $ID
az network vnet create -g $RG -n "dask-vnet" --subnet-name "default"
az network nsg create -g $RG -n "dask-nsg"
az network nsg rule create -g $RG --nsg-name "dask-nsg" -n "daskRule" --priority 500 --source-address-prefixes Internet --destination-port-ranges 8786 8787 --destination-address-prefixes "*" --access Allow --protocol Tcp --description "allow Internet to 8786-8787 for Dask"

In [None]:
!bash ./setup.sh

## Create the Dask cluster

Adjust the cell below to match watch you used when setting up your Azure resources.

In [None]:
location = "eastus"
resource_group = "dask-cloudprovider"
vnet = "dask-vnet"
security_group = "dask-nsg"

vm_size = "Standard_DS5_v2"

from distributed import Client
from dask_cloudprovider.azure import AzureVMCluster

cluster = AzureVMCluster(
    location=location,
    resource_group=resource_group,
    vnet=vnet,
    security_group=security_group,
    vm_size=vm_size,
)
c = Client(cluster)
c

## Scaling

There are some known issues with the `AzureVMCluster`:

- [#187](https://github.com/dask/dask-cloudprovider/issues/187): VM creation is serial, resulting in slow (and potentially costly) scaling
- an entire VM is used to the scheduler instance, which needs to be optimized

You can either manually scale the cluster, or enable auto-scaling. TODO: add details.

In [None]:
%%time
cluster.scale(4)
c.wait_for_workers(4)

## Close the cluster

When you're done, close the cluster to cleanup all VM and related resources. 

In [None]:
cluster.close()
c.close()

## Use GPUs for PyData and ML

The [RAPIDSAI](https://github.com/rapidsai) ecosystem mirrors PyData APIs from Pandas, Numpy, Scikit-Learn, etc. for acceleration on multiple GPU nodes via Dask.

In [None]:
location = "eastus"
resource_group = "dask-cloudprovider"
vnet = "dask-vnet"
security_group = "dask-nsg"

vm_size = "Standard_NC12s_v3"
docker_image = "rapidsai/rapidsai:cuda11.0-runtime-ubuntu18.04-py3.8"
worker_class = "dask_cuda.CUDAWorker"

from distributed import Client
from dask_cloudprovider.azure import AzureVMCluster

cluster = AzureVMCluster(
    location=location,
    resource_group=resource_group,
    vnet=vnet,
    security_group=security_group,
    vm_size=vm_size,
    docker_image=docker_image,
    worker_class=worker_class,
)
c = Client(cluster)
c

In [None]:
%%time
cluster.scale(2)
c.wait_for_workers(2)

In [None]:
cluster.close()
c.close()

## (Optional) Delete resource group

Optionally, delete the resource group and virtual network.

In [None]:
#!az group delete -n "dask-cloudprovider" -y --no-wait