<img src="images/dask_horizontal.svg" width=250 />

# Flexible Cloud Computing with `client.run()`

Dask gives you a lot of flexibility. 

After pointing Dask to your remote cluster, any Dask code will automatically run on that cluster.

But you're not restricted to running *only* Dask code on your cluster. **You can also run custom Python code on your cluster.**

Let's take a look at the flexibility you can achieve with `client.run()`

## Launch Cloud Computing Resources

In [1]:
import coiled

In [2]:
cluster = coiled.Cluster(
    name="client-run",
    n_workers=5,
    package_sync=True,
)

Output()

## Connect Dask to Cluster

In [4]:
from distributed import Client
client = Client(cluster)

## 1. Do some Dask things

In [5]:
import dask.dataframe as dd

In [8]:
ddf = dd.read_parquet("s3://coiled-datasets/github-archive/github-archive-2015.parq/")
ddf.head()

Unnamed: 0,user,repo,created_at,message,author
0,soumith,soumith/fbcunn,2015-01-01T01:00:00Z,"back to old structure, except lua files moved out",Soumith Chintala
1,soumith,soumith/fbcunn,2015-01-01T01:00:00Z,...,Soumith Chintala
2,soumith,soumith/fbcunn,2015-01-01T01:00:00Z,...,Soumith Chintala
3,soumith,soumith/fbcunn,2015-01-01T01:00:00Z,...,Soumith Chintala
4,radix,radix/effect,2015-01-01T01:00:00Z,put the auto-generated API docs in the reposit...,Christopher Armstrong


In [7]:
ddf.groupby('user').count().head()

Unnamed: 0_level_0,repo,created_at,message,author
user,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1995parham,873,873,873,873
247321453,78,78,78,78
3DJakob,36,36,36,36
3ft9,13,13,13,13
501st-alpha1,451,451,451,451


## 2. Do some generic Python things

In [None]:
def create_txt_file(content):
    file = open('myfile.txt', 'w+')
    file.write(content)
    return file

In [None]:
client.run(create_txt_file, "Add some content to our file.")

In [None]:
def read_file(filename):
    file = open(filename, "r")
    return file.read()

In [None]:
client.run(read_file, "myfile.txt")

## 3. Do some system-level things

In [None]:
import os

In [None]:
client.run(os.getpid)

In [None]:
client.run(os.getpid, workers=[])

## 4. Do some Dask debugging

In [None]:
# get status of each worker in your cluster
def get_status(dask_worker):
    return dask_worker.status

In [None]:
client.run(get_status)

In [None]:
# find where each worker is spilling data to disk
client.run(lambda dask_worker: dask_worker.local_directory)

## Other `client.` functions you might find useful
The flexibility doesn't end with `client.run()`

Consider taking a look at:

`client.submit()`: to submit a function to Dask scheduler to be run asynchronously

`client.map()`: to map a function onto multiple objects

`client.scatter()`: to scatter data from local client into distributed memory

`client.upload_file`: to upload a single file or package (.zip, .egg, . ) to all workers

In [None]:
Docmentation.