# Distributed Computation on Many Machines

## This notebook will not run in Binder!

Binder + Dask do not play well (because of `conda`).  You can read this notebook on Binder - to run it you will need to `$ pip install -r requirements-coiled.txt` and run locally.

## Outcomes

- overview of options for distributed compute in Python in 2022,
- demonstration of a AWS/Dask/Coiled/Prefect stack to distribute compute over a cluster on EC2.


## Why distribute compute over many machines?

Limit on the size of a single machine (largest instance on EC2 etc).

Many small machines can be cheaper & larger than the largest single machine.

Modern distributed compute platforms/environments will be fault tolerant to failures of individual workers - a single EC2 instance won't be.

## Ecosystems

Spark:

- accessing Scala code with Python bindings,
- Databricks is a modern way to run Spark.

[Ray](https://docs.ray.io/en/latest/index.html) & [Dask](https://docs.dask.org/en/stable/):

- distributed compute frameworks,
- DAGs for computation.

Tensorflow & PyTorch:

- multi-GPU training,
- accessing C++ code with Python bindings.

Plus more - Celery, lots of AWS Lambda...


## Our focus

A stack of Dask / Coiled / Prefect / EC2.

Requires two accounts - AWS account, Coiled account - Prefect account is optional. 


## Dask

Dask is an execution framework - one scheduler is responsible for executing many workers on many tasks.

<center><img src="../assets/dask.png" alt="Drawing" style="width: 600px;"/></center>

While Dask is a core part of this stack (it gives us concurrent computation - both parallelism + async), we will not write any low level Dask (or Dask DataFrame) code.


## Coiled

<center><img src="../assets/many-machine/f1.png" alt="Drawing" style="width: 600px;"/></center>

Manages AWS infrastructure for running Dask clusters on EC2:

- turns a `requirements.txt` into a *software environment* - Docker image with `pip install`,


## Prefect

Acts as a wrapper around Dask.  Prefect offers more functionality than just Dask execution:

- scheduling,
- monitoring,
- intelligent re-execution of pipelines (aka back-filling).

Prefect 2.0 is currently in beta (not yet production ready) - we are using Prefect 2.0.


# Prefect & Dask on a Single Machine

Let's start by writing the program from the last exercise of the previous notebook:

In [None]:
%%timeit -n 1 -r 1
!python ../src/naive.py

Now try with naive Prefect:

In [None]:
%%timeit -n 1 -r 1
!python ../src/naive_dask_prefect.py

Now let's use Prefect with `asyncio`:

In [None]:
%%timeit -n 1 -r 1
!python ../src/async_prefect.py

# Prefect & Dask Running on a Coiled Cluster (Many Machines)

<center><img src="../assets/many-machine/f2.png" alt="Drawing" style="width: 600px;"/></center>

Requires a few accounts to get setup:

- AWS account - cluster will run on EC2,
- Coiled account - adds & manages AWS infrastructure needed for a Dask cluster.

Stack:

- EC2,
- Dask,
- Prefect,
- Coiled.

Example of running on a Coiled cluster:

In [None]:
%%timeit -n 1 -r 1
!python ../src/dask_coiled_prefect.py

# Setting up the AWS/Dask/Coiled/Prefect stack

## AWS Setup

Pre-requisite is an AWS account.

First setup a new IAM user (below I call this user `coiled`) with programmatic access (key + secret key) - remember to download / copy your credentials to CSV!

We will use this user to manage & run the Coiled cluster on EC2.

Create IAM policies & AWS infrastructure so you can run Dask clusters in your AWS account.

[Coiled AWS setup](https://docs.coiled.io/user_guide/aws-cli.html). 

[Coiled IAM policies](https://docs.coiled.io/user_guide/aws_reference.html) - one is for setting up the IAM user (don't need if you are using credentials with admin access)

- create 2 IAM policies `coiled-setup` & `coiled-ongoing` from JSON,
- attach policies to your IAM user


## Coiled account setup

Create Coiled account - https://cloud.coiled.io/signup - add your credentials in *Cloud Provider*.

Or do the same thing via the shell - create Coiled API token https://cloud.coiled.io/profile:

<center><img src="../assets/many-machine/f3.png" alt="Drawing" style="width: 600px;"/></center>

```shell
$ pip install coiled
#  use token here
$ coiled login
$ coiled setup aws
```

Wasn't sure how to configure `region` with the browser *Cloud Provider*.

Now you can run the Dask example:

In [None]:
!python ../src/dask_coiled.py

## Optional - Adding Prefect Cloud

<center><img src="../assets/many-machine/f4.png" alt="Drawing" style="width: 600px;"/></center>


```shell
$ prefect cloud workspace set --workspace "adamgreenadgefficiencycom/kiwipycon-tutorial"
$ prefect cloud login -k $YOUR_PREFECT_API_KEY
```


## Exercise

1. Setup this Dask/Coiled stack on an EC2 cluster,
2. Add Prefect Cloud.