# PyWren RISECamp, 2018

Welcome to the hands-on tutorial for PyWren.

This tutorial consists of a set of exercises that will have you working directly with PyWren:
- basic exercises that introduce you to PyWren APIs (covered in this notebook)
- data analysis on a wikipedia dataset (see [analyze-wikipedia.ipynb](../analyze-wikipedia.ipynb))
- matrix multiplication with PyWren (see [matrix-computations-advanced.ipynb](../matrix-computations-advanced.ipynb))
- hyperparameter optimization (see [hyperparameter-optimization.ipynb](../hyperparameter-optimization.ipynb))


A couple of notes before you dive into the actual tutorials:
- To run a code cell: select the cell, click Cell -> Run Cells or use Ctrl + Enter.
- ***Execute*** indicates that the following code cell just works as given. Make sure to run them.
- ***Exercise*** indicates an incomplete/broken code cell. Modify the code to make them work.
- You can find solutions for the exercises [here](./)





## Introduction to PyWren


For this tutorial, we have already installed PyWren in the docker container where this jupyter notebook is running.
PyWren provides a command line tool that provides basic functionalities for creating AWS IAM roles, configuring PyWren environment, deploying/updating Lambda functions, etc. We have also done that for you.  
Before we go into the exercises, let's use the command line tool to test if PyWren works properly. 

***Execute*** the cell below ().  
If PyWren is correctly installed, you should see *`function returned: Hello world`* after a few seconds.

In [None]:
!pywren test-function

The above command essentially invokes a PyWren task that executes on AWS Lambda. The task simply returns `Hello world` back to our PyWren host. We'll show you how to do exactly that in a minute.

First and foremost, let's create a PyWren **Executor** that we will use throughput this notebook.

***Execute*** the following code to create an PyWren Executor.

In [None]:
import pywren
pwex = pywren.default_executor()

## 1. call_async() -- PyWren's single invocation API

A PyWren Executor exposes two main APIs for remote execution, the first one being ***call_async()***, which runs a single PyWren task on AWS Lambda. `call_async()` takes two parameters: a user-provided function and a parameter for the function, i.e., `call_async(func, param)`.  
Once called, it returns a ***future*** object that allows you to query the task status, get ***future.result()***, etc. 

***Exercise:*** Complete line **7** (fill in correct parameters) and line **9** for the code below to get correct result.

In [None]:
# this is the user-defined function that we will pass to call_async()
def hello_world(param):
    if param == 42:
        return "Hello world!"

# call_async(func, parameter)
future = pwex.call_async(hello_world, 42)
# call future.result() to fetch output of execution
result = future.result()

# check if result is correct
assert result == 'Hello world!'
print("Success!")

## 2. map() -- parallel execution in the cloud
The above example executes a single function once in the cloud. This is pretty neat, but pywren really shines when we want to run functions multiple times in parallel.
To do this, we can use PyWren executor's second API: ***map()***. `map()` allows users to call a function over multiple parameters, just like the `map` Python API.  
In Python, the `map()` API applies the same function to each item in an iterable. The returned object can then be passed to `list()` or `set()` to obtain the results.

***Execute*** the following code to see how Python's native `map()` works.

In [None]:
param_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
def square(param):
    return param * param

map_object = map(square, param_list)
results_with_python_map = list(map_object)
print(results_with_python_map)

PyWren Executor's `map()` API is not much different, except now the passed function runs on a cloud service.  
***Exercise:*** Complete the call to `pywx.map()` by filling appropriate parameters.

In [None]:
futures = pwex.map(square, param_list)
results_with_pywren_map = [f.result() for f in futures]

assert results_with_pywren_map == results_with_python_map
print("success")

One caveat above is that calls to `result()` are run serially, as `result()` blocks on a task to finish. This can be inefficient with a large number of parallel tasks. In PyWren, we provide a convenient API ***pywren.wait()*** to wait on all tasks to finish.  
***Execute*** the code below to see how `wait()`works.

In [None]:
pywren.wait(futures)
results_with_pywren_map = [f.result() for f in futures]

assert results_with_pywren_map == results_with_python_map
print("success")

Because the tasks in the futures have been executed before, the above code should finish immediately.  
We also have ***pywren.get_all_results()***, which is just a convenient way to do `wait()` and fetch results all together.  
***Execute*** the code below to see how `get_all_results()`works

In [None]:
results_with_pywren_map = pywren.get_all_results(futures)
assert results_with_pywren_map == results_with_python_map
print("success")

## 3. multiple jobs

Putting things together, we can use `map()` to execute a function over an iterable of parameters in parallel in cloud.
Then we can call `pywren.get_all_results()` to fetch all results.
Because `map()` returns immediately after all tasks are invoked. We can switch to other work before calling and blocking on `pywren.get_all_results()` for the results. This also includes invoking another PyWren job.

In the code below, we want to verify the distributive law of matrix-vector multiplication, i.e., A(x+y) = Ax + Ay. To do that, we invoke two PyWren jobs, one computing 50 instances of A(x+y) and the other computing 50 instances of Ax + Ay. As we pass the same random seeds to the jobs, results returned by the two jobs should be same, according to the distributive law holds.

***Exercise:*** Update the return statement for `multiply_1` and `multiply_2` to complete the verification program.

In [None]:
import numpy as np

def multiply_1(seed):
    np.random.seed(seed)
    A = np.random.normal(0, 1, (1024, 131072))
    x = np.random.normal(0, 1, 131072)
    y = np.random.normal(0, 1, 131072)
    # compute A * (x+y)
    return np.dot(A, x+y)

def multiply_2(seed):
    np.random.seed(seed)
    A = np.random.normal(0, 1, (1024, 131072))
    x = np.random.normal(0, 1, 131072)
    y = np.random.normal(0, 1, 131072)
    # compute Ax + Ay
    return np.dot(A, x) + np.dot(A, y)

futures_1 = pwex.map(multiply_1, range(50))
futures_2 = pwex.map(multiply_2, range(50))

results_1 = pywren.get_all_results(futures_1)
results_2 = pywren.get_all_results(futures_2)

assert np.all(np.isclose(np.stack(results_1) , np.stack(results_2)))
print("success")

## 4. Visualization and Debugging

### Plotting
You have probably been wondering where time is spent during a PyWren job. Here we provide a method to plot the execution graph of a PyWren job for you. We'll just reuse data from the maxtrix multiplication exercise as an example.

***Execute*** the plotting code below.

In [None]:
# load the plotting method
from training import plot_pywren_execution

plot_pywren_execution(futures_1 + futures_2)

You can see that the tasks are submitted in two batches. Each batch belongs to one PyWren job. The two jobs are indeed running in parallel! 

### Logging

For the advanced exercises, you might find yourself waiting for a large PyWren to finish and would like to know job progress. To do that, you can run `pywren.wrenlogging.default_config("INFO")` **before** running a job to get detailed logs during execution. To disable detailed logs, switch the logging info back to `ERROR` by running `pywren.wrenlogging.default_config("ERROR")`.


***Execute*** code below to observe logs.

In [None]:
# switch logging level to INFO
pywren.wrenlogging.default_config("INFO")

futures = pwex.map(square, param_list)
results_with_pywren_map = [f.result() for f in futures]

assert results_with_pywren_map == results_with_python_map
print("success")

# switch logging level back to ERROR
pywren.wrenlogging.default_config("ERROR")

Advanced users might also find it useful to check the logs of remote execution. You can use `pywren print-latest-logs` to print latest logs of Lambda execution.

***Execute*** the command below to observe remote logs.

In [None]:
!pywren print-latest-logs

Congratulations! This concludes our introduction exercises.

Now it is time to try out more challanges exercises with PyWren! Click on any one of them to continue:

- data analysis on a wikipedia dataset (see [analyze-wikipedia.ipynb](../analyze-wikipedia.ipynb))
- matrix multiplication with PyWren (see [matrix-computations-advanced.ipynb](../matrix-computations-advanced.ipynb))
- hyperparameter optimization (see [hyperparameter-optimization.ipynb](../hyperparameter-optimization.ipynb))

