# Rainwater Flow Simulation

In this notebook, we're going to use MPI to evaluate a range of parameters for
a linear rainfall-runoff model in parallel on a Banyan computing session. Banyan
automatically sets up cloud computing resources in your own Virtual Private Cloud,
sets up MPI and Python, manages your software environment, lets you run with any
number of workers, and scales down when you're finished. All your estimated costs,
running clusters, sessions, and logs can be viewed on
the [Banyan dashboard](https://www.banyancomputing.com/dashboard).

To run this notebook, you will need to have set up an account with Banyan. Check
out the documentation [here](https://www.banyancomputing.com/getting-started/)
for step by step instructions.

## Configuring

To run this example, please ensure that you have set up your Banyan account.

Run the first cell below to import `banyan`.

To configure your AWS credentials, run the second cell below and provide your
AWS credentials when prompted. Banyan does not save your AWS credentials, but
they are needed so that you can run your computation in your AWS account.
Finally, run the third cell below to set your Banyan credentials and configure
Banyan.

You must pass your User ID and API Key to the `configure` function in order
to authenticate. You can find this information on the Account page of the
Banyan Dashboard. After running this cell, your credentials will be saved
in `$HOME/.banyan/banyanconfig.toml` and will be read from that file in the
future. This means that you only need to run this cell once.

In [None]:
# Import packages
import banyan
from banyan import configure
from banyan import create_cluster, destroy_cluster, get_cluster, get_clusters
from banyan import run_session, end_session

import getpass
import urllib.request

In [None]:
# Run this cell to configure the AWS CLI. When prompted, specify your AWS
# credentials for the AWS account that you connected with Banyan. If you have
# already configured the AWS CLI with the credentials for the account you have
# configured with your Banyan account, you can skip this step.

import os

os.environ["AWS_ACCESS_KEY_ID"] = getpass.getpass(prompt="Enter AWS_ACCESS_KEY_ID\n")
os.environ["AWS_SECRET_ACCESS_KEY"] = getpass.getpass(prompt="Enter AWS_SECRET_ACCESS_KEY\n")
os.environ["AWS_DEFAULT_REGION"] = getpass.getpass(prompt="Enter AWS_DEFAULT_REGION\n")

print("AWS is now configured.")

In [None]:
# Run this cell to configure Banyan. When prompted, provide your user ID and API
# key. You can find these on the Account page of your Banyan dashboard.
# If you have already configured Banyan, you can skip this step.

user_id = getpass.getpass(prompt="Please enter your User ID\n")
api_key = getpass.getpass(prompt="Please enter your API Key\n")

# Configures Banyan client library with your Banyan credentials
configure(user_id=user_id, api_key=api_key)
print("Banyan is now configured.")

## Creating a cluster

For this example, you will need a Banyan cluster. You can either use an existing
cluster or create a new cluster. Run the following code block and enter in either
the name of an existing cluster or the name you would like to use for a new cluster.

If you already have a cluster, you should specify its name, when prompted.

If you would like to instead create a new cluster, provide a name and the name
of the [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair) that you created during [Banyan setup](https://www.banyancomputing.com/creating-clusters).

In the cell below, you can change `instance_type` to create a cluster with a
different EC2 instance type that may have a larger amount of memory or workers.
See the documentation [here](https://www.banyancomputing.com/banyan-py-docs/create-cluster/) for the other parameters for creating a cluster.

In [None]:
cluster_name = input("Cluster name for existing cluster or new cluster ")
print(f"Checking if cluster {cluster_name} exists...")

clusters = get_clusters()
print(f"You have {len(clusters)} clusters")
if not ((cluster_name in clusters) and (clusters[cluster_name].status == "running")):
    print(f"Creating new cluster {cluster_name}")
    ec2_key_pair_name = getpass.getpass(prompt="Name of SSH EC2 Key Pair")
    print(f"Using EC2 key pair {ec2_key_pair_name}")
    create_cluster(
        name=cluster_name,
        instance_type="t3.2xlarge",
        initial_num_workers=2,
        ec2_key_pair_name=ec2_key_pair_name
    )
else:
    print(f"Using existing cluster {cluster_name}")
get_cluster(cluster_name)

## Running the Simulation

We have a script in `simulation.py` which uses mpi4py (the commonly-used Python package that provides MPI bindings). This script runs multiple linear rainfall-runoff models using different parameters in parallel on different workers. To run this script on a session of multiple workers running in parallel, we call `run_session` and pass in the names of one or more `.py` files into the `code_files` to run.

In [None]:
# Download a file required for the simulation
urllib.request.urlretrieve("https://gist.githubusercontent.com/jdherman/252be34ced79dc42dc2300f227c2af29/raw/5d58fff0c47a6c1f78f0523488e4b946af507338/leaf-river-data.txt", "leaf-river-data.txt")

In [None]:
run_session(
    cluster_name=cluster_name, 
    session_name="rainfall_model", 
    nworkers=2,
    files=[
        "file://stockflow.py",
        "leaf-river-data.txt"
    ],
    code_files=["file://run_simulation.py"],
    print_logs=True
)

## Clean up resources

To destroy the cluster entirely (note that it will take 10-30 min to recreate the cluster), run the following:

In [None]:
# Destroy the cluster
destroy_cluster(cluster_name)