# ML Inference on Satellite Imagery with PyTorch+Julia (unreleased)

In this notebook, we will walk through an example of using BanyanArrays to
run a machine learning model, created in PyTorch (Python), on a collection of
satellite images retrieved through a public API. See `model.ipynb` for the model
definition in PyTorch (Python).

We acknowledge the use of imagery provided by services from NASA's Global Imagery Browse Services (GIBS), part of NASA's Earth Observing System Data and Information System (EOSDIS).



## Configuring

We will use Banyan to perform some data analysis on the Iris Dataset. To run this
example, please ensure that you have set up your Banyan account.

Run the first cell below to import `Banyan` and `BanyanDataFrames`.
To configure your AWS credentials, run the second cell below and provide your
AWS credentials when prompted. Banyan does not save your AWS credentials, but
they are needed so that you can run your computation in your AWS account.
Finally, run the third cell below to set your Banyan credentials and configure
Banyan.

You must pass your User ID and API Key to the `configure` function in order
to authenticate. You can find this information on the Account page of the
Banyan Dashboard. After running this cell, your credentials will be saved
in `$HOME/.banyan/banyanconfig.toml` and will be read from that file in the
future. This means that you only need to run this cell once.

In [None]:
# Import libraries
using AWSS3
using Banyan
using BanyanArrays
using BanyanImages
using BanyanONNXRunTime
using ImageCore
using IterTools

In [None]:
# Run this cell to configure the AWS CLI. When prompted, specify your AWS
# credentials for the AWS account that you connected with Banyan. If you have
# already configured the AWS CLI with the credentials for the account you have
# configured with your Banyan account, you can skip this step.

print("Enter AWS_ACCESS_KEY_ID: \n"); sleep(1)
ENV["AWS_ACCESS_KEY_ID"] = readline()
print("Enter AWS_SECRET_ACCESS_KEY: \n"); sleep(1)
ENV["AWS_SECRET_ACCESS_KEY"] = readline()
print("Enter AWS_DEFAULT_REGION: \n"); sleep(1)
ENV["AWS_DEFAULT_REGION"] = readline()

print("AWS is now configured.")

In [None]:
# Run this cell to configure Banyan. When prompted, provide your user ID and API
# key. You can find these on the Account page of your Banyan dashboard.
# If you have already configured Banyan, you can skip this step.

print("Please enter your User ID: \n"); sleep(1)
user_id = readline()
print("Please enter your API Key: \n"); sleep(1)
api_key = readline()

# Configures Banyan client library with your Banyan credentials
configure(user_id=user_id, api_key=api_key)
print("Banyan is now configured.")

## Creating a cluster

For this example, you will need a Banyan cluster. You can either use an existing
cluster or create a new cluster. Run the following code block and enter in either
the name of an existing cluster or the name you would like to use for a new cluster.

If you already have a cluster, you should specify its name, when prompted.

If you would like to instead create a new cluster, provide a name and the name
of the [Amazon EC2 key pair](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#having-ec2-create-your-key-pair) that you created during [Banyan setup](https://www.banyancomputing.com/creating-clusters).

In the cell below, you can change `instance_type` to create a cluster with a
different EC2 instance type that may have a larger amount of memory or workers.
See the documentation [here](https://www.banyancomputing.com/banyan-jl-docs/create-cluster/) for the other parameters for creating a cluster.

In [None]:
print("Cluster name for existing cluster or new cluster: \n"); sleep(1)
cluster_name = readline()
println(cluster_name)

clusters = get_clusters()
println("You have $(length(clusters)) clusters")
if !(haskey(clusters, cluster_name) && clusters[cluster_name].status == :running)
    println("Creating new cluster $(cluster_name)")
    print("Name of SSH EC2 Key Pair: \n"); sleep(1)
    ec2_key_pair_name = readline()
    println(ec2_key_pair_name)
    create_cluster(
        name=cluster_name,
        instance_type="t3.2xlarge",
        initial_num_workers=2,
        ec2_key_pair_name=ec2_key_pair_name
    )
else
    println("Using existing cluster $(cluster_name)")
end
get_cluster(cluster_name)

## Upload model to S3

In order to run inference using a model, the model should be either uploaded to
Amazon S3 or should be on the Internet. In this example, we have a model locally,
and we will upload it to the cluster's S3 bucket.

In [None]:
using AWSS3
using FilePathsBase

# Upload the model to S3
model_save_path = "image_compression_model.onnx"
s3_bucket_name = get_cluster_s3_bucket_name(cluster_name)

cp(
    Path(model_save_path),
    S3Path("s3://$s3_bucket_name/$model_save_path", config=Banyan.get_aws_config())
)

## Start a session

In order to perform any computation, we need to allocate resources on the cluster
to perform the computation. To do this, create a job with a specified number
of workers. The number of workers is correlated to the amount of parallelism
and speedup you can get. The more workers, the more parallelized your computation
can be and potentially the faster it can run. For this example, 2 workers should
be sufficient.

In [None]:
# Start a session
session_id = start_session(
    cluster_name = cluster_name,
    nworkers = 20,
    print_logs = false,  # Toggle this if you want to view job output printed here in the notebook
    store_logs_in_s3 = true,  # Toggle this if you do not want job logs to be saved
)

## Running inference on data loaded from the Internet

Run a model on each image to compress it to a vector encoding of length 10.

In [None]:
# Get model path
model_path = "https://github.com/banyan-team/banyan-julia/raw/v22.02.13/BanyanONNXRunTime/test/res/image_compression_model.onnx"

# Load model
model = BanyanONNXRunTime.load_inference(model_path, dynamic_axis=true)

# Load data
files = (  # 100
    IterTools.product(1:10, 1:10),
    (i, j) -> "https://gibs.earthdata.nasa.gov/wmts/epsg4326/best/MODIS_Terra_CorrectedReflectance_TrueColor/default/2012-07-09/250m/6/$i/$j.jpg"
)
data = BanyanImages.read_jpg(files; add_channelview=true)  # Specify `add_channelview` to add a dimension for the RGB channels
data = BanyanArrays.map(x -> float(x), data)

In [None]:
get_cluster_s3_bucket_name()

In [None]:
# Call model on data
res = model(Dict("input" => data))["output"]

# Create path in S3 to store the encodings
offloaded() do 
    bucket = readdir("s3")[1]
    mkpath("s3/$bucket/encodings/")
end

# Write each image encoding to a different file in Amazon S3
res_vecs = mapslices(img -> [img], res, dims=2)[:]
bc = BanyanArrays.collect(1:length(res_vecs))
res = map(res_vecs, bc) do img_vec, i
    if isdir("s3")
        bucket = readdir("s3")[1]
        write("s3/$bucket/encodings/part$i.txt", string(img_vec))
    end
    0
end

compute_inplace(res)

In [None]:
end_session(release_resources_now=true)