In [None]:
#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the "License")

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# 🌦️ Weather forecasting

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GoogleCloudPlatform/python-docs-samples/blob/main/people-and-planet-ai/weather-forecasting/README.ipynb)

This notebook leverages geospatial satellite and precipitation data from [Google Earth Engine](https://earthengine.google.com/). Using satellite imagery, you'll build and train a model for rain "nowcasting" i.e. predicting the amount of rainfall for a given geospatial region and time in the immediate future.

* ⏲️ **Time estimate**: 1 hour
* 💰 **Cost estimate**: less than $1.00 USD

💚 This is one of many **machine learning how-to samples** inspired from **real climate solutions** aired on the [People and Planet AI 🎥 series](https://www.youtube.com/playlist?list=PLIivdWyY5sqI-llB35Dcb187ZG155Rs_7).

## 📒 Using this interactive notebook

Click the **run** icons ▶️ of each section within this notebook.

![Run cell](images/run-cell.png)

> 💡 Alternatively, you can run the currently selected cell with `Ctrl + Enter` (or `⌘ + Enter` in a Mac).

This **notebook code lets you train and deploy an ML model** from end-to-end. When you run a code cell, the code runs in the notebook's runtime, so you're not making any changes to your personal computer.

> ⚠️ **To avoid any errors**, wait for each section to finish in their order before clicking the next “run” icon.

This sample must be connected to a **Google Cloud project**, but nothing else is needed other than your Google Cloud project.

You can use an _existing project_. Alternatively, you can create a new Cloud project [with cloud credits for free.](https://cloud.google.com/free/docs/gcp-free-tier)

## 🛸 Steps summary

This notebook is friendly for _beginner_, _intermediate_, and _advanced_ users of geospatial, data analytics and machine learning.

Here's a quick summary of what you’ll go through:

1. **📚 Understand the data**:
  Go through what we want to achieve and explore the data we want to use as _inputs and outputs_ for our model.

1. **🗄 Create the datasets** _(~30 minutes, costs [a few cents](https://cloud.google.com/dataflow/pricing))_:
  Learn to use
  [Apache Beam](https://beam.apache.org/)
  to fetch data from
  [Earth Engine](https://earthengine.google.com/)
  in parallel, and create a dataset for our model in
  [Dataflow](https://cloud.google.com/dataflow).

1. **🧠 Train the model** _(~10 minutes, costs [a few cents](https://cloud.google.com/vertex-ai/pricing#custom-trained_models))_:
  Build a simple _Fully Convolutional Network_ in
  [PyTorch](https://pytorch.org/)
  and train it in
  [Vertex AI](https://cloud.google.com/vertex-ai/docs/training/custom-training)
  with the dataset we created.

1. **🔮 Model predictions**:
  Get predictions of data the model has never seen before.

  * 💻 **Local predictions** _(~5 minutes)_: Get predictions directly in this notebook without hosting the model.

  * ☁️ **Cloud Run predictions** _(~10 minutes, [cost covered by free tier](https://cloud.google.com/run/pricing))_: Host the model in [Cloud Run](https://cloud.google.com/run), an easy to use and scalable serverless web service.

1. (Optional) **🛠 Delete the project** to avoid ongoing costs.

# 🎬 Before you begin

We first need to install all the requirements for the sample residing in GitHub.

The sample contains a
[`requirements.txt`](requirements.txt)
file with all the dependencies we need to install, so let's use that.

In [1]:
repo_url = "https://raw.githubusercontent.com/davidcavazos/python-docs-samples/ppai-weather-forecasting/people-and-planet-ai/weather-forecasting"
!wget --quiet {repo_url}/requirements.txt

> 💡 For more information about the `requirements.txt` file, see the [`pip` user guide](https://pip.pypa.io/en/stable/user_guide/).

In [None]:
# Update pip to use the latest version.
!pip install --quiet --upgrade pip

In [None]:
# Install the dependencies.
!pip install --quiet -r requirements.txt shapely==1.8.5

> **🛑 Restart the runtime 🛑**

Colab already comes with many dependencies pre-loaded.
In order to ensure everything runs as expected, we **_must_ restart the runtime**. This allows Colab to load the latest versions of the libraries.

!["Runtime" > "Restart runtime"](images/restart-runtime.png)

In [4]:
# Restart the runtime by ending the process.
exit()

# ☁️ My Google Cloud resources

First, choose the Google Cloud _location_ where you want to run this sample.
A good place to start is by choosing your [Google Cloud location](https://cloud.google.com/compute/docs/regions-zones).

> ⚠️ Make sure you choose a location
> available for all products: [Cloud Storage](https://cloud.google.com/storage/docs/locations),
> [Vertex AI](https://cloud.google.com/vertex-ai/docs/general/locations),
> [Dataflow](https://cloud.google.com/dataflow/docs/resources/locations), and
> [Cloud Run](https://cloud.google.com/run/docs/locations).

> 💡 Prefer locations that are geographically closer to you with
> [low carbon emissions](https://cloud.google.com/sustainability/region-carbon), highlighted with the
> ![Leaf](https://cloud.google.com/sustainability/region-carbon/gleaf.svg) icon.

Make sure you have followed these steps to configure your Google Cloud project:

1. Enable the APIs: _Dataflow, Earth Engine, Vertex AI, and Cloud Run_

  <button>

  [Click here to enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataflow.googleapis.com,earthengine.googleapis.com,aiplatform.googleapis.com,run.googleapis.com)
  </button>

1. Create a Cloud Storage bucket in your desired _location_.

  <button>

  [Click here to create a new Cloud Storage bucket](https://console.cloud.google.com/storage/create-bucket)
  </button>

1. Register your
  [Compute Engine default service account](https://console.cloud.google.com/iam-admin/iam)
  on Earth Engine.

  <button>

  [Click here to register your service account on Earth Engine](https://signup.earthengine.google.com/#!/service_accounts)
  </button>

Once you have everything ready, you can go ahead and fill in your Google Cloud resources in the following code cell.
Make sure you run it!

In [None]:
from __future__ import annotations

import os
from google.colab import auth

# Please fill in these values.
project = "" #@param {type:"string"}
bucket = "" #@param {type:"string"}
location = "us-central1" #@param {type:"string"}

# Quick input validations.
assert project, "⚠️ Please provide a Google Cloud project ID"
assert bucket, "⚠️ Please provide a Cloud Storage bucket name"
assert not bucket.startswith('gs://'), f"⚠️ Please remove the gs:// prefix from the bucket name: {bucket}"
assert location, "⚠️ Please provide a Google Cloud location"

# Authenticate to Colab.
auth.authenticate_user()

# Set GOOGLE_CLOUD_PROJECT for google.auth.default().
os.environ['GOOGLE_CLOUD_PROJECT'] = project

# Set the gcloud project for other gcloud commands.
!gcloud config set project {project}

In [None]:
# Now let's get the code from GitHub and navigate to the sample.
!git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
%cd python-docs-samples/people-and-planet-ai/land-cover-classification

Next, we have to authenticate Earth Engine and initialize it.
Since we've already authenticated to this [Colab](https://www.youtube.com/watch?v=rNgswRZ2C1Y) and saved them as the [Google default credentials](https://google-auth.readthedocs.io/en/master/reference/google.auth.html#google.auth.default),
we can reuse those credentials for Earth Engine.

> 💡 Since we're making **large amounts of automated requests to Earth Engine**, we want to use the
[high-volume endpoint](https://developers.google.com/earth-engine/cloud/highvolume).

In [202]:
#@title 🛑 Set up resources (DELETE)
from __future__ import annotations

import os
from google.colab import auth

project = "dcavazos-lyra"
bucket = "dcavazos-lyra"
location = "us-central1"

auth.authenticate_user()
os.environ['GOOGLE_CLOUD_PROJECT'] = project
!gcloud config set project {project}

%cd /content
!rm -rf python-docs-samples

# TODO: REPLACE WITH URL FROM UPSTREAM MAIN
# !git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git
!git clone -b ppai-weather-forecasting https://github.com/davidcavazos/python-docs-samples.git
%cd python-docs-samples/people-and-planet-ai/weather-forecasting

import importlib
import create_dataset
import trainer
import trainer.task
import serving
import serving.data
import visualize
importlib.reload(create_dataset)
importlib.reload(trainer)
importlib.reload(trainer.task)
importlib.reload(serving)
importlib.reload(serving.data)
importlib.reload(visualize)
None

Updated property [core/project].
/content
Cloning into 'python-docs-samples'...
remote: Enumerating objects: 85729, done.[K
remote: Counting objects: 100% (180/180), done.[K
remote: Compressing objects: 100% (115/115), done.[K
remote: Total 85729 (delta 85), reused 138 (delta 61), pack-reused 85549[K
Receiving objects: 100% (85729/85729), 159.44 MiB | 26.03 MiB/s, done.
Resolving deltas: 100% (51306/51306), done.
Checking out files: 100% (3620/3620), done.
/content/python-docs-samples/people-and-planet-ai/weather-forecasting


In [7]:
import ee
import google.auth

credentials, _ = google.auth.default()
ee.Initialize(
    credentials.with_quota_project(None),
    project=project,
    opt_url="https://earthengine-highvolume.googleapis.com",
)

# 📚 Understand the data

The goal of our model is using satellite images to do _weather forecasting_.
Specifically, we want to predict the amount of rainfall, measured in millimeters per hour, for the next two to six hours in the future.
This kind of short term forecasting is called [weather _nowcasting_](https://en.wikipedia.org/wiki/Nowcasting_(meteorology)).

When working with satellite data, each image has the shape `(width, height, bands)`.
**Bands** contain _numeric values_ for each pixel in the image, like the measurements from specific satellite instruments for different ranges of the electromagnetic spectrum, or the probabilities of different classifications.
If you're familiar with image classification problems, you can think of the bands as similar to an image's RGB channels.

## ☔️ Precipitation

We use [NASA's Global Precipitation Measurement (GPM)](https://developers.google.com/earth-engine/datasets/catalog/NASA_GPM_L3_IMERG_V06) to get the amount of _precipitation_ of rain and snow, measured as millimeters per hour.
We're interested in the `precipitationCal` band, which gives us the _calibrated_ precipitation amount.

This is what we want to predict, so we'll use them for our _labels_.
But it's also useful for the model to look at the precipitation from the _past_, so we'll also use it as _inputs_.

In the [`serving/data.py`](serving/data.py) file, we defined a function called `get_gpm_sequence` which returns us an `ee.Image` with the precipitation values for the time sequence we give it.
Each time step is stored in a different band with the index as a prefix.
For example, the band corresponding to the first time step in the sequence would be `0_precipitationCal`, and the second time step would be `1_precipitationCal`.

In [None]:
from datetime import datetime
import folium
import ee
from serving.data import get_gpm_sequence

def gpm_layer(image: ee.Image, label: str, i: int) -> folium.TileLayer:
  vis_params = {
      "bands": [f"{i}_precipitationCal"],
      "min": 0.0,
      "max": 20.0,
      "palette": [
          '000096', '0064ff', '00b4ff', '33db80', '9beb4a',
          'ffeb00', 'ffb300', 'ff6400', 'eb1e00', 'af0000',
      ],
  }
  # Mask (hide) pixels with no precipitation to see the map below.
  image = image.mask(image.gt(0.1))
  return folium.TileLayer(
      name=f"[{label}] Precipitation",
      tiles=image.getMapId(vis_params)["tile_fetcher"].url_format,
      attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
      overlay = True,
  )

# Get the Earth Engine images.
dates = [datetime(2019, 9, 2, 18)]
image = get_gpm_sequence(dates)

# Show map.
map = folium.Map([25, -90], zoom_start=5)
for i, date in enumerate(dates):
  gpm_layer(image, str(date), i).add_to(map)
folium.LayerControl().add_to(map)
map

![Global Precipitation Measurement (GPM)](images/gpm.png)

> 💡 This is [Hurricane Dorian](https://en.wikipedia.org/wiki/Hurricane_Dorian), the strongest Category 5 hurricane on record in the Bahamas.

## 🌨 Cloud and moisture

To predict precipitation, it's also useful to take a look at the _cloud_ and _moisture_.
We use data from [GOES-16 Cloud and Moisture Imagery](https://developers.google.com/earth-engine/datasets/catalog/NOAA_GOES_16_MCMIPF), which was the first satellite from the [Geostationary Operational Environmental Satellites (GOES)](https://en.wikipedia.org/wiki/Geostationary_Operational_Environmental_Satellite) mission, operated by [NASA](https://en.wikipedia.org/wiki/NASA) and [NOAA](https://en.wikipedia.org/wiki/National_Oceanic_and_Atmospheric_Administration).
It includes measurements from the _visible_, _near-infrared_, and _infrared_ spectrum.
It is a [geostationary](https://en.wikipedia.org/wiki/Geostationary_orbit) satellite, so its orbit is synchronized with the Earth's rotation, and it provides a view centered in the Americas.

In the [`serving/data.py`](serving/data.py) file, we defined a function called `get_goes16_sequence` which returns us an `ee.Image` with the cloud and moisture data for the time sequence we give it.

In [None]:
from datetime import datetime
import folium
import ee
from serving.data import get_goes16_sequence

def goes16_layer(image: ee.Image, label: str, i: int) -> folium.TileLayer:
  vis_params = {
      "bands": [f"{i}_CMI_C02", f"{i}_CMI_C03", f"{i}_CMI_C01"],
      "min": 0.0,
      "max": 3000.0,
  }
  return folium.TileLayer(
      name=f"[{label}] Cloud and moisture",
      tiles=image.getMapId(vis_params)["tile_fetcher"].url_format,
      attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
      overlay = True,
  )

# Get the Earth Engine image.
dates = [datetime(2019, 9, 2, 18)]
image = get_goes16_sequence(dates)

# Show map.
map = folium.Map([25, -90], zoom_start=5)
for i, date in enumerate(dates):
  goes16_layer(image, str(date), i).add_to(map)
folium.LayerControl().add_to(map)
map

![GOES 16](images/goes16.png)

## 🏔 Elevation

Elevation could also give the model useful information.
We use the [MERIT Terrain DEM](https://developers.google.com/earth-engine/datasets/catalog/MERIT_DEM_v1_0_3) dataset to get the elvation.

In the [`serving/data.py`](serving/data.py) file, we defined a function called `get_elevation` which returns us an `ee.Image` with the elevation measured in meters.

In [None]:
import folium
from serving.data import get_elevation

def elevation_layer() -> folium.TileLayer:
  image = get_elevation()
  vis_params = {
      "bands": ["elevation"],
      "min": 0.0,
      "max": 3000.0,
      "palette": ['000000', '478FCD', '86C58E', 'AFC35E', '8F7131', 'B78D4F', 'E2B8A6', 'FFFFFF']
  }
  return folium.TileLayer(
      name="Elevation",
      tiles=image.getMapId(vis_params)["tile_fetcher"].url_format,
      attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
      overlay = True,
  )

# Show map.
map = folium.Map([25, -90], zoom_start=5)
elevation_layer().add_to(map)
folium.LayerControl().add_to(map)
map

![Elevation](images/elevation.png)

## 🛰 Inputs

In this example, we also consider multiple images across time, since weather forecasting is more accurate when we look at how the cloud cover changes over a period of time.
Particularly, we consider 3 data points - 4 hours prior, 2 hours prior, and current.

> 💡 To give the model a better picture, we chose to feed it with _at least three_ data points from the past.
> With only a single point, the model wouldn't know if the rain is increasing or decreasing.
> Two points would give it an idea of the trend.
> Three or more points would give it an idea of how fast it's changing.
> The more points, the more it can see.

In the [`serving/data.py`](serving/data.py) file, we defined a function called `get_inputs_image` which returns us an `ee.Image` with bands for all the time steps for cloud and moisture, and for precipitation, alongside with the elevation.

In [None]:
from datetime import datetime, timedelta
import folium
from serving.data import get_inputs_image, INPUT_HOUR_DELTAS

# Get the Earth Engine image.
date = datetime(2019, 9, 2, 18)
image = get_inputs_image(date)

# Show map.
map = folium.Map([25, -90], zoom_start=5)
elevation_layer().add_to(map)
for i, h in enumerate(INPUT_HOUR_DELTAS):
  label = str(date + timedelta(hours=h))
  goes16_layer(image, label, i).add_to(map)
  gpm_layer(image, label, i).add_to(map)
folium.LayerControl().add_to(map)
map

![Inputs](images/inputs.png)

> 💡 You can hide and show layers from the top-right corner widget to see all the inputs for the model.

## ✅ Labels

We chose to predict precipitation for 2 and 6 hours in the future, but it could be anything as long as we have the right _labels_.

In the [`serving/data.py`](serving/data.py) file, we defined a function called `get_labels_image` which returns us an `ee.Image` with bands for each time step of precipitation.

In [None]:
from datetime import datetime, timedelta
import folium
from serving.data import get_labels_image, OUTPUT_HOUR_DELTAS

# Get the Earth Engine image.
date = datetime(2019, 9, 3, 18)
image = get_labels_image(date)

# Show map.
map = folium.Map([25, -90], zoom_start=5)
for i, h in enumerate(OUTPUT_HOUR_DELTAS):
  label = str(date + timedelta(hours=h))
  gpm_layer(image, label, i).add_to(map)
folium.LayerControl().add_to(map)
map

![Labels](images/labels.png)

# 🗄 Create the dataset

Now that we know the _inputs_ and _outputs_ for the model, we can create a _dataset_ to train the model.
A dataset consists of _training examples_, which are `(inputs, labels)` pairs, so for each input data, we have to give it the correct output values.

We want a _balanced_ dataset consisting on a representative, diverse, and unbiased selection of data points.
This way the model can learn from many different examples covering different seasons, times of day, regions, ecosystems, etc.

Let's take a closer look at how we select our training examples to create the dataset.

## 📌 Sample points

First, we want to get balanced points for a given time.
We use [`ee.Image.stratifiedSample`](https://developers.google.com/earth-engine/apidocs/ee-image-stratifiedsample) to select around the same number of points for each amount of precipitation.
Also, most of the regions from where we're selecting data points fall under very low elevations, near sea level.
So it's important to make sure we select data points from different elevations in a balanced way.

Since the precipitation is a continuous value, we first need to convert it to a classification.
By looking at different images, we noticed that most values fall within 0 and 30.
So we simply clamped the values into that range, divided by the maximum value, multiplied by the number of bins, and converted them into integers.

We do a similar thing for the elevation, where we found empirically that most values fall between 0 and 6000.

Once we have bins for both precipitation and elevation, we combine them into a single "unique" bin value to make sure we get all the possible precipitation values for each elevation.

In [`create_dataset.py`](create_dataset.py) we defined a function called `sample_points` that gives us a balanced selction of `(longitude, latitude)` coordinates for a given date.

In [203]:
from datetime import datetime
from create_dataset import sample_points

date = datetime(2019, 9, 2, 18)
for date, point in sample_points(date):
  print(f"{date} -- {point}")

2019-09-02 18:00:00 -- [-69.5525524841715, -39.82132539507417]
2019-09-02 18:00:00 -- [-71.4390145808225, 1.9503353164835744]
2019-09-02 18:00:00 -- [-52.12523597225278, -20.956704428564223]
2019-09-02 18:00:00 -- [-75.66109641618425, 34.11002248796244]
2019-09-02 18:00:00 -- [-37.662359897928496, 51.2678444146453]
2019-09-02 18:00:00 -- [-87.15953205291412, 5.902922566609462]
2019-09-02 18:00:00 -- [-70.27120471146712, 3.4774712994867656]
2019-09-02 18:00:00 -- [-45.208208284532475, -25.358449320749884]
2019-09-02 18:00:00 -- [-121.5650074346918, 8.058879248496325]
2019-09-02 18:00:00 -- [-127.7633828951165, 12.909781782741732]
2019-09-02 18:00:00 -- [-110.96488708208145, 53.96279026700387]
2019-09-02 18:00:00 -- [-50.957426102897415, -25.358449320749884]
2019-09-02 18:00:00 -- [-63.80333466580656, 2.399492958543334]
2019-09-02 18:00:00 -- [-50.957426102897415, -25.98727001963354]
2019-09-02 18:00:00 -- [-47.723491080067134, -22.034682769507654]
2019-09-02 18:00:00 -- [-71.61867763764

> 💡 We only bucketize the precipitation to select a balanced dataset, but we use the original continuous value for the labels.

## 📑 Get training examples

The next step is to get our training examples data.
Sometimes there are transient errors like sending too many requests, so we used [`Retry`](https://googleapis.dev/python/google-api-core/latest/retry.html) to handle those cases.

We predefined that all our training examples would be 5 pixels width by 5 pixels height, but we could choose any size as long as the model accepts it.
We also want all the training examples to be the same size so we can batch them.

In [`create_dataset.py`](create_dataset.py) we defined `get_training_example`, which fetches an `(inputs, labels)` pair for the given date and (longitude, latitude) coordinate.
Let's see how a 64x64 patch looks like, since a 5x5 patch will only look like a bunch of random pixels to us.

In [20]:
from datetime import datetime
from create_dataset import get_training_example

date = datetime(2019, 9, 2, 18)
point = [-77.93, 25.23]  # [longitude, latitude]
(inputs, labels) = get_training_example(date, point, patch_size=64)

print(f"inputs : {inputs.dtype} {inputs.shape}")
print(f"labels : {labels.dtype} {labels.shape}")

inputs : float32 (64, 64, 52)
labels : float32 (64, 64, 2)


Let's see how the example inputs look like.

In [21]:
from visualize import show_inputs

show_inputs(inputs)

And these are the labels for that example, corresponding to 2 and 6 hours in the future from the example's time.

In [22]:
from visualize import show_outputs

show_outputs(labels)

> 💡 We chose _5x5 patches_ because our Fully Convolutional Model uses a _3x3 kernel_.
> We want a _balanced_ representation of precipitation, and we did the stratified sampling on the _center_ pixel only.
> By choosing 5x5 patches with a 3x3 kernel, we make sure the center pixel we chose appears in all 9 positions for the kernel.

## 📚 Write NumPy files

Finally, we need to write the training examples into files.
We chose [compressed NumPy files](https://numpy.org/doc/stable/reference/generated/numpy.savez_compressed.html) for simplicity.
We used Apache Beam [`FileSystems`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.filesystems.html) to be able to write into any file system that Beam supports, including Cloud Storage.

Before writing the examples, we batch them to create files containing multiple examples, rather than a single file per example.
This reduces I/O operations when reading the dataset during training.

Here, let's create a batch from a single example, but our data creation pipeline will create larger batches.

In [25]:
from create_dataset import write_npz

batch = [(inputs, labels)]
write_npz(batch, 'data-small/')

!ls -lh data-small

total 412K
-rw-r--r-- 1 root root 412K Dec 18 21:25 87ce0eb3-e2e4-4c92-9858-72639c434b9e.npz


## 🗃 Create the dataset

Finally, we create an
[Apache Beam](https://beam.apache.org/) pipeline, which allows us to create parallel processing pipelines.

Let's see how to create a small dataset from a single date!

In [None]:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions

data_path = "data/"
dates = [datetime(2019, 9, 2, 18)]
beam_options = PipelineOptions([], direct_num_workers=20)
with beam.Pipeline(options=beam_options) as pipeline:
    (
        pipeline
        | "📆 Create dates" >> beam.Create(dates)
        | "📌 Sample points" >> beam.FlatMap(sample_points)
        | "🃏 Reshuffle" >> beam.Reshuffle()
        | "📑 Get example" >> beam.MapTuple(get_training_example)
        | "🗂️ Batch examples" >> beam.BatchElements()
        | "📚 Write NPZ files" >> beam.Map(write_npz, data_path)
    )

Now we can take a look at our data files.

In [None]:
!ls -lh {data_path}

# ☁️ Create the dataset in Dataflow

Local testing works great for creating small datasets and making sure everything works, but to run on a large dataset at scale it's best to use a distributed runner like
[Dataflow](https://cloud.google.com/dataflow).

We can run [`create_dataset.py`](create_dataset.py) as a script and run it in [Dataflow](https://cloud.google.com/dataflow).
You can control the number of dates to sample with `--num-dates` _(default=50)_, the number of bins to use for the stratified sampling with `--num-bins` _(default=10)_, and the number of points per bin to sample with `--num-points` _(default=1)_.

In [None]:
!python create_dataset.py \
  --data-path="gs://{bucket}/weather/data" \
  --runner="DataflowRunner" \
  --project="{project}" \
  --region="{location}" \
  --temp_location="gs://{bucket}/weather/temp"

> 💡 Look at your Dataflow jobs: https://console.cloud.google.com/dataflow/jobs

# 🧠 Train the model

Now that we have our dataset, we're ready to train the model.

## 📖 Read the datasets

To read a dataset in PyTorch, we could manually instantiate a subclass of `torch.utils.data.Dataset`, but we're going to use [Hugging Face 🤗 Datasets](https://huggingface.co/docs/datasets/main/en/index), which are a high-level interface to use datasets more easily.

Our data files are compressed NumPy files, which we can easily load with NumPy.
To load them into a 🤗 Dataset, we can use [`Dataset.from_dict`](https://huggingface.co/docs/datasets/main/en/loading#python-dictionary) and pass it a dictionary containing all the file names of our data files.
Then, we use [`Dataset.map`](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.map) to read the data files in parallel.
To split the our dataset into training and a testing/validation subsets, we use [`Dataset.train_test_split`](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset.train_test_split).

In [`trainer/task.py`](trainer/task.py) we defined the `read_dataset` function to load our data files, and returns us a 🤗 Dataset with train/test splits.

In [None]:
from trainer.task import read_dataset

data_path = 'data'
train_test_ratio = 0.9  # 90% train, 10% test

# Read the dataset with train/test splits.
dataset = read_dataset(data_path, train_test_ratio)

In [None]:
print(dataset)

> 💡 For more information on loading data into a 🤗 Dataset, refer to the [Loading data](https://huggingface.co/docs/datasets/main/en/loading) guide.

🤗 Datasets allow for random access just like PyTorch Datasets.

Let's see the shapes of the first training example from the `train` split.
When we access an example, we get an `{'inputs': list, 'labels': list}` dictionary, where each value is a [Python list](https://docs.python.org/3/library/stdtypes.html#list).
We can then convert them into [PyTorch tensors](https://pytorch.org/docs/stable/tensors.html) for further use.

In [15]:
import torch

train_dataset = dataset['train']
example = train_dataset[0]  # random access the first element

print(f"inputs: {torch.as_tensor(example['inputs']).shape}")
print(f"labels: {torch.as_tensor(example['labels']).shape}")

inputs: torch.Size([5, 5, 52])
labels: torch.Size([5, 5, 2])


The _inputs_ have the shape `(width, height, num_inputs)`, where each input is the value of an Earth Engine band.

The _outputs_ have the shape `(width, height, num_outputs)`, where each output is a prediction.
We're predicting for 2 and 6 hours into the future, so we get 2 outputs.

## 📓 Define the model

First we define our model, which is a very simple _Fully Convolutional Network_.
The input data can consist of potentially very large numbers, but machine learning generally prefers small numbers around -1 and 1.
So in [`trainer/task.py`](trainer/task.py) we defined a `Normalization` layer which applies [Z-Score](https://developers.google.com/machine-learning/data-prep/transform/normalization#z-score) to normalize all the model's inputs as a first step.
But we need to provide it with the [_mean_](https://en.wikipedia.org/wiki/Mean) and [_standard deviation_](https://en.wikipedia.org/wiki/Standard_deviation) from the training dataset.

A model always processes _batches_ of inputs, so we always get an extra _first_ dimension.
This means that for all the layers in the model, our inputs have the shape `(batch, width, height, num_inputs)`, and our outputs have the shape `(batch, width, height, num_outputs)`.

We need to calculate the mean and standard deviation for each input, so each band is normalized within its own range.
Both the mean and standard deviation must have the shape `(batch, width, height, num_inputs)`, which allows them to _broadcast_ to any batch size, width and height, as long as the `num_inputs` match.

In [16]:
import numpy as np

# Let's get the mean and standard deviation.
data = np.array(dataset['train']['inputs'], np.float32)
mean = data.mean(axis=(0, 1, 2))[None, None, None, :]
std = data.std(axis=(0, 1, 2))[None, None, None, :]

print(f"mean: {mean.shape}")
print(f"std:  {std.shape}")

mean: (1, 1, 1, 52)
std:  (1, 1, 1, 52)


Let's see how the normalization works for a sample of an example's inputs.

In [17]:
import torch
from trainer.task import Normalization

normalization = Normalization(mean, std)

sample = lambda x: x[0, 0, 0, 10:15].detach().numpy()

print(f"mean: {sample(normalization.mean)}")
print(f"std:  {sample(normalization.std)}")
print('-' * 40)

example = dataset['train'][0]
example_inputs = torch.as_tensor([example['inputs']])
normalized_inputs = normalization(example_inputs)
print(f"inputs:     {sample(example_inputs)}")
print(f"normalized: {sample(normalized_inputs)}")

mean: [2102.7695 2243.9602 2222.8613 2377.36   2597.1636]
std:  [230.2718  300.6387  314.30673 495.46133 337.46265]
----------------------------------------
inputs:     [2199. 2411. 2446. 2808. 2826.]
normalized: [0.4178995  0.5556164  0.7099392  0.86916953 0.67810893]


After applying the `Normalization` layer, we get small numbers much closer to the range within -1 and 1, they don't have to be _exactly_ within the range, just close enough.

Another thing to note is that our data is in a channels-last format, like `(width, height, channels)`.
But PyTorch expects channels-first format in the convolutional layers, like `(channels, width, height)`.
We still want to pass our inputs in a channels-last format and want the predictions back as channels-last for convenience, but we must convert them to channels-first for PyTorch convolutional layers to work.

In [`trainer/task.py`](trainer/task.py) we define the `MoveDim` layer, which works similar to [`torch.movedim`](https://pytorch.org/docs/stable/generated/torch.movedim.html) so the model can move the channels dimension as needed.


In [18]:
from trainer.task import MoveDim

# We move the channels/last dimension (-1) to the second index (1),
# since the first (0) is for the batch dimension.
to_channels_first = MoveDim(-1, 1)
channels_first = to_channels_first(normalized_inputs)

print(f"normalized:     {normalized_inputs.shape}")
print(f"channels-first: {channels_first.shape}")

normalized:     torch.Size([1, 5, 5, 52])
channels-first: torch.Size([1, 52, 5, 5])


The model then passes the data through a
[2D Convolutional layer](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html) for downsampling, and then through a
[2D DeConvolutional layer](https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html) for upsampling, so we end up with images the same size as the input image.
We used a [`ReLU`](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html) activation function inbetween all hidden layers since it's typically a good general purpose activation function.

The Conv2D and DeConv2D layers form a very simple Fully Convolutional Network architecture, and since we're using the same _kernel size_ for both we get the same `(width, height)` as outputs.

In [19]:
num_inputs = 52
num_hidden1 = 64
num_hidden2 = 128
kernel_size = (3, 3)

fully_convolutional_layers = torch.nn.Sequential(
    torch.nn.Conv2d(num_inputs, num_hidden1, kernel_size),
    torch.nn.ReLU(),
    torch.nn.ConvTranspose2d(num_hidden1, num_hidden2, kernel_size),
    torch.nn.ReLU(),
)

fcn_outputs = fully_convolutional_layers(channels_first)
print(f"FCN outputs: {fcn_outputs.shape}")

FCN outputs: torch.Size([1, 128, 5, 5])


Now, let's convert the results back into channels-last format with `MoveDim`.

In [20]:
to_channels_last = MoveDim(1, -1)
channels_last = to_channels_last(fcn_outputs)

print(f"channels-last: {channels_last.shape}")

channels-last: torch.Size([1, 5, 5, 128])


For the last layer, we use a [`Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) layer with the number of outputs we want.
Since we can't have negative precipitation, we passed the model's outputs through a final `ReLU` activation function.

In [21]:
num_outputs = 2

linear = torch.nn.Linear(num_hidden2, num_outputs)
relu = torch.nn.ReLU()

with torch.no_grad():
  raw_predictions = linear(channels_last)
  predictions = relu(raw_predictions)

print(f"predictions: {predictions.shape}")
print(predictions[0, 0, 0])

predictions: torch.Size([1, 5, 5, 2])
tensor([0.0105, 0.0494])


In [`trainer/model.py`](trainer/model.py) we defined the `WeatherModel` and `WeatherConfig` classes.

The `WeatherModel` class inherits from [`PreTrainedModel`](https://huggingface.co/docs/transformers/main/en/main_classes/model) to make it compatible with [🤗 Transformers](https://huggingface.co/docs/transformers/main/en/index).

The model definition includes the loss function, so it knows how good or bad their predictions were.
We could use any regression loss function like [Mean Absolute Error (L1)](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html) or [Mean Squared Error (L2)](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html).
PyTorch provides a [Smooth L1 Loss](https://pytorch.org/docs/stable/generated/torch.nn.SmoothL1Loss.html), which chooses between L1 and L2 depending on a certain criteria.
It's less sensitive to outliers, so we'll use that.

To create a `WeatherModel`, we have to pass it a `WeatherConfig`.
The `WeatherConfig` contains all the model's hyperparameters, and we must also pass the _mean_ and _standard deviation_ from the training dataset for the normalization layer.
We defined `WeatherModel.create` which takes in the training dataset inputs and returns us a `WeatherModel` with the right `WeatherConfig`.

In [63]:
from trainer.task import WeatherModel

model = WeatherModel.create(dataset['train']['inputs'])
print(model)

WeatherModel(
  (loss): SmoothL1Loss()
  (model): Sequential(
    (0): Normalization()
    (1): MoveDim()
    (2): Conv2d(52, 64, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU()
    (4): ConvTranspose2d(64, 128, kernel_size=(3, 3), stride=(1, 1))
    (5): ReLU()
    (6): MoveDim()
    (7): Linear(in_features=128, out_features=2, bias=True)
    (8): ReLU()
  )
)


The model outputs a `{'loss': torch.Tensor, 'logits': torch.Tensor}` dictionary during training, and a `{'logits': torch.Tensor}` dictionary during predictions.
This is what 🤗 Transformers expect for [model outputs](https://huggingface.co/docs/transformers/main/en/main_classes/output).

Remember that we _must_ pass a _batch_ of inputs to the model, not a single input.

In [23]:
example = dataset['test']
inputs_batch = torch.as_tensor(example['inputs'][:1])
labels_batch = torch.as_tensor(example['labels'][:1])

# We pass the labels as well to get the loss, but it's optional.
# If we don't pass the labels, we simply won't get the loss.
# The predictions are under the 'logits' key.
with torch.no_grad():
  predictions = model(inputs_batch, labels_batch)

print(f"inputs:      {inputs_batch.shape}")
print(f"labels:      {labels_batch.shape}")
print(f"loss:        {predictions['loss']}")
print(f"predictions: {predictions['logits'].shape}")
print("-" * 40)
print(f"sample labels:      {labels_batch[0, 0, 0]}")
print(f"sample predictions: {predictions['logits'][0, 0, 0]}")

inputs:      torch.Size([1, 5, 5, 52])
labels:      torch.Size([1, 5, 5, 2])
loss:        6.588315486907959
predictions: torch.Size([1, 5, 5, 2])
----------------------------------------
sample labels:      tensor([6.9336, 5.3004])
sample predictions: tensor([0.0000, 0.0372])


These predictions don't look great because we haven't trained our model.
Fortunately, since we've made our model compatible with 🤗 Transformers, we can simply use [`Trainer`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer), which takes care of all the training steps, automatically optimizes the whole process, and uses accelerators like GPUs if available.

## 👟 Train the model

We have to define the number of times we want the model to go through the training dataset, this is called the number of _epochs_.
We also have to define the _batch size_ we want to use during training and testing, this can have a big impact in how fast the model trains, as a rule of thumb the larger the better as long as it fits into memory.
We define all these parameters with [`TrainingArguments`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments).

Then we pass the model, the `TrainingArguments`, and the training and testing datasets into the `Trainer`.
Finally we can train the model with [`Trainer.train`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.train).

In [64]:
from trainer.task import WeatherModel

model = WeatherModel.create(dataset['train']['inputs'])
model

WeatherModel(
  (loss): SmoothL1Loss()
  (model): Sequential(
    (0): Normalization()
    (1): MoveDim()
    (2): Conv2d(52, 64, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU()
    (4): ConvTranspose2d(64, 128, kernel_size=(3, 3), stride=(1, 1))
    (5): ReLU()
    (6): MoveDim()
    (7): Linear(in_features=128, out_features=2, bias=True)
    (8): ReLU()
  )
)

In [None]:
from transformers import TrainingArguments, Trainer

epochs = 5
batch_size = 512

# Define our training job.
training_args = TrainingArguments(
    output_dir=os.path.join(model_path, "outputs"),
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=epochs,
    logging_strategy="epoch",
    evaluation_strategy="epoch",
)
trainer = Trainer(
    model,
    training_args,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
)

# Run the training job.
trainer.train()

> 💡 Both losses should go down every epoch, and they should be roughly similar.
> If the training loss goes down, but the testing loss stays flat or goes up, it might be a sign that the model is [overfitting](https://developers.google.com/machine-learning/crash-course/generalization/peril-of-overfitting), meaning that it's memorizing the training dataset instead of learning to generalize.

## 💾 Save and load the model

After the model has finished training, we can save it with [`Trainer.save_model`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.Trainer.save_model).



In [None]:
trainer.save_model("trained-model")

!ls -lh trained-model

Saving model checkpoint to trained-model
Configuration saved in trained-model/config.json
Model weights saved in trained-model/pytorch_model.bin


total 420K
-rw-r--r-- 1 root root 3.3K Dec 15 00:01 config.json
-rw-r--r-- 1 root root 410K Dec 15 00:01 pytorch_model.bin
-rw-r--r-- 1 root root 3.4K Dec 15 00:01 training_args.bin


Now that we have a trained model, we can save it and load it anywhere else.
We can load a 🤗 Transformers model with [`PreTrainedModel.from_pretrained`](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained), in our case with `WeatherModel.from_pretrained`.
This loads all the model's hyperparameters as well as the _mean_ and _standard deviation_ for the normalization layer.

In [None]:
from trainer.task import WeatherModel

model = WeatherModel.from_pretrained("trained-model")
print(model)

loading configuration file trained-model/config.json
Model config WeatherConfig {
  "architectures": [
    "WeatherModel"
  ],
  "kernel_size": [
    3,
    3
  ],
  "mean": [
    [
      [
        [
          1.6744621992111206,
          2.8964385986328125,
          5.55377721786499,
          447.6246337890625,
          381.5528869628906,
          424.61761474609375,
          137.4585723876953,
          199.13075256347656,
          202.05430603027344,
          4974.47998046875,
          2052.6318359375,
          2189.6328125,
          2173.363037109375,
          2311.430419921875,
          2530.8388671875,
          2590.708251953125,
          2532.613037109375,
          2493.992919921875,
          2643.083984375,
          466.0069274902344,
          406.8913879394531,
          450.98480224609375,
          152.3926239013672,
          209.89422607421875,
          212.64283752441406,
          4567.27099609375,
          2008.63818359375,
          2126.4636230468

WeatherModel(
  (loss): SmoothL1Loss()
  (model): Sequential(
    (0): Normalization()
    (1): MoveDim()
    (2): Conv2d(52, 64, kernel_size=(3, 3), stride=(1, 1))
    (3): ReLU()
    (4): ConvTranspose2d(64, 128, kernel_size=(3, 3), stride=(1, 1))
    (5): ReLU()
    (6): MoveDim()
    (7): Linear(in_features=128, out_features=2, bias=True)
    (8): ReLU()
  )
)


# ☁️ Train the model in Vertex AI

For this example we're training on a very small dataset for a very small number of epochs.
This means we don't have a representative number of examples and the model hasn't seen the data enough times, so it won't perform very well.

Training on larger datasets for a large number of epochs can take a lot of time, so it might be a good idea to do the training in Cloud.
[Vertex AI](https://cloud.google.com/vertex-ai) is a great option, and even allows us to use hardware accelerators like GPUs.
There are [PyTorch pre-built containers](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers#pytorch) which include PyTorch and many common libraries, so we don't need to build a custom container.

## 🛑 Remove this and add torch to trainer/requirements.txt

In [None]:
# from google.cloud import aiplatform

# epochs = 1000
# data_dir = "data-300k"

# aiplatform.init(project=project, location=location, staging_bucket=bucket)

# with open('trainer/requirements.txt') as f:
#   requirements = [line.strip() for line in f.readlines()]

# job = aiplatform.CustomTrainingJob(
#     display_name=f"{data_dir}-{epochs}",
#     script_path="trainer/task.py",
#     container_uri="us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest",
#     requirements=requirements,
# )

# job.run(
#     machine_type='n1-highmem-8',
#     accelerator_type='NVIDIA_TESLA_T4',
#     accelerator_count=1,
#     args=[
#         f"--data-path=/gcs/{bucket}/weather/{data_dir}",
#         f"--model-path=/gcs/{bucket}/weather/model/{data_dir}-{epochs}",
#         f"--epochs={epochs}",
#     ],
# )

In [None]:
from google.cloud import aiplatform

epochs = 5000

aiplatform.init(project=project, location=location, staging_bucket=bucket)

with open('trainer/requirements.txt') as f:
  requirements = [line.strip() for line in f.readlines()]

job = aiplatform.CustomTrainingJob(
    display_name="weather",
    script_path="trainer/task.py",
    container_uri="us-docker.pkg.dev/vertex-ai/training/pytorch-gpu.1-11:latest",
    requirements=requirements,
)

job.run(
    machine_type='n1-highmem-8',
    accelerator_type='NVIDIA_TESLA_T4',
    accelerator_count=1,
    args=[
        f"--data-path=/gcs/{bucket}/weather/data",
        f"--model-path=/gcs/{bucket}/weather/model",
        f"--epochs={epochs}",
    ],
)

> 💡 Look at your Vertex AI training jobs: https://console.cloud.google.com/vertex-ai/training/custom-jobs

# 🔮 Make predictions

Now we are ready to run the model on some new data and get some predictions.

First, we get the input data for the model.
We get the labels as well, just to compare our model's predictions with what the real precipitation actually was.

In [81]:
from datetime import datetime
from serving.data import get_inputs_patch, get_labels_patch

date = datetime(2019, 9, 2, 18)
point = [-78.322, 25.507]  # [longitude, latitude]
patch_size = 128

inputs = get_inputs_patch(date, point, patch_size)
labels = get_labels_patch(date, point, patch_size)

print(f"inputs : {inputs.dtype} {inputs.shape}")
print(f"labels : {labels.dtype} {labels.shape}")

inputs : float32 (128, 128, 52)
labels : float32 (128, 128, 2)


Here's how the input data looks like.

In [95]:
from visualize import show_inputs

show_inputs(inputs)

## 🛑 Update how many epochs and dataset size used for pre-trained model

Next, we load our model.
If we want to use the model we trained in Vertex AI, we have to copy it and have it locally.
Otherwise, we've provided a pre-trained model in the [`model/`](model/) directory.
We trained it for a large number on epochs on a really large dataset, so you don't have to 🙂.

> 💡 Uncomment and run the following cell to use the model you trained in Vertex AI.

In [None]:
# Uncomment to copy the Vertex AI model
# model_path = "model-vertex"
# !mkdir -p {model_path}
# !gsutil cp gs://{bucket}/weather/model/* {model_path}/

In [30]:
!rm -rf model-vertex
!mkdir model-vertex
!gsutil cp gs://{bucket}/weather/model/data-300k-10/* model-vertex/
# !gsutil cp gs://{bucket}/weather/model/* model/

Copying gs://dcavazos-lyra/weather/model/data-300k-10/config.json...
Copying gs://dcavazos-lyra/weather/model/data-300k-10/pytorch_model.bin...
Copying gs://dcavazos-lyra/weather/model/data-300k-10/training_args.bin...
Omitting prefix "gs://dcavazos-lyra/weather/model/data-300k-10/outputs/". (Did you mean to do cp -r?)

Operation completed over 3 objects/416.1 KiB.                                    


In [101]:
from trainer.task import WeatherModel
from visualize import show_outputs

model = WeatherModel.from_pretrained("model-vertex")
predictions = model.predict(inputs.tolist())

show_outputs(predictions)

In [99]:
show_outputs(labels)

In [None]:
# # Setup some test data to predict
# test_data = [
#     # datetime, longitude, latitude, title
#     ['2021-12-23T17:00:00', -118.04194946289063, 40.788608828924076, 'light rain'],
#     ['2021-12-15T17:30:00', -76.23759765625002, 43.16430423079589, 'medium rain'],
#     ['2021-12-23T17:00:00', -119.66792602539063, 36.882018532995076, 'heavy rain'],
#     ['2021-12-23T17:00:00', -112.0310546875, 36.27880268550452, 'just cloudy'],
#     ['2021-09-23T17:00:00', -119.98515625, 39.68941717505306, 'clear'],
#     ['2021-11-15T17:30:00', -123.72188110351563, 45.6445363447774, 'medium rain'],
#     ['2021-12-23T17:00:00', -115.70048828125, 35.324365287778654, 'light rain']
# ]

In [None]:
from datetime import datetime
import multiprocessing

test_data = [
    ['2021-12-23T17:00:00', -118.0419, 40.7886, 'light rain'],
    ['2021-12-15T17:30:00',  -76.2375, 43.1643, 'medium rain'],
    ['2021-12-23T17:00:00', -119.6679, 36.8820, 'heavy rain'],
    ['2021-12-23T17:00:00', -112.0310, 36.2788, 'just cloudy'],
    ['2021-09-23T17:00:00', -119.9851, 39.6894, 'clear'],
    ['2021-11-15T17:30:00', -123.7218, 45.6445, 'medium rain'],
    ['2021-12-23T17:00:00', -115.7004, 35.3243, 'light rain'],
]

def get_predictions(to_predict: list) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
  timestamp, longitude, latitude, title = to_predict
  date = datetime.fromisoformat(timestamp)
  point = [longitude, latitude]
  patch_size = 64

  inputs = get_inputs_patch(date, point, patch_size)
  predictions = model.predict(inputs.tolist())
  labels = get_labels_patch(date, point, patch_size)

  return (inputs, predictions, labels)

pool = multiprocessing.Pool(processes=len(test_data))
results = pool.map(get_predictions, test_data)

# result = get_predictions(test_data[0])
# print(f"inputs:      {result['inputs'].shape}")
# print(f"predictions: {result['predictions'].shape}")
# print(f"labels:      {result['labels'].shape}")

In [153]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from visualize import render_gpm, render_goes16, render_elevation

def show_predictions(results: list[dict[str, np.ndarray]]) -> None:
    fig = make_subplots(rows=5, cols=len(results), vertical_spacing=0.025)
    for i, (inputs, predictions, labels) in enumerate(results, start=1):
        fig.add_trace(go.Image(z=render_goes16(inputs[:, :, 35:51])), row=1, col=i)
        fig.add_trace(go.Image(z=render_gpm(inputs[:, :, 2:3])), row=2, col=i)
        fig.add_trace(go.Image(z=render_elevation(inputs[:, :, 51:52])), row=3, col=i)
        fig.add_trace(go.Image(z=render_gpm(predictions[:, :, 0:1])), row=4, col=i)
        fig.add_trace(go.Image(z=render_gpm(labels[:, :, 0:1])), row=5, col=i)
    fig.update_layout(
        height=5 * int(1000 / len(results)),
        margin=dict(l=0, r=0, b=0, t=0),
    )
    fig.show()

show_predictions(results)

# 5. ⛵ Further exploration

This notebook demonstrated a simple model to start exploring the problem of weather forecasting using deep neural networks. The model has less than 100k parameters and only a few Conv2D layers to keep training time short. Even so, the model is able to distinguish cloud patterns for broad rain vs no rain detection.

There has been a lot of interesting research work on weather nowcasting recently, especially with [U-Net](https://en.wikipedia.org/wiki/U-Net) style model architectures. If you are interested in diving deeper, here are some articles from Google Research:

*  [Google Research blog on nowcasting](https://ai.googleblog.com/2021/11/metnet-2-deep-learning-for-12-hour.html)
*  [MetNet paper](https://arxiv.org/abs/2003.12140)

# 6. 🧹 Clean Up

To **avoid incurring charges** to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

## Deleting the project

The **easiest** way to **eliminate billing** is to delete the project that you created for the tutorial.

To delete the project:

> ⚠️ Deleting a project has the following effects:
>
> * **Everything in the project is deleted.** If you used an existing project for this tutorial, when you delete it, you also delete any other work you've done in the project.
>
> * **Custom project IDs are lost.** When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as an appspot.com URL, delete selected resources inside the project instead of deleting the whole project.
>
> If you plan to explore multiple tutorials and quickstarts, **reusing** projects can help you avoid exceeding project **quota limits**.

1. In the Cloud Console, go to the **Manage resources** page.

  <button>

  [Go to Manage resources](https://console.cloud.google.com/iam-admin/projects)

  </button>

1. In the project list, select the project that you want to delete, and then click **Delete**.

1. In the dialog, type the project ID, and then click **Shut down** to delete the project.