In [None]:
#@title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the "License")

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Wildfire spread with Tensorflow and Vertex AI

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GoogleCloudPlatform/python-docs-samples/blob/main/people-and-planet-ai/land-cover-classification/README.ipynb)

In 2021, wildfires destroyed [7 million acres of wildland](https://www.ncei.noaa.gov/access/monitoring/monthly-report/fire/202113)--roughly the same area as the state of Massachusetts. These wildfires destroyed homes, towns, and people's lives.

<img src="https://media.cnn.com/api/v1/images/stellar/prod/200908110238-07-wildfires-0907-malden-wa.jpg?q=x_17,y_443,h_876,w_1556,c_crop/h_720,w_1280"/>
<caption><i>Figure. The 2020 Babb Road wildfire destroying a home in Malden, WA</i></caption>

For a wildfire to catch hold and spread, a set of conditions must exist in an environment. These conditions have been measured and recorded in multiple sources--sources that are available in Earth Engine. Imagine if you could build a ML model that can predict the likelihood and spread of wildfires!

This notebook

+ ⏲️ Time estimate: TT hours
+ 💰 Cost estimate: Around \$DD USD (free if you use \$300 Cloud credits)

💚 This is one of many machine learning how-to samples inspired from real climate solutions aired on the People and Planet AI 🎥 series.

In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 📒 Using this interactive notebook

Click the **run** icons ▶️ of each section within this notebook.

![Run cell](data/images/run-cell.png)

> 💡 Alternatively, you can run the currently selected cell with `Ctrl + Enter` (or `⌘ + Enter` in a Mac).

This **notebook code lets you train and deploy an ML model** from end-to-end. When you run a code cell, the code runs in the notebook's runtime, so you're not making any changes to your personal computer.

> ⚠️ **To avoid any errors**, wait for each section to finish in their order before clicking the next “run” icon.

This sample must be connected to a **Google Cloud project**, but nothing else is needed other than your Google Cloud project.

You can use an existing project or you can create a new Cloud project [with cloud credits for free.](https://cloud.google.com/free/docs/gcp-free-tier)

## 🚴‍♀️ Steps summary

This notebook is friendly for _beginner_, _intermediate_, and _advanced_ users of geospatial, data analytics and machine learning.
**No prior experience is needed** to dive in.

Here's a quick summary of what you’ll go through:

1. **📚 Understand the data**:
  Go through what we want to achieve and explore the data we want to use as _inputs and outputs_ for our model.

1. **🗂 Create the datasets**:
  Discover the shape of the datasets and how they will be transformed for use in training.

1. **🧠 Train the model**:
  Use [Keras and Tensorflow](https://keras.io/about/) to train a model on [Vertex AI](https://cloud.google.com/vertex-ai/docs/).

1. **🔮 Get inferences from the model**:
  Use your new ML model to predict the spread of wildfires.

1. (Optional) **🛠 Delete the project** to avoid ongoing costs.


## 🎬 Before you begin

We first need to install all the requirements for this notebook.

In [7]:
%%writefile requirements.txt
apache-beam[gcp]==2.41.0
earthengine-api==0.1.324
folium==0.12.1.post1
plotly==5.10.0

Writing requirements.txt


In [8]:
%%writefile constraints.txt
cachetools==4.2.4 # apache-beam requires cachetools<5
fastavro==1.5.4
fasteners==0.17.3
google-api-python-client==1.12.11 # earthengine-api requires google-api-python-client<2
google-apitools==0.5.31 # apache-beam requires google-apitools<0.5.32
google-auth-httplib2==0.1.0
google-auth==1.35.0 # earthengine-api requires google-api-python-client<2
google-cloud-bigquery-storage==2.13.2 # apache-beam requires google-cloud-bigquery-storage<2.14
google-cloud-bigquery==2.34.4 # apache-beam requires google-cloud-bigquery<3
google-cloud-bigtable==1.7.2 # apache-beam requires google-cloud-bigtable<2
google-cloud-core==2.3.2
google-cloud-datastore==1.15.5 # apache-beam requires google-cloud-datastore<2
google-cloud-dlp==3.8.0
google-cloud-language==1.3.2 # apache-beam requires google-cloud-language<2
google-cloud-pubsub==2.13.5
google-cloud-pubsublite==1.4.2
google-cloud-storage==2.5.0
google-cloud-spanner==1.19.3 # apache-beam requires google-cloud-spanner<2
google-cloud-resource-manager==1.6.1
google-cloud-recommendations-ai==0.7.1

Writing constraints.txt


In [9]:
!pip --quiet install --upgrade pip
!pip --quiet install -r requirements.txt -c constraints.txt google-cloud-aiplatform

# Restart the runtime by ending the runtime's process
exit()

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m24.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m25.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.0/12.0 MB[0m [31m93.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m242.1/242.1 kB[0m [31m26.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.2/15.2 MB[0m [31m84.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.9/140.9 kB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m152.0/152.0 kB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━

## ⚠️ Restart the runtime

Colab already comes with many dependencies pre-loaded.
In order to ensure everything runs as expected, we have to **restart the runtime**. This allows Colab to load the latest versions of the libraries.

!["Runtime" > "Restart runtime"](data/images/restart-runtime.png)

# ☁️ My Google Cloud resources

First, choose the Google Cloud _location_ where you want to run this sample.
A good place to start is by choosing your [Google Cloud location](https://cloud.google.com/compute/docs/regions-zones).

> ⚠️ Make sure you choose a location
> available for all products: [Cloud Storage](https://cloud.google.com/storage/docs/locations),
> [Vertex AI](https://cloud.google.com/vertex-ai/docs/general/locations),
> [Dataflow](https://cloud.google.com/dataflow/docs/resources/locations), and
> [Cloud Run](https://cloud.google.com/run/docs/locations).

> 💡 Prefer locations that are geographically closer to you with
> [low carbon emissions](https://cloud.google.com/sustainability/region-carbon), highlighted with the
> ![Leaf](https://cloud.google.com/sustainability/region-carbon/gleaf.svg) icon.

Make sure you have followed these steps to configure your Google Cloud project:

1. Enable the APIs: _Dataflow, Earth Engine, Vertex AI, and Cloud Run_

  <button>

  [Click here to enable the APIs](https://console.cloud.google.com/flows/enableapi?apiid=dataflow.googleapis.com,earthengine.googleapis.com,aiplatform.googleapis.com,run.googleapis.com)
  </button>

1. Create a Cloud Storage bucket in your desired _location_.

  <button>

  [Click here to create a new Cloud Storage bucket](https://console.cloud.google.com/storage/create-bucket)
  </button>

1. Register your
  [Compute Engine default service account](https://console.cloud.google.com/iam-admin/iam)
  on Earth Engine.

  <button>

  [Click here to register your service account on Earth Engine](https://signup.earthengine.google.com/#!/service_accounts)
  </button>

Once you have everything ready, you can go ahead and fill in your Google Cloud resources in the following code cell.
Make sure you run it!

In [1]:
import os
from google.colab import auth

auth.authenticate_user()

# Please fill in these values.
project = "video-erschmid" #@param {type:"string"}
bucket = "erschmid-wildfires" #@param {type:"string"}
location = "us-west1" #@param {type:"string"}

# Load values from environment variables if available.
project = os.environ.get("GOOGLE_CLOUD_PROJECT", project)
bucket = os.environ.get("CLOUD_STORAGE_BUCKET", bucket)
location = os.environ.get("CLOUD_LOCATION", location)

# Quick input validations.
assert project, "⚠️ Please provide a Google Cloud project ID"
assert bucket, "⚠️ Please provide a Cloud Storage bucket name"
assert not bucket.startswith('gs://'), f"⚠️ Please remove the gs:// prefix from the bucket name: {bucket}"
assert location, "⚠️ Please provide a Google Cloud location"

# Configure gcloud.
!gcloud config set project {project}

Updated property [core/project].


Next, we have to authenticate Earth Engine and initialize it.
Since we've already authenticated to this [Colab](https://www.youtube.com/watch?v=rNgswRZ2C1Y) and saved our credentials as the [Google default credentials](https://google-auth.readthedocs.io/en/master/reference/google.auth.html#google.auth.default),
we can reuse those credentials for Earth Engine.

> 💡 Since we're making **large amounts of automated requests to Earth Engine**, we want to use the
[high-volume endpoint](https://developers.google.com/earth-engine/cloud/highvolume).

In [2]:
import ee
import google.auth

def ee_init() -> None:
    """Authenticate and initialize Earth Engine with the default credentials."""
    # Use the Earth Engine High Volume endpoint.
    #   https://developers.google.com/earth-engine/cloud/highvolume
    credentials, _ = google.auth.default(
        scopes=[
            "https://www.googleapis.com/auth/cloud-platform",
            "https://www.googleapis.com/auth/earthengine",
        ]
    )
    ee.Initialize(
        credentials,
        project=project,
        opt_url="https://earthengine-highvolume.googleapis.com",
    )

In [3]:
ee_init()

# 📚 Understand the data

Before we begin, let's consider what we want to achieve and the datasets we chose for that purpose.

## 🎯 **Goal**: Time series forecasting and image segmentation

The goal of our model is to use satellite images to analyze the likelihood and potential spread of wildfires for a given geographical region. The output layer will combine a time series forecast (likelihood of fire to spread) and a classification (on fire or not on fire).

## 🛰 Inputs: Satellite images

To achieve our goal, we must combine multiple geographical datasets into a single dataset (or map in this case). Each input--also known as "features" or "independent variables"--will be stored as a single band within the resulting map. The following list shows the datasets used for this example:

* **USGS/SRTMGL1_003**: NASA SRTM Digital Elevation 30m
* **GRIDMET/DROUGHT**: CONUS Drought Indices
* **ECMWF/ERA5/DAILY**: Daily Aggregates - Latest Climate Reanalysis Produced by ECMWF / Copernicus Climate Change Service
* **IDAHO_EPSCOR/GRIDMET**: University of Idaho Gridded Surface Meteorological Dataset
* **CIESIN/GPWv411/GPW_Population_Density**: Population Density (Gridded Population of the World Version 4.11)

The following table shows the model input variables, the source dataset, and the symbols used for variable in our model.

| Feature | Original Source | Variable name |
| --------|:----------------|:--------------|
| Elevation | `USGS/SRTMGL1_003` | `elevation` |
| Palmer Drought Severity Index | `GRIDMET/DROUGHT` | `psdi` |
| Avg air temperature at 2m height | `ECMWF/ERA5/DAILY` | `mean_2m_air_temperature` |
| Total precipitation | `ECMWF/ERA5/DAILY` | `total_precipitation` |
| 10m u-component of wind (daily avg) | `ECMWF/ERA5/DAILY` | `u_component_of_wind_10m` |
| 10m v-component of wind (daily avg) | `ECMWF/ERA5/DAILY` | `v_component_of_wind_10m'` |
|
| Precipatation amount | `IDAHO_EPSCOR/GRIDMET` | `pr` |
| Specific humidity | `IDAHO_EPSCOR/GRIDMET` | `sph` |
| Wind direction | `IDAHO_EPSCOR/GRIDMET` | `th` |
| Minimum temperature | `IDAHO_EPSCOR/GRIDMET` | `tmmn` |
| Maximum temperature | `IDAHO_EPSCOR/GRIDMET` | `tmmx` |
| Wind velocity at 10m | `IDAHO_EPSCOR/GRIDMET` | `vs` |
| Energy release component | `IDAHO_EPSCOR/GRIDMET` | `erc` |
| Population density (per square km) | `CIESIN/GPWv411/GPW_Population_Density` | `population_density` |




In [7]:
INPUTS = {
    'USGS/SRTMGL1_003': ["elevation"],
    'GRIDMET/DROUGHT': ["psdi"],
    'ECMWF/ERA5/DAILY': [
         'mean_2m_air_temperature',
         'total_precipitation',
         'u_component_of_wind_10m',
         'v_component_of_wind_10m'],
    'IDAHO_EPSCOR/GRIDMET': [
         'pr',
         'sph',
         'th',
         'tmmn',
         'tmmx',
         'vs',
         'erc'],
    'CIESIN/GPWv411/GPW_Population_Density': ['population_density'],
    'MODIS/006/MOD14A1': ['FireMask']
}

## 🗺 **Outputs**: Land cover map

Finally, we need to give the model a set of labels to apply to each section of the map. These labels tell the training program (Tensorflow) what we want to infer from the previous data. In other words, this dataset represents the "dependent variable" that our model attempts to predict. For our model, we will use the "Terra Thermal Anomalies & Fire Daily Global 1km (MODIS/006/MOD14A1)" map from Earth Engine. We'll use the band `FireMask` provided by the map.

In [14]:
LABELS = {
    'MODIS/006/MOD14A1': ['FireMask'],
}


# 🗂 Explore the datasets

We need at least two datasets, a _training_ and a _validation_ dataset, to train our model.
They both have contain _examples_ of _inputs_ (features) with their respective _outputs_ (labels), but are used for two very different purposes.

The _training dataset_ is what the model uses to learn and adjust itself.
It goes through this dataset multiple times, similar to a student studying for an exam.

The _validation dataset_ is used like an _exam_.
It's important that the validation dataset does **not** include examples found in the training dataset, or the validation will be [biased](https://developers.google.com/machine-learning/crash-course/fairness/types-of-bias).
After going through the training dataset, the model will test itself against the validation dataset, which should include data it has not seen (learned from) before.

For this sample, we'll fetch data from Earth Engine and use that to create our training and validation datasets.

---

**Note**

The magnitude of the data in these datasets is huge: 16 bands of information for a significant sampling of rectangular areas. Because of the size, it is impractical to run an Apache Beam pipeline locally to build the datasets.

(Luckily, we can use Cloud Dataflow for this very reason.)

In the following cells, we'll examine some of the code used to compose the dataset. However, when it comes to actually building the datasets, we will dispatch our code as a job in Dataflow using a single, discrete Python file named `create_dataset.py`.

---



## 📌 Sample training points

As mentioned previously, the size of the data we'll use is already really large. Inspecting all of the data to validate it could take years!

Instead of looking at all of the data, we'll grab just a handful of geographical map points that are meaningful for detecting wildfires. (Don't worry, we've pre-selected some points already.) This process, sometimes called _exploratory data analysis (EDA)_, allows us to understand the shape and contents of our dataset.

The points we select for this analysis must satisfy the following criteria:

1. The points cannot be in the ocean.
1. Some of the points are on fire and the fire spreads.
1. Some of the points are on fire and the fire does not spread.
1. Some of the points are not on fire.

For each point we select, we need to define a rectangular region around the points.


### Get the independent variables

In [8]:
from google.api_core import retry, exceptions
from typing import Dict, Iterable, List, Optional, NamedTuple, Tuple

from datetime import datetime, timedelta
import ee
import io
import numpy as np
import requests

@retry.Retry(deadline=60 * 20)  # seconds
def ee_fetch(url: str) -> bytes:
    # If we get "429: Too Many Requests" errors, it's safe to retry the request.
    # The Retry library only works with `google.api_core` exceptions.
    response = requests.get(url)
    if response.status_code == 429:
        raise exceptions.TooManyRequests(response.text)

    # Still raise any other exceptions to make sure we got valid data.
    response.raise_for_status()
    return response.content


def get_image(
    date: datetime, bands_schema: Dict[str, List[str]], window: timedelta
) -> ee.Image:
    # if elevation dataset is part of bands_schema, deal with it separately
    if 'USGS/SRTMGL1_003' in bands_schema:
      elevation = ee.Image('USGS/SRTMGL1_003').select(bands_schema['USGS/SRTMGL1_003'])
      bands_schema.pop("USGS/SRTMGL1_003")
    else:
      elevation = None

    # if population dataset is part of bands_schema, deal with it separately
    if 'CIESIN/GPWv411/GPW_Population_Density' in bands_schema:
      population = [
          ee.ImageCollection('CIESIN/GPWv411/GPW_Population_Density')
        .filterDate(date.isoformat(), (date + window).isoformat())
        .select(bands_schema['CIESIN/GPWv411/GPW_Population_Density'])
        .median()
      ]
      bands_schema.pop("CIESIN/GPWv411/GPW_Population_Density")
    else:
      population = None

    images = [
        ee.ImageCollection(collection)
        .filterDate(date.isoformat(), (date + window).isoformat())
        .select(bands)
        .mosaic()
        for collection, bands in bands_schema.items()
    ]
    # add elevation to list
    if elevation:
      images.append(elevation)
    # add population to list
    if population:
      images.append(population)
    return ee.Image(images)

def get_input_image(date: datetime) -> ee.Image:
    return get_image(date, INPUTS, WINDOW)


class Bounds(NamedTuple):
    west: float
    south: float
    east: float
    north: float

def sample_points(
    image: ee.Image, num_points: int, bounds: Bounds, scale: int
) -> np.ndarray:
    def get_coordinates(point: ee.Feature) -> ee.Feature:
        coords = point.geometry().coordinates()
        return ee.Feature(None, {"lat": coords.get(1), "lon": coords.get(0)})

    points = image.int().stratifiedSample(
        num_points,
        region=ee.Geometry.Rectangle(bounds),
        scale=scale,
        geometries=True,
    )
    url = points.map(get_coordinates).getDownloadURL("CSV", ["lat", "lon"])
    raw_data = ee_fetch(url)
    return np.genfromtxt(io.BytesIO(raw_data), delimiter=",", skip_header=1)


In [10]:
SCALE = 5000
WINDOW = timedelta(days=1)

START_DATE = datetime(2019, 1, 1)
END_DATE = datetime(2020, 1, 1)

# Find a patch that corresponds to the 243 Command Fire in Washington State, 2019
patch_1 = {
    "datetime": START_DATE,
    "timedelta": WINDOW,
    "bounds": Bounds(
        west=-119.92358556551419,
        east=-119.79964605135403,
        north=46.86159776509677,
        south=46.81533182118907),
    "on_fire": True,
}

img = get_image(patch_1["datetime"], INPUTS, patch_1["timedelta"])
samples = sample_points(img, num_points=4, bounds=patch_1["bounds"], scale=12)

[[  46.82463029 -119.91005263]
 [  46.83465548 -119.88450455]
 [  46.83260733 -119.91522693]
 [  46.8340087  -119.81023184]]


### Get the labels

In [12]:
from typing import Tuple, Iterable
from datetime import datetime

class Point(NamedTuple):
    lat: float
    lon: float

def get_label_image(date: datetime) -> ee.Image:
    return get_image(date, LABELS, WINDOW)

def sample_labels(
    date: datetime, num_points: int, bounds: Bounds
) -> Iterable[Tuple[datetime, Point]]:
    image = get_label_image(date)
    for lat, lon in sample_points(image, num_points, bounds, SCALE):
        yield (date, Point(lat, lon))

In [17]:
label_img = get_label_image(START_DATE)
labels = sample_labels(START_DATE, 4, patch_1["bounds"])

for label in labels:
  print(label)

(datetime.datetime(2019, 1, 1, 0, 0), Point(lat=46.82468418473005, lon=-119.90263254785313))
(datetime.datetime(2019, 1, 1, 0, 0), Point(lat=46.82468418473005, lon=-119.81280101944118))
(datetime.datetime(2019, 1, 1, 0, 0), Point(lat=46.82468418473005, lon=-119.85771678364715))


##  ✍🏽 Write NumPy files

Finally, we need to write the training examples into files. We chose [compressed NumPy files](https://numpy.org/doc/stable/reference/generated/numpy.savez_compressed.html) for simplicity. We used Apache Beam FileSystems to be able to write into any file system that Beam supports, including Cloud Storage.

Before writing the examples, we batch them to create files containing multiple examples, rather than a single file per example. This reduces I/O operations when reading the dataset during training.

Here, let's create a batch from a single example, but our data creation pipeline will create larger batches.

In [2]:
from typing import NamedTuple
import numpy as np

class Example(NamedTuple):
    inputs: np.ndarray
    labels: np.ndarray

def write_npz_file(example: Example, file_prefix: str) -> str:
    from apache_beam.io.filesystems import FileSystems

    filename = FileSystems.join(file_prefix, f"{uuid.uuid4()}.npz")
    with FileSystems.create(filename) as f:
        np.savez_compressed(f, inputs=example.inputs, labels=example.labels)
    return filename

## PIPELINE STACK

```
random_dates
   num_dates = 250

sample_labels()
   num_points = 100
   bounds
   get_label_image()
   sample_points()

try_get_training_example()
    patch_size = 64
      get_training_example()
        ee_init()
        get_input_sequence()
          get_input_image()
          get_patch_sequence()
        get_label_sequence()
          get_label_image()
          get_patch_sequence()
        RETURNS Example()


write_npz_file()
    output_path
```

## 🚈 Create the datasets in Dataflow

We have all the pieces now, so lets put all together into an
[Apache Beam](https://beam.apache.org/) pipeline.
Apache Beam allows us to create parallel processing pipelines.

For this pipeline, we will create a new file, `create_dataset.py`, that will will bundle along into our job to run on Cloud Dataflow.

> 💡 For more information on how to use Apache Beam, refer to the
> [Tour of Beam](https://beam.apache.org/get-started/tour-of-beam/)

# TESTING


In [5]:
!gcloud config set project {project}
import os
os.environ["GOOGLE_CLOUD_PROJECT"] = project

Updated property [core/project].


In [None]:
# testing
from datetime import datetime
bounds = [-124, 24, -73, 49]
date = datetime(2020, 1, 1)
output_path = "gs://{bucket}/fire/large_data"
num_dates = "250"
num_points = "100"
runner = "DataflowRunner"
region = "{location}"
temp_location = "gs://{bucket}/fire/temp"
prebuild_sdk_container_engine = "cloud_build"
docker_registry_push_url = "gcr.io/{project}/fire"

!python create_dataset.py \
  --output-path "gs://{bucket}/fire/large_data" \
  --num-dates "250" \
  --num-points "100" \
  --runner "DataflowRunner" \
  --project "{project}" \
  --region "{location}" \
  --temp_location "gs://{bucket}/fire/temp" \
  --prebuild_sdk_container_engine "cloud_build" \
  --docker_registry_push_url "gcr.io/{project}/fire"

INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/bin/python3', '-m', 'pip', 'download', '--dest', '/tmp/dataflow-requirements-cache', '-r', '/tmp/tmpjupp5lz1/tmp_requirements.txt', '--exists-action', 'i', '--no-deps', '--implementation', 'cp', '--abi', 'cp39', '--platform', 'manylinux2014_x86_64']
INFO:apache_beam.runners.portability.stager:Downloading source distribution of the SDK from PyPi
INFO:apache_beam.runners.portability.stager:Executing command: ['/usr/bin/python3', '-m', 'pip', 'download', '--dest', '/tmp/tmpsk52ziyc', 'apache-beam==2.41.0', '--no-deps', '--no-binary', ':all:']
INFO:apache_beam.runners.portability.stager:Staging SDK sources from PyPI: dataflow_python_sdk.tar
INFO:apache_beam.runners.portability.stager:Downloading binary distribution of the SDK from PyPi
INFO:apache_beam.runners.portabi

In [7]:
run(output_path=output_path, num_dates=num_date, num_points=num_points, bounds=Bounds(-124, 24, -73, 49), patch_size=64, max_request=20, beam_args=beam_args)

ERROR:root:File `'(output_path=output_path,.py'` not found.


# 🧠 Train the model

We're now entering into the realms of PyTorch and linear algebra.
**Do not worry, we'll try to keep it simple**.

The overall process of training the model is _pretty straightforward_, but there's a lot of things to know on how to do things.

First, some basic definitions:

- 🔮 **Model**: You can think of the model as a function. You give it some inputs and it returns you some outputs. But rather than writing the function yourself, it learns from examples.
- 🧮 **Tensor**: Numbers without dimensions are _scalars_, 1-dimensional numbers are _vectors_, 2-dimensional numbers are _matrices_, multi-dimensional numbers are _tensors_. Machine learning models take tensors as inputs and return you tensors.
- 🏗️ **Shape**: This says how many dimensions and how they look. For example, a 3x4 matrix has shape `(3, 4)`, and an 800x600 RGB image has a shape `(800, 600, 3)` since we have three numbers (red, green, blue) for every pixel.