## Flex Dataflow Template Runner

This notebook consists of - 
- How to create flex template via 
- How to run dataflow flex template via GCLOUD CLI 

#### Create a Cloud Storage bucket

In [None]:
export BUCKET="your-bucket"
gsutil mb gs://$BUCKET

#### Create an Artifact Registry Repository

Create an Artifact Registry repository where you will push the Docker container image for the template.

Use the `gcloud artifacts repositories create` command to create a new Artifact Registry repository.

Replace the following:
- `REPOSITORY`: a name for your repository. Repository names must be unique for each repository location in a project.
- `LOCATION`: the regional or multi-regional location for the repository.


In [None]:
REPOSITORY = 'your_repsitory_name'
REGION = 'us-central1'
PROJECT = 'your_gcp_project

In [None]:
export REGION="us-central1"
export REPOSITORY="your-repository"

! gcloud artifacts repositories create $REPOSITORY \
    --repository-format=docker \
    --location=$REGION

#### Configure Docker 

Use the `gcloud auth configure-docker` command to configure Docker to authenticate requests for Artifact Registry.

This command updates your `Docker Configuration`, so that you can connect with Artifact Registry to push images.

**Flex Templates** can also use images stored in private registries. For more information, see Use an image from a private registry.
https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_an_image_from_a_private_registry

In [None]:
! gcloud auth configure-docker $REGION-docker.pkg.dev

#### Build the Flex Template
In this step, you use the gcloud dataflow flex-template build command to build the Flex Template.

**A Flex Template consists of the following components:**

- A Docker container image that packages your pipeline code. For Java and Python Flex Templates, the Docker image is built and pushed to your Artifact Registry repository when you run the `gcloud dataflow flex-template build` command.

-  A template specification file. This file is a JSON document that contains the location of the container image plus metadata about the template, such as pipeline parameters.

In [None]:
! gcloud dataflow flex-template build gs://BUCKET_NAME/flex-template-py.json \
--image-gcr-path f"{REGION}-docker.pkg.dev/{PROJECT}/{REPOSITORY}/flex-python:latest" \
--sdk-language "PYTHON" \
--flex-template-base-image "PYTHON3" \
--metadata-file "metadata.json" \
--py-path "." \
--env "FLEX_TEMPLATE_PYTHON_PY_FILE=flex_python.py" \
--env "FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE=requirements.txt" \
--service-account-email gcp_service_account_name

#### Run The Flex Template 
In this step, we use the flex template to run a dataflow job.
Use the `gcloud dataflow flex-template run` command to run a Dataflow job that uses the Flex Template.

Replace the following:

- `BUCKET_NAME`: the name of the Cloud Storage bucket that you created earlier
- `REGION`: the region
To view the status of the Dataflow job in the Google Cloud console, go to the Dataflow Jobs page.

In [None]:
gcloud dataflow flex-template run "flex-job-`date +%Y%m%d-%H%M%S`" \
--template-file-gcs-location "gs://BUCKET_NAME/flex-template-py.json" \
--parameters output="gs://BUCKET_NAME/output-" \
--region "REGION" \
--service-account-email gcp_service_account_name \
--staging_location gs://your_staging_bucket_location/ \
--subnetwork your_full_subnetwork_uri \
--num-workers 4 \
--max-workers 8 \
--disable-public-ips \
--worker-region us-central1 \
--worker-machine-type c2-standard-8 \
--parameters output='gs://your_bucket/output/'
