Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add step-by-step and pipeline tutorials for reinforcement learning with Vertex AI. #19

Merged
merged 5 commits into from
Aug 6, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Building TF-Agents Bandits Based Movie Recommendation Systems using KFP and Vertex AI SDK

This directory demos the use of TF-Agents, Kubeflow Pipelines (KFP) and Vertex
AI in building reinforcement learning applications. The RL applications we build
are movie recommendation systems based on the MovieLens 100K dataset. We adopt
the contextual bandits formulation.

There are 2 demos:

1. Step-by-step demo: showcase how to use custom training, custom
hyperparameter tuning, custom prediction and endpoint deployment of Vertex
AI to build a RL movie recommendation system

2. End-to-end pipeline demo: showcase how to build a RL-specific MLOps pipeline
using KFP and Vertex Pipelines, as well as additional Vertex AI and GCP
services such as BigQuery, Cloud Functions, Cloud Scheduler, Pub/Sub.
Highlights of this end-to-end pipeline are:

- RL-specific implementation for training and prediction
- Simulation for initial training data, prediction requests and
re-training
- Closing of the feedback loop from prediction results back to training
- Customizable and reproducible KFP components

Each demo contains a notebook that carries out the full workflow and user
instructions, and a `src/` directory for Python modules and unit tests.
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# End-to-End MLOps Pipeline Demo

This demo showcase how to build a RL-specific MLOps pipeline using
[Kubeflow Pipelines (KFP)](https://www.kubeflow.org/docs/components/pipelines/overview/pipelines-overview/)
and [Vertex Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines), as
well as additional [Vertex AI](https://cloud.google.com/vertex-ai) and GCP
services such as [BigQuery](https://cloud.google.com/bigquery),
[Cloud Functions](https://cloud.google.com/functions),
[Cloud Scheduler](https://cloud.google.com/scheduler),
[Pub/Sub](https://cloud.google.com/pubsub). We implement the RL training and
prediction logic using the [TF-Agents](https://www.tensorflow.org/agents)
library.

Highlights of this end-to-end pipeline are:

- RL-specific implementation for training and prediction
- Simulation for initial training data, prediction requests and re-training
- Closing of the feedback loop from prediction results back to training
- Customizable, reusable and shareable KFP components

We use the
[MovieLens 100K dataset](https://www.kaggle.com/prajitdatta/movielens-100k-dataset)
to build a simulation environment that frames the recommendation problem:

1. User vectors are the environment observations;
2. Movie items to recommend are the agent actions applied on the environment;
3. Approximate user ratings are the environment rewards generated as feedback
to the observations and actions.

For custom training, we implement off-policy training, using a static set of
pre-collected data records. "Off-policy" refers to the situation where for a
data record, given its observation, the current policy in training might not
choose the same action as the one in said data record.

The pipeline (startup) includes the following KFP custom components (executed
once):

- Generator to generate MovieLens simulation data
- Ingester to ingest data
- Trainer to train the RL policy
- Deployer to deploy the trained policy to a Vertex AI endpoint

The re-training pipeline (executed recurrently) includes the Ingester, Trainer
and Deployer, as it does not need initial training data from the Generator.

After pipeline construction, we also create:

- Simulator (which utilizes Cloud Functions, Cloud Scheduler and Pub/Sub) to
send simulated MovieLens prediction requests
- Logger to asynchronously log prediction inputs and results (which utilizes
Cloud Functions, Pub/Sub and a hook in the prediction code)
- Trigger to trigger recurrent re-training.

Here is an illustration of the design:

<img src="mlops-pipeline-design.png" alt="RL MLOps Pipeline Design" width="800"/>

The demo contains a notebook that carries out the full workflow and user
instructions, and a `src/` directory for Python modules and unit tests.

Read more about problem framing, simulations, and adopting this demo in
production and to other use cases in the notebook.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading