Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add step-by-step and pipeline tutorials for reinforcement learning with Vertex AI. #19

Merged
merged 5 commits into from
Aug 6, 2021

Conversation

KathyFeiyang
Copy link
Contributor

@KathyFeiyang KathyFeiyang commented Aug 3, 2021

Add step-by-step and pipeline tutorials for reinforcement learning with Vertex AI.

Add two reinforcement learning on Vertex AI prototypes. The prototypes use TF-Agents, Kubeflow Pipelines (KFP) and Vertex AI in building a reinforcement learning application: movie recommendation system based on the MovieLens 100K dataset.

  • Step-by-step demo: showcase how to use custom training, custom hyperparameter tuning, custom prediction and endpoint deployment of Vertex AI to build a RL movie recommendation system

  • End-to-end pipeline demo: showcase how to build a RL-specific MLOps pipeline using KFP and Vertex Pipelines, as well as additional Vertex AI and GCP services such as BigQuery, Cloud Functions, Cloud Scheduler, Pub/Sub.

Each demo contains a notebook that carries out the full workflow and user instructions, and a src/ directory for Python modules and unit tests.


Before submitting a Jupyter notebook, follow this mandatory checklist:

  • Use the notebook template as a starting point.
  • Follow the style and grammar rules outlined in the above notebook template.
  • Verify the notebook runs successfully in Colab since the automated tests cannot guarantee this even when it passes.
  • [N/A] Passes all the required automated checks
  • [N/A] You have consulted with a tech writer to see if tech writer review is necessary. If so, the notebook has been reviewed by a tech writer, and they have approved it.
  • This notebook has been added to the CODEOWNERS file, pointing to the author or the author's team. If the CODEOWNERS file doesn't exist, create one in the nearest folder that makes sense.
  • The Jupyter notebook cleans up any artifacts it has created (datasets, ML models, endpoints, etc) so as not to eat up unnecessary resources.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@yinghsienwu
Copy link
Collaborator

LGTM! Thanks for the PR.

Copy link
Contributor

@ivanmkc ivanmkc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM because community tutorials don't need review.

Just add the CODEOWNERS file

@Ark-kun
Copy link
Contributor

Ark-kun commented Aug 4, 2021

Got this error when running at the ModelUpload stage: google.api_core.exceptions.FailedPrecondition: 400 The Cloud Storage bucket of gs://avolkov/tmp/artifactsis in locationus. It must be in the same regional location as the service location us-central1.

/cc @sasha-gitg
/cc @chensun

This is one of the reasons when I was proposing to add the InputUri placeholders, I've proposed it to have the supportedSchemas property and possibility for expansion. All URIs are not equal - the component authors need to be able to tell the system which URIs the component supports.

@KathyFeiyang
Copy link
Contributor Author

Got this error when running at the ModelUpload stage: google.api_core.exceptions.FailedPrecondition: 400 The Cloud Storage bucket of gs://avolkov/tmp/artifactsis in locationus. It must be in the same regional location as the service location us-central1.

/cc @sasha-gitg

Thanks for pointing this out. I encountered the same issue with CustomContainerTrainingJob, and also logged this in a friction log (which I will send you offline).

For the two notebooks, there are instructions in the GCS bucket creation section about avoiding multi-regional buckets.

@morgandu
Copy link
Contributor

morgandu commented Aug 5, 2021

@KathyFeiyang please add the prototype to the CODEOWNERS

@sasha-gitg
Copy link
Member

In Vertex, buckets must be in a single region and must match the region of the Vertex service. We have an open ticket to add verification before we call the API(b/183494969) in the Vertex SDK. In many scenarios the service exception is informative enough (like the example above) and is generally raised immediately.

For components, I don't think this is feasible because the storage uri could be passed in as PipelineParam and we will not know the identity of the uri until the task is executed.

Are you thinking of a different solution?

@KathyFeiyang KathyFeiyang requested review from morgandu and a team as code owners August 5, 2021 19:49
Copy link
Contributor

@morgandu morgandu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @KathyFeiyang for your contribution to our community content!

@KathyFeiyang
Copy link
Contributor Author

@Ark-kun @sasha-gitg Thank you for the discussion above around the potential improvement to using GCS buckets with Vertex. For the scope of this PR and adding the two RL prototypes, the feature change with Vertex is not immediately necessary, because users are instructed in the notebooks to match bucket region with Vertex region, and platform changes can't be implemented in the context of this PR. Therefore, I'll prepare to wrap up the PR. Meanwhile, I do think the discussion is valuable to continue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants