Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS distribution of Kubeflow v1.4 #27

Closed
27 of 31 tasks
surajkota opened this issue Nov 23, 2021 · 2 comments
Closed
27 of 31 tasks

AWS distribution of Kubeflow v1.4 #27

surajkota opened this issue Nov 23, 2021 · 2 comments
Labels
work in progress Has been assigned and is in progress

Comments

@surajkota
Copy link
Contributor

surajkota commented Nov 23, 2021

Feature list:

@surajkota surajkota changed the title AWS distribution for Kubeflow v1.4 AWS distribution of Kubeflow v1.4 Nov 23, 2021
surajkota added a commit that referenced this issue Dec 2, 2021
### Description of your changes:

All AWS related changes are confined to `distributions/aws` directory with this PR

- RDS and S3 integration for pipelines: 
  - ported `aws` directory from v1.3-branch under `apps/pipeline/upstream/env` to `distributions/aws/apps/pipeline`
  - updated the base resources location in `distributions/aws/apps/pipeline/kustomization.yaml`
- RDS integration for Katib: 
   - ported `katib-external-db-with-kubeflow` directory from v1.3-branch directory under `apps/katib/upstream/installs` to `distributions/aws/apps`
   - updated base resources location in `distributions/aws/apps/katib-external-db-with-kubeflow/kustomization.yaml` and created a patches directory for db-manager patch. `db-manager.yaml` patch is a copy from `katib-external-db` because patches need to be under kustomization directory
- DLC based jupyter notebooks
  - created `jupyter-web-app` directory under `distributions/aws/apps` and created an overlay with spawner_ui_config with AWS DLC based images for tensorflow and pytorch. spawner_ui_config is a copy of the file `apps/jupyter/jupyter-web-app/upstream/base/configs/spawner_ui_config.yaml` except for notebook server images updates
- Changes from #20 to v1.4-branch. No changes here
 
### Testing
Manual, WIP
Deleted generic components installed in an existing 1.4 deployment and installed aws modified components
- [x] Pipelines: Ran sample data passing pipeline, verified databases and run detail in RDS and artifacts pushed in S3
- [x] Notebook: created notebook server with tf-cpu image and ran some commands
- [x] Katib: obseravtion_logs table was created by db-manager so assuming the connection is good. test run will be addressed as part of #30 

#27
surajkota added a commit that referenced this issue Dec 9, 2021
### Description of your changes:
- Port existing tutorials from v1.3-branch and path changes according to directory structure
- Change examples directory location to be in dist/aws per discussion
- Fixed a few broken links in RDS readme
- [x] update jupyter-web-app location
- [x] update Katib location
- [x] update pipelines location
- [x] update ingress, alb-controller and envoy-filter location
- [x] update the branch checkout to v1.4-branch
- [x] sync common component install locations from generic 1.4 (mainly 1.4 has training operator instead of individual controller like XGboost, TFJob etc.)

### Testing
Manual, WIP
- [x] All links are working including cross readme links
- [x] Images are loading
- [x] Cognito readme
- [x] RDS-S3 reame
- [x] Cognito-RDS-S3 readme

Checked login, ran a sample pipeline and verified rds and s3 connections and created a notebook server with one of DLC image

#27
rrrkharse pushed a commit that referenced this issue Dec 10, 2021
### Description of your changes:
- Port existing tutorials from v1.3-branch and path changes according to directory structure
- Change examples directory location to be in dist/aws per discussion
- Fixed a few broken links in RDS readme
- [x] update jupyter-web-app location
- [x] update Katib location
- [x] update pipelines location
- [x] update ingress, alb-controller and envoy-filter location
- [x] update the branch checkout to v1.4-branch
- [x] sync common component install locations from generic 1.4 (mainly 1.4 has training operator instead of individual controller like XGboost, TFJob etc.)

### Testing
Manual, WIP
- [x] All links are working including cross readme links
- [x] Images are loading
- [x] Cognito readme
- [x] RDS-S3 reame
- [x] Cognito-RDS-S3 readme

Checked login, ran a sample pipeline and verified rds and s3 connections and created a notebook server with one of DLC image

#27
@surajkota surajkota added the work in progress Has been assigned and is in progress label Feb 18, 2022
@CodeBooster97
Copy link

looking forward to the release!

@surajkota
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
work in progress Has been assigned and is in progress
Projects
None yet
Development

No branches or pull requests

2 participants