Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Support TensorBoard in Kubeflow Pipelines #118

Open
akartsky opened this issue Mar 2, 2022 · 3 comments
Open

[Doc] Support TensorBoard in Kubeflow Pipelines #118

akartsky opened this issue Mar 2, 2022 · 3 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@akartsky
Copy link
Contributor

akartsky commented Mar 2, 2022

"Support TensorBoard in Kubeflow Pipelines" section of document is outdated :
https://www.kubeflow.org/docs/distributions/aws/pipeline/#support-tensorboard-in-kubeflow-pipelines

Outdated Doc :

TensorBoard needs some extra settings on AWS like below:

  1. Create a Kubernetes secret aws-secret in the kubeflow namespace. Follow instructions here.

  2. Create a ConfigMap to store the configuration of TensorBoard on your cluster. Replace <your_region> with your S3 region.

apiVersion: v1
kind: ConfigMap
metadata:
  name: ml-pipeline-ui-viewer-template
data:
  viewer-tensorboard-template.json: |
    {
        "spec": {
            "containers": [
                {
                    "env": [
                        {
                            "name": "AWS_ACCESS_KEY_ID",
                            "valueFrom": {
                                "secretKeyRef": {
                                    "name": "aws-secret",
                                    "key": "AWS_ACCESS_KEY_ID"
                                }
                            }
                        },
                        {
                            "name": "AWS_SECRET_ACCESS_KEY",
                            "valueFrom": {
                                "secretKeyRef": {
                                    "name": "aws-secret",
                                    "key": "AWS_SECRET_ACCESS_KEY"
                                }
                            }
                        },
                        {
                            "name": "AWS_REGION",
                            "value": "<your_region>"
                        }
                    ]
                }
            ]
        }
    }
  1. Update the ml-pipeline-ui deployment to use the ConfigMap by running kubectl edit deployment ml-pipeline-ui -n kubeflow.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: ml-pipeline-ui
  namespace: kubeflow
  ...
spec:
  template:
    spec:
      containers:
      - env:
        - name: VIEWER_TENSORBOARD_POD_TEMPLATE_SPEC_PATH
          value: /etc/config/viewer-tensorboard-template.json
        ....
        volumeMounts:
        - mountPath: /etc/config
          name: config-volume
      .....
      volumes:
      - configMap:
          defaultMode: 420
          name: ml-pipeline-ui-viewer-template
        name: config-volume
@akartsky akartsky added the documentation Improvements or additions to documentation label Mar 2, 2022
@akartsky
Copy link
Contributor Author

akartsky commented Mar 2, 2022

kubeflow/kubeflow#6328

@akartsky
Copy link
Contributor Author

akartsky commented Mar 18, 2022

Errors :

There are the errors that might see on the tensorboard pod when you try to use S3

1] This is caused because we need to specify AWS_REGION as environment variable for the pod

2022-03-17 19:17:23.774900: W tensorflow/core/platform/s3/aws_logging.cc:57] Encountered Unknown AWSError 'PermanentRedirect': The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
2022-03-17 19:17:23.774947: E tensorflow/core/platform/s3/aws_logging.cc:60] HTTP response code: 301

2] This is caused because the pod does not have permissions to access the S3 bucket
(pod will default to using Node IAM role if no secrets are provided and that will not have S3 access)

2022-03-17 18:51:05.696810: W tensorflow/core/platform/s3/aws_logging.cc:57] Encountered AWSError 'AccessDenied': Access Denied
2022-03-17 18:51:05.696857: E tensorflow/core/platform/s3/aws_logging.cc:60] HTTP response code: 403

Issue :

The current implementations of TensorBoard controller does not mount AWS secrets and doesn't have configMap for providing env variable inputs to tensorboard pod

https://github.com/kubeflow/kubeflow/blob/d224549f11b671c2ee9e97380e4525bb698c0a68/components/tensorboard-controller/controllers/tensorboard_controller.go#L252

Workaround :

This is Not a good workaround and you have to do this for every tensorboard pod that you launch.

1] Create AWS secrets in the kubeflow user namespace
(This IAM user should have S3 access to the bucket)
Eg:

apiVersion: v1
kind: Secret
metadata:
  name: aws-secret
  namespace: <your_kubeflow_user_namespace>
type: Opaque
data:
  AWS_ACCESS_KEY_ID: <base_64_key>
  AWS_SECRET_ACCESS_KEY: <base_64_secret>

2] Launch a TensorBoard from the UI with S3 object storage link
Eg:
Name : <name_for_your_tensorboard>
Object Storage Link : s3://<your_bucket_name>
Current KF deployment uses TensorBoard version 2.1.0

3] Edit the deployment for the tensorboard pod that was just created

kubectl edit deployment <name_for_your_tensorboard> -n <your_kubeflow_user_namespace>

then add the following environment variables to it (on the same level as args, command and image)

env:
- name: AWS_REGION
  value: <your_s3_bucket_region>
- name: AWS_ACCESS_KEY_ID
  valueFrom:
    secretKeyRef:
      name: aws-secret
      key: AWS_ACCESS_KEY_ID
- name: AWS_SECRET_ACCESS_KEY
  valueFrom:
    secretKeyRef:
      name: aws-secret
      key: AWS_SECRET_ACCESS_KEY

Now if you go to the UI of the tensorboard that you had created it should be working.

Actual solution :

Make code changes in the TensorBoard controller

1] Modify the TensorBoard controller and provide a configMap input so that users can specify environment variables
2] Mount AWS credentials just like they are currently doing for GCS

Need to work on this PR

@surajkota surajkota added the enhancement New feature or request label Dec 8, 2022
@surajkota
Copy link
Contributor

Upstream issue: kubeflow/kubeflow#6493

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants