Skip to content
This repository has been archived by the owner on Jul 14, 2023. It is now read-only.

Improve mnist pipeline example #33

Closed
Jeffwan opened this issue Mar 29, 2020 · 5 comments
Closed

Improve mnist pipeline example #33

Jeffwan opened this issue Mar 29, 2020 · 5 comments

Comments

@Jeffwan
Copy link
Contributor

Jeffwan commented Mar 29, 2020

https://github.com/aws-samples/eks-kubeflow-workshop/blob/master/notebooks/05_Kubeflow_Pipeline/05_03_Pipeline_mnist.ipynb

  1. Consider to use random names for mnist-ui, mnist-model. Otherwise user has to delete resources to rerun the pipeline. This brings some challenges to get exact name for inference. There's a tradeoff and we should keep simplification in mind and make changes to improve experiences.

Reference: commands to delete resources

k delete deployment mnist-model  mnist-ui
k delete svc mnist-service  mnist-ui
k delete vs mnist-ui

Note: k is alias of kubectl -n kubeflow

  1. Currently, user doesn't have permission to check kubeflow namespace. Check why this doesn't work?
kubectl create --namespace=anonymous rolebinding --clusterrole=kubeflow-view --serviceaccount=anonymous:default-editor anonymous-kubeflow-view 

Otherwise, we just give user cluster-admin permission which is not recommended but works.

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: anonymous-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: default-editor
  namespace: anonymous

  1. Provides Dockerfile and update python to write event logs, write an artifact to enable tensorboard

  2. Use high level KFP Op to construct TFjob instead of ResourceOp

  3. Consider to pass S3 MNIST training data to pipeline instead of download dataset in the code. Add instructions to replace S3 export bucket to user's

@Jeffwan
Copy link
Contributor Author

Jeffwan commented Mar 29, 2020

/cc @PatrickXYS @cfregly

@PatrickXYS
Copy link
Contributor

PatrickXYS commented Apr 1, 2020

Hi @Jeffwan, I was wondering do we really need have permission to check kubeflow namespace? Since this is required because the namespace we set in the notebook is kubeflow, what if we set the namespace as the namespace we created in kubeflow dashboard, then we don't need rolebinding.

For example, if I use below logic to find the namespace I'm using right now not simply kubeflow:

with open('/var/run/secrets/kubernetes.io/serviceaccount/namespace', 'r') as f:
    namespace = f.readline()

Then the SA user should have permission to check the namespace. We then set the namespace we got above in the pipeline:

def mnist_pipeline(
        name="mnist-{{workflow.uid}}",
        namespace=namespace,
        step="1000",
        s3bucketexportpath=""):

And it should be good to go, I've already tested it.

@Jeffwan
Copy link
Contributor Author

Jeffwan commented Apr 1, 2020

@PatrickXYS

Once multi-user pipeline is supported, this will be easier.

Currently, there're two different groups of containers

  1. Pipeline runners - by default they are running inside Kubeflow
  2. ResourceOps -> katib, tfjob, mnist deployment/svc. If pipeline-runner SA has permission, they can create resources in separate namespaces

I think there're few principles

  1. I think it's ok to have pipeline runners and applications in separate namespace.
  2. It would be better to have all resources running in user's namespace which means katib job, tfjob and all rest resources in this pipeline should be in user's own namespace.
with open('/var/run/secrets/kubernetes.io/serviceaccount/namespace', 'r') as f:
    namespace = f.readline()

The namespace here is still the pod namespace, the runner's pod is in kubeflow namespace, then it's kubeflow here. How do you use other namespace here?

@PatrickXYS
Copy link
Contributor

@Jeffwan The latest PR is able to address No.1, 3, 4. Will put more efforts to solve all the issues.

@Jeffwan
Copy link
Contributor Author

Jeffwan commented Apr 6, 2020

Let's close this issue now. If we have extra feedbacks, we can revisit the issue

@Jeffwan Jeffwan closed this as completed Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants