Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IsADirectoryError: [Errno 21] Is a directory: '/mnt/models' #2876

Closed
jax79sg opened this issue Jan 25, 2021 · 1 comment
Closed

IsADirectoryError: [Errno 21] Is a directory: '/mnt/models' #2876

jax79sg opened this issue Jan 25, 2021 · 1 comment
Labels
bug triage Needs to be triaged and prioritised accordingly

Comments

@jax79sg
Copy link

jax79sg commented Jan 25, 2021

Describe the bug

Deployment of a simple sklearn pod doesn't work. The error seems to point to IsADirectoryError: [Errno 21] Is a directory: '/mnt/models'

To reproduce

default sklearn-default-0-classifier-7bf86c744f-gdms2 0/2 CrashLoopBackOff 1 8s

>cat sklearn.yaml 

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: sklearn
spec:
  name: digit-predict
  protocol: kfserving
  predictors:
  - graph:
      children: []
      implementation: SKLEARN_SERVER
      modelUri: s3://models/digits.joblib
      envSecretRefName: seldon-init-container-secret
      name: classifier
      parameters:
        - name: method
          type: STRING
          value: predict
    name: default
    replicas: 1
> kubectl apply -f sklearn.yaml 
seldondeployment.machinelearning.seldon.io/sklearn created
> kubectl describe pod sklearn-default-0-classifier-7bf86c744f-gdms2 -n default

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  102s               default-scheduler  Successfully assigned default/sklearn-default-0-classifier-7bf86c744f-gdms2 to kubenode2
  Normal   Pulled     95s                kubelet            Container image "gcr.io/kfserving/storage-initializer:v0.4.0" already present on machine
  Normal   Created    95s                kubelet            Created container classifier-model-initializer
  Normal   Started    95s                kubelet            Started container classifier-model-initializer
  Normal   Pulled     93s                kubelet            Container image "docker.io/seldonio/seldon-core-executor:1.5.1" already present on machine
  Normal   Created    92s                kubelet            Created container seldon-container-engine
  Normal   Started    92s                kubelet            Started container seldon-container-engine
  Normal   Pulled     76s (x3 over 93s)  kubelet            Container image "seldonio/mlserver:0.1.1" already present on machine
  Normal   Created    76s (x3 over 93s)  kubelet            Created container classifier
  Normal   Started    76s (x3 over 93s)  kubelet            Started container classifier
  Warning  BackOff    58s (x6 over 90s)  kubelet            Back-off restarting failed container
  Warning  Unhealthy  56s (x4 over 71s)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503
> kubectl logs sklearn-default-0-classifier-7bf86c744f-gdms2 -n default classifier

Traceback (most recent call last):
  File "/usr/local/bin/mlserver", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/mlserver/cli/main.py", line 61, in main
    root()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/mlserver/cli/main.py", line 18, in wrapper
    return asyncio.run(f(*args, **kwargs))
  File "/usr/local/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.7/site-packages/mlserver/cli/main.py", line 57, in start
    await server.start(models)
  File "/usr/local/lib/python3.7/site-packages/mlserver/server.py", line 28, in start
    await asyncio.gather(*load_tasks)
  File "/usr/local/lib/python3.7/site-packages/mlserver/repository.py", line 23, in load
    await model.load()
  File "/usr/local/lib/python3.7/site-packages/mlserver/models/sklearn.py", line 37, in load
    self._model = joblib.load(model_uri)
  File "/usr/local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 577, in load
    with open(filename, 'rb') as f:
IsADirectoryError: [Errno 21] Is a directory: '/mnt/models'
> kubectl logs sklearn-default-0-classifier-7bf86c744f-gdms2 -n default seldon-container-engine

{"level":"error","ts":1611537677.303898,"logger":"SeldonRestApi","msg":"Ready check failed","error":"dial tcp 127.0.0.1:9000: connect: connection refused","stacktrace":"github.com/seldonio/seldon-core/executor/api/rest.(*SeldonRestApi).checkReady\n\t/workspace/api/rest/server.go:177\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2036\ngithub.com/seldonio/seldon-core/executor/api/rest.handleCORSRequests.func1\n\t/workspace/api/rest/middlewares.go:64\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2036\ngithub.com/gorilla/mux.CORSMethodMiddleware.func1.1\n\t/go/pkg/mod/github.com/gorilla/mux@v1.8.0/middleware.go:51\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2036\ngithub.com/seldonio/seldon-core/executor/api/rest.xssMiddleware.func1\n\t/workspace/api/rest/middlewares.go:87\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2036\ngithub.com/seldonio/seldon-core/executor/api/rest.(*CloudeventHeaderMiddleware).Middleware.func1\n\t/workspace/api/rest/middlewares.go:47\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2036\ngithub.com/seldonio/seldon-core/executor/api/rest.puidHeader.func1\n\t/workspace/api/rest/middlewares.go:79\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2036\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\t/go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2831\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1919"}
> kubectl logs sklearn-default-0-classifier-7bf86c744f-gdms2 -n default classifier-model-initializer
[I 210125 01:17:34 initializer-entrypoint:13] Initializing, args: src_uri [s3://models/digits.joblib] dest_path[ [/mnt/models]
[I 210125 01:17:34 storage:35] Copying contents of s3://models/digits.joblib to local
[I 210125 01:17:34 storage:60] Successfully copied s3://models/digits.joblib to /mnt/models

Environment

  • Cloud Provider: Bare Metal
  • Kubernetes Cluster Version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.6", GitCommit:"fbf646b339dc52336b55d8ec85c181981b86331a", GitTreeState:"clean", BuildDate:"2020-12-18T12:09:30Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.6", GitCommit:"fbf646b339dc52336b55d8ec85c181981b86331a", GitTreeState:"clean", BuildDate:"2020-12-18T12:01:36Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
  • Deployed Seldon System Images:
value: docker.io/seldonio/engine:1.5.1
          value: docker.io/seldonio/seldon-core-executor:1.5.1
        image: docker.io/seldonio/seldon-core-operator:1.5.1

Model Details

Simple joblib file from a digits sklearn task.

@jax79sg jax79sg added bug triage Needs to be triaged and prioritised accordingly labels Jan 25, 2021
@ukclivecox
Copy link
Contributor

You need to point to the folder containing the pikled model, so not s3://models/digits.joblib but s3://models
Also the file needs to be called model.joblib see https://docs.seldon.io/projects/seldon-core/en/latest/servers/sklearn.html#prerequisites

@jax79sg jax79sg closed this as completed Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug triage Needs to be triaged and prioritised accordingly
Projects
None yet
Development

No branches or pull requests

2 participants