-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deployment issue on AWS #3077
Comments
Can you clarify the yaml and actual issue you are seeing in more detail. Are you sure this is not an Ambassador setup issue on AWS? |
The Yaml file is a very simple Seldon deployment which I copy from Seldon docs. I just add the secret, my Docker image and change the type to GRPC. |
@dimasheva1 are you sending the requests through the load balancer? We test ambassador on integration tests and have production enviornments on EKS and there are no issues there, so seems like a cluster configuration issue or confusion |
May be easier if you can ask these questions on the slack as you'll be able to get quicker answers to user questions - will close and can reopen if we confirm issue on slack |
Describe the bug
Whitelabel Error Page when I want to open the ambassador endpoint(istio endpoint doesn`t work because seldon core cannot be deployed) of my model on AWS. Endpoint http://localhost:9000/api/v1.0/predictions works fine for prediction locally via POST request.
To reproduce
--set image.repository=quay.io/datawire/ambassador
--set enableAES=false
--set crds.keep=false
--namespace ambassador
Expected behaviour
View Standardized User Interface
Environment
value: 403495124976.dkr.ecr.us-east-1.amazonaws.com/cc92bd08-3aee-4006-983a-b74fbf1cbfa8/cg-2585947346/seldonio/engine:0.5.0-latest
image: 403495124976.dkr.ecr.us-east-1.amazonaws.com/cc92bd08-3aee-4006-983a-b74fbf1cbfa8/cg-2585947346/seldonio/seldon-core-operator:0.5.0-latest`
Model Details
FROM seldonio/seldon-core-s2i-python3:1.7.0-dev
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
EXPOSE 5000
EXPOSE 6000
EXPOSE 9000
ENV MODEL_NAME Titanic
ENV SERVICE_TYPE MODEL
ENV PERSISTENCE 0
CMD exec seldon-core-microservice $MODEL_NAME --service-type $SERVICE_TYPE --persistence $PERSISTENCE
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: seldon-model
spec:
name: test-deployment
predictors:
containers:
image: dimasheva1/seldon:latest
imagePullSecrets:
graph:
children: []
endpoint:
type: GRPC
name: classifier
type: MODEL
name: example
replicas: 1
2021-03-19 09:34:56.922094: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2021-03-19 09:34:56.922143: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-03-19 09:34:59,344 - seldon_core.microservice:main:203 - INFO: Starting microservice.py:main
2021-03-19 09:34:59,344 - seldon_core.microservice:main:204 - INFO: Seldon Core version: 1.7.0-dev
2021-03-19 09:34:59,347 - seldon_core.microservice:main:332 - INFO: Parse JAEGER_EXTRA_TAGS []
2021-03-19 09:34:59,347 - seldon_core.microservice:load_annotations:155 - INFO: Found annotation kubernetes.io/config.seen:2021-03-19T09:34:08.084412441Z
2021-03-19 09:34:59,347 - seldon_core.microservice:load_annotations:155 - INFO: Found annotation kubernetes.io/config.source:api
2021-03-19 09:34:59,347 - seldon_core.microservice:load_annotations:155 - INFO: Found annotation kubernetes.io/psp:eks.privileged
2021-03-19 09:34:59,347 - seldon_core.microservice:load_annotations:155 - INFO: Found annotation prometheus.io/path:prometheus
2021-03-19 09:34:59,347 - seldon_core.microservice:load_annotations:155 - INFO: Found annotation prometheus.io/port:8000
2021-03-19 09:34:59,347 - seldon_core.microservice:load_annotations:155 - INFO: Found annotation prometheus.io/scrape:true
2021-03-19 09:34:59,347 - seldon_core.microservice:main:335 - INFO: Annotations: {'kubernetes.io/config.seen': '2021-03-19T09:34:08.084412441Z', 'kubernetes.io/config.source': 'api', 'kubernetes.io/psp': 'eks.privileged', 'prometheus.io/path': 'prometheus', 'prometheus.io/port': '8000', 'prometheus.io/scrape': 'true'}
2021-03-19 09:34:59,347 - seldon_core.microservice:main:339 - INFO: Importing Titanic
2021-03-19 09:34:59,374 - seldon_core.microservice:main:422 - INFO: REST gunicorn microservice running on port 9000
2021-03-19 09:34:59,376 - seldon_core.microservice:main:476 - INFO: REST metrics microservice running on port 6000
2021-03-19 09:34:59,376 - seldon_core.microservice:main:486 - INFO: Starting servers
2021-03-19 09:34:59,392 - seldon_core.wrapper:_set_flask_app_configs:213 - INFO: App Config: <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
2021-03-19 09:34:59,411 - seldon_core.wrapper:_set_flask_app_configs:213 - INFO: App Config: <Config {'ENV': 'production', 'DEBUG': False, 'TESTING': False, 'PROPAGATE_EXCEPTIONS': None, 'PRESERVE_CONTEXT_ON_EXCEPTION': None, 'SECRET_KEY': None, 'PERMANENT_SESSION_LIFETIME': datetime.timedelta(days=31), 'USE_X_SENDFILE': False, 'SERVER_NAME': None, 'APPLICATION_ROOT': '/', 'SESSION_COOKIE_NAME': 'session', 'SESSION_COOKIE_DOMAIN': None, 'SESSION_COOKIE_PATH': None, 'SESSION_COOKIE_HTTPONLY': True, 'SESSION_COOKIE_SECURE': False, 'SESSION_COOKIE_SAMESITE': None, 'SESSION_REFRESH_EACH_REQUEST': True, 'MAX_CONTENT_LENGTH': None, 'SEND_FILE_MAX_AGE_DEFAULT': datetime.timedelta(seconds=43200), 'TRAP_BAD_REQUEST_ERRORS': None, 'TRAP_HTTP_EXCEPTIONS': False, 'EXPLAIN_TEMPLATE_LOADING': False, 'PREFERRED_URL_SCHEME': 'http', 'JSON_AS_ASCII': True, 'JSON_SORT_KEYS': True, 'JSONIFY_PRETTYPRINT_REGULAR': False, 'JSONIFY_MIMETYPE': 'application/json', 'TEMPLATES_AUTO_RELOAD': None, 'MAX_COOKIE_SIZE': 4093}>
[2021-03-19 09:34:59 +0000] [1] [INFO] Starting gunicorn 20.0.4
[2021-03-19 09:34:59 +0000] [1] [INFO] Listening at: http://0.0.0.0:9000 (1)
[2021-03-19 09:34:59 +0000] [1] [INFO] Using worker: threads
[2021-03-19 09:34:59 +0000] [28] [INFO] Booting worker with pid: 28
2021-03-19 09:34:59,451 - seldon_core.gunicorn_utils:load:88 - INFO: Tracing branch is active
[2021-03-19 09:34:59 +0000] [22] [INFO] Starting gunicorn 20.0.4
[2021-03-19 09:34:59 +0000] [22] [INFO] Listening at: http://0.0.0.0:6000 (22)
[2021-03-19 09:34:59 +0000] [22] [INFO] Using worker: sync
2021-03-19 09:34:59,458 - seldon_core.utils:setup_tracing:724 - INFO: Initializing tracing
[2021-03-19 09:34:59 +0000] [30] [INFO] Booting worker with pid: 30
2021-03-19 09:34:59.505769: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-03-19 09:34:59.505886: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2021-03-19 09:34:59.505961: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (test-deployment-example-50a2328-db496c484-fw2mz): /proc/driver/nvidia/version does not exist
2021-03-19 09:34:59.509202: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-19 09:34:59.522083: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499995000 Hz
2021-03-19 09:34:59.522499: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557139c42470 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-19 09:34:59.522608: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-03-19 09:34:59,542 - seldon_core.utils:setup_tracing:731 - INFO: Using default tracing config
2021-03-19 09:34:59,542 - jaeger_tracing:_create_local_agent_channel:446 - INFO: Initializing Jaeger Tracer with UDP reporter
2021-03-19 09:34:59,545 - jaeger_tracing:new_tracer:384 - INFO: Using sampler ConstSampler(True)
2021-03-19 09:34:59,550 - jaeger_tracing:_initialize_global_tracer:436 - INFO: opentracing.tracer initialized to <jaeger_client.tracer.Tracer object at 0x7fa808cc2a10>[app_name=Titanic]
2021-03-19 09:34:59,550 - seldon_core.gunicorn_utils:load:93 - INFO: Set JAEGER_EXTRA_TAGS []
2021-03-19 09:34:59.624040: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2021-03-19 09:34:59.624092: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2021-03-19 09:34:59.624132: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (test-deployment-example-50a2328-db496c484-fw2mz): /proc/driver/nvidia/version does not exist
2021-03-19 09:34:59.624483: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-03-19 09:34:59.651544: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2499995000 Hz
2021-03-19 09:34:59.652034: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557139c42470 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-03-19 09:34:59.652060: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-03-19 09:34:59,852 - seldon_core.microservice:grpc_prediction_server:452 - INFO: GRPC microservice Running on port 5000
2021-03-19 09:35:04.387 INFO 6 --- [ main] i.s.e.App : Starting App v0.5.0 on test-deployment-example-50a2328-db496c484-fw2mz with PID 6 (/app.jar started by ? in /)
2021-03-19 09:35:04.400 INFO 6 --- [ main] i.s.e.App : No active profile set, falling back to default profiles: default
2021-03-19 09:35:06.396 INFO 6 --- [ main] i.s.e.c.CustomizationBean : Customizing EmbeddedServlet
2021-03-19 09:35:06.398 INFO 6 --- [ main] i.s.e.c.CustomizationBean : FOUND env var [ENGINE_SERVER_PORT], will use for engine server port
2021-03-19 09:35:06.398 INFO 6 --- [ main] i.s.e.c.CustomizationBean : setting serverPort[8000]
2021-03-19 09:35:07.333 WARN 6 --- [ main] o.s.h.c.j.Jackson2ObjectMapperBuilder : For Jackson Kotlin classes support please add "com.fasterxml.jackson.module:jackson-module-kotlin" to the classpath
2021-03-19 09:35:07.513 INFO 6 --- [ main] i.s.e.p.EnginePredictor : init
2021-03-19 09:35:07.514 INFO 6 --- [ main] i.s.e.p.EnginePredictor : FOUND env var [ENGINE_PREDICTOR], will use for engine predictor
2021-03-19 09:35:08.040 INFO 6 --- [ main] i.s.e.p.EnginePredictor : Setting deployment name to test-deployment
2021-03-19 09:35:08.066 INFO 6 --- [ main] i.s.e.p.EnginePredictor : Installed engine predictor: {"name":"example","graph":{"name":"classifier","children":[],"type":"MODEL","implementation":"UNKNOWN_IMPLEMENTATION","methods":[],"endpoint":{"service_host":"localhost","service_port":9000,"type":"GRPC"},"parameters":[],"modelUri":"","serviceAccountName":"","envSecretRefName":""},"componentSpecs":[{"metadata":{"name":"","generateName":"","namespace":"","selfLink":"","uid":"","resourceVersion":"","generation":0,"deletionGracePeriodSeconds":0,"labels":{},"annotations":{},"ownerReferences":[],"finalizers":[],"clusterName":""},"spec":{"volumes":[],"containers":[{"name":"classifier","image":"dimasheva1/seldon:latest","command":[],"args":[],"workingDir":"","ports":[{"name":"grpc","hostPort":0,"containerPort":9000,"protocol":"TCP","hostIP":""}],"env":[{"name":"PREDICTIVE_UNIT_SERVICE_PORT","value":"9000"},{"name":"PREDICTIVE_UNIT_ID","value":"classifier"},{"name":"PREDICTOR_ID","value":"example"},{"name":"SELDON_DEPLOYMENT_ID","value":"seldon-model"}],"resources":{"limits":{},"requests":{}},"volumeMounts":[{"name":"podinfo","readOnly":false,"mountPath":"/etc/podinfo","subPath":"","mountPropagation":""}],"livenessProbe":{"initialDelaySeconds":60,"timeoutSeconds":1,"periodSeconds":5,"successThreshold":1,"failureThreshold":3},"readinessProbe":{"initialDelaySeconds":20,"timeoutSeconds":1,"periodSeconds":5,"successThreshold":1,"failureThreshold":3},"lifecycle":{"preStop":{"exec":{"command":["/bin/sh","-c","/bin/sleep 10"]}}},"terminationMessagePath":"/dev/termination-log","imagePullPolicy":"IfNotPresent","stdin":false,"stdinOnce":false,"tty":false,"envFrom":[],"terminationMessagePolicy":"File","volumeDevices":[]}],"restartPolicy":"","terminationGracePeriodSeconds":0,"activeDeadlineSeconds":0,"dnsPolicy":"","nodeSelector":{},"serviceAccountName":"","serviceAccount":"","nodeName":"","hostNetwork":false,"hostPID":false,"hostIPC":false,"imagePullSecrets":[{"name":"regcred"}],"hostname":"","subdomain":"","schedulerName":"","initContainers":[],"automountServiceAccountToken":false,"tolerations":[],"hostAliases":[],"priorityClassName":"","priority":0,"shareProcessNamespace":false,"readinessGates":[],"runtimeClassName":"","enableServiceLinks":false}}],"replicas":1,"annotations":{},"engineResources":{"limits":{},"requests":{}},"labels":{"version":"example"},"svcOrchSpec":{"env":[]},"traffic":0,"explainer":{"type":"","modelUri":"","serviceAccountName":"","envSecretRefName":"","containerSpec":{"name":"","image":"","command":[],"args":[],"workingDir":"","ports":[],"env":[],"resources":{"limits":{},"requests":{}},"volumeMounts":[],"terminationMessagePath":"","imagePullPolicy":"","stdin":false,"stdinOnce":false,"tty":false,"envFrom":[],"terminationMessagePolicy":"","volumeDevices":[]}}}
2021-03-19 09:35:08.075 INFO 6 --- [ main] i.s.e.c.AnnotationsConfig : Annotations {kubernetes.io/config.source=api, kubernetes.io/psp=eks.privileged, kubernetes.io/config.seen=2021-03-19T09:34:08.084412441Z, prometheus.io/path=prometheus, prometheus.io/port=8000, prometheus.io/scrape=true}
2021-03-19 09:35:08.079 INFO 6 --- [ main] i.s.e.t.TracingProvider : Not activating tracing
2021-03-19 09:35:08.080 INFO 6 --- [ main] i.s.e.s.InternalPredictionService : REST Connection timeout set to 200
2021-03-19 09:35:08.081 INFO 6 --- [ main] i.s.e.s.InternalPredictionService : REST read timeout set to 5000
2021-03-19 09:35:08.390 INFO 6 --- [ main] i.s.e.s.InternalPredictionService : gRPC max message size set to 4194304
2021-03-19 09:35:08.391 INFO 6 --- [ main] i.s.e.s.InternalPredictionService : gRPC read timeout set to 5000
2021-03-19 09:35:08.391 INFO 6 --- [ main] i.s.e.s.InternalPredictionService : REST retries set to 3
2021-03-19 09:35:08.524 INFO 6 --- [ main] i.s.e.g.SeldonGrpcServer : FOUND env var [ENGINE_SERVER_GRPC_PORT], will use engine server port 5001
2021-03-19 09:35:08.843 INFO 6 --- [ task-1] i.s.e.g.SeldonGrpcServer : Starting grpc server
2021-03-19 09:35:09.111 INFO 6 --- [ task-1] i.s.e.g.SeldonGrpcServer : Server started, listening on 5001
2021-03-19 09:35:09.780 INFO 6 --- [ main] i.s.e.App : Started App in 6.037 seconds (JVM running for 7.275)
The text was updated successfully, but these errors were encountered: