Skip to content

Commit

Permalink
Tracing config (#225)
Browse files Browse the repository at this point in the history
* add tracing configuration

* add helm chart updates

* add initial shceduler tracing

* review fixes

* review comments - interface and config semantics

* review fixes
  • Loading branch information
ukclivecox committed Jun 2, 2022
1 parent 071cdce commit e601cae
Show file tree
Hide file tree
Showing 49 changed files with 728 additions and 252 deletions.
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ deploy-k8s:

.PHONY: undeploy-k8s
undeploy-k8s:
kubectl delete --ignore-not-found=true -f k8s/seldon-v2-servers.yaml
kubectl delete --ignore-not-found=true -f k8s/seldon-v2-components.yaml
kubectl delete --ignore-not-found=true -f k8s/seldon-v2-crds.yaml
kubectl delete --ignore-not-found=true -f k8s/yaml/seldon-v2-servers.yaml
kubectl delete --ignore-not-found=true -f k8s/yaml/seldon-v2-components.yaml
kubectl delete --ignore-not-found=true -f k8s/yaml/seldon-v2-crds.yaml

#
# Dev
Expand Down
28 changes: 27 additions & 1 deletion docs/source/contents/getting-started/configuration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,34 @@ The top level keys are:

### Kubernetes

For Kubernetes this is controlled via a ConfigMap call `seldon-kafka` whose default value is shown below:
For Kubernetes this is controlled via a ConfigMap called `seldon-kafka` whose default value is shown below:

```{literalinclude} ../../../../../scheduler/k8s/config/kafka.yaml
:language: yaml
```

## Tracing Configuration

We allow configuration of tracing. This file looks like:

```{literalinclude} ../../../../../scheduler/config/tracing-internal.json
:language: json
```

The top level keys are:

* `enable` : whether to enable tracing
* `otelExporterEndpoint` : The host and port for the OTEL exporter
* `ratio` : The ratio of requests to trace. Takes values between 0 and 1 inclusive.



### Kubernetes

For Kubernetes this is controlled via a ConfigMap call `seldon-tracing` whose default value is shown below:

```{literalinclude} ../../../../../scheduler/k8s/config/tracing.yaml
:language: yaml
```

At present Java instrumentation (for the dataflow engine) is duplicated via separate keys.
11 changes: 11 additions & 0 deletions k8s/helm-charts/seldon-core-v2-crds/templates/seldon-v2-crds.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,8 @@ spec:
type: string
type: array
stepsJoin:
description: One of inner (default), outer, or any (see above
for details)
type: string
tensorMap:
additionalProperties:
Expand All @@ -507,6 +509,8 @@ spec:
items:
properties:
batch:
description: Batch size of request required before data will
be sent to this step
properties:
rolling:
type: boolean
Expand All @@ -523,6 +527,11 @@ spec:
type: string
type: array
inputsJoinType:
description: 'One of inner (default), outer, or any inner -
do an inner join: data must be available from all inputs outer
- do an outer join: data will include any data from any inputs
at end of window any - first data input that arrives will
be forwarded'
type: string
joinWindowMs:
description: msecs to wait for messages from multiple inputs
Expand All @@ -544,6 +553,8 @@ spec:
type: string
type: array
triggersJoinType:
description: One of inner (default), outer, or any (see above
for details)
type: string
required:
- name
Expand Down
9 changes: 9 additions & 0 deletions k8s/helm-charts/seldon-core-v2-setup/templates/agent.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: seldon-agent
namespace: seldon-mesh
data:
agent.yaml: |-
rclone:
config_secrets: ["seldon-rclone-gs-public"]
20 changes: 20 additions & 0 deletions k8s/helm-charts/seldon-core-v2-setup/templates/kafka.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: seldon-kafka
namespace: seldon-mesh
data:
kafka.json: |-
{
"bootstrap.servers": "{{ .Values.kafka.bootstrap }}",
"consumer":{
"session.timeout.ms": {{ .Values.kafka.consumer.sessionTimeoutMs }},
"auto.offset.reset": "{{ .Values.kafka.consumer.autoOffsetReset }}"
},
"producer":{
"linger.ms": {{ .Values.kafka.producer.lingerMs }},
"message.max.bytes": {{ int .Values.kafka.producer.messageMaxBytes }}
},
"streams":{
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: v1
kind: Secret
metadata:
name: seldon-rclone-gs-public
namespace: seldon-mesh
type: Opaque
stringData:
gs: |
type: "google cloud storage"
name: gs
parameters:
anonymous: true
Original file line number Diff line number Diff line change
Expand Up @@ -405,35 +405,6 @@ subjects:
namespace: seldon-mesh
---
apiVersion: v1
data:
agent.yaml: "rclone: \n config_secrets: [\"seldon-rclone-gs-public\"]"
kind: ConfigMap
metadata:
name: seldon-agent
namespace: seldon-mesh
---
apiVersion: v1
data:
kafka.json: |-
{
"bootstrap.servers": "seldon-kafka-plain-bootstrap.kafka:9092",
"consumer":{
"session.timeout.ms":6000,
"auto.offset.reset":"earliest"
},
"producer":{
"linger.ms":0,
"message.max.bytes":1000000000
},
"streams":{
}
}
kind: ConfigMap
metadata:
name: seldon-kafka
namespace: seldon-mesh
---
apiVersion: v1
data:
controller_manager_config.yaml: |
apiVersion: controller-runtime.sigs.k8s.io/v1alpha1
Expand All @@ -453,19 +424,6 @@ metadata:
namespace: seldon-mesh
---
apiVersion: v1
kind: Secret
metadata:
name: seldon-rclone-gs-public
namespace: seldon-mesh
stringData:
gs: |
type: "google cloud storage"
name: gs
parameters:
anonymous: true
type: Opaque
---
apiVersion: v1
kind: Service
metadata:
labels:
Expand Down Expand Up @@ -650,14 +608,22 @@ spec:
- env:
- name: SELDON_CORES_COUNT
value: '{{ .Values.dataflow.cores }}'
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: '{{ .Values.opentelemetry.endpoint }}'
- name: SELDON_KAFKA_BOOTSTRAP_SERVERS
value: seldon-kafka-plain-bootstrap.kafka:9092
- name: SELDON_UPSTREAM_HOST
value: seldon-scheduler
- name: SELDON_UPSTREAM_PORT
value: "9008"
- name: OTEL_JAVAAGENT_ENABLED
valueFrom:
configMapKeyRef:
key: OTEL_JAVAAGENT_ENABLED
name: seldon-tracing
- name: OTEL_EXPORTER_OTLP_ENDPOINT
valueFrom:
configMapKeyRef:
key: OTEL_EXPORTER_OTLP_ENDPOINT
name: seldon-tracing
image: '{{ .Values.dataflow.image.registry }}/{{ .Values.dataflow.image.repository
}}:{{ .Values.dataflow.image.tag }}'
imagePullPolicy: '{{ .Values.dataflow.image.pullPolicy }}'
Expand Down Expand Up @@ -724,7 +690,8 @@ spec:
- --scheduler-port=9004
- --envoy-host=seldon-mesh
- --envoy-port=80
- --config-path=/mnt/config/kafka.json
- --kafka-config-path=/mnt/kafka/kafka.json
- --tracing-config-path=/mnt/tracing/tracing.json
command:
- /bin/modelgateway
env:
Expand All @@ -739,16 +706,21 @@ spec:
cpu: '{{ .Values.modelgateway.resources.requests.cpu }}'
memory: '{{ .Values.modelgateway.resources.requests.memory }}'
volumeMounts:
- mountPath: /mnt/config
name: config-volume
- mountPath: /mnt/kafka
name: kafka-config-volume
- mountPath: /mnt/tracing
name: tracing-config-volume
securityContext:
runAsUser: 8888
serviceAccountName: seldon-scheduler
terminationGracePeriodSeconds: 5
volumes:
- configMap:
name: seldon-kafka
name: config-volume
name: kafka-config-volume
- configMap:
name: seldon-tracing
name: tracing-config-volume
---
apiVersion: apps/v1
kind: Deployment
Expand All @@ -772,12 +744,10 @@ spec:
- --http-port=9010
- --grpc-port=9011
- --metrics-port=9006
- --config-path=/mnt/config/kafka.json
- --kafka-config-path=/mnt/kafka/kafka.json
- --tracing-config-path=/mnt/tracing/tracing.json
command:
- /bin/pipelinegateway
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: '{{ .Values.opentelemetry.endpoint }}'
image: '{{ .Values.pipelinegateway.image.registry }}/{{ .Values.pipelinegateway.image.repository
}}:{{ .Values.pipelinegateway.image.tag }}'
imagePullPolicy: '{{ .Values.pipelinegateway.image.pullPolicy }}'
Expand All @@ -797,16 +767,21 @@ spec:
cpu: '{{ .Values.pipelinegateway.resources.requests.cpu }}'
memory: '{{ .Values.pipelinegateway.resources.requests.memory }}'
volumeMounts:
- mountPath: /mnt/config
name: config-volume
- mountPath: /mnt/kafka
name: kafka-config-volume
- mountPath: /mnt/tracing
name: tracing-config-volume
securityContext:
runAsUser: 8888
serviceAccountName: seldon-scheduler
terminationGracePeriodSeconds: 5
volumes:
- configMap:
name: seldon-kafka
name: config-volume
name: kafka-config-volume
- configMap:
name: seldon-tracing
name: tracing-config-volume
---
apiVersion: apps/v1
kind: StatefulSet
Expand All @@ -829,6 +804,7 @@ spec:
containers:
- args:
- --pipeline-gateway-host=seldon-pipelinegateway
- --tracing-config-path=/mnt/tracing/tracing.json
- --pipeline-db-path=/mnt/scheduler/pipelinedb
command:
- /bin/scheduler
Expand All @@ -849,12 +825,18 @@ spec:
cpu: 100m
memory: 200Mi
volumeMounts:
- mountPath: /mnt/tracing
name: tracing-config-volume
- mountPath: /mnt/scheduler
name: scheduler-state
securityContext:
runAsUser: 8888
serviceAccountName: seldon-scheduler
terminationGracePeriodSeconds: 5
volumes:
- configMap:
name: seldon-tracing
name: tracing-config-volume
volumeClaimTemplates:
- metadata:
name: scheduler-state
Expand Down Expand Up @@ -892,13 +874,15 @@ spec:
volumeMounts:
- mountPath: /mnt/agent
name: mlserver-models
- env:
- args:
- --tracing-config-path=/mnt/tracing/tracing.json
command:
- /bin/agent
env:
- name: SELDON_SERVER_CAPABILITIES
value: '{{ .Values.serverConfig.mlserver.serverCapabilities }}'
- name: SELDON_OVERCOMMIT_PERCENTAGE
value: '{{ .Values.serverConfig.mlserver.overcommitPercentage }}'
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: '{{ .Values.opentelemetry.endpoint }}'
- name: SELDON_SERVER_HTTP_PORT
value: "9000"
- name: SELDON_SERVER_GRPC_PORT
Expand Down Expand Up @@ -943,6 +927,8 @@ spec:
name: mlserver-models
- mountPath: /mnt/config
name: config-volume
- mountPath: /mnt/tracing
name: tracing-config-volume
- env:
- name: MLSERVER_HTTP_PORT
value: "9000"
Expand Down Expand Up @@ -1006,6 +992,9 @@ spec:
- configMap:
name: seldon-agent
name: config-volume
- configMap:
name: seldon-tracing
name: tracing-config-volume
volumeClaimTemplates:
- name: mlserver-models
spec:
Expand Down Expand Up @@ -1042,13 +1031,15 @@ spec:
volumeMounts:
- mountPath: /mnt/agent
name: triton-models
- env:
- args:
- --tracing-config-path=/mnt/tracing/tracing.json
command:
- /bin/agent
env:
- name: SELDON_SERVER_CAPABILITIES
value: '{{ .Values.serverConfig.triton.serverCapabilities }}'
- name: SELDON_OVERCOMMIT_PERCENTAGE
value: '{{ .Values.serverConfig.triton.overcommitPercentage }}'
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: '{{ .Values.opentelemetry.endpoint }}'
- name: SELDON_SERVER_HTTP_PORT
value: "9000"
- name: SELDON_SERVER_GRPC_PORT
Expand All @@ -1065,6 +1056,8 @@ spec:
value: "9006"
- name: SELDON_SERVER_TYPE
value: triton
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: seldon-collector.seldon-mesh:4317
- name: POD_NAME
valueFrom:
fieldRef:
Expand Down Expand Up @@ -1093,6 +1086,8 @@ spec:
name: triton-models
- mountPath: /mnt/config
name: config-volume
- mountPath: /mnt/tracing
name: tracing-config-volume
- args:
- --model-repository=$(SERVER_MODELS_DIR)
- --http-port=$(SERVER_HTTP_PORT)
Expand Down Expand Up @@ -1155,6 +1150,9 @@ spec:
- configMap:
name: seldon-agent
name: config-volume
- configMap:
name: seldon-tracing
name: tracing-config-volume
volumeClaimTemplates:
- name: triton-models
spec:
Expand Down
Loading

0 comments on commit e601cae

Please sign in to comment.