Skip to content

Conversation

@deadlycoconuts
Copy link
Collaborator

@deadlycoconuts deadlycoconuts commented Apr 1, 2022

Context

In PR #176, a change was introduced to merge the headers of the original request sent to Turing routers with that of the response returned from an enricher. The change in this PR complements the change previously made by introducing logging of the response headers of each response received by a Turing Router.

Modifications

  • engines/router/missionctl/handlers/logutils.go - Addition of a step to turn response headers into strings to be logged
  • engines/router/missionctl/log/resultlog/proto/turing/TuringResultLog.pb.go - Changes to the generated proto classes
  • engines/router/missionctl/log/resultlog/proto/turing/TuringResultLog.proto - Changes to the proto to include response headers
  • engines/router/missionctl/log/resultlog/resultlog.go - Changes to the AddResponse method to include response headers as an argument

@deadlycoconuts deadlycoconuts self-assigned this Apr 1, 2022
@deadlycoconuts deadlycoconuts requested a review from a team April 4, 2022 03:46
@deadlycoconuts deadlycoconuts marked this pull request as ready for review April 4, 2022 03:54
@deadlycoconuts deadlycoconuts marked this pull request as draft April 4, 2022 07:10
string response = 1;
// The JSON response body from a Turing component (Enricher / Experiment Engine / Router / Ensembler),
// UTF-8-encoded.
string body = 1;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed response as body because the addition of the new field header would create some ambiguity as to what response would really mean.

Copy link
Collaborator

@krithika369 krithika369 Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point. But unfortunately, this will be a breaking change, not for the proto decoding itself, but rather, the downstream workflows that depend on it. For example, in most internal use cases, the Kafka messages are forwarded to BQ and the field name change would result in a change in the table as well.

Similarly, our own BQ logger uses the schema generated from this proto. This means, the next time users update their routers, they have to carry out a schema migration on the existing table as the only thing we handle automatically is addition of new columns (ref).

We can look into a schema migration if required (which we have done once in the past) but it would need the users to take actions to migrate their existing data / scripts as well. So we should avoid it unless critical.

Considering all of the above, I think header, response is not too outlandish and we can live with it.

Copy link
Collaborator

@krithika369 krithika369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. It is marked as Draft but I took a quick look anyway and left some comments.

The changes themselves look good. The biggest factor to consider is, how they would play with the existing logging set up - both BQ and Kafka. For example, if someone has a router deployed that's logging to BQ and they update the router, is the new field added to each of the components' records and are they written correctly? These are unit-tested but would be nice to carry out some e2e test as well before merging this PR, locally.

string response = 1;
// The JSON response body from a Turing component (Enricher / Experiment Engine / Router / Ensembler),
// UTF-8-encoded.
string body = 1;
Copy link
Collaborator

@krithika369 krithika369 Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a valid point. But unfortunately, this will be a breaking change, not for the proto decoding itself, but rather, the downstream workflows that depend on it. For example, in most internal use cases, the Kafka messages are forwarded to BQ and the field name change would result in a change in the table as well.

Similarly, our own BQ logger uses the schema generated from this proto. This means, the next time users update their routers, they have to carry out a schema migration on the existing table as the only thing we handle automatically is addition of new columns (ref).

We can look into a schema migration if required (which we have done once in the past) but it would need the users to take actions to migrate their existing data / scripts as well. So we should avoid it unless critical.

Considering all of the above, I think header, response is not too outlandish and we can live with it.

_, turingLogEntry := makeTestTuringResultLogEntry(t)
// Overwrite the turing request id value
turingLogEntry.TuringReqId = "testID"
// Manually remove the key-value pair corresponding to "Content-Type" in the router and enricher header to prevent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate more on this non-deterministic behavior? What changes? And why specifically Content-Encoding ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's essentially when the Header field of turingLogEntry, which is an (unordered) map of strings, get encoded, the order in which the items in Header get serialised is not guaranteed to always be the same, leading to some unit tests failing and not others. Since the default test log entry Header has 2 keys, one for Content-Encoding and the other for Content-Type, I simply removed removed one of them to ensure that the output's always the same.

Another option is to create another test log entry for this kafka test without using makeTestTuringResultLogEntry, or to make makeTestTuringResultLogEntry use only 1 item for the Header field, which would also simplify the other tests using it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.. In this case, what do you think about decoding the message using the proto and then using assert.JSONEq (as we do in TestNewJSONKafkaLogEntry) to compare?

Copy link
Collaborator Author

@deadlycoconuts deadlycoconuts Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's possible, but I'm not sure if the intention of the TestNewProtobufKafkaLogEntry test was to test the byte encoding of the kafka log entry, as opposed to TestNewJSONKafkaLogEntry which seems to be testing for the JSON output. Would using assert.JSONEq defeat the purpose of the other test?

Edit: Okay maybe not, it's do-able I guess.

@deadlycoconuts deadlycoconuts marked this pull request as ready for review April 5, 2022 02:55
kvPairs["experiment"] = formatBQLogEntryResponse(e.TuringResultLogEntry.Experiment)
kvPairs["enricher"] = formatBQLogEntryResponse(e.TuringResultLogEntry.Enricher)
kvPairs["router"] = formatBQLogEntryResponse(e.TuringResultLogEntry.Router)
kvPairs["ensembler"] = formatBQLogEntryResponse(e.TuringResultLogEntry.Ensembler)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added these additional lines after realising that the headers in the response need to be specially handled before being pushed to BigQuery tables. There's a similar step done for the request headers above.

Copy link
Collaborator

@krithika369 krithika369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

@deadlycoconuts
Copy link
Collaborator Author

Thanks for the comments! Merging it now! 🚀

@deadlycoconuts deadlycoconuts merged commit c5df96b into caraml-dev:main Apr 6, 2022
krithika369 added a commit that referenced this pull request Jun 6, 2022
* Update workflows to use Python 3.7 for ensembler engines and sdk (#190)

* Update workflows to use Python 3.7 for ensembler engines and sdk

* Add none return option for config in Router SDK class

* Update text display settings to display entire image name

* Set container image name to overflow instead of being truncated

* Include response headers in logs (#191)

* Update proto and classes

* Add response headers and refactor naming of response bodies

* Refactor logging methods

* Fix tests

* Fix kafka tests

* Fix kafka tests

* Fix line breaks

* Fix line breaks

* Fix line breaks

* Fix typo in error message

* Refactor response fields

* Refactor response headers as map of strings

* Refactor tests to use map of strings to represent headers

* Fix non deterministic serialisation of hashmap in tests

* Refactor log handler

* Remove debug statement

* Refactor HTTP header formatting into a helper function

* Fix kafka protobuf test to use JSON as means of comparison

* Rename body in proto to response to avoid breaking changes

* Refactor all tests to use response instead of body to refer to request body

* Fix lint import suggestion

* Remove debug statements

* Rename variables in http header formatting helper function

* Fix BQ marshalling issues

* Fix lint import suggestion

* Fix lint comments

* Minor fixes for experiment engine configs in the helm chart (#193)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Nop Ensembler Config (#192)

* UI changes for nop ensembler config

* Make handling of router / version status consistent

* SDK changes for default route

* Correct the default route id in unit tests

* Add tests for the nop ensembler config

* Update sample code and doc

* Add PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* [BUG] Fix undeploy router error (#194)

* Fix bugs in sample SDK script

* Refactor UndeployRouterVersion to take in cleanup flag

* Update DeploymentService mock class

* Fix bug in deployment controller test

* Refactor if else blocks

* Add tests for IsKnativeServiceInNamespace and IsSecretinNamespace

* Rename test method to make it consistent with the method tested

* Refactor k8s deletion methods into separate methods

* Fix bug in deleting deployments and services

* Fix typo in router name

* Rename default route from nothing to control

* Make undeploying a pending router status a cleanup job

* Refactor code to use ignoreNotFound flag

* Fix go mod file

* Bugfix: Turing API should process experiment engine passkey only if client selection enabled (#196)

* Bugfix: Passkey should not be processed if client selection disabled

* Update hardcoded sample plugin to use experiment variables, consistent with the runner

* Update RPC plugin example and docs

* Correct numbering in doc and plugin name change

* Add debug message

* Update Deployment controller to consider if client selection enabled

* Add another unit test case for TestIsClientSelectionEnabled

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add Fallback Response Route Config for Standard Ensemblers (#197)

* UI changes for standard ensembler Fallback response

* Add validation for fallback resonse route id

* Update docs for the Standard Ensembler config

* Routing stragey changes for default route handling

* SDK changes for fallback response route id

* Amend user docs

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Remove the default route configuration (#198)

* Remove unused default route property

* Router: Make the default_route_id only required for DefaultTuringRoutingStrategy

* Make Default Route ID optional for the Turing Router creater / update API

* Update e2e tests

* UI: Update router view / edit to stop handling default route explicitly

* UI: Exclude routes with traffic rules in the final/fallback response options

* SDK: deprecate the default_route_id config

* SDK: Remove default route id from samples

* Update user docs

* Update OpenAPI bundle

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Adding SDK support for Python 3.8 and 3.9 (#199)

* Update SDK / engines to support multiple Python versions

* Pin cloudpickle at 2.0.0

* Introduce Python Version on the Pyfunc ensembler config

* Update SDK unit tests

* Update Github workflows

* Update chart values

* Update docs, unit tests

* Update SDK CI workflows to test on all Python versions

* Pin protobuf version at 3.20.1

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Bugfix: Clear default route Id from custom ensembler configs (#200)

* Miscellaneous bug fixes

* Add unit tests for the SDK changes, address PR comments

* Bugfix: Regression in display of container configs for Pyfunc

* Add type annotation to class methods

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Update chart version for app release (#201)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add dynamic loading of Experiment Engine config (#202)

* Add dynamic loading of exp engine config

* Address PR comments

* Add useEffect rerender

* Address PR comments

* Simplify conditional logic

* Attempt to fix yarn install error

Co-authored-by: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>
Co-authored-by: Terence Lim <terencelimxp@gmail.com>
krithika369 added a commit that referenced this pull request Jun 9, 2022
* Upgrade Go version/Knative/Spark Operator/Kubernetes Client/Docker Compose for dev environment. (#183)

* Bump versions for all k8s based libs

* Use proper context for each scenario.

* Upgrade virtual service version.

* Update k3d version to kubernetes 1.22.

* Upgrade Go to 1.16.

* Fix linting errors.

* Update k3d flags.

* Upgrade knative/istio/spark-operator versions in cluster init.

* Update default versions in cluster-init, change e2e test to new api

* Fail fast if default environment is wrong. Extra logging.

* Fix unit tests.

* Addressed PR comments.

* Pull requests to be run on any target branch.

* Upgrade to go 1.18, upgrade linter.

* Upgrade experiment and router to go 1.18.

* Update PR comments.

* Parameterise Go and Go Linter Versions.

* Update documentation with new versions and ports.

* Merge from main -> knative-upgrade branch (#203)

* Update workflows to use Python 3.7 for ensembler engines and sdk (#190)

* Update workflows to use Python 3.7 for ensembler engines and sdk

* Add none return option for config in Router SDK class

* Update text display settings to display entire image name

* Set container image name to overflow instead of being truncated

* Include response headers in logs (#191)

* Update proto and classes

* Add response headers and refactor naming of response bodies

* Refactor logging methods

* Fix tests

* Fix kafka tests

* Fix kafka tests

* Fix line breaks

* Fix line breaks

* Fix line breaks

* Fix typo in error message

* Refactor response fields

* Refactor response headers as map of strings

* Refactor tests to use map of strings to represent headers

* Fix non deterministic serialisation of hashmap in tests

* Refactor log handler

* Remove debug statement

* Refactor HTTP header formatting into a helper function

* Fix kafka protobuf test to use JSON as means of comparison

* Rename body in proto to response to avoid breaking changes

* Refactor all tests to use response instead of body to refer to request body

* Fix lint import suggestion

* Remove debug statements

* Rename variables in http header formatting helper function

* Fix BQ marshalling issues

* Fix lint import suggestion

* Fix lint comments

* Minor fixes for experiment engine configs in the helm chart (#193)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Nop Ensembler Config (#192)

* UI changes for nop ensembler config

* Make handling of router / version status consistent

* SDK changes for default route

* Correct the default route id in unit tests

* Add tests for the nop ensembler config

* Update sample code and doc

* Add PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* [BUG] Fix undeploy router error (#194)

* Fix bugs in sample SDK script

* Refactor UndeployRouterVersion to take in cleanup flag

* Update DeploymentService mock class

* Fix bug in deployment controller test

* Refactor if else blocks

* Add tests for IsKnativeServiceInNamespace and IsSecretinNamespace

* Rename test method to make it consistent with the method tested

* Refactor k8s deletion methods into separate methods

* Fix bug in deleting deployments and services

* Fix typo in router name

* Rename default route from nothing to control

* Make undeploying a pending router status a cleanup job

* Refactor code to use ignoreNotFound flag

* Fix go mod file

* Bugfix: Turing API should process experiment engine passkey only if client selection enabled (#196)

* Bugfix: Passkey should not be processed if client selection disabled

* Update hardcoded sample plugin to use experiment variables, consistent with the runner

* Update RPC plugin example and docs

* Correct numbering in doc and plugin name change

* Add debug message

* Update Deployment controller to consider if client selection enabled

* Add another unit test case for TestIsClientSelectionEnabled

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add Fallback Response Route Config for Standard Ensemblers (#197)

* UI changes for standard ensembler Fallback response

* Add validation for fallback resonse route id

* Update docs for the Standard Ensembler config

* Routing stragey changes for default route handling

* SDK changes for fallback response route id

* Amend user docs

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Remove the default route configuration (#198)

* Remove unused default route property

* Router: Make the default_route_id only required for DefaultTuringRoutingStrategy

* Make Default Route ID optional for the Turing Router creater / update API

* Update e2e tests

* UI: Update router view / edit to stop handling default route explicitly

* UI: Exclude routes with traffic rules in the final/fallback response options

* SDK: deprecate the default_route_id config

* SDK: Remove default route id from samples

* Update user docs

* Update OpenAPI bundle

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Adding SDK support for Python 3.8 and 3.9 (#199)

* Update SDK / engines to support multiple Python versions

* Pin cloudpickle at 2.0.0

* Introduce Python Version on the Pyfunc ensembler config

* Update SDK unit tests

* Update Github workflows

* Update chart values

* Update docs, unit tests

* Update SDK CI workflows to test on all Python versions

* Pin protobuf version at 3.20.1

* Address PR comments

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Bugfix: Clear default route Id from custom ensembler configs (#200)

* Miscellaneous bug fixes

* Add unit tests for the SDK changes, address PR comments

* Bugfix: Regression in display of container configs for Pyfunc

* Add type annotation to class methods

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Update chart version for app release (#201)

Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>

* Add dynamic loading of Experiment Engine config (#202)

* Add dynamic loading of exp engine config

* Address PR comments

* Add useEffect rerender

* Address PR comments

* Simplify conditional logic

* Attempt to fix yarn install error

Co-authored-by: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>
Co-authored-by: Terence Lim <terencelimxp@gmail.com>

* Update CI specs

* Revert UI changes during merge

* Update CI specs

* Update e2e deployment timeout

* Remove WIP inline comment

Co-authored-by: Ashwin <ashwinath@hotmail.com>
Co-authored-by: Ewe Zi Yi <36802364+deadlycoconuts@users.noreply.github.com>
Co-authored-by: Krithika Sundararajan <krithika.sundararajan@go-jek.com>
Co-authored-by: Terence Lim <terencelimxp@gmail.com>
@deadlycoconuts deadlycoconuts deleted the include_response_headers_in_logs branch August 5, 2022 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants