Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Merge changes from v2 to release 2.8 branch #5379

Merged
merged 14 commits into from Feb 27, 2024
Merged

ci: Merge changes from v2 to release 2.8 branch #5379

merged 14 commits into from Feb 27, 2024

Conversation

sakoush
Copy link
Member

@sakoush sakoush commented Feb 27, 2024

Changes for release 2.8 rc2

lc525 and others added 14 commits February 20, 2024 13:30
* bump(librdkafka): from v1.9.2 to v2.3.0

- update needed for automatic re-authentication of kafka connections via
SASL (used when connecting to Confluent Cloud via OAUTHBEARER)

* fix(librdkafka): Copy certs bundle to path expected by librdkafka v2.3.0

librdkafka uses libcurl for fetching OIDC tokens. However, depending on OS, the
location where the certs ca-bundle for validating the tls connection between
librdkafka and the auth provider is searched differs. We now copy the ca cert
bundle to multiple paths to make sure it is found.

While this is a bit hacky, there are limited alternatives we can take because:

- until now (with librdkafka 1.9.2) those certs were searched under
/etc/ssl/certs/ca-certificates.crt but after the update they are searched
under /etc/pki/tls/certs/ca-bundle.crt

- libcurl doesn't get passed/doesn't respect the caCert location
or .pem file set via librdkafka/kafka configs
(see confluentinc/librdkafka#375).

- there is no way clear way to influence the libcurl search path via env vars
(setting the CURL_CA_BUNDLE env var has no effect), with search paths fully
decided at compile-time.

The likely chain of reasons this has popped-up when updating librdkafka to
v2.3.0:
  - we're building {model, pipeline}gateway docker images based on debian
  bullseye image
  - we're copying the build results into redhat ubi9 container
  - libcurl was statically built in previous librdkafka, so was
  using the debian search paths on ubi9
  - libcurl ends up being dynamically linked in the current librdkafka
  (to ubi9 version) or has changed its default ca bundle search path
* send server statuses on controller connect

* add error message on server notify err

* add logs on server notify memory

* review comments
Bumps grafana/grafana from 10.3.1 to 10.3.3.

---
updated-dependencies:
- dependency-name: grafana/grafana
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.61.0 to 1.61.1.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](grpc/grpc-go@v1.61.0...v1.61.1)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…up (#5353)

* set default OTEL_EXPORTER_OTLP_PROTOCOL in compose setup

* go mod tidy operator
Bumps [github.com/go-playground/validator/v10](https://github.com/go-playground/validator) from 10.17.0 to 10.18.0.
- [Release notes](https://github.com/go-playground/validator/releases)
- [Commits](go-playground/validator@v10.17.0...v10.18.0)

---
updated-dependencies:
- dependency-name: github.com/go-playground/validator/v10
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…5358)

We need better error handling and reporting across dataflow, together with 
appropriately cleaning the kafka streams state once a pipeline reaches an
error state.

Because we're now continuously retrying the subscription to the scheduler 
when the subscription terminates, not cleaning the kafka streams state of
failed pipelines may lead to the inability to re-create the same pipeline, even
after deletion.

This commit also introduces a new way of handling errors and the changes
in pipeline status. It moves us towards being able to inspect the status of the
underlying kafka streams from outside the Pipeline object. 

The end-goal (not archieved here) is to react to pipelines reaching error states
during their operation, rather than just when they are created for the first time 
(what's currently implemented). 

However, the changes here do get us to a point where the pipeline creation status,
including possible errors/exceptions are reported to the scheduler and can show 
up either in the seldon CLI or in k8s.

Instead of propagating exceptions, the idea is to return and handle common errors
in a more similar way to golang.

**Fixed issues**:

- Fixes #INFRA-755 (internal): Exceptions in pipeline creation/deletion are 
  unrecoverable because of uncleaned Kafka Streams state
- Progress on #INFRA-617 (internal): Investigate high priority unhandled errors
- Progress on #INFRA-648 (internal): Handle dataflow [...] exceptions and present 
  meaningful error messages
significant updates in this pack:
- kafka-streams from 3.4.0 to confluent's 7.6.0-ccs
* fix(dataflow) wait for kafka topic creation

Kafka topic creation happens asynchronously. This means that even when
the return value from `createTopics(...)` indicates that the topic has
been created successfully, the topic can not be immediately subscribed
to.

Instead of verifying the status of the topic from the `createTopics`
return value, here we're repeatedly calling `describeTopics` until
all of the topics for the pipeline can be described successfully.
This indicates that the topic has been fully created _at least_ on
one broker, and can now be subscribed to.

**Fixed issues**:
- Fixes dataflow component for #INFRA-663 (internal): Pipeline creation goes into ERROR state
* fix(gateway) wait for kafka topic creation

Kafka topic creation happens asynchronously. This means that even when the
return value from createTopics(...) indicates that the topic has been created
successfuly, the topic can not be immediately subscribed to.

We retry DescribeTopics until all of the topics for the pipeline can be
described successfully. This indicates that the topic has been fully created at
least on one broker, and can now be subscribed to.

Minor changes:
- timeout config now in gateway/constants.go
- tidying-up of error reporting

Which issue(s) this PR fixes:
Fixes gateway component for #INFRA-663 (internal): Pipeline creation goes into ERROR state
@sakoush sakoush changed the title Merge changes from v2 to release 2.8 branch ci: Merge changes from v2 to release 2.8 branch Feb 27, 2024
Copy link
Member

@lc525 lc525 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sakoush sakoush merged commit e643a80 into release-2.8 Feb 27, 2024
17 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants