Skip to content

Commit

Permalink
add DEGRADED state as a goal
Browse files Browse the repository at this point in the history
  • Loading branch information
aljoscha committed Aug 2, 2021
1 parent 198c611 commit e096cab
Showing 1 changed file with 4 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,9 @@ The concrete steps to achieve this, in the order we should address them:
happens when restoring the catalog and looks to the sink like it's just
being created. Normally, when creating a sink, failures should cancel the
statement and not add the sink to the catalog but we cannot do this here.
- Report a `DEGRADED` state in system tables when a connector is in a
temporarily degraded state. For example when `rdkafka` can't reach a broker
but just sits and waits for them to come back online.

#### Dealing with transitive failures

Expand All @@ -145,8 +148,6 @@ has failed.
- Keep entries of dropped/failed/whatever connector in a
`mz_connector_history` view. Or keep dropped connector in `mz_connector` for
a while.
- Report a `DEGRADED` state, for example when `rdkafka` is in a state where it
can't reach a broker but just sits and waits for them to come back online.

## Alternatives

Expand All @@ -160,14 +161,7 @@ specific.

## Open Questions

Some SDKs, for example `rdkafka`, will happily sit and not report any errors
when brokers can't be reached from the consumer. Once the broker is available
again they will start consuming again. At least that's the case in our setup,
with split consumer keys. And the consumer will log errors still.

We can be fine with that, or also try and cover this with a timeout and try to
surface errors and change the sink state. I think it's fine to just let the
consumer continue do its thing because the failures are likely ephemeral.
There were some, which lead to the addition of a `DEGRADED` state above.

## References

Expand Down

0 comments on commit e096cab

Please sign in to comment.