Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka auth test fails with dropped instead of stalled #23469

Closed
def- opened this issue Nov 27, 2023 · 9 comments
Closed

Kafka auth test fails with dropped instead of stalled #23469

def- opened this issue Nov 27, 2023 · 9 comments
Assignees
Labels
C-bug Category: something is broken ci-flake

Comments

@def-
Copy link
Contributor

def- commented Nov 27, 2023

What version of Materialize are you using?

0716439

What is the issue?

Seen in https://buildkite.com/materialize/tests/builds/69712#018c11e1-c004-4451-bbaf-dc16d421a8be

> ALTER CONNECTION kafka_backup SET (BROKER 'kafka:9093') WITH (VALIDATE = false);
rows match; continuing at ts 1701108752.3339665
Ingesting data into Kafka topic testdrive-avro-data-3510298630 with start_iteration = 0, repeat = 1
> SELECT * FROM avro_data
rows didn't match; sleeping to see if dataflow catches up 50ms 75ms 113ms 169ms 253ms 380ms
rows match; continuing at ts 1701108753.6897357
> SELECT status FROM mz_internal.mz_sink_statuses WHERE name = 'snk_backup';
rows didn't match; sleeping to see if dataflow catches up 50ms 75ms 113ms 169ms 253ms 380ms 570ms 854ms 1s 2s 3s 4s 6s 10s 15s 22s 33s 22s
^^^ +++
test-schema-registry-ssl-basic.td:129:1: error: non-matching rows: expected:
[["stalled"]]
got:
[["dropped"]]
Poor diff:
+ dropped
- stalled

@sploiselle Could this be related to your alter connection changes?

@def- def- added the C-bug Category: something is broken label Nov 27, 2023
@sploiselle
Copy link
Contributor

I suspect this is more likely related to #23222, cc'ing @guswynn but I'll take a look

@guswynn
Copy link
Contributor

guswynn commented Nov 27, 2023

@sploiselle this was found by @def- before that pr merged :(

even so, i dont see the codepath that would call healthcheck::drop_sinks when altering a sink connection...

@sploiselle
Copy link
Contributor

@guswynn PTAL at my understanding:

@def-
Copy link
Contributor Author

def- commented Nov 27, 2023

Sorry for the incomplete information when opening this issue. I saw this previously on main before @guswynn 's change: https://buildkite.com/materialize/tests/builds/69697#018c11a7-fd87-4ad8-972e-6ca1c2134d20

@sploiselle
Copy link
Contributor

I just ran test-schema-registry-basic.td test for 6 hours in a loop without hitting this issue. Seems like it might need a broader set of tests to run to trigger.

@def-
Copy link
Contributor Author

def- commented Nov 28, 2023

I reproduced it locally in a few minutes with

while true; do bin/mzcompose --find kafka-auth down && bin/mzcompose --find kafka-auth run default || break; done

@def- def- added the ci-flake label Nov 28, 2023
getong pushed a commit to getong/materialize that referenced this issue Nov 28, 2023
@sploiselle
Copy link
Contributor

This might also be #23450

@sploiselle
Copy link
Contributor

This is probably #23507

@sploiselle
Copy link
Contributor

sploiselle commented Nov 30, 2023

This should be fixed as of #23528

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: something is broken ci-flake
Projects
None yet
Development

No branches or pull requests

3 participants