Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky python unit test test_publish_compact_and_consume #2321

Closed
merlimat opened this issue Aug 6, 2018 · 3 comments · Fixed by #5448
Closed

Flaky python unit test test_publish_compact_and_consume #2321

merlimat opened this issue Aug 6, 2018 · 3 comments · Fixed by #5448
Labels
area/test help wanted type/bug The PR fixed a bug or issue reported a bug

Comments

@merlimat
Copy link
Contributor

merlimat commented Aug 6, 2018

Expected behavior

Seen failure at : https://builds.apache.org/job/pulsar_precommit_cpp/2951/console

======================================================================
FAIL: test_publish_compact_and_consume (__main__.PulsarTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "pulsar_test.py", line 557, in test_publish_compact_and_consume
    self.assertEqual(msg0.data(), b'hello-1')
AssertionError: 'hello-0' != 'hello-1'

----------------------------------------------------------------------
2018-08-06 22:34:41.901 INFO  HandlerBase:53 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-producer-name] Getting connection from pool
2018-08-06 22:34:42.058 INFO  ProducerImpl:154 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-producer-name] Created producer on broker [127.0.0.1:55834 -> 127.0.0.1:8885] 
2018-08-06 22:34:42.060 INFO  HandlerBase:53 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-sub1, 0] Getting connection from pool
2018-08-06 22:34:42.133 INFO  ConsumerImpl:168 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-sub1, 0] Created consumer on broker [127.0.0.1:55834 -> 127.0.0.1:8885] 
2018-08-06 22:34:42.134 INFO  ConsumerImpl:761 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-sub1, 0] Closed consumer 0
2018-08-06 22:34:42.135 INFO  HandlerBase:53 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-sub2, 1] Getting connection from pool
2018-08-06 22:34:42.207 INFO  ConsumerImpl:168 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-sub2, 1] Created consumer on broker [127.0.0.1:55834 -> 127.0.0.1:8885] 
2018-08-06 22:34:42.265 INFO  ProducerImpl:466 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-producer-name] Closed producer
Compact still running
{"status":"RUNNING","lastError":""}
Compact still running
{"status":"RUNNING","lastError":""}
Compact Complete now
{"status":"SUCCESS","lastError":""}
2018-08-06 22:34:42.817 INFO  HandlerBase:53 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-sub1, 2] Getting connection from pool
2018-08-06 22:34:42.818 INFO  ConsumerImpl:168 | [persistent://sample/standalone/ns1/my-python-test_publish_compact_and_consume, my-sub1, 2] Created consumer on broker [127.0.0.1:55834 -> 127.0.0.1:8885] 
@merlimat
Copy link
Contributor Author

merlimat commented Feb 7, 2019

@merlimat
Copy link
Contributor Author

merlimat commented Feb 8, 2019

Tried to run 100s of times in a loop within a Docker container on mac laptop and was not able to reproduce so far..

@merlimat merlimat modified the milestones: 2.3.0, 2.4.0 Feb 14, 2019
@sijie sijie removed this from the 2.4.0 milestone Jun 9, 2019
ivankelly added a commit to ivankelly/pulsar that referenced this issue Oct 23, 2019
After compaction completes the compacted ledger is recorded as a
property of a cursor. As persisting the cursor is async and we don't
wait for the acknowledgement of the acknowledgement, there may be a
race if we try to read the compacted ledger immediately.

To avoid this race, I've added a 1 second sleep to allow the compacted
ledger to be updated on the broker.
@ivankelly
Copy link
Contributor

For the record I only managed one natural repro of this (with stress-ng --aggressive --class scheduler --all 5). It appears to be a very narrow race. It can be widened by changing the thenAccept to thenAcceptAsync at

future.thenAccept((v) -> compactedTopic.newCompactedLedger(position, compactedLedgerId));
, and then putting a sleep in the callback.

merlimat pushed a commit that referenced this issue Oct 23, 2019
After compaction completes the compacted ledger is recorded as a
property of a cursor. As persisting the cursor is async and we don't
wait for the acknowledgement of the acknowledgement, there may be a
race if we try to read the compacted ledger immediately.

To avoid this race, I've added a 1 second sleep to allow the compacted
ledger to be updated on the broker.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test help wanted type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants