Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][schema] Fix creating lots of versions for the same schema #18293

Closed
wants to merge 1 commit into from

Conversation

aloyszhang
Copy link
Contributor

@aloyszhang aloyszhang commented Nov 2, 2022

Fixes #18292

Motivation

fix creating lots of ledgers for the same schema

Modifications

when putting schema to zk failed, do not retry directly, but read the latest persistent schema from zk first

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: aloyszhang#3

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Nov 2, 2022
@aloyszhang aloyszhang changed the title fix creating lots of ledgers for the same schema [fix][broker][schema]fix creating lots of ledgers for the same schema Nov 2, 2022
@codecov-commenter
Copy link

codecov-commenter commented Nov 3, 2022

Codecov Report

Merging #18293 (fe78ff2) into master (cee697b) will decrease coverage by 0.15%.
The diff coverage is 28.94%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #18293      +/-   ##
============================================
- Coverage     47.06%   46.91%   -0.16%     
- Complexity     8956    10428    +1472     
============================================
  Files           592      698     +106     
  Lines         56313    68339   +12026     
  Branches       5844     7342    +1498     
============================================
+ Hits          26503    32058    +5555     
- Misses        26946    32658    +5712     
- Partials       2864     3623     +759     
Flag Coverage Δ
unittests 46.91% <28.94%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...apache/bookkeeper/mledger/ManagedLedgerConfig.java 71.60% <0.00%> (ø)
.../org/apache/bookkeeper/mledger/impl/MetaStore.java 100.00% <ø> (ø)
...okkeeper/mledger/impl/ShadowManagedLedgerImpl.java 0.00% <0.00%> (ø)
...ava/org/apache/pulsar/broker/service/Consumer.java 69.21% <0.00%> (+7.11%) ⬆️
...service/schema/BookkeeperSchemaStorageFactory.java 100.00% <ø> (ø)
...ker/stats/prometheus/NamespaceStatsAggregator.java 0.00% <0.00%> (ø)
...va/org/apache/pulsar/client/impl/ProducerImpl.java 17.00% <0.00%> (-0.02%) ⬇️
...rg/apache/pulsar/broker/service/BrokerService.java 57.34% <20.00%> (+0.15%) ⬆️
...sar/broker/service/persistent/PersistentTopic.java 61.46% <20.00%> (-0.82%) ⬇️
...org/apache/bookkeeper/mledger/impl/OpAddEntry.java 64.21% <26.31%> (ø)
... and 186 more

@aloyszhang
Copy link
Contributor Author

@gaoran10 @codelipenghui PTAL

Copy link
Contributor

@congbobo184 congbobo184 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pr only guarantees the consistency of the zk storage schema, but it still creates useless ledger, right?

@aloyszhang
Copy link
Contributor Author

aloyszhang commented Nov 7, 2022

This pr only guarantees the consistency of the zk storage schema, but it still creates useless ledger, right?

@congbobo184 The schema will create a useless ledger at two points.

  • At the point of client connected the first time concurrently, all producers or consumers see no schema info and create a new schema ledger, and only onde ledger will be used at last
  • At the point of update zookeeper storage failed, the broker will create a new schema ledger repeatedly if the update zookeeper failed.

This PR tries to resolve the second problem.

I still working on the first problem and will send a new PR for it.

Copy link
Contributor

@gaoran10 gaoran10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Left some comments.

@labuladong
Copy link
Contributor

labuladong commented Nov 7, 2022

To my understanding, this PR canceled the retry action in putSchema, and add the retry action in putSchemaIfAbsent.

But I notice the putSchema method will be called by the deleteSchema method, so this PR removes the retry functionality of deleteSchema method too. Will that be OK?

@aloyszhang
Copy link
Contributor Author

aloyszhang commented Nov 7, 2022

To my understanding, this PR canceled the retry action in putSchema, and add the retry action in putSchemaIfAbsent.
But I notice the putSchema method will be called by the deleteSchema method, so this PR removes the retry functionality of deleteSchema method too. Will that be OK?

@labuladong As described in issue #18292, the retry function in putSchema is the root cause of creating lots of useless schema ledgers when the update zookeeper failed with AlreadyExistsException or BadVersionException. So, it should be removed.

BTW, the retry function is only useful when there is a race condition updating the zookeeper, and the command of delete schema is invoked from the CLI or PulsarAdmin which is rare to have a high concurrency. So, little chance for delete the schema will fail. However, putSchemaIfAbsent will be called when the producer or subscription is created. There is a high chance for a race condition to happen, so keep the retry function to avoid the producer or consumer from init failed.

Also, if there is a scene where keeping the retry function in putSchema is necessary, we can bring up a new issue and pull request.

@aloyszhang aloyszhang changed the title [fix][broker][schema]fix creating lots of ledgers for the same schema [fix][broker][schema]fix creating lots of versions for the same schema Nov 8, 2022
@AnonHxy AnonHxy changed the title [fix][broker][schema]fix creating lots of versions for the same schema [fix][schema] Fix creating lots of versions for the same schema Nov 8, 2022
@AnonHxy
Copy link
Contributor

AnonHxy commented Nov 8, 2022

LGTM

@aloyszhang aloyszhang force-pushed the schema-fix branch 2 times, most recently from 4576c48 to 69c0ce5 Compare November 10, 2022 04:34
Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried a test to verify we will not get duplicated schemas with this fix.
But looks like the issue hasn't been fixed completely

~/GitHub/apache/incubator-pulsar/distribution/server/target/apache-pulsar-2.11.0-SNAPSHOT (schema-fix*) » bin/pulsar-admin schemas get test --version 2                                            lipenghui@lipenghuideMacBook-Pro-2
{
  "name": "test",
  "schema": {
    "type": "record",
    "name": "User",
    "namespace": "io.streamnative.test.AvroConsumer",
    "fields": [
      {
        "name": "age",
        "type": "int"
      },
      {
        "name": "description",
        "type": [
          "null",
          "string"
        ]
      },
      {
        "name": "name",
        "type": [
          "null",
          "string"
        ]
      }
    ]
  },
  "type": "AVRO",
  "timestamp": 1668066392133,
  "properties": {
    "__alwaysAllowNull": "true",
    "__jsr310ConversionEnabled": "false"
  }
}
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/GitHub/apache/incubator-pulsar/distribution/server/target/apache-pulsar-2.11.0-SNAPSHOT (schema-fix*) » bin/pulsar-admin schemas get test --version 1                                            lipenghui@lipenghuideMacBook-Pro-2
{
  "name": "test",
  "schema": {
    "type": "record",
    "name": "User",
    "namespace": "io.streamnative.test.AvroConsumer",
    "fields": [
      {
        "name": "age",
        "type": "int"
      },
      {
        "name": "description",
        "type": [
          "null",
          "string"
        ]
      },
      {
        "name": "name",
        "type": [
          "null",
          "string"
        ]
      }
    ]
  },
  "type": "AVRO",
  "timestamp": 1668066392023,
  "properties": {
    "__alwaysAllowNull": "true",
    "__jsr310ConversionEnabled": "false"
  }
}
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
~/GitHub/apache/incubator-pulsar/distribution/server/target/apache-pulsar-2.11.0-SNAPSHOT (schema-fix*) » bin/pulsar-admin schemas get test --version 0                                            lipenghui@lipenghuideMacBook-Pro-2
{
  "name": "test",
  "schema": {
    "type": "record",
    "name": "User",
    "namespace": "io.streamnative.test.AvroConsumer",
    "fields": [
      {
        "name": "age",
        "type": "int"
      },
      {
        "name": "name",
        "type": [
          "null",
          "string"
        ]
      }
    ]
  },
  "type": "AVRO",
  "timestamp": 1668066223880,
  "properties": {
    "__alwaysAllowNull": "true",
    "__jsr310ConversionEnabled": "false"
  }
}

The schema version 1 and 2 is the same schema which was reproduced by creating 64 producers with the same schema.

The steps to reproduce

  • Create 64 consumers with a schema (Only one schema created)
  • Create 64 producers with a new schema (Two same schemas are created)

@aloyszhang
Copy link
Contributor Author

aloyszhang commented Nov 11, 2022

image

@codelipenghui This is another problem. The root cause is the concurrent access to zookeeper.

Before updating the schemaLocator on zk, it reads zk again. The purpose is to obtain the content of the existing locator to build the index information of the new locator, and the version will be +1.

If other threads updated the locator just before this point, it directly adds the position information of a new schema entry to the locator without exist check

@congbobo184
Copy link
Contributor

@aloyszhang hi, I move this PR to release/2.9.5, if you have any questions, please ping me. thanks.

@aloyszhang
Copy link
Contributor Author

@codelipenghui PTAL, thanks

@aloyszhang
Copy link
Contributor Author

close this pull request due to #18701

@aloyszhang aloyszhang closed this Dec 1, 2022
@aloyszhang aloyszhang deleted the schema-fix branch August 17, 2023 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/schema doc-not-needed Your PR changes do not impact docs ready-to-test release/2.9.5 release/2.10.3 release/2.11.1 type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Too much schemas ledgers are created when multi producer start concurrently