Config for raw index writer version #5503

siddharthteotia · 2020-06-05T21:10:22Z

Description

In PR #5285, the raw index writer format was changed to use 8 byte offset for each chunk in the file header. The writer version was bumped to 3. This was done to support > 2GB indexes. The change was backward compatible to continue the support for reading existing/old segments using 4-byte offsets

While there is no problem with PR 5285, it prevents rollback. So if there is any orthogonal issue while rolling out a release, we can't rollback to older Pinot release since segments already
generated with 8-byte offsets can't be read by old code.

This PR introduces a config option to set the writer version (2 for using old 4-byte chunk offset and 3 which is latest for 8-byte chunk offset). This config option is temporary. To deploy a new version of our internal offline segment creation/push job, we would like this format to be disabled temporarily to have the flexibility of dealing with issues by rolling back the release at will. Once the roll-out is complete, this config option will be removed.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
No
Does this PR fix a zero-downtime upgrade introduced earlier?
No
Does this PR otherwise need attention when creating release notes? Things to consider:

New configuration options
Deprecation of configurations
Signature changes to public methods/interfaces
New plugins added or old plugins removed

Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Documentation

the raw index writer format was changed to use 8 byte offset for each chunk in the file header. The writer version was bumped to 3. This was done to support > 2GB indexes. The change was backward compatible to continue the support for reading existing/old segments using 4-byte offsets While there is no problem with the change, it prevents rollback. So if there is any orthogonal issue while rolling out a release, we can't rollback to older Pinot release since segments already generated with 8-byte offsets can't be read by old code. This config option is temporary to help with internal roll-out by keeping the 8-byte format disabled by default thus allowing rollback due to any issues. In the next couple of weeks after internal rollout, we plan to remove this option.

Jackie-Jiang

If I understand the intention correctly, this PR want to let production environment to still use the DEFAULT_VERSION, but check in the classes of the CURRENT_VERSION and the tests for them. If that is the case, can we just add a separate constructor with the version info so that the tests can use that? The default constructor will create the writer with DEFAULT_VERSION. We don't really need to make it configurable because in production environment you never want to use CURRENT_VERSION before the version is verified. In order to move to the CURRENT_VERSION, you can simply change the default constructor to the CURRENT_VERSION

Jackie-Jiang · 2020-06-05T21:57:32Z

pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java

@@ -33,15 +33,16 @@
  private final IndexType _indexType;
  private final Map<String, String> _properties;

-  public static String BLOOM_FILTER_COLUMN_KEY = "bloom.filter";


Why are we changing this? This is backward-incompatible change right?

I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. So this change doesn't affect anyone. Just wanted to clean up names and remove the dotted notation. Going to migrate to FieldConfig soon

mcvsubbu

minimal comments

mcvsubbu · 2020-06-05T21:13:11Z

...t-core/src/main/java/org/apache/pinot/core/io/writer/impl/v1/BaseChunkSingleValueWriter.java

@@ -36,6 +37,9 @@
 * Base class for fixed and variable byte writer implementations.
 */
 public abstract class BaseChunkSingleValueWriter implements SingleColumnSingleValueWriter {
+  public static final int DEFAULT_VERSION = 2;


Suggested change

public static final int DEFAULT_VERSION = 2;

public static final int DEFAULT_WRITER_VERSION = 2;

Maybe add a TODO or create an issue to remove this before 0.5.0 and put a pointer here to the issue?

mcvsubbu · 2020-06-05T22:32:28Z

...t-core/src/main/java/org/apache/pinot/core/io/writer/impl/v1/BaseChunkSingleValueWriter.java

@@ -60,12 +64,13 @@
   * @param numDocsPerChunk Number of docs per data chunk
   * @param chunkSize Size of chunk
   * @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation.
-   * @param version Version of file
+   * @param version format version used to determine whether to use 8 or 4 byte chunk offsets


this comment may not hold in the long term, so just leave it as is

mcvsubbu · 2020-06-05T22:32:39Z

...e/src/main/java/org/apache/pinot/core/io/writer/impl/v1/FixedByteChunkSingleValueWriter.java

@@ -64,15 +62,15 @@
   * @param compressionType Type of compression to use.
   * @param totalDocs Total number of docs to write.
   * @param numDocsPerChunk Number of documents per chunk.
-   * @param sizeOfEntry Size of entry (in bytes).
+   * @param sizeOfEntry Size of entry (in bytes)
+   * @param writerVersion writer version used to determine whether to use 8 or 4 byte chunk offsets


siddharthteotia · 2020-06-05T22:48:23Z

If I understand the intention correctly, this PR want to let production environment to still use the DEFAULT_VERSION, but check in the classes of the CURRENT_VERSION and the tests for them. If that is the case, can we just add a separate constructor with the version info so that the tests can use that? The default constructor will create the writer with DEFAULT_VERSION. We don't really need to make it configurable because in production environment you never want to use CURRENT_VERSION before the version is verified. In order to move to the CURRENT_VERSION, you can simply change the default constructor to the CURRENT_VERSION

We want to use CURRENT_VERSION in production environment (use cases for text index are going to need the 8-byte chunk offset because of huge raw data).
CURRENT_VERSION is verified in production. We already have realtime segments that got completed and written with CURRENT_VERSION as of the currently deployed release in production. Has also been verified on our internal perf cluster.
However, for pbnj roll-out we need the flexibility temporarily to rollback the release if something orthogonal to this goes wrong. So even though we know, 8-byte format (CURRENT_VERSION) is itself not a problem, something else might go wrong while pbnj is being rolled out internally. In such situations, we should be able to rollback the release. The plan is to use implicit DEFAULT_VERSION while rolling out pbnj and enable CURRENT_VERSION specifically for text index tables. Once the pbnj roll-out is complete, we can remove this.

Coming to changing the dotted notation, I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. Going to migrate soon.

Jackie-Jiang reviewed Jun 5, 2020

View reviewed changes

mcvsubbu approved these changes Jun 5, 2020

View reviewed changes

address review comments

45402fd

siddharthteotia merged commit 04e12bd into apache:master Jun 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Config for raw index writer version #5503

Config for raw index writer version #5503

siddharthteotia commented Jun 5, 2020 •

edited

Jackie-Jiang left a comment

Jackie-Jiang Jun 5, 2020

siddharthteotia Jun 5, 2020 •

edited

mcvsubbu left a comment

mcvsubbu Jun 5, 2020

mcvsubbu Jun 5, 2020

siddharthteotia Jun 5, 2020

mcvsubbu Jun 5, 2020

siddharthteotia Jun 5, 2020

mcvsubbu Jun 5, 2020

siddharthteotia Jun 5, 2020

siddharthteotia commented Jun 5, 2020 •

edited

	public static final int DEFAULT_VERSION = 2;
	public static final int DEFAULT_WRITER_VERSION = 2;

Config for raw index writer version #5503

Config for raw index writer version #5503

Conversation

siddharthteotia commented Jun 5, 2020 • edited

Description

Upgrade Notes

Release Notes

Documentation

Jackie-Jiang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia Jun 5, 2020 • edited

Choose a reason for hiding this comment

mcvsubbu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia commented Jun 5, 2020 • edited

siddharthteotia commented Jun 5, 2020 •

edited

siddharthteotia Jun 5, 2020 •

edited

siddharthteotia commented Jun 5, 2020 •

edited