-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Config for raw index writer version #5503
Conversation
the raw index writer format was changed to use 8 byte offset for each chunk in the file header. The writer version was bumped to 3. This was done to support > 2GB indexes. The change was backward compatible to continue the support for reading existing/old segments using 4-byte offsets While there is no problem with the change, it prevents rollback. So if there is any orthogonal issue while rolling out a release, we can't rollback to older Pinot release since segments already generated with 8-byte offsets can't be read by old code. This config option is temporary to help with internal roll-out by keeping the 8-byte format disabled by default thus allowing rollback due to any issues. In the next couple of weeks after internal rollout, we plan to remove this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the intention correctly, this PR want to let production environment to still use the DEFAULT_VERSION
, but check in the classes of the CURRENT_VERSION
and the tests for them. If that is the case, can we just add a separate constructor with the version info so that the tests can use that? The default constructor will create the writer with DEFAULT_VERSION
. We don't really need to make it configurable because in production environment you never want to use CURRENT_VERSION
before the version is verified. In order to move to the CURRENT_VERSION
, you can simply change the default constructor to the CURRENT_VERSION
@@ -33,15 +33,16 @@ | |||
private final IndexType _indexType; | |||
private final Map<String, String> _properties; | |||
|
|||
public static String BLOOM_FILTER_COLUMN_KEY = "bloom.filter"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing this? This is backward-incompatible change right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. So this change doesn't affect anyone. Just wanted to clean up names and remove the dotted notation. Going to migrate to FieldConfig soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minimal comments
@@ -36,6 +37,9 @@ | |||
* Base class for fixed and variable byte writer implementations. | |||
*/ | |||
public abstract class BaseChunkSingleValueWriter implements SingleColumnSingleValueWriter { | |||
public static final int DEFAULT_VERSION = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static final int DEFAULT_VERSION = 2; | |
public static final int DEFAULT_WRITER_VERSION = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TODO or create an issue to remove this before 0.5.0 and put a pointer here to the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -60,12 +64,13 @@ | |||
* @param numDocsPerChunk Number of docs per data chunk | |||
* @param chunkSize Size of chunk | |||
* @param sizeOfEntry Size of entry (in bytes), max size for variable byte implementation. | |||
* @param version Version of file | |||
* @param version format version used to determine whether to use 8 or 4 byte chunk offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment may not hold in the long term, so just leave it as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -64,15 +62,15 @@ | |||
* @param compressionType Type of compression to use. | |||
* @param totalDocs Total number of docs to write. | |||
* @param numDocsPerChunk Number of documents per chunk. | |||
* @param sizeOfEntry Size of entry (in bytes). | |||
* @param sizeOfEntry Size of entry (in bytes) | |||
* @param writerVersion writer version used to determine whether to use 8 or 4 byte chunk offsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Coming to changing the dotted notation, I haven't yet done the migration from IndexingConfig to FieldConfig. It is only being used for text. Going to migrate soon. |
Description
In PR #5285, the raw index writer format was changed to use 8 byte offset for each chunk in the file header. The writer version was bumped to 3. This was done to support > 2GB indexes. The change was backward compatible to continue the support for reading existing/old segments using 4-byte offsets
While there is no problem with PR 5285, it prevents rollback. So if there is any orthogonal issue while rolling out a release, we can't rollback to older Pinot release since segments already
generated with 8-byte offsets can't be read by old code.
This PR introduces a config option to set the writer version (2 for using old 4-byte chunk offset and 3 which is latest for 8-byte chunk offset). This config option is temporary. To deploy a new version of our internal offline segment creation/push job, we would like this format to be disabled temporarily to have the flexibility of dealing with issues by rolling back the release at will. Once the roll-out is complete, this config option will be removed.
Upgrade Notes
Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
No
Does this PR fix a zero-downtime upgrade introduced earlier?
No
Does this PR otherwise need attention when creating release notes? Things to consider:
release-notes
and complete the section on Release Notes)Release Notes
Documentation