Skip to content

Conversation

@EdColeman
Copy link
Contributor

Exports created with versions before 3.1 will not contain range fenced information. This change allows imports of exported with Accumulo versions prior to 3.1 to be imported into 3.1+.

  • updated export version to 2 as an optimization so files exported with 3.1+ can be assumed to include fenced ranges.
  • moved the need conversion check to StoredTabletFile for reuse. (also moves tests)

Also verified using uno with exports created with versions 2.1.3 and 3.1 and imported into 3.1 (with these changes)

Closes #3849

@EdColeman EdColeman self-assigned this Dec 29, 2023
Comment on lines 137 to 138
if (tableInfo.exportedVersion == null || (tableInfo.exportedVersion < VERSION
&& StoredTabletFile.fileNeedsConversion(dataFileCQ))) {
Copy link
Contributor

@keith-turner keith-turner Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is very tied to a specific version. I would make that more explicit as follows with a new constant VERSION_2. If in the future if VERSION changes this code will still behave the same w.r.t. the specific version where the change in behavior happened. It also makes it easier to follow the code over time and know what specific version we care about in this if stmt.

Suggested change
if (tableInfo.exportedVersion == null || (tableInfo.exportedVersion < VERSION
&& StoredTabletFile.fileNeedsConversion(dataFileCQ))) {
if (tableInfo.exportedVersion == null || tableInfo.exportedVersion < VERSION_2) {
checkState(!StoredTabletFile.fileNeedsConversion(dataFileCQ));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added CURR_VERSION and explicit VERSION_2 in 4d136c4

Copy link
Contributor

@cshannon cshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM but I think it would be good to add a test for import that the conversion works properly. It looks like there is an existing ImportTableTest that could maybe be updated.

@EdColeman
Copy link
Contributor Author

The testing difficulty is that the files need to be written in a prior version, I suppose that the files could be created using 2.1 and added as a resource and preserved in Github. I'm not aware of other places where we have done that. Or maybe there is another way?

The changes were testing using uno, but I do agree that is less than satisfying.

@cshannon
Copy link
Contributor

Yeah, in this case I think it might be ok to just create small files with 2.1 and commit it as part of the test resources. I have done that before in other projects and I don't think it's a problem here. The only other option is to write code to manually write out the test files each time in the old format which doesn't sound like fun. Since we are just checking if the legacy conversion works seems like just uploading an old file format would be fine. @ctubbsii - Do you have any issue if we uploaded test files created by an older version to check the conversion?

@keith-turner
Copy link
Contributor

Yeah, in this case I think it might be ok to just create small files with 2.1 and commit it as part of the test resources.

We already have some data like that for testing old rfile formats. So there is precedent for that pattern and I think it would be a good thing to do. Could place a comment in the test showing how the data was produced.

if (key.getColumnFamily().equals(DataFileColumnFamily.NAME)) {
StoredTabletFile exportedRef;
var dataFileCQ = key.getColumnQualifier().toString();
if (tableInfo.exportedVersion == null || (tableInfo.exportedVersion < VERSION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When tableInfo.exportedVersion < VERSION_2 is true under what circumstance would we expect StoredTabletFile.fileNeedsConversion(dataFileCQ) to return false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed extra check in 8191010

@EdColeman
Copy link
Contributor Author

EdColeman commented Jan 21, 2024

Import test case added in 47f634d. Includes README in the data dir to create the files with shell commands.

@EdColeman
Copy link
Contributor Author

One quirk of adding the test files where they currently are, the license check shows a warning because it does not know how to check a *.rf file. (The build completes though) Unsure if a general exclusion for all rfiles, or something more targeted to the file / directory location would be appropiate. @ctubbsii do you have a suggestion or preference?

Another curiosity is that the licence check inserted the ASL into the export distcp.txt file, but it does not seem to matter to the import code.

The maven warnings:

[INFO] --- license:4.3:format (license-headers) @ accumulo-test ---
[INFO] Updating license headers...
[WARNING] Unknown file extension: .../test/src/main/resources/v2_import_test/data/A0000008.rf
[WARNING] Unknown file extension: .../test/src/main/resources/v2_import_test/data/A000000a.rf
[WARNING] Unknown file extension: .../test/src/main/resources/v2_import_test/data/A000000b.rf
[WARNING] Unknown file extension: .../test/src/main/resources/v2_import_test/data/A0000009.rf
[WARNING] Unable to find a comment style definition for some files. You may want to add a custom mapping for the relevant file extensions.

@ctubbsii
Copy link
Member

Unsure if a general exclusion for all rfiles, or something more targeted to the file / directory location would be appropiate.

Either are fine. I can't imagine we'd ever have a file ending in .rf where we'd want the plugin to add a license header to it, so can probably just exclude them all **/*.rf in the parent POM (and remove any more specific .rf exclusions from any child POMs... I know at least one exists in core/pom.xml).

Another curiosity is that the licence check inserted the ASL into the export distcp.txt file, but it does not seem to matter to the import code.

That one should also be excluded. The format of that file should be one file per line, and the comments it added aren't clearly identifiable as a comment, so we should just avoid it. It also doesn't matter, because there's no copyrightable content in that file anyway. It's just a list of files.

@EdColeman EdColeman merged commit 1db892b into apache:main Jan 31, 2024
@EdColeman EdColeman deleted the no_chop_imports_version_change branch January 31, 2024 22:58
@ctubbsii ctubbsii added this to the 3.1.0 milestone Jul 12, 2024
@ctubbsii ctubbsii modified the milestones: 3.1.0, 4.0.0 Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate what happens when importing a table from 2.1 to 3.1

4 participants