Skip to content

Remove broken support for map files #3378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 10, 2023
Merged

Conversation

ctubbsii
Copy link
Member

@ctubbsii ctubbsii commented May 5, 2023

  • Remove support for Hadoop map files for tablet data, as this hasn't worked since 1.5.0 anyway. Now, an error should be thrown, as the file type is no longer recognized. Replaced special handling for map file extension or directory with a logged warning, error, or allow an exception to be thrown, depending on the situation.

  • Rename 'mapFile' variables and related methods to the more generic 'dataFile' or equivalent, to reduce confusion, since we no longer use MyMapFile or Hadoop MapFile, and only use RFile for storing tablet data files. This was done once before when RFile was initially added to the code base, which is why many of our variables and types already used the term 'dataFile', but evidently, not all of it had been renamed, or some of that terminology leaked back in. Some of it was added as part of recent Compaction changes, but there may have been earlier times this language was added back in. This change here resets us to the generic language for tablet data files.

  • Rename and relocate tabletingest-specific Thrift struct MapFileInfo to DataFileInfo under the tabletingest.thrift namespace. This only affects the bulk load Manager FATE operation and tablet servers. End user clients did not use this API, so it will not affect them.

  • Rename loadMapFile variable to loadMappingFile to make it clear that it's the file that stores the mapping of files for bulk loads, rather than a file in the MapFile format.

  • Add new validator to table.file.type value to prevent it from being anything other than the currently supported 'rf' extension for RFiles, which is also the current default (and the only value that would have worked anyway).

  • Fix issues with DefaultCompactionPlanner reported as part of DefaultCompactionPlanner SPI class has several static methods/inner classes exposed as public API #3377, but those should be applied to 2.1 branch first, and this change rebased onto the updated main after that is merged forward. TODO remove this from the commit message once that's done

* Remove support for Hadoop map files for tablet data, as this hasn't
  worked since 1.5.0 anyway. Now, an error should be thrown, as the file
  type is no longer recognized. Replaced special handling for map file
  extension or directory with a logged warning, error, or allow an
  exception to be thrown, depending on the situation.
* Rename 'mapFile' variables and related methods to the more generic
  'dataFile' or equivalent, to reduce confusion, since we no longer use
  MyMapFile or Hadoop MapFile, and only use RFile for storing tablet
  data files. This was done once before when RFile was initially added
  to the code base, which is why many of our variables and types already
  used the term 'dataFile', but evidently, not all of it had been
  renamed, or some of that terminology leaked back in. Some of it was
  added as part of recent Compaction changes, but there may have been
  earlier times this language was added back in. This change here resets
  us to the generic language for tablet data files.
* Rename and relocate tabletingest-specific Thrift struct MapFileInfo to
  DataFileInfo under the tabletingest.thrift namespace. This only
  affects the bulk load Manager FATE operation and tablet servers. End
  user clients did not use this API, so it will not affect them.
* Rename loadMapFile variable to loadMappingFile to make it clear that
  it's the file that stores the mapping of files for bulk loads, rather
  than a file in the MapFile format.
* Add new validator to `table.file.type` value to prevent it from being
  anything other than the currently supported 'rf' extension for RFiles,
  which is also the current default (and the only value that would have
  worked anyway).

* Fix issues with DefaultCompactionPlanner reported as part of apache#3377,
  but those should be applied to 2.1 branch first, and this change
  rebased onto the updated main after that is merged forward.
  **TODO** remove this from the commit message once that's done
@ctubbsii ctubbsii self-assigned this May 5, 2023
@ctubbsii ctubbsii marked this pull request as ready for review May 5, 2023 08:08
@ctubbsii ctubbsii closed this May 5, 2023
@ctubbsii ctubbsii reopened this May 5, 2023
@ctubbsii ctubbsii requested review from keith-turner and EdColeman May 5, 2023 14:52
@ctubbsii
Copy link
Member Author

ctubbsii commented May 5, 2023

All unit tests and ITs passed.

Copy link
Contributor

@EdColeman EdColeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@cshannon cshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, I like the new name of dataFile

@ctubbsii ctubbsii linked an issue May 10, 2023 that may be closed by this pull request
ctubbsii added a commit to ctubbsii/accumulo that referenced this pull request May 10, 2023
Backport some changes from PR apache#3378 for 3.0 into 2.1 to fix issue apache#3377
Specifically, backport the changes to DefaultCompactionPlanner (its unit
test) to make private some of its internal implementation methods that
were unintentionally left public. This also includes the name changes to
those methods that were part of PR apache#3378, to reduce merge conflicts.

This fixes apache#3377
@ctubbsii ctubbsii merged commit fdabdc5 into apache:main May 10, 2023
@ctubbsii ctubbsii deleted the mapfile-removal branch May 10, 2023 14:07
ctubbsii added a commit to ctubbsii/accumulo that referenced this pull request May 10, 2023
Drop MapFile support (apache#3359) by partially backporting PR apache#3378, and
making minimally disruptive changes to avoid use of MapFiles for 1.10.4
and 2.1.1.

As noted in apache#3378, MapFile support is already broken and has been for a
long time. This change will cause an explicit and detectable failure,
rather than a silent one, if a MapFile is attempted to be used.
ctubbsii added a commit that referenced this pull request May 12, 2023
Drop MapFile support (#3359) by partially backporting PR #3378, and
making minimally disruptive changes to avoid use of MapFiles for 1.10.4
and 2.1.1.

As noted in #3378, MapFile support is already broken and has been for a
long time. This change will cause an explicit and detectable failure,
rather than a silent one, if a MapFile is attempted to be used.
@ctubbsii ctubbsii added this to the 3.0.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop Mapfile support
4 participants