fixes #1559 - Prevent Bulk loading files that exceed TSERV_BULK_MAX_TABLET_OVERLAP tablets #1560

FineAndDandy · 2020-03-11T17:50:34Z

No description provided.

…_MAX_TABLET_OVERLAP tablets

milleruntime · 2020-03-11T17:59:26Z

core/src/main/java/org/apache/accumulo/core/conf/Property.java

      "If a thread blocks more than this period of time waiting to get file permits,"
          + " debugging information will be written."),
-
+  TSERV_BULK_MAX_TABLET_OVERLAP("tserver.bulk.max.overlap", "0", PropertyType.COUNT,


I think just the name "tserver.bulk.max.tablets" is more appropriate since it is just the max number of tablets a single bulk file can produce.

Thinking ahead, for the new bulk import code in the 2.x branch I think this check would actually execute in the master. Therefore the prefix tserver is not a good prefix. This is not a problem with this PR, more of a problem with the current property name conventions. I searched for bulk in this file and found master.bulk. and tserver.bulk. prefixes. The user does not care where the check is done. Could possibly use the prefix general instead of tserver, but this creates a third prefix for bulk properties which could make this property harder to find.

Not sure there is a good solution to this problem.

I agree, I don't think users care whether the prefix is tserver or master. I don't think its worth going through a deprecation cycle to migrate all of the tserver properties to master or general. That just adds extra unnecessary work for users. I think the name master.bulk.max.tablets is OK.

I think the name master.bulk.max.tablets is OK.

I like that name and would be in favor of using it in 1.9 and 2.x. The name will make more sense in 2.x than in 1.9 but that is better for the long term.

milleruntime · 2020-03-11T18:08:56Z

test/src/main/java/org/apache/accumulo/test/functional/MasterBulkFailureIT.java

+    iterator.next();
+    Assert.assertTrue(iterator.hasNext());
+    iterator.next();
+    Assert.assertFalse(iterator.hasNext());


This should be done in a loop, like you did above to create the file.

Better yet, just use Guava's Iterables.size(s)

milleruntime · 2020-03-11T18:10:02Z

test/src/main/java/org/apache/accumulo/test/functional/MasterBulkFailureIT.java

+    // create a bulk load file
+    final FileSystem fs = getCluster().getFileSystem();
+    final Path basePath = getCluster().getTemporaryPath();
+    CachedConfiguration.setInstance(fs.getConf());


This class was dropped in 2.x. and I don't think is needed in the test.

Suggested change

CachedConfiguration.setInstance(fs.getConf());

milleruntime · 2020-03-11T18:22:03Z

test/src/main/java/org/apache/accumulo/test/functional/MasterBulkFailureIT.java

+    RemoteIterator<LocatedFileStatus> status = fs.listFiles(bulkFailures, false);
+    Assert.assertTrue(status.hasNext());
+    Path newPath = status.next().getPath();
+    Assert.assertFalse(status.hasNext());


The bulkImport may not have run yet so you need to verify that the files were moved first before checking the failure directory. The same with the second try below.

ctubbsii

I haven't had a chance to look at the PR in detail. However, I wanted to point out that this modifies the behavior of an API that is deprecated in the latest code. I'm not sure it makes sense to add new behavior features to deprecated APIs, when we haven't yet discussed or come to consensus on how it should be implemented in the replacement API. Addressing it in the latest API first, will ensure we make sure any backported change leads in the same direction, and doesn't do something wildly different (assuming we backport to a deprecated API at all).

milleruntime · 2020-05-19T15:47:13Z

I am in the process of adding this check to the new bulk import for 2.1. I think this would be useful to have in the old bulk import process as well, but will be different for 2.x since some of the old code was refactored for reuse with the new code. Adding a new check to the deprecated API in 2.x is debatable and back porting this check to 1.9/1.10 is also debatable. I think the maintenance cost of something like this is very minimal and would benefit users who don't want to risk switching to the newer API. The new bulk API should have significant improvements yet may come with some bugs initially as it has yet to be tested in production at a large scale.

ctubbsii · 2020-08-17T01:19:50Z

PR auto-closed when 1.9 branch was removed. Re-opened and set base branch to 1.10 instead.

milleruntime · 2020-10-19T17:30:29Z

Closing since this didn't make it into 1.10. It was added to 2.1 in #1614. The only thing left that could be done would be to add the property to the deprecated bulk import code and I wouldn't recommend doing that.

FineAndDandy added 2 commits March 11, 2020 17:47

fixes apache#1559 - Prevent Bulk loading files that exceed TSERV_BULK…

db60789

…_MAX_TABLET_OVERLAP tablets

Merge branch '1.9' into 1559

cd54e12

milleruntime reviewed Mar 11, 2020

View reviewed changes

ctubbsii linked an issue Mar 11, 2020 that may be closed by this pull request

Consider implementing failsafe to prevent large number of tablet assignments in bulk import #1559

Closed

ctubbsii reviewed Mar 11, 2020

View reviewed changes

milleruntime mentioned this pull request May 20, 2020

Create max tablets property in new bulk import #1614

Merged

ctubbsii closed this Aug 17, 2020

ctubbsii reopened this Aug 17, 2020

ctubbsii changed the base branch from 1.9 to 1.10 August 17, 2020 01:14

Fix checkstyle and formatter for PR

b7a9380

milleruntime closed this Oct 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixes #1559 - Prevent Bulk loading files that exceed TSERV_BULK_MAX_TABLET_OVERLAP tablets #1560

fixes #1559 - Prevent Bulk loading files that exceed TSERV_BULK_MAX_TABLET_OVERLAP tablets #1560

Uh oh!

FineAndDandy commented Mar 11, 2020

Uh oh!

milleruntime Mar 11, 2020

Uh oh!

keith-turner Mar 11, 2020

Uh oh!

milleruntime May 19, 2020

Uh oh!

keith-turner May 19, 2020

Uh oh!

milleruntime Mar 11, 2020

Uh oh!

ctubbsii Mar 11, 2020

Uh oh!

milleruntime Mar 11, 2020

Uh oh!

milleruntime Mar 11, 2020

Uh oh!

ctubbsii left a comment

Uh oh!

milleruntime commented May 19, 2020

Uh oh!

ctubbsii commented Aug 17, 2020

Uh oh!

milleruntime commented Oct 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fixes #1559 - Prevent Bulk loading files that exceed TSERV_BULK_MAX_TABLET_OVERLAP tablets #1560

fixes #1559 - Prevent Bulk loading files that exceed TSERV_BULK_MAX_TABLET_OVERLAP tablets #1560

Uh oh!

Conversation

FineAndDandy commented Mar 11, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ctubbsii left a comment

Choose a reason for hiding this comment

Uh oh!

milleruntime commented May 19, 2020

Uh oh!

ctubbsii commented Aug 17, 2020

Uh oh!

milleruntime commented Oct 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants