Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDAP-8745 Fix search total #8158

Merged
merged 1 commit into from
Feb 24, 2017
Merged

Conversation

Denton-L
Copy link
Contributor

@Denton-L Denton-L commented Feb 24, 2017

This commit fixes the bug where the total returned is incorrect when
doing a sorted metadata search. Prior to this, the total returned was
the actual total. Now, the total returned may be less than the actual
total but the search is more efficient.

JIRA: https://issues.cask.co/browse/CDAP-8745
Bamboo: http://builds.cask.co/browse/CDAP-RUT701

@@ -608,7 +608,7 @@ private SearchResults searchByCustomIndex(String namespaceId, Set<EntityTypeSimp
String indexColumn = getIndexColumn(sortInfo.getSortBy(), sortInfo.getSortOrder());
// we want to return the first chunk of 'limit' elements after offset
// in addition, we want to pre-fetch 'numCursors' chunks of size 'limit'
int fetchSize = offset + ((numCursors + 1) * limit);
int fetchSize = (int) Math.min(offset + ((numCursors + 1) * (long) limit), Integer.MAX_VALUE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need all this casting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for when offset is really large (larger than Integer.MAX_VALUE?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems wrong. just int fetchSize = offset + ((numCursors + 1) * limit); should work?

Copy link
Contributor Author

@Denton-L Denton-L Feb 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible for limit to be the Integer.MAX_VALUE. If that's the case, then this line won't run properly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Is this easier to read then?

if (Integer.MAX_VALUE == limit) {
  fetchSize = limit;
} else {
  fetchSize = offset + ((numCursors + 1) * limit);
}

It's ok to keep it the way it is, but please add a comment explaining why this is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in the off chance that limit is very large, it would still not work for the code above. I'll add a comment documenting it, though.

@@ -1137,42 +1137,34 @@ public void apply() throws Exception {
searchResults = dataset.search(namespaceId, "*", targets, nameAsc, 0, 2, 0, null, false,
EnumSet.allOf(EntityScope.class));
Assert.assertEquals(ImmutableList.of(flowEntry, dsEntry), searchResults.getResults());
Assert.assertEquals(ImmutableList.of(flowEntry, dsEntry, appEntry), searchResults.getAllResults());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is searchResults.getAllResults() and why is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getAllResults() should get all search results, including the ones in the offset. This way, we'll be able to count the total.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean if offset = 2, limit = 2, numCursors = 2
results will have (limit + 1) * numCursors = 6 elements, but all results will have offset + results = 8 elements?

To count the total, you don't actually need to return the results, right? You can just add the offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, then we reintroduce the bug that was fixed in the last PR: https://issues.cask.co/browse/CDAP-7930.

Copy link
Contributor

@bdmogal bdmogal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Denton-L, a couple of comments. However, is there a test that covers this scenario somewhere? Would be good to ensure that there's a test which makes sure that a full scan is not done when not required.

@bdmogal
Copy link
Contributor

bdmogal commented Feb 24, 2017

@Denton-L I meant a test that does the following:

  1. Has 5 possible search results
  2. Accepts limit = 1 and numCursors = 3
  3. Ensures that only 4 search results, 3 cursors and total as 4 are returned.

@Denton-L
Copy link
Contributor Author

@bdmogal, in the unit tests, I introduced a check for getAllResults().size(). This is essentially the final total which is returned. Any line where I'm asserting a number < 3 should be testing the behaviour you've described in your comment. Could you give it a quick lookover?

@bdmogal
Copy link
Contributor

bdmogal commented Feb 24, 2017

@Denton-L ok. final comment: getResults and getAllResults is a little ambiguous. Any better names? getResultsFromOffset and getResultsFromBeginning, perhaps? Its not getAllResults, because we may still have more than what's returned. Other than that, LGTM so long as this scenario has been tested. If you have this pushed to a cluster, perhaps @tonybach can verify the behavior too.

@Denton-L
Copy link
Contributor Author

Yeah, good idea, I'll change the naming. And unfortunately, I've only tested this locally using StandaloneMain.

@bdmogal
Copy link
Contributor

bdmogal commented Feb 24, 2017

Ok. I'd recommend pushing this to a cluster, and having @tonybach verify the fix as well, since it is so late in the release cycle :-)

This commit fixes the bug where the total returned is incorrect when
doing a sorted metadata search. Prior to this, the total returned was
the actual total. Now, the total returned may be less than the actual
total but the search is more efficient.
@Denton-L Denton-L merged commit 4e57fad into release/4.1 Feb 24, 2017
@Denton-L Denton-L deleted the bugfix/cdap-8745-fix-total branch February 24, 2017 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants