Skip to content

Conversation

@vamshikolanu
Copy link

@vamshikolanu vamshikolanu commented Mar 10, 2023

What changes were proposed in this pull request?

As part of PR, Round off the limit value to greater than Integer.MAX_VALUE to Integer.MAX_VALUE

Why are the changes needed?

Does this PR introduce any user-facing change?

No but we should update the documentation to reflect this behavior

How was this patch tested?

mvn test -Dtest=TestMiniLlapLocalCliDriver -Dtest.output.overwrite=true -Dqfile=limit_max_int.q

@vamshikolanu vamshikolanu changed the title [WIP]HIVE-27133: Round off limit value greater than int_max to int_max HIVE-27133: Round off limit value greater than int_max to int_max Mar 10, 2023
Copy link
Contributor

@jfsii jfsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, though I am curious on the utility of this change.
I'm usually not a fan of not doing what the user has explicitly asked for -
I.E. if they asked for a limit of 4 billion rows and squashing it 2 billion rather than throwing an error, seems like it could cause confusion. I think it is actually preferable to throw an exception when you can't do what the user explicitly asked for.


/**
* Returns integer value of a string. If the string value exceeds max int, returns Integer.MAX_VALUE
* else if the string value is less than min int, returns Integer.MAX_VALUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely meant MIN_VALUE rather than MAX_VALUE

import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hadoop.util.ReflectionUtils;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove random line

select key from src where key = '238' limit 214748364700;
select * from src where key = '238' limit 214748364700;
select src.key, count(src.value) from src group by src.key limit 214748364700;
select * from ( select key from src limit 3) sq1 limit 214748364700; No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to test underflow?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit query fails for negative numbers at the parser level. Hence I haven't added it for a negative number. convertStringToBoundedInt() made it generic to handle both positive and negative cases.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

try {
BigInteger bigIntValue = new BigInteger(value);
if (bigIntValue.compareTo(BigInteger.valueOf(Integer.MAX_VALUE)) > 0) {
return Integer.MAX_VALUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vamshikolanu
I agree with @jfsii. Converting a large number to Integer.MAX_VALUE is misleading the user.
Consider the following query -
INSERT INTO TABLE destinationTable SELECT * FROM sourceTable LIMIT <some_large_number>;
The insert will write records based on the output of the SELECT operator. In this case, since we have converted it to Integer.MAX_VALUE, the number of records written will be equal to Integer.MAX_VALUE which might not be what the user wants.

Perhaps adding a meaningful exception is better. In the long term, adding support for large integers for LIMIT clauses is even more better.

select key from src where key = '238' limit 214748364700;
select * from src where key = '238' limit 214748364700;
select src.key, count(src.value) from src group by src.key limit 214748364700;
select * from ( select key from src limit 3) sq1 limit 214748364700; No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please add a newline at the end of the qfile.

@vamshikolanu
Copy link
Author

I agree with your comments @jfsii @SourabhBadhya. I'm going to update to add support for bigint in limit clause and close this pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants