-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-27995: Fix inconsistent behavior of LOAD DATA command for partitoned and non-partitioned tables #5000
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to add a test, there are NegativeCLIDrivers which can be used.
There are bunch of return statements below the current line & the place where it is there right now, So, if that returned was called earlier, this applyConstraintsAndGetFiles(fromURI, ts.tableHandle);
must not have been called, but now you would be calling that, right?. So, that is some additional cost just for error message?
@ayushtkn Thanks for the review, i have added the test case as mentioned above.please review once |
CREATE TABLE validate_load_data(key int, value string) partitioned by (hr int) STORED AS TEXTFILE; | ||
LOAD DATA INPATH 'hdfs:///validateload/filedoesnt.txt' INTO TABLE validate_load_data partition (hr); | ||
SELECT * FROM validate_load_data; | ||
DROP TABLE validate_load_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check this, without fix also it is throwing error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chinnaraolalam the following scenario is now handled, can you review this once
…oned and non-partitioned tables
|
What changes were proposed in this pull request?
Earlier, the code flows for partitioned and non-partitioned tables were different. The partitioned tables skipped constraints checks before submitting the job for execution. This Pull request ensures that both, the partitioned and non-partitioned tables go through the constraints validations in
applyConstraintsAndGetFiles
function.Why are the changes needed?
For partitioned tables, while executing LOAD DATA/ LOAD DATA LOCAL commands, the check for file existence is not executed on HiveServer2, and this in turn throws
java.io.FileNotFoundException
during Runtime once the job is launched.This PR prevents such cases at compile time.
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
The test cases already exist. The error messages prompted back to the user are now consistent if the file is not found at HiveServer2
Load Data Error (Non Partitioned Tables)
![Load Data Error](https://private-user-images.githubusercontent.com/149884343/295987656-5133da3d-21d4-43f5-bec0-2b76f63ef979.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg0NTk5ODgsIm5iZiI6MTcxODQ1OTY4OCwicGF0aCI6Ii8xNDk4ODQzNDMvMjk1OTg3NjU2LTUxMzNkYTNkLTIxZDQtNDNmNS1iZWMwLTJiNzZmNjNlZjk3OS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjE1JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYxNVQxMzU0NDhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02ZGZkNzM4NDkzYzFiOTVjNDZlMzJjYjA1OWYzZjk3MzlkYmY1ZDVmYzM3OGM3ZDM5ZmNjMzc4NDBlNTkzYjAxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.CS87w4ogepL9utX2XYYNTsPTgfTNJFC53Y3yZyD-jd8)
File Not Found Exception (Partitioned Tables)
![Load](https://private-user-images.githubusercontent.com/149884343/295987916-3490422a-de97-4750-8cab-fb81d5319a2b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg0NTk5ODgsIm5iZiI6MTcxODQ1OTY4OCwicGF0aCI6Ii8xNDk4ODQzNDMvMjk1OTg3OTE2LTM0OTA0MjJhLWRlOTctNDc1MC04Y2FiLWZiODFkNTMxOWEyYi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjE1JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYxNVQxMzU0NDhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1mYTg1ZmY3OGRlNWFmYzRlMWFiMGZkZjc4ZjBhMWFjYzZjNzQyYTUwODY1MmM0NjZhMjZiYjZmOGVhM2Y5ZjhjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.Y7wCBPsILY_uZcJTjhctraPWADUSMaPSqAh_FM4zQgg)
Fixed: For partitioned tables
![Load (1)](https://private-user-images.githubusercontent.com/149884343/295988086-95ad648f-3233-4032-a438-db032a82401b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg0NTk5ODgsIm5iZiI6MTcxODQ1OTY4OCwicGF0aCI6Ii8xNDk4ODQzNDMvMjk1OTg4MDg2LTk1YWQ2NDhmLTMyMzMtNDAzMi1hNDM4LWRiMDMyYTgyNDAxYi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjE1JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYxNVQxMzU0NDhaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jNmYwNmM3ZmUxYzk2MjA2YjJiZGNjYjk2YjgwMjJiNzZkNTUxNTUyZDhhYzJmNzYzOWE2YjEwYmQ1YzQyODVlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.SM6wR8bXCLRmuJkhFqYkOIi5rEQflyO2kQOGCEslZ-k)