Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27995: Fix inconsistent behavior of LOAD DATA command for partitoned and non-partitioned tables #5000

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shivjha30
Copy link
Contributor

@shivjha30 shivjha30 commented Jan 11, 2024

What changes were proposed in this pull request?

Earlier, the code flows for partitioned and non-partitioned tables were different. The partitioned tables skipped constraints checks before submitting the job for execution. This Pull request ensures that both, the partitioned and non-partitioned tables go through the constraints validations in applyConstraintsAndGetFiles function.

Why are the changes needed?

For partitioned tables, while executing LOAD DATA/ LOAD DATA LOCAL commands, the check for file existence is not executed on HiveServer2, and this in turn throws java.io.FileNotFoundException during Runtime once the job is launched.
This PR prevents such cases at compile time.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

The test cases already exist. The error messages prompted back to the user are now consistent if the file is not found at HiveServer2

Load Data Error (Non Partitioned Tables)
Load Data Error

File Not Found Exception (Partitioned Tables)
Load

Fixed: For partitioned tables
Load (1)

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to add a test, there are NegativeCLIDrivers which can be used.

There are bunch of return statements below the current line & the place where it is there right now, So, if that returned was called earlier, this applyConstraintsAndGetFiles(fromURI, ts.tableHandle); must not have been called, but now you would be calling that, right?. So, that is some additional cost just for error message?

@shivjha30
Copy link
Contributor Author

@ayushtkn Thanks for the review, i have added the test case as mentioned above.please review once

@shivjha30
Copy link
Contributor Author

There is a test failure in Jenkins, i ran the test in in local and that test is passing which can be viewed in the below screenshot:
image

CREATE TABLE validate_load_data(key int, value string) partitioned by (hr int) STORED AS TEXTFILE;
LOAD DATA INPATH 'hdfs:///validateload/filedoesnt.txt' INTO TABLE validate_load_data partition (hr);
SELECT * FROM validate_load_data;
DROP TABLE validate_load_data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check this, without fix also it is throwing error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chinnaraolalam the following scenario is now handled, can you review this once

Copy link

sonarcloud bot commented May 3, 2024

Quality Gate Passed Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants