Skip to content

Conversation

@ashishkumar50
Copy link
Contributor

What changes were proposed in this pull request?

Ozone client support readVectored which allows reading multiple file ranges in parallel.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13660

How was this patch tested?

Tests are added

@jojochuang
Copy link
Contributor

@yandrey321

@jojochuang jojochuang self-requested a review December 1, 2025 17:19
@adoroszlai
Copy link
Contributor

adoroszlai commented Dec 3, 2025

Thanks @ashishkumar50 for the patch. I have ported the corresponding contract test from Hadoop, which shows that some input validation is missing (negative length and offset, overlapping ranges, same ranges).

Please feel free to pick it from adoroszlai@ee11cf3 and include in this PR.

https://github.com/adoroszlai/ozone/actions/runs/19905480757/job/57061761264#step:13:5599

adoroszlai and others added 2 commits December 4, 2025 18:07
@ashishkumar50
Copy link
Contributor Author

@adoroszlai Thanks for the review, handled comments.

(offset, buffer) -> readRangeData(offset, buffer, initialPosition));
} finally {
// Restore position
synchronized (this) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen in case of concurrent readVectored() calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronized now to avoid race condition in same concurrent stream reads.

});
} finally {
// Restore position
synchronized (this) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen in case of concurrent readVectored() calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronized now to avoid race condition in same concurrent stream reads.

@chungen0126 chungen0126 self-requested a review December 5, 2025 07:26
@adoroszlai
Copy link
Contributor

@chungen0126 @jojochuang @yandrey321 please take a look

Copy link
Contributor

@chungen0126 chungen0126 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @ashishkumar50 the patch. Are there plans to optimize read vector further? The current implementation doesn't seem to fully support fully parallel reading for different ranges yet. I also added some comments.

);

// Restore position
seek(initialPosition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if the seek to initialPosition is necessary here? Since the offset changes asynchronously, restoring it at this point might not work correctly. It seems like readRangeData already handles the position restoration correctly.

);

// Restore position before returning from method
seek(initialPosition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants