Skip to content

[SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream#44563

Closed
beliefer wants to merge 1 commit intoapache:masterfrom
beliefer:SPARK-46567
Closed

[SPARK-46567][CORE] Remove ThreadLocal for ReadAheadInputStream#44563
beliefer wants to merge 1 commit intoapache:masterfrom
beliefer:SPARK-46567

Conversation

@beliefer
Copy link
Copy Markdown
Contributor

@beliefer beliefer commented Jan 2, 2024

What changes were proposed in this pull request?

This PR propose to remove ThreadLocal for ReadAheadInputStream.

Why are the changes needed?

ReadAheadInputStream has a field oneByte declared as TheadLocal.
In fact, oneByte only used in read.
We can remove it by the way that the closure of local variables in instance methods can provide thread safety guarantees.

On the other hand, the TheadLocal occupies a certain amount of space in the heap and there are allocation and GC costs.

Does this PR introduce any user-facing change?

'No'.

How was this patch tested?

Exists test cases.

Was this patch authored or co-authored using generative AI tooling?

'No'.

@github-actions github-actions bot added the CORE label Jan 2, 2024
// short path - just get one byte.
return activeBuffer.get() & 0xFF;
} else {
byte[] oneByteArray = oneByte.get();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original writing seems to be to avoid creating byte[1] repeatedly. Have you encountered any bad cases? The new writing seems to create more objects of type byte[1]

Copy link
Copy Markdown
Contributor Author

@beliefer beliefer Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte[1] is very cheap and change it to local variable could avoid GC.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decorate it with private final if we really need to avoid creating byte[1] repeatedly.

@beliefer
Copy link
Copy Markdown
Contributor Author

beliefer commented Jan 3, 2024

ping @dongjoon-hyun @srowen cc @sitalkedia

@beliefer
Copy link
Copy Markdown
Contributor Author

beliefer commented Jan 3, 2024

The GA failure is unrelated.

@srowen
Copy link
Copy Markdown
Member

srowen commented Jan 3, 2024

I agree with this change, I would be surprised if it's even an optimization

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Copy Markdown
Member

Merged to master.

@dongjoon-hyun
Copy link
Copy Markdown
Member

Thank you, @beliefer and all.

@beliefer
Copy link
Copy Markdown
Contributor Author

beliefer commented Jan 3, 2024

@srowen @dongjoon-hyun @LuciferYang Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants