Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string builder to build JsString #3915

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

CrazyboyQCD
Copy link
Contributor

@CrazyboyQCD CrazyboyQCD commented Jul 12, 2024

There are many places where Vec is used as a builder to create JsString.
This adds JsStringBuilder as a more natural choice.

Copy link

codecov bot commented Jul 12, 2024

Codecov Report

Attention: Patch coverage is 59.62441% with 86 lines in your changes missing coverage. Please review.

Project coverage is 52.88%. Comparing base (6ddc2b4) to head (581f14f).
Report is 273 commits behind head on main.

Files with missing lines Patch % Lines
core/string/src/builder.rs 59.62% 86 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3915      +/-   ##
==========================================
+ Coverage   47.24%   52.88%   +5.64%     
==========================================
  Files         476      483       +7     
  Lines       46892    46934      +42     
==========================================
+ Hits        22154    24821    +2667     
+ Misses      24738    22113    -2625     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@CrazyboyQCD CrazyboyQCD marked this pull request as draft July 13, 2024 16:58
@CrazyboyQCD CrazyboyQCD marked this pull request as ready for review July 17, 2024 15:33
Copy link
Member

@HalidOdat HalidOdat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Check my comment if they make sense :)

core/string/src/lib.rs Outdated Show resolved Hide resolved
core/string/src/lib.rs Outdated Show resolved Hide resolved
core/string/src/lib.rs Outdated Show resolved Hide resolved
core/string/src/lib.rs Outdated Show resolved Hide resolved
core/string/src/lib.rs Outdated Show resolved Hide resolved
core/string/src/lib.rs Outdated Show resolved Hide resolved
core/string/src/lib.rs Outdated Show resolved Hide resolved
@CrazyboyQCD
Copy link
Contributor Author

CrazyboyQCD commented Jul 24, 2024

@HalidOdat, one more thing is that I used realloc instead of alloc before and found alloc has a better performance, since I can't test on other machine, you could test realloc to see if I'm wrong.
If you want to test it, just change the reserve:

    fn reserve(&mut self, new_layout: Layout) {
        let new_ptr = if self.is_dangling() {
            let new_ptr = unsafe { alloc(new_layout) };
            let Some(new_ptr) = NonNull::new(new_ptr.cast::<RawJsString>()) else {
                std::alloc::handle_alloc_error(new_layout)
            };
            new_ptr
        } else {
            use std::alloc::realloc;
            let old_ptr = self.inner.as_ptr();
            let ptr =  unsafe { realloc(ptr, new_layout, new_layout.size()) };
            let Some(new_ptr) = NonNull::new(new_ptr.cast::<RawJsString>()) else {
                std::alloc::handle_alloc_error(new_layout)
            };
            new_ptr
        };
        self.inner = new_ptr;
        let new_arr_size = new_layout.size() - DATA_OFFSET;
        self.cap = new_arr_size / Self::DATA_SIZE;
    }

core/string/src/lib.rs Outdated Show resolved Hide resolved
core/string/src/lib.rs Outdated Show resolved Hide resolved
@HalidOdat
Copy link
Member

            let ptr =  unsafe { realloc(ptr, new_layout, new_layout.size()) };

This may be unsafe because in the realloc's safety section it has the requirement that the layout passed to it is layout must be the same layout that was used to allocate that block of memory, so it should be the current layout.

See: https://doc.rust-lang.org/stable/core/alloc/trait.GlobalAlloc.html#method.realloc

@HalidOdat
Copy link
Member

HalidOdat commented Jul 25, 2024

I benchmarked with the current implementation (alloc) and with realloc and found that it's considerably faster (14x), here is my benchmark code if you want to reproduce it.

fn main() {
    let mut sb = JsStringBuilder::<u8>::new();

    for _ in 0..10000 {
        sb.push(b'a');
        sb.extend_from_slice(b"bcde".as_slice());
        sb.extend([b'f', b'g'].into_iter());
    }

    let s = sb.build();
    drop(s);
}
hyperfine -N --warmup=10 ./sb-alloc ./sb-realloc
Benchmark 1: ./sb-alloc
  Time (mean ± σ):      10.8 ms ±   0.2 ms    [User: 5.9 ms, System: 4.6 ms]
  Range (min … max):    10.2 ms …  11.9 ms    284 runs

Benchmark 2: ./sb-realloc
  Time (mean ± σ):     728.6 µs ±  78.9 µs    [User: 565.7 µs, System: 96.5 µs]
  Range (min … max):   550.7 µs … 1324.8 µs    3970 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  ./sb-realloc ran
   14.77 ± 1.63 times faster than ./sb-alloc

NOTE: Changed extend's reserve to be self.reserve(Self::new_layout(self.len() + lower_bound)); to avoid the above mentioned bug.

@HalidOdat HalidOdat added enhancement New feature or request performance Performance related changes and issues labels Jul 25, 2024
@HalidOdat HalidOdat requested a review from a team July 25, 2024 05:10
@CrazyboyQCD
Copy link
Contributor Author

CrazyboyQCD commented Jul 25, 2024

@HalidOdat
My OS is Windows11, and seems realloc doesn't work much better for me, what is your OS?
And do you think I should change into realloc?

Benchmark 1: ./jsstringbuilder-alloc.exe
  Time (mean ± σ):       3.2 ms ±   0.2 ms    [User: 0.7 ms, System: 1.9 ms]
  Range (min … max):     3.0 ms …   6.0 ms    952 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ./jsstringbuilder-realloc.exe
  Time (mean ± σ):       3.0 ms ±   0.2 ms    [User: 1.0 ms, System: 1.6 ms]
  Range (min … max):     2.8 ms …   6.1 ms    1028 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary
  ./jsstringbuilder-realloc.exe ran
    1.07 ± 0.08 times faster than ./jsstringbuilder-alloc.exe

@HalidOdat
Copy link
Member

My OS is Windows11, and seems realloc doesn't work much better for me, what is your OS?

I use Linux (EndeavourOS).

And do you think I should change into realloc?

Yes, there is no drawback to using realloc and even on windows (from the results you posted) there is still a small ~7% increase in performance.

@CrazyboyQCD
Copy link
Contributor Author

Yes, there is no drawback to using realloc and even on windows (from the results you posted) there is still a small ~7% increase in performance.

Ok

Copy link
Member

@HalidOdat HalidOdat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the suggested adjustments, this looks great! :)

core/string/src/builder.rs Outdated Show resolved Hide resolved
core/string/src/builder.rs Outdated Show resolved Hide resolved
core/string/src/builder.rs Show resolved Hide resolved
@HalidOdat HalidOdat requested a review from a team August 17, 2024 15:27
@HalidOdat HalidOdat added this to the next-release milestone Aug 17, 2024
@CrazyboyQCD CrazyboyQCD changed the title Add JsStringBuilder to build JsString Add string builder to build JsString Sep 30, 2024
Comment on lines +16 to +28
#[doc(hidden)]
mod private {
pub trait Sealed {}

impl Sealed for u8 {}
impl Sealed for u16 {}
}

/// Inner elements represented for `JsStringBuilder`.
pub trait JsStringData: private::Sealed {}

impl JsStringData for u8 {}
impl JsStringData for u16 {}
Copy link
Member

@jedel1043 jedel1043 Nov 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a second pass I realized that maybe this abstraction is not needed? More so because pushing u16 is always valid, but pushing u8 needs to do latin1 checks, and having separate builders for each type would allow us to reflect that in the API.

Since we now use an enum for the generic StringBuilder that doesn't need the JsStringData trait, maybe we can just have different types for each builder, then just extract the common paths into utility functions.

Copy link
Contributor Author

@CrazyboyQCD CrazyboyQCD Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a second pass I realized that maybe this abstraction is not needed?

JsStringData could be removed since I only export specific builders.

maybe we can just have different types for each builder, then just extract the common paths into utility functions.

Have we already done this by using generic?

More so because pushing u16 is always valid, but pushing u8 needs to do latin1 checks, and having separate builders for each type would allow us to reflect that in the API.

This could be done with implementations for generics seperately like this:

impl<T> JsStringBuilder<T> {
     fn _build(mut self, latin1_check: Option<bool>) -> JsString { ... }
}
impl JsStringBuilder<u8> {
    // ---check inside---
    pub fn build(self) -> JsString {
        self._build(None)
    }
    // ---check outside---
    pub unsafe fn build_as_utf16(self) -> JsString {
        self._build(Some(false))
    }
    pub unsafe fn build_as_latin1(self) -> JsString {
        self._build(Some(true))
    }
}

impl JsStringBuilder<u16> {
    // no need to check
    pub fn build(self) -> JsString {
        self._build(Some(false))
    }
}

Copy link
Member

@jedel1043 jedel1043 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that could potentially work, maybe calling it something like JsStringTypedBuilder to distinguish it from the plain JsStringBuilder that can accept str, utf16 and latin1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Performance related changes and issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants