Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #8099: Add binary:join/2 to stdlib #8100

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

onno-vos-dev
Copy link
Contributor

See linked issue for details.

Copy link
Contributor

github-actions bot commented Feb 8, 2024

CT Test Results

    2 files     93 suites   34m 41s ⏱️
2 017 tests 1 968 ✅ 48 💤 1 ❌
2 327 runs  2 276 ✅ 50 💤 1 ❌

For more details on these failures, see this check.

Results for commit d7bb143.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

<<"a, b, c">>
```
""".
-doc(#{since => <<"OTP 27.0">>}).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it'll make it into OTP 27 or not but I'm assuming that RC is still open 👍

-spec join([binary()], binary()) -> binary().
join([H], _Separator) -> H;
join([H | T], Separator) ->
join(T, Separator, H).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as an option

Suggested change
join(T, Separator, H).
lists:foldl(fun(Element, Acc) -> <<Acc/binary, Separator/binary, Element/binary>> end, H, T).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would that not lose some performance though? Performance being the main reason for this implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's very little change performance wise in this suggestion nor do I see the big difference in reduction of complexity? Care to elaborate why you'd prefer this option? 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a will to replace generic code patterns with standard library functions (to use high-order-functions instead of manual recursion).

lib/stdlib/src/binary.erl Outdated Show resolved Hide resolved
@onno-vos-dev onno-vos-dev force-pushed the issue-8099-add-binary-join-to-stdlib branch 2 times, most recently from 335d9fc to 26e06f7 Compare February 8, 2024 10:10
lib/stdlib/src/binary.erl Outdated Show resolved Hide resolved
@onno-vos-dev onno-vos-dev force-pushed the issue-8099-add-binary-join-to-stdlib branch 2 times, most recently from 4b2771c to e0df3f0 Compare February 8, 2024 11:53
@onno-vos-dev
Copy link
Contributor Author

Squashed to get rid of some of the commit noise

@onno-vos-dev onno-vos-dev force-pushed the issue-8099-add-binary-join-to-stdlib branch from e0df3f0 to 036b654 Compare February 8, 2024 12:39
lib/stdlib/src/binary.erl Outdated Show resolved Hide resolved
lib/stdlib/src/binary.erl Outdated Show resolved Hide resolved
@onno-vos-dev onno-vos-dev force-pushed the issue-8099-add-binary-join-to-stdlib branch from 3fc013a to fa6d3b7 Compare February 8, 2024 14:49
@paulo-ferraz-oliveira
Copy link
Contributor

Fwiw, there was an attempt at this a few years back, that ended up not making it due to a core team decision. It's possible, though, that stuff's changed since then.

Comment on lines +955 to +957
%% Starting with an empty binary convinces the compiler to use the new "private append" optimisation
Acc = <<>>,
join(T, Separator, <<Acc/binary, H/binary>>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bjorng @jhogberg @michalmuskala

I understand why starting with the empty string makes the compiler use the new "private append" optimisation. However I wonder if this could be generalized?

While H cannot be privately appended because it comes from an external function, the next iteration of join/3 can be privately appended, because no one uses the intermediate binary. Therefore, for those "closed loops", would it make sense to have a bit in the binary that tells when to private append or not? Generally speaking, everything after the first iteration would be privately appended. This would allow private append to happen in more situations, although I am not aware of the costs of reserving one extra bit for binaries.

PS: I may be completely off mark here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why starting with the empty string makes the compiler use the new "private append" optimisation. However I wonder if this could be generalized?

While H cannot be privately appended because it comes from an external function, the next iteration of join/3 can be privately appended, because no one uses the intermediate binary.

The private append operation is faster because it does fewer tests and less working than general append operation. Therefore, the private append must only be used with a binary that has been specially prepared.

However, it should be possible for the compiler to generalize the optimization. If the call looks like:

join(T, Separator, H)

the compiler could rewrite it to:

    Acc = <<>>,
    join(T, Separator, <<Acc/binary, H/binary>>);

Not sure that this code pattern is common enough to make the optimization worthwhile to implement, though. It would not be needed if the clause is rewritten to:

join([_ | _]=List, Separator) when is_binary(Separator) ->
    join(List, Separator, <<>>);

lib/stdlib/src/binary.erl Outdated Show resolved Hide resolved
@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Feb 12, 2024
@jhogberg jhogberg added the stalled waiting for input by the Erlang/OTP team label Feb 19, 2024
@bjorng bjorng added this to the OTP-28.0 milestone Feb 20, 2024
@bjorng
Copy link
Contributor

bjorng commented Feb 20, 2024

After the first release candidate, we generally focus on bug fixes and polishing of features already included or planned for the release. To ensure that Erlang/OTP 27 will be as good as it possibly can be, we need to minimize the time we spend on things not to be included in the release. Therefore, we will not review this pull request until after OTP 27 has been released. If we have not came back to it before September, feel free to remind us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stalled waiting for input by the Erlang/OTP team team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet