Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Slice#+(Slice) and Slice.join #12081

Merged
merged 3 commits into from
Feb 14, 2023

Conversation

HertzDevil
Copy link
Contributor

@HertzDevil HertzDevil commented Jun 1, 2022

Resolves #10157.

# ```
#
# See also: `#+(other : Slice)`.
def self.join(slices : Indexable(Slice)) : Slice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 I think we should also have an overload that accepts a splat. That way you can do Slice.join(slice1, slice2, slice3) which I think is going to be the most common scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a splat overload then Slice(Slice) becomes ambiguous:

def Slice.join(slices : Indexable(Slice)); end
def Slice.join(*slices : Slice); end

# first overload
Slice.join(Slice[Slice[1], Slice[2]]) # => Slice[1, 2]
Slice.join(Slice[Slice[1]])           # => Slice[1]

# second overload
Slice.join(Slice[Slice[1]], Slice[Slice[2]]) # => Slice[Slice[1], Slice[2]]
Slice.join(Slice[Slice[1]])                  # => Slice[Slice[1]]

Neither overload is more specific than the other, so Slice.join(Slice[Slice[1]])'s return value depends on the definition order.

Maybe we can require the splat overload to take 2+ arguments, i.e. (slice : Slice, *slices : Slices)? Note that if someone has a Tuple of Slices already then they don't need to explode the Tuple.

(Actually this reasoning applies to some other methods too, such as Object#in?.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither overload is more specific than the other

Is that the case?

I just tried this:

def foo(x)
  puts "no splat"
end

def foo(*x)
  puts "yes splat"
end


foo(1)
foo(1, 2)

It prints "no splat" and "yes splat". Then I tried this:

def foo(*x)
  puts "yes splat"
end

def foo(x)
  puts "no splat"
end

foo(1)
foo(1, 2)

Same output.

So a no-splat overload is more specific than a splat overload. I don't think there's any ambiguity, because if you do Slice.join(...) you would only want the splat version when there's more than one element in that list. Which... I guess means we could also make it explicit with (slice, *slices).

Copy link
Contributor Author

@HertzDevil HertzDevil Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If neither x parameter has a restriction, then yes, the splat-less overload comes first. In fact, the restrictions given above also make the splat-less overload come first, for now:

def foo(slices : Indexable(Slice)); "indexable"; end
def foo(*slices : Slice); "splat"; end

def bar(*slices : Slice); "splat"; end
def bar(slices : Indexable(Slice)); "indexable"; end

slice = Slice[Slice.new(1) { 1 }]
foo(slice) # => "indexable"
bar(slice) # => "indexable"

But if #10711 is merged and -Dpreview_overload_order is defined, then:

  • The two slices parameters are corresponding parameters, because they both match the first positional argument in any valid call;
  • Indexable(Slice) < Slice is false;
  • Slice < Indexable(Slice) is also false;
  • Therefore, neither overload is stricter than the other, because at this point the splat has no effect.

Thus bar(slice) will output "splat" in the future. This is why I believe the splat overload should take 2+ parameters so that the two overloads become disjoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2+ parameter solution would lead to surprising results when the number of arguments comes from indirection (e.g. a macro or splat). The behaviour would change significantly when there is only a single argument.

Copy link
Contributor Author

@HertzDevil HertzDevil Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a splat is unnecessary because the Indexable overload takes care of Tuple arguments already. Macros are probably better off wrapping all arguments to Slice.join and similar methods within a Tuple.new call to make the intent explicit:

macro bad(*x)
  ::Slice.join({{ x.splat }})
end

macro good(*x)
  ::Slice.join(::Tuple.new({{ x.splat }}))
end

The 2+ requirement signals the fact that the splat overload is user-facing, rather than generated code-facing. And then even in user-written code, switching from the splat overload to the Indexable overload is a matter of adding a pair of {} (and maybe () because the syntax requires it) around the arguments; the performance is the same whether the Tuple formation is done by the language or the user. I think we should eventually clean up splat overloads like those in the standard library.

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling for read_only is missing. I think as soon as one of the slices is read_only, the resulting slice should be read_only as well.

@HertzDevil
Copy link
Contributor Author

The new Slice never reuses existing memory, so it doesn't need to respect read_only.

@straight-shoota
Copy link
Member

Hm, if read_only is only used to protect memory, yes then we can ignore it.
It might have other use cases though? Not sure...

@yxhuvud
Copy link
Contributor

yxhuvud commented Jun 1, 2022

I could see an argument that read_only status would be up to the join method to decide, ie, it should be an extra named parameter with a default value (false).

Comment on lines +1559 to +1561
{% elsif T < Slice && @type < Indexable %}
# optimize for slice
Slice.join(self)
Copy link
Contributor

@yxhuvud yxhuvud Jun 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is really surprising behavior for me, but given that it is done for arrays I guess it make sense. Where does support for Array come from? The optimization comes from c2c4d5d but I do wonder if it was intentional that it worked as a synonym of flatten(1) in the first place. (it gives a typeerror in ruby, for reference)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default argument of Ruby's Enumerable#sum is 0; Ruby does not have the notion of .additive_identity or try to infer it. This works:

[[1, 2], [3, 4]].sum([]) # => [1, 2, 3, 4]

only that there is probably no optimization here.

@jgaskins
Copy link
Contributor

jgaskins commented Jun 2, 2022

Hm, if read_only is only used to protect memory, yes then we can ignore it.

@straight-shoota That seems to be true. The only use case I recall seeing for this is to support the contract of strings being immutable, even if you end up trying to do things like my_string.to_slice[0] = '?' and IO::Memory.new(my_string.to_slice) << "lol". The only other occurrence I can find by actively searching in the Crystal codebase is to keep a constant from being mutated.

I do like @yxhuvud's suggestion of letting the caller of Slice.join decide whether the resulting slice is read-only.

@HertzDevil
Copy link
Contributor Author

Just realized we can apply the same treatment to other non-resizable Indexables too. BitArray should just work. StaticArray would require the number of objects to be known (i.e. only the splat overload makes sense, not the Indexable one). Tuple too, but in this case one can already write Tuple.new(*a, *b, ...) or {*a, *b, ...}.

@straight-shoota
Copy link
Member

straight-shoota commented Jan 21, 2023

I'm not entirely sure what's left to do here.

Adding read_only parameter to .join would be good, but I suppose we could also defer it.

Similar implementations for other types should better be handled on their own.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Feb 13, 2023

I did not add a read_only parameter because it would be redundant if we decide to go with #12133. (Same can be said about #map and #map_with_index.) And for the class method, again Indexable takes care of Tuples already and it doesn't take much effort to group all the arguments into a tuple.

In both cases, adding the feature later wouldn't constitute a breaking change.

@straight-shoota straight-shoota added this to the 1.8.0 milestone Feb 13, 2023
@straight-shoota straight-shoota merged commit 5634583 into crystal-lang:master Feb 14, 2023
@HertzDevil HertzDevil deleted the feature/slice-join branch February 15, 2023 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add concatenation operations to Slice
5 participants