Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assigning [ ] for array replacement #46903

Closed
wants to merge 1 commit into from
Closed

assigning [ ] for array replacement #46903

wants to merge 1 commit into from

Conversation

udohjeremiah
Copy link
Contributor

I came from a language like python were there was no talk on "performance", so its very common and I mean SO COMMON, a lot of AI and DS experts write codes like:

>>> data = [1, 2, 3, 4]
>>> data
>>> [1, 2, 3, 4]
>>> data[2] = []
>>> data
>>> [1, 2, [], 4]
>>> data[2].append(10); data[2].append(20)
>>> data
[1, 2, [10, 20], 4]

As seen above by assigning [] to an array, one can "replace" an element.

The = section pointed a lot of things I usually do in Python and it made me know right away, some examples are:

help?> =

Assignment at out-of-bounds indices does not grow a collection. If the
collection is a Vector it can instead be grown with push! or append!.

Assigning [] does not eliminate elements from a collection; instead use
filter!.

From Julia's survey, it shows a lot of users come from Python (me included), so it's best not to assume no one would want to do something of this nature - assigning [] to replace an array. So it should be added to the doc as well and then told that it shouldn't be used for performance reasons. This way python users know first hand and avoid it, other than assuming.

@udohjeremiah udohjeremiah added the domain:docs This change adds or pertains to documentation label Sep 25, 2022
@bramtayl
Copy link
Contributor

bramtayl commented Sep 25, 2022

It seems like in python assigning [] doesn't eliminate elements in a collection either, but replaces them too, like your example shows:

>>> data[2] = []
>>> data
>>> [1, 2, [], 4]

I'm surmising from the original wording that there is some language where

a = [1, 2, 3, 4]
a[3] = []
a == [1, 2, 4]

And I don't see any reason why you shouldn't do this in Julia either:

a = Union{Int, Vector{Int}}[1, 2, 3, 4]
a[3] = Int[]

I also don't understand why you should use replace() or filter! in julia instead. deleteat! might be a better alternative.

@udohjeremiah
Copy link
Contributor Author

Yeah that's what I'm thinking. The "eliminate" is for some language with what you've described above, and I feel it's great to add the "replace" part so python users are aware as well.

@udohjeremiah
Copy link
Contributor Author

udohjeremiah commented Sep 29, 2022

This PR has been downvoted by many and it confuses me why? It would be nice to leave a review on why this is wrong and you downvoted, making me possibly learn something new about Julia, other than just downvoting without a reason for such actions.

Moreover the intended PR seeks to address the fact that the behaviour python users are most familiar with should not be hidden as if it doesn't exists. Rather it should be shown and then explicitly told it's wrong. Many people from python write Julia code this way (I've seen it and that's why I opened the PR) and then say Julia's performance is bad. Why not explicitly address the issue, so such behaviours are avoided.

@KristofferC
Copy link
Sponsor Member

The original motivation for me and the current addition to the docstring doesn't make sense to me.

so its very common and I mean SO COMMON, a lot of AI and DS experts write codes like:

It seems to just be a vector containing vectors. I don't really see what is Python especific about it.

As seen above by assigning [] to an array, one can "replace" an element.

You can do exactly the same "replace"ment in Julia (if the array type is wide enough).

And the two parts in the docstring:

Assigning [] does not replace or eliminate elements from a collection

and

replacing an element with [] should be avoided for performance reasons.

also comes off strange because they kind of contradict each other.

I think the addition here likely causes more confusion than what it solves.

@udohjeremiah
Copy link
Contributor Author

udohjeremiah commented Sep 29, 2022

@KristofferC you didn't get the PR clearly. The docstring of = addressed some behaviours that are common in other languages but won't work in Julia.

I pointed this replacement behaviour because its SO common in Python, so it would be best for the = to address it as well. That is:

>>> data = [1, 2, 3]
>>> data[2] = []
>>> data
[1, 2, []]

fails in Julia:

julia> data = [1, 2, 3];

julia> data[2] = []
ERROR: MethodError: Cannot `convert` an object of type Vector{Any} to an object of type Int64

I might not had gotten the wordings right as you've pointed out (a suggestion would help), but I don't see any meaningful reasons to single out this replacement case if others like "assignment at out-of-bounds indices does not grow a collection" were addressed:

>>> data = [1, 2, 3]
>>> data[10:10] = [4]
>>> data
[1, 2, 3, 4]
julia> data = [1, 2, 3];

julia> data[10:10] = [4]
ERROR: BoundsError: attempt to access 3-element Vector{Int64} at index [10:10]

@KristofferC
Copy link
Sponsor Member

KristofferC commented Sep 30, 2022

>>> data = [1, 2, 3]
>>> data[2] = []
>>> data
[1, 2, []]

Why would you do this in Python instead of remove or del? Do you have some examples where this is used in Python since you say it is so common?

@udohjeremiah
Copy link
Contributor Author

The main idea behind that is not "removing" or "deleting". It's used a lot when when you want a list of list i.e. nested list but don't want to go the hassle of doing that with a function.

The example below is a glorified but real world naive situation, where you want a nested list that has the multiples of its first element. Notice where the [] was used for replacement:

>>> multiples = [2, [3, 6, 9], 4, [5, 10]]
>>> for i in range(len(multiples)):
...     if type(multiples[i]) != list:
            x, multiples[i] = multiples[i], []
        if len(multiples[i]) > 0:
            x = multiples[i][0]
        for j in range(len(multiples[i]), len(multiples)):
            multiples[i].append(x * (j+1))

>>> multiples
[[2, 4, 6, 8], [3, 6, 9, 12], [4, 8, 12, 16], [5, 10, 15, 20]]

So as you can see its used a lot in nested lists. I don't want to go into details or great examples where the PR would turn too verbose or be misleading.

@KristofferC
Copy link
Sponsor Member

You would pretty much do the same thing for lists of lists in Julia. You could also empty! the list but maybe it is aliased to something else. It isn't clear to me how using replace for this would be natural.

Maybe if you write that code in Julia as well, it would be more clear what the difference is.

@udohjeremiah
Copy link
Contributor Author

julia> multiples = [2, [3, 6, 9], 4, [5, 10]];

julia> for i in 1:length(multiples)
           if !(typeof(multiples[i]) <: AbstractVector)
               x, multiples = multiples[i], replace(multiples, multiples[i]=>[])
           end
           if length(multiples[i]) > 0
               x = multiples[i][1]
           end
           for j in length(multiples[i]):length(multiples)-1
               append!(multiples[i], x * (j+1))
           end
       end

julia> multiples
4-element Vector{Any}:
 Any[2, 4, 6, 8]
 [3, 6, 9, 12]
 Any[4, 8, 12, 16]
 [5, 10, 15, 20]

See! The only major difference comes from:

x, multiples[i] = multiples[i], []
 x, multiples = multiples[i], replace(multiples, multiples[i]=>[])

I wrote the code exactly the same in the best way I could, and the other differences comes from the 0 and 1 based indexing. The major difference was the = [] that python allowed and the = replace(multiples, multiples[i]=>[]) Julia uses.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Sep 30, 2022

x, multiples[i] = multiples[i], []

This works fine in Julia as well though. Why would replace be any better here (which means you have to loop over the array and look for the element)?

@udohjeremiah
Copy link
Contributor Author

udohjeremiah commented Sep 30, 2022

It doesn't work. GENERALLY it doesn't work, the example I gave you is a special case because the type of the vector is Any. See below what I am trying to address with this PR as you keep missing it.

>>> data = [1,2, 3, 4]
>>> data
[1, 2, 3, 4]
>>> data[2] = []
>>> data
[1, 2, [], 4]
julia> data = [1, 2, 3, 4];

julia> data
4-element Vector{Int64}:
 1
 2
 3
 4

julia> data[2] = []
ERROR: MethodError: Cannot `convert` an object of type Vector{Any} to an object of type Int64

julia> data = replace(data, data[2]=>[]);

julia> data
4-element Vector{Any}:
 1
  Any[]
 3
 4

@KristofferC
Copy link
Sponsor Member

That's just because Python arrays are always of type Any. And as you said, you would only do something like this if you have a list of lists where you want to reinitialize one of the inner lists, not if you have a list of numbers.

@udohjeremiah
Copy link
Contributor Author

It's not always a list of list. Most times, the list grows from a numbers list to a list of list.

@KristofferC
Copy link
Sponsor Member

I still think this is likely to leave a reader more confused with this bit included than without (and there seems to be some general agreement of this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:docs This change adds or pertains to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants