Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

select!(df, Not(tuple)) does not work #1997

Closed
tlienart opened this issue Oct 30, 2019 · 8 comments · Fixed by #2011
Closed

select!(df, Not(tuple)) does not work #1997

tlienart opened this issue Oct 30, 2019 · 8 comments · Fixed by #2011

Comments

@tlienart
Copy link

The issue is pretty simple:

unwanted = (:col1, :col2, :col3)
select!(df, Not(unwanted))

will fail (getindex error)

unwanted = [:col1, :col2, :col3]
select!(df, Not(unwanted))

will work fine.

Maybe the behaviour from (2) could be generalised to iterables?

@bkamins
Copy link
Member

bkamins commented Oct 30, 2019

This is intentional. Note that for e.g. arrays you have the same:

julia> using InvertedIndices

julia> x = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> x[Not(1)]
2-element Array{Int64,1}:
 2
 3

julia> x[Not((1,2))]
ERROR: ArgumentError: invalid index: (1, 2) of type Tuple{Int64,Int64}

@tlienart
Copy link
Author

Ok then it's fine and thanks for the explanation, it was the first time I was trying to drop columns and was using delete! which prompted me to use this Not syntax I didn't know.

@Sinansi
Copy link

Sinansi commented Nov 11, 2019

Thank you so much! The documentation does not mention that tuples are not allowed.

I spent two hours trying to know how to remove a column from a dataframe.
I am not sure why such a simple task has to be so complicated.

I urge the developers to use column labels more often than column indices.
That is the purpose of dataframes. Indices are meant more for Arrays.

Removing a column by index is very dangerous.
If a table schema has changed, you will remove the wrong columns.

Additionally, using select instead of delete is also confusing.
Please, consider bringing back delete (with column names).

Thank you!

@bkamins
Copy link
Member

bkamins commented Nov 11, 2019

The documentation does not mention that tuples are not allowed.

Tuples are not allowed in Julia in general. If you feel this should be mentioned somewhere in DataFrames.jl documentation please feel free to make a PR.

Removing a column by index is very dangerous.

Agreed. We simply support both options, as sometimes it is more convenient to use number.

Additionally, using select instead of delete is also confusing.

That is why we specifically give an example of it in Getting Started part of the manual (supposedly the first thing one reads when learning DataFrames.jl). You can find it here.

I will add these examples also to docstrings of select and select! to make sure it is clear.
Note that this pattern of dropping columns is standard in e.g. dplyr.

@nalimilan - do you think reverting deletecols! and deletecols is justified (I think having select/select! and getindex to do this job is enough).

@Sinansi
Copy link

Sinansi commented Nov 11, 2019

Thanks bkamins for the timely and caring reply.
I find your reply highly informative.

I am new to Julia (switching from Python)
I will go through the link you provided.

Thanks again!

@bkamins
Copy link
Member

bkamins commented Nov 11, 2019

In the link you have df[:, Not(:col)] example only, therefore I will also add an example using select as noted above.

@Sinansi
Copy link

Sinansi commented Nov 11, 2019

Yes, please!

@bkamins
Copy link
Member

bkamins commented Nov 11, 2019

See #2011 (please leave a comment if something is not clear)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants