Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require AbstractVector from subset selectors #2744

Merged
merged 6 commits into from
May 4, 2021
Merged

require AbstractVector from subset selectors #2744

merged 6 commits into from
May 4, 2021

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented May 2, 2021

Fixes #2740

It was easier than I thought in the end. I have not implemented the ByGroup idea (as one can use filter for this), but we can discuss it here.

Some other minor things I have changed:

  • add information what was changed in patch releases to NEWS.md
  • separate out test/subset.jl as it was in test/grouping.jl which is super big anyway
  • bump version to 1.0.3

@bkamins bkamins added the bug label May 2, 2021
@bkamins
Copy link
Member Author

bkamins commented May 2, 2021

I classify this PR as a bug, since I understand we agree that the current behavior is unintended and confusing.

@bkamins bkamins added this to the patch milestone May 2, 2021
NEWS.md Outdated

* patch v1.0.3: make sure `subset` checks if the passed condition function
returns a vector of values
([#XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not this qualifies for a patch release as this may break existing workflows, right?

Suggested change
([#XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX))
([#2744](https://github.com/JuliaData/DataFrames.jl/pull/2744))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is how we qualify this change. Since we added this feature in 1.0 release only I would prefer to say this is a bug (i.e. not intended functionality) so we patch it. Otherwise we should deprecate it and it could be changed only in 2.0 release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in theory no breaking changes should be introduced until 2.0. In practice there are lots of grey areas like this one. Just like Julia allows "minor changes" across minor versions, I think we can do this kind of thing given that the risk of breakage is limited. But this PR doesn't fix any obviously incorrect behavior nor allows doing something that currently fails -- it's just an improvement to limit confusion. Better keep disruption in patch releases to the strict minimum IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in summary do we go for 1.1.0 or 1.0.3 release with this? I understand you feel 1.1.0 would be better. Right? I am OK with this (I guess this should not cause package update issues).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think 1.1.0 would be better. Not sure whether it's simpler to merge the PR and create a release-1.1 branch, or whether we should keep it open for a while.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am OK to switch to 1.1.0 but I would make the release as soon as possible. I do not want people using the functionality we want to remove.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I admit it's a bit weird to tag 1.1 so quickly, but in practice that's not a problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - it is weird, but it is not a problem so I would go for it (and explain it in my blog on Friday 😄).

src/abstractdataframe/subset.jl Outdated Show resolved Hide resolved
src/abstractdataframe/subset.jl Outdated Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
src/abstractdataframe/subset.jl Outdated Show resolved Hide resolved
src/abstractdataframe/subset.jl Outdated Show resolved Hide resolved
src/abstractdataframe/subset.jl Outdated Show resolved Hide resolved
bkamins and others added 2 commits May 2, 2021 16:11
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
src/abstractdataframe/subset.jl Outdated Show resolved Hide resolved
@@ -77,7 +92,8 @@ Each argument passed in `args` can be either a single column selector or a
described for [`select`](@ref).

Note that as opposed to [`filter`](@ref) the `subset` function works on whole
columns (or all rows in groups for `GroupedDataFrame`).
columns and always makes a subset of rows (or all rows in groups for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "makes a subset of rows" mean? Also it's not mentioned for subset!.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is intended to mean that selection happens row-wise always (not whole groups) - this was the original reason why this PR is made. In particular this is a crucial difference between subset and filter for GroupedDataFrame (as the latter works per group). Could you please propose a better wording to convey this message?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Then maybe this?

works on whole columns (and selects rows within groups for `GroupedDataFrame`
rather than whole groups).

And I'd mention that the result should be a vector of Boolean in the first paragraph as that's really an essential information.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - added

src/abstractdataframe/subset.jl Outdated Show resolved Hide resolved
Comment on lines 200 to 203
If you want to filter rows of an `AbstractDataFrame` or groups of
`GroupedDataFrame` by a function that returns `Bool` value use [`filter!`](@ref)
or [`filter`](@ref) (note that [`filter!`](@ref) is not supported for
`GroupedDataFrame`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync this with subset docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will sync once we finalize subset docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

bkamins and others added 2 commits May 2, 2021 20:23
NEWS.md Outdated
Comment on lines 3 to 4
* patch v1.0.3: make sure `subset` checks if the passed condition function
returns a vector of values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this if bumping to 1.1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated and added more explanations. Fortunately the change we make will now throw an error so it will be easy to catch.

NEWS.md Outdated

* patch v1.0.3: make sure `subset` checks if the passed condition function
returns a vector of values
([#XXXX](https://github.com/JuliaData/DataFrames.jl/pull/XXXX))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I admit it's a bit weird to tag 1.1 so quickly, but in practice that's not a problem.

@bkamins bkamins merged commit c3083fc into main May 4, 2021
@bkamins bkamins deleted the restrict_subset branch May 4, 2021 20:12
@bkamins
Copy link
Member Author

bkamins commented May 4, 2021

Thank you! I will make a 1.1. release now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential confusion with subset
2 participants