Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unique() behaviour, add unique!() #358

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

alyst
Copy link
Contributor

@alyst alyst commented Jun 23, 2021

A draft of a solution for #357, this PR renames unique(::CategoricalArray) to uniquelevels(::CategoricalArray) as it better reflects what this method does
This PR restores the Base.unique() behavior -- makes unique(x::CategoricalArray) return the categorical array of the same type as x and containing the unique values of x in the order of their first appearance.
The internal implementation of unique() is cleaned up and optimized a bit (avoid intermediate array sorting, generalize to AbstractCategoricalArray).

The PR also adds Base.unique!(x::CategoricalArray).

The new unique() is not yet covered by unit tests. I'll do that if the overall approach would be considered ok.

If you need the old unique(x::CategoricalArray) behavior, use unwrap.(unique(x)).
As this is backward-incompatible change, the new minor version would be required.

Needs to be rebased once #359 lands.

Fixes #357, #129.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this then. Unfortunately this is breaking and we released 0.10 recently, so I guess the PR will have to remain unmerged for a while. But better finish it anyway.

src/array.jl Outdated Show resolved Hide resolved
src/array.jl Outdated Show resolved Hide resolved
src/array.jl Outdated Show resolved Hide resolved
src/array.jl Show resolved Hide resolved
src/array.jl Outdated Show resolved Hide resolved
src/array.jl Outdated Show resolved Hide resolved
@ablaom
Copy link

ablaom commented Jul 13, 2021

This does sound like a good solution.

@alyst alyst marked this pull request as ready for review July 27, 2021 13:24
@alyst alyst changed the title Add uniquelevels(), fix unique() behaviour fix unique() behaviour, add unique!() Jul 27, 2021
@alyst
Copy link
Contributor Author

alyst commented Aug 13, 2021

gentle bump

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just waiting for #359 (and then for the next breaking release).

src/CategoricalArrays.jl Outdated Show resolved Hide resolved
src/array.jl Outdated
@@ -854,7 +841,15 @@ end
Drop levels which do not appear in categorical array `A` (so that they will no longer be
returned by [`levels`](@ref DataAPI.levels)).
"""
droplevels!(A::CategoricalArray) = levels!(A, intersect(levels(A), uniquelevels(A)))
function droplevels!(A::CategoricalArray)
# FIXME temporary code until PR#359 is merged
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note so that we don't forget this before merging (x-ref #359).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just rebased the PR against the updated droplevels!() and squashed the commits, so it should be ready for merging (when it's time for a new minor release).

so it conforms to the semantics of the Base.unique()

This is a breaking change that requires a new minor release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unique() returns levels, not the CategoricalArray
3 participants