Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove dependency on CategoricalArrays.jl in legacy show #2427

Merged
merged 8 commits into from
Oct 13, 2020

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Sep 12, 2020

Using reflection is the best what could come up with unless there is a solution to https://stackoverflow.com/questions/63860020/how-to-get-a-list-of-modules-imported-by-a-given-module (which seems unlikely - I have checked the source code, and it seems that this part of the functionality is not exposed to Julia code and is only available in C).

The approach is assuming that if something is pretending to be CategoricalArrays.jl we can just treat it in display as if it were CategoricalArrays.jl (and in general such pretending is unlikely).

@bkamins bkamins added ecosystem Issues in DataFrames.jl ecosystem non-breaking The proposed change is not breaking labels Sep 12, 2020
# the T is an instance of CategoricalValue from CategoricalArrays.jl
if hasproperty(T, :name) &&
nameof(T.name.module) === :CategoricalArrays &&
T.name.name === :CategoricalValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not care if the T.parameters[1] <: AbstractString here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bacause we are replacing the behaviour for CategoricalValue{<:AbstractString} defined earlier in line https://github.com/JuliaData/DataFrames.jl/pull/2427/files#diff-38a6c4f0cb7743ccc6d4e8470ebdbd11L69. I just want to ensure I am matching functionality 1-2-1 exactly.

Copy link
Member

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; hacky, but whatever, it does the job. The alternative route is we put something in DataAPI.jl like iscategorical(x) or something that CategoricalArrays could overload.

@bkamins
Copy link
Member Author

bkamins commented Sep 13, 2020

The alternative route is we put something in DataAPI.jl like iscategorical(x) or something that CategoricalArrays could overload.

I was also thinking about it, but I think it is best for @nalimilan to decide what is best. In particular if we have iscategorical in DataAPI.jl we probably should handle both: CategoricalValue (for use cases like here) and CategoricalArray (to allow e.g. using it in /src/dataframerow/utils.jl to fire a fast path for CategoricalVector without having to change the code much - AFAICT @nalimilan is working on it now).

@nalimilan
Copy link
Member

Mmm, this is definitely very hacky... I see two other possibly clearner solutions:

  • use an IOContext property that CategoricalValue would dected
  • use print everywhere

I wonder what the drawbacks of using print would be. Besides omitted quotes when printing string and : when printing symbols, it also represents Date("2020-09-13") as 2020-09-13 and Day(1) as 1 day. Maybe that's what we're actually looking for? It's not clear why we do something special for strings but not for other types. What do you think?

@bkamins
Copy link
Member Author

bkamins commented Sep 13, 2020

If we are at it I would actually prefer to switch to print if everyone is OK with this (I thought there was some reason to use show but maybe there was none). PrettyTables.jl uses print.

If we use print we do not need to be hacky here.

@bkamins
Copy link
Member Author

bkamins commented Sep 13, 2020

OK - I have switched show to print. I think we are OK to do this as we show eltype in the header (so e.g. not escaping Char is OK as you see in the changed tests).

@rofinn - could you please have a look at it as you use print in PrettyTables.jl

@bkamins
Copy link
Member Author

bkamins commented Sep 13, 2020

CC @ronisbr not @rofinn - sorry for a typo.

@bkamins
Copy link
Member Author

bkamins commented Sep 14, 2020

TODO. Check what is printed with this PR if you call:

julia> DataFrame(x=[true,false], y=[0x1, 0x2])
2×2 DataFrame
│ Row │ x    │ y     │
│     │ Bool │ UInt8 │
├─────┼──────┼───────┤
│ 1   │ 1    │ 0x01  │
│ 2   │ 0    │ 0x02  │

(this is the output from legacy code)

@matthieugomez
Copy link
Contributor

matthieugomez commented Oct 5, 2020

Stata prints strings in a dataset without quotes but, for clarity, it also uses a different color (red). Maybe that would be useful to do the same.

@bkamins
Copy link
Member Author

bkamins commented Oct 5, 2020

CC @ronisbr, as this PR will not get merged. #2429 is the way to go, and when that is merged I will close this. I keep it open only to make sure #2429 correctly handles CategoricalArrays.jl without introducing a dependency.

@bkamins
Copy link
Member Author

bkamins commented Oct 11, 2020

@nalimilan: given the discussion with @ronisbr we should merge this PR anyway (we need this to get rid of CategoricalArrays.jl dependency for HTML and LaTeX even if we start using PrettyTables.jl for text/plain)

if T <: CategoricalValue
# This is only type display shortening so we
# are OK with any T whose name starts with Categorical here
if startswith(sT, "Categorical")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to handle CategoricalValue, right. so maybe better strictly check for "CategoricalValue"? Abbreviating other types could be weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I have written it when CategoricalString was present.

`truncstring` indicates the approximate number of text characters width to truncate
the output (if it is a non-positive value then no truncation is applied).
"""
function ourshow(io::IO, x::Any, truncstring::Int; styled::Bool=false)
function ourshow(io::IO, x::T, truncstring::Int; styled::Bool=false) where T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T isn't used right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah - yes; it was earlier, but we ended up not using it. Fixing.

bkamins and others added 2 commits October 13, 2020 13:09
@bkamins
Copy link
Member Author

bkamins commented Oct 13, 2020

Now I remember why I used "Categorical" not "CategoricalValue" 😄. It is because CategoricalArrays.jl might not be in scope in which case CategoricalArray.CategoricalValue is eltype name. I have added || now to handle this case.

@bkamins bkamins merged commit f507944 into JuliaData:master Oct 13, 2020
@bkamins bkamins deleted the categorical_show branch October 13, 2020 15:41
@bkamins
Copy link
Member Author

bkamins commented Oct 13, 2020

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecosystem Issues in DataFrames.jl ecosystem non-breaking The proposed change is not breaking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants