Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate DataArrays in favor of NullableArrays #994

Closed
quinnj opened this issue Jun 11, 2016 · 9 comments
Closed

Deprecate DataArrays in favor of NullableArrays #994

quinnj opened this issue Jun 11, 2016 · 9 comments

Comments

@quinnj
Copy link
Member

quinnj commented Jun 11, 2016

I believe NullableArrays has reached a level of maturity where we should seriously consider defaulting to them instead of DataArrays.

Unfortunately, I'm not familiar with the far-reaching ramifications of the switch, so my naive suggestion is to swap it out and see what breaks and start fixing. I'd appreciate some more mature insight on what the potential breakage/things to deal with are. AFAIK, DataFrames should be fairly robust since we've already abstracted DataArrays out on its own.

My long-term plans are to deprecate Data.Table in the DataStreams.jl package in favor of supporting returning a DataFrame as the defactor in-memory Julia table structure in the DataStreams ecosystem (CSV, ODBC, SQLite, Feather).

I'm happy to push forward on this if there aren't any red flags/major roadblocks.

@nalimilan
Copy link
Member

I have done some work already, I'll push a branch soon. I think it's indeed not that complicated, as you said. I think one of the things we absolutely need to do in time for 0.5 is JuliaStats/NullableArrays.jl#111. I'd certainly appreciate some help.

Cf. JuliaStats/DataArrays.jl#73.

@nalimilan
Copy link
Member

I've pushed the current state of my work to the nl/nullable branch. So far I've got more than half of the tests passing, with quite a bunch of FIXMEs and of temporary definitions of methods to work around lacking features in Base, NullableArrays and CategoricalArrays. This work is very interesting to reveal areas which need improvement for NullableArray to be really an acceptable replacement to DataArrays.

That's really a rough port at this point, but I think it shows we're not that far.

@nalimilan
Copy link
Member

Good news: tests all pass now, except those which are already broken on master. The code now needs a lot of cleanup, in particular we need to implement the missing features in NullableArrays. I'll make a detailed list soon, but you can start having a look at the branch to see how it feels.

@davidagold
Copy link

@nalimilan it seems as though there are some methods that ought to go into Base, too --- e.g. the == and != operators you define in DataFrames.jl --- right?

@nalimilan
Copy link
Member

Yes. Most of these should go into Base, and I'll move the most hacky to other/utils.jl. I wonder about isnull(x::Any) = false and get(x::Any) = x: looks like they can be useful in many cases, but defining these methods in Base may not be a a good idea as it can easily hide bugs. It would be nice to find a way to explicitly require this behavior, but I'm not sure how.

@davidagold
Copy link

I agree that they should be useful in many cases. They also seem sensible. What sort of bugs do you think they would hide? Just unexpected behavior?

@nalimilan
Copy link
Member

Not sure, maybe I'm paranoid.
@johnmyleswhite I guess you had a reason for not implementing them in the first place?

@johnmyleswhite
Copy link
Contributor

My original goal in not defining them was to avoid encouraging people to think about isnull as the universal predicate they should apply when they probably ought to think about types first. In particular, isnull(x::Any) is literally identical to isa(x, Nullable) && isnull(x::Nullable), so I figured I'd skip it. I'm not strongly opposed to adding them, but I feel like most people will end up using them in bad ways.

@nalimilan
Copy link
Member

The branch has been merged now (#1008).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants