Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all RDA functionality from the package #1031

Merged
merged 4 commits into from
Aug 31, 2016
Merged

Conversation

ararslan
Copy link
Member

@ararslan ararslan commented Aug 10, 2016

NOTE: DO NOT MERGE UNTIL RDATA.JL IS REGISTERED! RData has been registered.

Farewell RDA, you served us well, but you'll be happier in your new home.

Fixes #955.

@nalimilan
Copy link
Member

Would there be a way to reexport RData.jl functions and print a deprecation warning when they are called via DataFrames.jl, but not after the user has called using RData? I'm afraid people will have a hard time discovering where the feature have gone.

@ararslan
Copy link
Member Author

ararslan commented Aug 16, 2016

I figured we'd do it in a similar manner to how prime functionality was removed from Base. That's what I did here (not really obvious in the large diff).

Edit: Sorry, wrong link: https://github.com/JuliaStats/DataFrames.jl/blob/702e3bd0556b30416d503ffae8bbd8c62048a7af/src/deprecated.jl#L20

@ararslan
Copy link
Member Author

If we continue to export read_rda here, we won't be able to have it exported in RData until after the deprecation period here. Maybe that's okay though, dunno.

@nalimilan
Copy link
Member

We could also @reexport that function, since the deprecation warning printed by RData.jl will give useful instructions.

@ararslan
Copy link
Member Author

That's a good idea.

@ararslan ararslan changed the title [WIP] Remove all RDA functionality from the package Remove all RDA functionality from the package Aug 21, 2016
@@ -1,18 +1,23 @@
using Base.depwarn
import Base: @deprecate, depwarn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import is not needed here as we don't extend these methods. Also, better have this in a separate commit.

@nalimilan
Copy link
Member

The failures on all platforms are worrying...

@ararslan
Copy link
Member Author

Indeed, though it appears to be caused by JuliaData/RData.jl#8. Once that's fixed, things should (hopefully) work as expected.


function read_rda(args...)
depwarn("read_rda is deprecated. Use the load function from the RData package.")
RData.load(args...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should call FileIO.load(), because the file format-specific load() functions are not meant to be called directly from the user code (and they have special signatures), rather by FileIO.load() that delegates loading to specific package based on the current registry of formats. So the more accurate deprecation message would be something like:

`read_rda()` is deprecated. R data format support is moved to `RData` package. Use `load()` function.

It is also possible to remove circular dependency and replace RData with FileIO in REQUIRE. If read_rda() is called before using RData (i.e. ".rda" not registered), FileIO.load() will give an error about unsupported data type (or read_rda() could check whether ".rda" format is registered in FileIO and suggest the user to run using RData).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, but the message should be "use FileIO.load()".

As regards registering, you should make a PR against FileIO to associate RData with .rda files: https://github.com/JuliaIO/FileIO.jl/blob/master/src/registry.jl See also https://github.com/JuliaIO/FileIO.jl#adding-new-formats

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalimilan add_format() is already called during RData initialization, but it's a good idea to register it in FileIO directly. In that regard, what extensions should we register? ".rda" and ".rdata"?

@dmbates
Copy link
Contributor

dmbates commented Aug 24, 2016

@alyst It is more common to use the .RData extension than .rdata.

@nalimilan
Copy link
Member

Better register .rda, .RData and .rdata. (The Freedesktop MIME database has the first and the last one, which also matches the second one as a fallback.)

@alyst
Copy link
Contributor

alyst commented Aug 26, 2016

FileIO v0.1.2 knows about .rda&co out of the box (JuliaIO/FileIO.jl#75), so it should be added to the DataFrame requirements instead of RData and read_rda() should call FileIO.load().
RData v0.0.4 supports the updated FileIO (JuliaData/RData.jl#15).

@ararslan
Copy link
Member Author

The nightly OS X failure seems to be Travis' fault but I restarted that build just in case. That aside, is this good to go?

@ararslan
Copy link
Member Author

ararslan commented Aug 30, 2016

Everything is nice and green now; the failure was indeed unrelated. If no one objects, I'll go ahead and merge this a little later. Just realized there are a couple of comments I missed. Will get to those later today. Sorry, confused myself. I did everything I had intended to do.

@ararslan ararslan merged commit d2987d1 into master Aug 31, 2016
@ararslan ararslan deleted the aa/goodbye_rda branch August 31, 2016 16:58
@nalimilan
Copy link
Member

Good to see tests green again!

@ararslan
Copy link
Member Author

ararslan commented Sep 5, 2016

It looks like the deprecation may not have been done properly, or else RDatasets wouldn't be broken as in #1054. (Also ref the fix in JuliaStats/RDatasets.jl#43) @nalimilan, any idea?

@andreasnoack
Copy link
Member

I just hit this and I open a PR soon but adding a test was a little harder than expected.

@tlnagy
Copy link

tlnagy commented Sep 5, 2016

I don't know where would be the best place to bring this up, but are there any plans to do more rigorous cross-package testing?

My beef with this split everything up into small contained packages is that the unit tests pass in each one, but changes get merged all the time that end up breaking downstream packages. It really doesn't really incite too much confidence in the package ecosystem as a whole if the rug can be pulled out from under your code at any point. When more code was included in the small package then unit tests encapsulated more of the whole and were a more reliable indicator of breakage.

@ararslan
Copy link
Member Author

ararslan commented Sep 5, 2016

I'd love to see PkgEvaluator become a CI service, wherein a Travis-like check on a PR will tell you what packages will be broken by a change, but I don't think the resources currently exist to make that happen.

Pipe dreams aside, I agree that things like this can be frustrating for all involved. But I also think that smaller, more modularized packages help both with pinpointing the source of breakages and making packages less monolithic and easier to maintain. As long as packages have unit tests that exercise the code in their respective packages how it's intended to be used, I don't think it's too much of a problem for an issue with package X to be found by package Y; the developers of package X can't come up with an exhaustive list of every use case for their package, so it just isn't feasible to test everything at once. As long as everyone remains in communication and submits issues and PRs, I think what we have now is sustainable.

Others may feel differently, but that's my $0.02.

@nalimilan
Copy link
Member

@ararslan The error from http://pkg.julialang.org/logs/Gadfly_0.5.log indicates that depwarn is called incorrectly: it needs a second argument giving the name of the function as a symbol.

@andreasnoack
Copy link
Member

@nalimilan See #1056

@tlnagy
Copy link

tlnagy commented Sep 5, 2016

As long as packages have unit tests that exercise the code in their respective packages how it's intended to be used, I don't think it's too much of a problem for an issue with package X to be found by package Y; the developers of package X can't come up with an exhaustive list of every use case for their package, so it just isn't feasible to test everything at once.

I think this is okay as long as release versions of packages don't break, if possible. That's what was a bit frustrating about this change was that it broke the release version of RDatasets before a patch was available.

@ararslan
Copy link
Member Author

ararslan commented Sep 5, 2016

Yeah, in this case the breakage was totally my fault, the result of a stupid mistake. I sincerely apologize for the frustration.

Something we may want to consider before tagging new releases is to check which packages depend on the package we want to tag (perhaps using JuliaPkgDb!), check out master on our package, and run Pkg.test on the ones that depend on it.

@@ -16,6 +16,8 @@ using Reexport
using GZip
using SortingAlgorithms

using FileIO # remove after read_rda deprecation period
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ararslan Why is the dependency here FileIO and not RData?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileIO recognizes the R data format by itself, so FileIO.load should work without RData. As for why the dependency is FileIO instead of RData, I had initially set it to RData but changed it to FileIO and I forget why. 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...but only if RData is loaded, right? So the dependency in REQUIRE should actually be RData? I was also a bit puzzled by the warning since it first points to RData and then mentions FileIO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I think the dependency should be on RData and the deprecation should use RData.load (which I also had before and changed). The deprecation should probably also just be simplified to, as Milan suggested, @deprecate read_rda(args...) RData.load(args...).

Copy link
Contributor

@alyst alyst Sep 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a discussion in #1031 why it should be FileIO.load() (because it accepts string names, whereas RData.load() method has different signature and is not to be called directly, it gets called by FileIO.load()). FileIO knows about .rda format and knows that its support is provided by RData package. So if no RData package is installed, the user would be asked to install it. By requiring FileIO we avoid circular references DataFrames -> RData -> DataFrames.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! I knew there was a reason I changed it. Thanks, @alyst. Perhaps you might want to update your PR, @andreasnoack?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyst Thanks for the explanation. I've updated my PR.

maximerischard pushed a commit to maximerischard/DataFrames.jl that referenced this pull request Sep 28, 2016
* Remove all RDA functionality from the package
* Depend on RData for the deprecation period
* Use FileIO instead of RData
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants