Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move statistics.jl and related methods from Base to StatsBase #379

Merged

Conversation

fredrikekre
Copy link
Contributor

See JuliaLang/julia#27152

This passes on Julia v0.6.2, master, and on fe/stats.

fredrikekre added a commit to JuliaLang/julia that referenced this pull request May 20, 2018
this commit removes cor, cov, median, median!,
middle, quantile, quantile!, std, stdm, var,
varm and linreg and moves them to StatsBase

fix #25571 (comment)
    (included in JuliaStats/StatsBase.jl#379)
fix #23769 (included in JuliaStats/StatsBase.jl#379)
fix #27140
@codecov
Copy link

codecov bot commented May 20, 2018

Codecov Report

Merging #379 into master will decrease coverage by 9.79%.
The diff coverage is 7.14%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #379     +/-   ##
=========================================
- Coverage   90.58%   80.78%   -9.8%     
=========================================
  Files          19       20      +1     
  Lines        2028     2274    +246     
=========================================
  Hits         1837     1837             
- Misses        191      437    +246
Impacted Files Coverage Δ
src/StatsBase.jl 100% <ø> (ø) ⬆️
src/deprecates.jl 0% <ø> (ø) ⬆️
src/base.jl 0% <0%> (ø)
src/cov.jl 97.05% <100%> (ø) ⬆️
src/moments.jl 99.4% <88.88%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb5138e...72896d2. Read the comment docs.

fredrikekre added a commit to JuliaLang/julia that referenced this pull request May 25, 2018
fredrikekre added a commit to JuliaLang/julia that referenced this pull request May 25, 2018
fredrikekre added a commit to JuliaLang/julia that referenced this pull request May 25, 2018
@fredrikekre fredrikekre force-pushed the fe/stats-from-base-to-statsbase branch from 8fe4df1 to 2c7f670 Compare May 25, 2018 13:31
test/misc.jl Outdated
@@ -24,17 +24,17 @@ b = [true, false, false, true, false, true, true, false]

# indicatormat

I = [false true false false false;
II = [false true false false false;
true false false false true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjust alignment.

test/runtests.jl Outdated

@static if StatsBase.BASESTATS_IN_STATSBASE
using Random
const Compatvar = var
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we just shadow the standard var, etc. functions instead of defining separate ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since var and Compat.var are different functions on 0.6 I couldn't come up with a better solution. How did you mean?
I don't think it matters much, presumably StatsBase will drop 0.6 when 0.7 is released anyway, and then these will all just be var without Compat being involved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of something like const var = StatsBase.var in one branch, and const var = Compat.var in the other one. The point is precisely to limit as much as possible the changes that will have to be applied manually when dropping 0.6 support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But

const var = Compat.var

would mean that StatsBase extend the unexported Compat.var and people will have to use a fully qualified name for this. In addition, that would mean StatsBase would export the unexported Compat.var and break things...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a test file, it doesn't affect what users see at all AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, was thinking about the same case we have in the source file. But again, I don't think we can do that since they are two different functions, and we use them both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's keep it that way then.

test/runtests.jl Outdated
@@ -22,6 +34,11 @@ tests = ["ambiguous",
"statmodels"]#,
#"statquiz"]

@static if StatsBase.BASESTATS_IN_STATSBASE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be run even on 0.6? If needed, StatsBase can reexport Base functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test in this file have been changed multiple times while it was in base I don't think so.


using Test, Random, LinearAlgebra, SparseArrays

if true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to use StatsBase.BASESTATS_IN_STATSBASE?

@nalimilan
Copy link
Member

I've added comments I had prepared based on the previous state of the PR. Some still appear to be relevant.

BTW, we should find a better file name than statistics.jl.

@fredrikekre
Copy link
Contributor Author

BTW, we should find a better file name than statistics.jl.

Does it really matter what this is called? I assume that once 0.6 support is dropped we will just split this file and distribute the functions to the right place.

@nalimilan
Copy link
Member

Maybe call it base.jl then?

@fredrikekre fredrikekre force-pushed the fe/stats-from-base-to-statsbase branch from 2c7f670 to 53d5024 Compare May 25, 2018 14:19
fredrikekre added a commit to JuliaLang/julia that referenced this pull request May 28, 2018
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge and tag soon since the functions have been removed from Base. Just needs to adjust the check based on the Julia version.

src/StatsBase.jl Outdated
@@ -193,11 +193,24 @@ module StatsBase
export midpoints
end

const BASESTATS_IN_STATSBASE = false # VERSION >= v"1.2.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update this now that PR has been merged in Julia?

@ViralBShah
Copy link
Contributor

Note the coalesce to something rename when this is merged: JuliaLang/julia#27258

@fredrikekre fredrikekre force-pushed the fe/stats-from-base-to-statsbase branch from 53d5024 to 72896d2 Compare May 28, 2018 13:01
@andreasnoack
Copy link
Member

@fredrikekre Is the version check updated and this ready to go?

@fredrikekre
Copy link
Contributor Author

Yes.

@andreasnoack andreasnoack merged commit b7212f9 into JuliaStats:master May 28, 2018
@nalimilan
Copy link
Member

Docstrings for new functions do not appear in the manual, but I guess that's just because we currently deploy it from Julia 0.6?

@fredrikekre fredrikekre deleted the fe/stats-from-base-to-statsbase branch May 28, 2018 16:32
@fredrikekre
Copy link
Contributor Author

Docstrings for new functions do not appear in the manual, but I guess that's just because we currently deploy it from Julia 0.6?

I actually didn't think about the docs, but yea, they don't appear in StatsBase on julia 0.6 at all so thats also why.

IMO the simplest thing for StatsBase would be to drop 0.6 at this point.

@nalimilan
Copy link
Member

At least we could switch to deploying docs from 0.7, assuming it becomes a supported choice on Travis once the alpha is released.

@martinholters
Copy link
Contributor

martinholters commented May 29, 2018

Um, this interacts badly with Compat. Compat provides its own var, std, ... with keyword support not present in 0.6, passing the call through to the respective function in Base. Which now doesn't work anymore on 0.7. But Compat cannot depend on StatsBase because StatsBase depends on Compat. So the solution would be to remove those functions from Compat, and let people use StatsBase on 0.6, too. But IIUC correctly, StatsBase only provides the old interface on 0.6, so that doesn't work. Furthermore, StatsBase explicitly uses (at least) Compat.cov and Compat.varm. I wonder how that ever passes tests on a Julia build with those functions removed. Am I missing something? EDIT: It only uses these Compat function on older Julia, where needed.

@fredrikekre
Copy link
Contributor Author

Furthermore, StatsBase explicitly uses (at least) Compat.cov and Compat.varm.

I side stepped that, see #379 (comment)

@martinholters
Copy link
Contributor

So, what I'd like to see is to have e.g. StatsBase.var have the same interface across Julia versions; then we can safely drop those from Compat.

@fredrikekre
Copy link
Contributor Author

So, what I'd like to see is to have e.g. StatsBase.var have the same interface across Julia versions

Wouldn't that require StatsBase to have one exported var and one unexported var seems... impossible?

@martinholters
Copy link
Contributor

My idea would be to export var after 0.7.0-DEV.5238 and only provide an unexported, new-interface version before that. Could that work?

@martinholters
Copy link
Contributor

If that doesn't work, maybe a StatsBase.Compat submodule?

The two options I see is duplicating the code from here to Compat or moving the code from Compat here. Obviously, I'd favor moving code over duplicating it.

Well, a third option would be to create a new package StatsCompat or whatever just for those and move the Compat code there. But note that StatsBase could not depend on it, making adjustments here necessary anyway.

@nalimilan
Copy link
Member

I agree with @martinholters. We should:

  • have StatsBase provide unexported var with the same interface on 0.6 and 0.7, but export it only after 0.7.0-DEV.5238
  • remove var from Compat after 0.7.0-DEV.5238, and have it just print a deprecation warning pointing to StatsBase (just like Base does)

@fredrikekre
Copy link
Contributor Author

This would be very much simpler if we could have separate branches for pre and post 0.7.0-DEV.5238

@martinholters
Copy link
Contributor

Ah, some of those functions had already been extended in StatsBase, right? Then having them suddenly be distinct, unexported functions seems like a no-go. I can try to prepare a PR, putting the code from Compat into a StatsBase.Compat submodule, if that is considered an option.

@nalimilan
Copy link
Member

I guess it's fine to make different branches if that helps.

@martinholters
Copy link
Contributor

I don't see how that would help much. Wouldn't we still need to provide e.g. two different varm functions on 0.6, the Base one with extensions (if I got that right) and an unexported one with the 0.7-interface?

@fredrikekre
Copy link
Contributor Author

I can try to prepare a PR, putting the code from Compat into a StatsBase.Compat submodule, if that is considered an option.

That seems like the simplest approach, and direct Compat.var to StatsBase.Compat.var (although we need another name for that internal Compat module I think since StatsBase uses Compat for other things...)

I guess it's fine to make different branches if that helps.

I am trying this now, should be doable.

I don't see how that would help much.

Then we can do e.g. Base.var(args...) instead of import Base: var; function var .. end for adding methods, which lets us have a module local var and also extend the Base function.

@martinholters
Copy link
Contributor

Of course! If that pans out, it would be better than the submodule approach, I think. Only if people had been doing StatsBase.var, they would be in for a surprise. But why would they? But are separate branches better than a lot of if VERSION blocks?

@fredrikekre
Copy link
Contributor Author

After trying some I think that the submodule is much simpler, albeit uglier. I think most users have not upgraded yet anyway, and when they do they will likely drop 0.6 right away, meaning that this ugly middle ground might not be visible that much. Some other things that complicates this is that StatsBase have its own deprecations for some of these functions, and also, the methods that StatsBase extend does not use dims as a kwarg.

@martinholters
Copy link
Contributor

After trying some I think that the submodule is much simpler, albeit uglier.

Are you working on it or should I give it a shot?

@fredrikekre
Copy link
Contributor Author

#382

andreasnoack pushed a commit to JuliaLinearAlgebra/SparseLinearAlgebra.jl that referenced this pull request Jun 21, 2018
nalimilan pushed a commit to JuliaStats/Statistics.jl that referenced this pull request Sep 18, 2019
nalimilan pushed a commit to JuliaStats/Statistics.jl that referenced this pull request Sep 18, 2019
KristofferC pushed a commit to JuliaSparse/SparseArrays.jl that referenced this pull request Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants