Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let var, std, cor take a CovarianceEstimator #734

Closed
tpgillam opened this issue Nov 17, 2021 · 7 comments · Fixed by #815
Closed

Let var, std, cor take a CovarianceEstimator #734

tpgillam opened this issue Nov 17, 2021 · 7 comments · Fixed by #815

Comments

@tpgillam
Copy link
Contributor

Currently StatsBase.jl defines CovarianceEstimator here, and defines methods of cov that take one as the first argument.

I think it would be convenient to allow similar dispatches for related methods. These could have fairly simple default implementations, something like the following (untested) sketch:

var(ce::CovarianceEstimator, x::AbstractVector; kwargs...) = cov(ce, x; kwargs...)

std(ce::CovarianceEstimator, args...; kwargs...) = sqrt(var(ce, args...; kwargs...))

cov2cor!(x::AbstractMatrix) = cov2cor!(x, sqrt.(diag(x)))

cor(ce::CovarianceEstimator, x::AbstractVector, y::AbstractVector) = cov2cor!(cov(ce, x, y))
cor(ce::CovarianceEstimator, X::AbstractMatrix; kwargs...) = cov2cor!(cov(ce, X; kwargs...))
cor(ce::CovarianceEstimator, X::AbstractMatrix, w::AbstractWeights; kwargs...) = cov2cor!(cov(ce, X, w; kwargs...))

Any thoughts or objections to this? I'm happy to make a PR for more concrete discussion if this seems reasonable :)

@nalimilan
Copy link
Member

Why not, but I guess you'd also expect CovarianceEstimation.jl (or another package) to define the corresponding functions? Otherwise it wouldn't be super useful.

@mateuszbaran Any opinion?

@tpgillam
Copy link
Contributor Author

I’d propose including something like the default implementations above in StatsBase.

At that point, so long as the first dispatch of cov is available (since that is effectively var), it might not generally be necessary to override for specific estimators, I would hope?

@nalimilan
Copy link
Member

Ah, got it. Yes, makes sense. Let's hear what @mateuszbaran says though.

@mateuszbaran
Copy link
Contributor

I think this definitely makes sense for cor but so far I haven't seen the need for fancy methods of estimating variance and standard deviation.

@tpgillam
Copy link
Contributor Author

the need for fancy methods of estimating variance and standard deviation.
For example, my current case would be mateuszbaran/CovarianceEstimation.jl#80 - specifically, I would like to provide equivalent functions for biweight_midvariance and biweight_scale](https://docs.astropy.org/en/stable/api/astropy.stats.biweight.biweight_scale.html).

To me, it seems nice to just implement a Biweight (or similar name) covariance estimator, and then be able to get these things with var(Biweight(), ...) and std(Biweight(), ...).

[I would be surprised if there aren't other regularisation techniques that would also fit this pattern - winsorising might be one off the top of my head]

@mateuszbaran
Copy link
Contributor

Sure, if you have a use for that, don't mind my comment 🙂

@nalimilan
Copy link
Member

OK feel free to make a PR @tpgillam then.

rofinn pushed a commit that referenced this issue Jul 15, 2022
* Closes #734

* Undo drive-by tweak

* Apparently needed to avoid duplicating docstrings

* Bump version

* Use private _cov2cor! for now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants