Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Ones type #135

Closed
matthieugomez opened this issue Sep 30, 2015 · 10 comments · Fixed by #515
Closed

A Ones type #135

matthieugomez opened this issue Sep 30, 2015 · 10 comments · Fixed by #515

Comments

@matthieugomez
Copy link
Contributor

A Ones type would be something like

immutable Ones{N} <: AbstractArray{Int,N}
    dims::NTuple{N, Int}
end
Base.size(O::Ones) = O.dims
Base.getindex(O::Ones, I::Int...) = (checkbounds(O, I...); 1)

(see this stackoverflow answer for the original proposal by @mbauman).

It's helpful because it allows to write efficient, generic code:

  • Its default behavior matches a vector of ones, without the memory allocation
  • Frequently, a non weighted operation can be written more efficiently than the weighted case with a vector of ones. Defining a neutral weight type such as Ones allows to branch elegantly between the weighted and the non weighted versions.

It's a good application of multiple dispatch. It would be great to define something like this in StatsBase.

@matthieugomez
Copy link
Contributor Author

Why should such a type be defined in StatsBase?

  • Suppose I need to compute the median. For efficiency, I'd like to call median(x) in the no weight case and median(x, w) in the weight case. If median(x, w::Ones) = median(x) is defined in StatsBase, I could generically write median(x, w).
  • Moreover, suppose I actually did not know that the algorithm for unweighted median was more efficient than the weighted case with a vector of ones. I would still write median(x, Ones(length(x))) for the syntax simplicity (compared to median(x) vs median(x, w) ). But now, thanks to the definition median(x, w::Ones) = median(x) in StatsBase, the command is automatically redirected to the efficient unweighted method.

@johnmyleswhite
Copy link
Member

I think this is pretty reasonable and a potentially good change, but it adds more complexity.

I would like us to improve the design of our weighting functionality at some point to handle weights with different interpretations (right now everything is a frequency weight); maybe we can address this concern then?

We probably need a different name than Ones, which is too vague. Something like UniformWeights would be better.

@nalimilan
Copy link
Member

UnitWeights might be even better. In the literature, uniform weights sometimes mean 1/N, N being the number of cases. Unit weights doesn't have that ambiguity.

@matthieugomez
Copy link
Contributor Author

Great! I'm also a supporter of different kinds of weights.

@sbromberger
Copy link

Hi,

I may have implemented something like this in LightGraphs: see https://github.com/JuliaGraphs/LightGraphs.jl/blob/v4/src/distance.jl#L5-L14). Let me know if it's useful.

@nalimilan
Copy link
Member

I've just realized it might be possible to implement this using repeated() from Julia Base. See JuliaLang/julia#15988.

@Nosferican
Copy link
Contributor

Any updates in implementing UnitWeights?

@nalimilan
Copy link
Member

Do you want to give it a try? It shouldn't be hard to do (most of the work should be adapting tests to cover it).

@Nosferican
Copy link
Contributor

Sure. I will see if I can give it a try in the next few days and get a PR to review.

@Nosferican
Copy link
Contributor

FillArrays.Ones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants