Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopting Transducers.jl as a Dependency #890

Open
ParadaCarleton opened this issue Sep 7, 2023 · 2 comments
Open

Adopting Transducers.jl as a Dependency #890

ParadaCarleton opened this issue Sep 7, 2023 · 2 comments

Comments

@ParadaCarleton
Copy link
Contributor

Many functions--e.g. mean, variance, etc.--could be made parallelizable, faster, shorter, and more general--by accepting Transducers.jl as a dependency, and it would substantially simplify the implementation of some features. I find myself reaching for it but having to use clumsier iterators or broadcasting methods often.

Luckily, Transducers.jl is now being maintained by Mason Protter and the rest of the people working on the JuliaFolds ecosystem.

The package and its dependencies have been pared down substantially over time and should not be a major contributor to StatsBase.jl's loading time. Transducers is now lightweight, with only about 80ms load time for all dependencies (including indirect dependencies) on v1.10.

julia> @time_imports using Transducers
      0.2 ms  Adapt
      6.1 ms  MacroTools
      0.5 ms  StaticArraysCore
      0.3 ms  ConstructionBase
      6.4 ms  Setfield
      0.3 ms  ArgCheck
      0.1 ms  Compat
      0.1 ms  Compat → CompatLinearAlgebraExt
      6.4 ms  InitialValues
               ┌ 0.0 ms Requires.__init__() 
     32.5 ms  Requires 98.74% compilation time
               ┌ 0.0 ms BangBang.__init__() 
      4.7 ms  BangBang
      9.5 ms  Baselet
      0.2 ms  CompositionsBase
      0.2 ms  DefineSingletons
      2.9 ms  MicroCollections
     30.2 ms  Test
      4.5 ms  SplittablesBase
     14.2 ms  Transducers

The primary advantage would be to simplify the implementation of many features, enable in-place algorithms that can be substantially faster and more memory-efficient, and to use a more generic interface than the iterator interface (as transducers can operate on collections that are not themselves iterators).

@ParadaCarleton
Copy link
Contributor Author

BTW, @devmotion, the reason why I'm interested in Transducers.jl is I'm working on a PR that fixes all of the loops and uses of @inbounds in StatsBase.jl; Transducers.jl can replace most of these loops with faster (but less bug-prone) constructions. I think finally killing off @inbounds with no performance penalty (and in most cases a speedup) would be worth it.

@devmotion
Copy link
Member

Any remaining @inbounds issues could be fixed without switching to Transducers, so for me that's not a compelling argument for adopting such a large dependency (and I guess it's completely impossible for code that will be moved to the Statistics stdlib?). Even if StatsBase would use Transducers at some point, I think it would be good to keep bugfixes separate from a transition to/adoption of Transducers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants