-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future plan for Stats.jl and statistical computing in general #4168
Comments
I like this plan. cc: @dmbates, @johnmyleswhite, @HarlanH. |
It would be unfortunate to have to use DataFrames to do statistics (i.e., the R approach), as implied in (2). It would preferable if all statistical functions worked directly on Arrays and Numbers and then had wrappers to use DataFrames as well. Otherwise, those of us that use large multidimensional Arrays have extra overhead to push Arrays (or Array slices) into DataFrames (probably in a loop) just to call stats functions that ultimately just act on Arrays. This approach turned me off to R. A DataFrame is just an abstraction for tabular data and in my experience (climate modeling) most data sets and model output do not fit nicely into this framework. |
I'm generally on board: (1) For the moment, moving Stats back into Base seems unwise to me since removing Stats from Base has given us much more freedom to work. In addition, Stats depends upon both NumericExtensions and Distances, so those also would have to be brought into Base. In the long run, I'd like all of that functionality brought into Base -- but that's a few months away I would think. (2) I'm happy to make Stats into a meta-package, but we have to then create a temporary package that stores all of the material currently in Stats. (3) Unified documentation would be great. (4) Function references would be great as well. @BobPortmann, all of the functions in Stats already work on vectors by default. DataFrames then extends them to work on DataArray's, which are the relevant data structure. Operations on DataFrames are defined in terms of actions on DataArray's. If you dislike DataFrames, you can avoid them. You can also avoid DataArray's. |
I'm fine with moving to a "sumo" Stats.jl that requires the other packages. @johnmyleswhite Commits on the current |
@johnmyleswhite Yes, I realize that is true now but I thought that item (2) above was proposing to move away from that model. I glad to hear that it is not. I'm not sure what you mean by DataArrays. Are these normal Arrays or an extension of DataFrames to higher dimensions? |
@BobPortmann DataArrays are arrays that include a missing data specification. A DataFrame can contain one or more DataArray or other types of vectors or arrays. See DataFrames/src/dataarray.jl |
The general consensus seems to be that we still need Stats.jl to be independent. I do wish there was an easy way to get command line help for package functions. |
I would love to do the following to make statistical computing much more accessible in julia. This would make things a lot simpler for folks coming from the R world.
help
does not have the ability to look into packages.The text was updated successfully, but these errors were encountered: