Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is "within scope" for the data.table package? #5722

Closed
TysonStanley opened this issue Nov 3, 2023 · 3 comments · Fixed by #5772
Closed

What is "within scope" for the data.table package? #5722

TysonStanley opened this issue Nov 3, 2023 · 3 comments · Fixed by #5772
Labels
governance Project governance

Comments

@TysonStanley
Copy link
Member

TysonStanley commented Nov 3, 2023

As we are working towards a new governance document for data.table (#5676), it seems important to consider what features are “within scope” of the data.table package. For instance, it is clear that essentially anything tied to data wrangling, cleaning, reformatting, structuring, and analysis are all within the scope of the package. But currently plotting (e.g., something like ggplot2) is not. To help communicate and guide feature requests and contributions, having this laid out clearly is important. Notably, the README states “data.table provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.” Obviously, there are additional features (e.g., fread()) in addition to the high-performance version of data.frames.
 
To start, here are the features that are already within scope (as described in the README).

  • Data manipulation and analysis
    • reshaping/pivoting
    • aggregation/summarizing
    • subsetting rows
    • all sorts of joining (left/right/full/inner, rolling, etc.)
    • adding/updating/deleting columns
  • Reading/writing of data from/to many file formats

Topics we currently believe are out of scope:

  • plotting/graphics (like ggplot2)
  • manipulating data stored on disk (or remote SQL DB) rather than in memory (like sqldf / dbplyr)
  • machine learning / modeling (like mlr3)
  • regular expression builders (like rex and nc packages)

Please add any others that make sense to include or should be discussed. Note that these topics may be relevant for the “Seal of Approval” (#5723).

@jangorecki
Copy link
Member

I think we will not cover up front all options, and best may be just how we dealt with it till now.

If there is a request/idea, and we are convinced it is out of scope, we just close the issue with such comment. We can of course re-open it later on, based on more comments or upvotes.

@TysonStanley
Copy link
Member Author

Yeah, I didn't think it would be possible to be comprehensive and have the foresight for all future developments. I think some of this is more to help new people understand the overall plans for where development will go. Do you think it's useful at all to delineate this information? Even with something like a 1 (or 5) year plan as we are putting together the governance doc?

Also, curious if the decision point is mostly on the spot with the active developers and a quick "tally" of agreement on the issue? I think that makes a lot of sense (especially with the opportunity to reactivate with enough comments/upvotes.

@tdhock tdhock added the governance Project governance label Nov 3, 2023
@jangorecki
Copy link
Member

jangorecki commented Nov 4, 2023

I think some of this is more to help new people understand the overall plans for where development will go.

That would be nice but without having funding for development team these kinds of roadmap will be "where we would like development will go" and not really where "development will go". We cannot really make much commitments without having business model or some other kinds of funding.
I can say what where two big points on the roadmap in the recent past: on disk file mapped data.table and long vector support. I cannot easily imagine having those anytime soon (and later) when Matt is not actively developing anymore. PoC yes, but having it ready to merge to master will be completely different story. As long as volunteers are contributing I don't think it make sense to hold any commitments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
governance Project governance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants