Help draft the data.table community survey (open until September 26) #5686

sluga · 2023-09-12T15:07:20Z

Hi all,

following the discussion in #5676, here's the initial draft of a data.table community survey.
The plan is to collect suggestions/feedback in this issue until September 26 and then run the survey for a month.

Please give your suggestions below. I'll make a final draft after September 26 & then give @tdhock (or another senior data.table contributor) the chance to have a final say on survey content.

Draft

About you

How long have you been using R?
- 0 - 3 months
- 3 - 12 months
- 12 - 24 months
- 2 - 4 years
- 4 - 7 years
- 7+ years
How long have you been using data.table?
- 0 - 3 months
- 3 - 12 months
- 12 - 24 months
- 2 - 4 years
- 4 - 7 years
- 7+ years
Approximately how often do you use data.table these days?
- Every day
- Every week
- Every month
- Occasionally
- I don't use data.table anymore
In what context are you using data.table? (Select all that apply.)
- Professionally
- For side projects
- Academic research
- Teaching
- As a student
- Other ( __________ )
What are you currently using data.table for? (Select all that apply.)
- Statistical analysis
- Machine learning
- As a dependency in my R package(s)
- In Shiny apps
- In production
- Other ( __________ )

Contributing to data.table

Are you interested in contributing to data.table in some way?
- Yes
- Maybe
- No
In which areas are you interested in contributing to data.table? (Select all that apply.)
- Spreading the word about data.table
- Reporting bugs
- Submitting feature requests
- Giving talks/tutorials on data.table
- Financial donation
- GitHub issue triage (adding labels to issues, finding new issues to prioritize and old issues to close, etc.)
- Writing/editing documentation
- Adding translations
- Reviewing R code changes
- Reviewing C code changes
- Submitting R code
- Submitting C code
- Other ( __________ )
In which areas have you already contributed to data.table? (Select all that apply.)
- I haven't contributed to data.table yet.
- Spreading the word about data.table
- Reporting bugs
- Submitting feature requests
- Giving talks/tutorials on data.table
- Financial donation
- GitHub issue triage (adding labels to issues, finding new issues to prioritize and old issues to close, etc.)
- Writing/editing documentation
- Adding translations
- Reviewing R code changes
- Reviewing C code changes
- Submitting R code
- Submitting C code
- Other ( __________ )
Is there anything that would make it easier or more appealing for you to contribute to the project?
- (Open-ended)

Evaluation & Priorities

What do you appreciate the most about data.table?
- (Open-ended)
What are your biggest challenges in using data.table?
- (Open-ended)
How satisfied are you with data.table in the following areas? (1 - 5 scale)
- Speed
- Memory efficiency
- Concise syntax
- Expressive syntax
- Intuitive syntax
- Consistency in design
- Backwards compatibility
- Number of dependencies
- Scope/Breadth of functionality
- Quality of documentation
- Scope of documentation
- Clarity of error messages
- Ease of use in a package
- Programming (as opposed to interactive) usage
- Import/export functionality
- Reshaping functionality
- Filtering functionality
- Data manipulation & aggregation functionality
How important are the following areas to you?
- (Same set as in the preceding question)

Feedback on candidate/upcoming features

Haven't drafted the exact questions for this section yet, but I was thinking of covering:

the convenience function for using data.table with the pipe (see Upcoming versions of base R eliminate the need for DT() functionality - consider eliminating? #5621)
the new interface for programming on data.table via the env argument
the let alias for :=

Project governance

The data.table team is currently preparing a governance document that would clarify the roles and processes for future development of data.table. If you'd like to contribute to this discussion, please answer the following questions.
Which values, principles, and objectives should guide the data.table project going forward?
- (Open-ended)
Which specific agreements, roles, workflows & practices would best serve the project?
- (Open-ended)
If you're familiar with governance documents of other open source projects, which ones should data.table imitate? How/Why?
- (Open-ended)

The text was updated successfully, but these errors were encountered:

jangorecki · 2023-09-12T16:15:48Z

I had a brief look... I would completely drop 5)

tdhock · 2023-09-12T16:37:50Z

Hi this is great thanks so much @sluga
for the question "How satisfied are you with data.table in the following areas?" and "How important are the following areas to you? " I wonder if it could be changed from 1-5 scale response to a simple yes/no for each, or check the top N which you are most satisfied / top N which need improvement or are most important for you. (I think that would be easier to analyze later than the 1-5 scale?)
In what context are you using data.table? -> for me at least, it would make sense to add responses "academic research" and "teaching"
In which areas are you interested in contributing to data.table? -> Please add response "Issue triage (add labels to issues, find new issues to prioritize, and old issues to close, etc)"

sluga · 2023-09-13T16:03:00Z

I had a brief look... I would completely drop 5)

Thanks @jangorecki, could you elaborate why?

My thinking was that the input from the broader community could inform some decisions. As an example, take the helper function for using data.table with the pipe. To me it would seem useful to know if, say, 95/100 users say "We need this" vs. if only 5/100 do.

Note that I'm not suggesting that any survey result should be taken as binding in any way. This isn't a popularity contest, so the core team should follow their best judgment. But couldn't that judgment potentially be informed by survey results?

sluga · 2023-09-13T16:05:18Z

Hi this is great thanks so much @sluga
for the question "How satisfied are you with data.table in the following areas?" and "How important are the following areas to you? " I wonder if it could be changed from 1-5 scale response to a simple yes/no for each, or check the top N which you are most satisfied / top N which need improvement or are most important for you. (I think that would be easier to analyze later than the 1-5 scale?)
In what context are you using data.table? -> for me at least, it would make sense to add responses "academic research" and "teaching"
In which areas are you interested in contributing to data.table? -> Please add response "Issue triage (add labels to issues, find new issues to prioritize, and old issues to close, etc)"

Thanks @tdhock, I've added the extra options to the context & contribution questions.

But I don't know about the Likert->Yes/No suggestion. Likert scales are very common in survey research & not particularly difficult to analyze, and we'd lose a lot of information by dichotomizing.

jangorecki · 2023-09-13T17:52:28Z

I had a brief look... I would completely drop 5)

Thanks @jangorecki, could you elaborate why?

My thinking was that the input from the broader community could inform some decisions. As an example, take the helper function for using data.table with the pipe. To me it would seem useful to know if, say, 95/100 users say "We need this" vs. if only 5/100 do.

Note that I'm not suggesting that any survey result should be taken as binding in any way. This isn't a popularity contest, so the core team should follow their best judgment. But couldn't that judgment potentially be informed by survey results?

I believe all governance processes we are trying to establish should be completely independent from what issue users/devs are liking/disliking/prioritizing/etc.
Once new governance is in place, then I think, it is best to ask for feedback and comments on other comments, and whoever wish to comment will do.
Discussing those issues, why they are (un?)important, what they address or miss, and all that stuff is not really relevant for setting up governance.

We have, and promote, using upvote, so we have a way to measure users interest.

...

OK. as for DT community survey (that meant not to be related to setting up governance) that make some sense, but assuming we want to focus on governance, and survey is meant to help on that, then I would skip that part. Or at least made that kind of DT community survey after we set up governance. We could promote survey in package startup message. But really, now I would completely focus on governance and use survey towards that, and then when that is set, make another survey "what you would like? etc.

Maybe we simply don't need a survey now, to move forward with governance, then ti's fine, and point 5) could of course stay. My bad, I kind of get idea that survey will help to push governance setup, which doesn't have to be true.

tdhock · 2023-09-13T20:54:43Z

But I don't know about the Likert->Yes/No suggestion. Likert scales are very common in survey research & not particularly difficult to analyze, and we'd lose a lot of information by dichotomizing.

haha that is fine with me, I guess I just have the tendency to always respond either 1 or 5 on those kind of questions.

sluga · 2023-09-18T17:11:37Z

I had a brief look... I would completely drop 5)

Thanks @jangorecki, could you elaborate why?
My thinking was that the input from the broader community could inform some decisions. As an example, take the helper function for using data.table with the pipe. To me it would seem useful to know if, say, 95/100 users say "We need this" vs. if only 5/100 do.
Note that I'm not suggesting that any survey result should be taken as binding in any way. This isn't a popularity contest, so the core team should follow their best judgment. But couldn't that judgment potentially be informed by survey results?

I believe all governance processes we are trying to establish should be completely independent from what issue users/devs are liking/disliking/prioritizing/etc. Once new governance is in place, then I think, it is best to ask for feedback and comments on other comments, and whoever wish to comment will do. Discussing those issues, why they are (un?)important, what they address or miss, and all that stuff is not really relevant for setting up governance.

We have, and promote, using upvote, so we have a way to measure users interest.

...

OK. as for DT community survey (that meant not to be related to setting up governance) that make some sense, but assuming we want to focus on governance, and survey is meant to help on that, then I would skip that part. Or at least made that kind of DT community survey after we set up governance. We could promote survey in package startup message. But really, now I would completely focus on governance and use survey towards that, and then when that is set, make another survey "what you would like? etc.

Maybe we simply don't need a survey now, to move forward with governance, then ti's fine, and point 5) could of course stay. My bad, I kind of get idea that survey will help to push governance setup, which doesn't have to be true.

Thanks for the elaboration @jangorecki.

Yes, I wasn't thinking of the survey as purely about governance - should've made that clearer. Having two separate surveys - one now, one later - makes sense as well, but that way we'd probably be asking people to complete another survey in a couple of months, and it would likely repeat a number of questions (as it would be interesting to correlate some of the responses from sections 1/2 and 4/5). So I'm leaning toward the one-bigger-survey option.

markseeto · 2023-09-22T22:57:30Z

@sluga Sorry if this seems pedantic, but in the first two questions the options seem a bit untidy. For example, there is no option that includes 4.5 years. Maybe "2 - 4 years" is meant to include 4.5 years, 4.9 years, etc., but if that's the intended interpretation then it doesn't seem right that "13 - 24 months" and "2 - 4 years" have no gap between them.

Also, the second question doesn't have the "Less than 3 months" option, but I'm guessing it was just left out by mistake.

From a statistical point of view, it would be preferable to collect these as continuous variables to avoid loss of information by categorising. But I understand that collecting continuous variables might have disadvantages in other ways.

franknarf1 · 2023-09-25T00:25:06Z

For (1), could include a question like

What is your package development experience? Check all that apply. [] Released an open-source R package [] Worked with 'R Internals' [] Contributed code to data.table [] Wrote 'production' code

This would give context for (2) re possible gaps between desire to contribute and (current) skills among respondents.

(4) looks like a mix of quality-of-life and functionality points (the latter in the final bullets starting from "Programming") -- maybe these should be treated separately in some way? Eg,

How important are the following areas of functionality to you?
...
How satisfied are you with data.table functionality in the following areas? (1 - 5 scale)
...
Are there any further areas of functionality that you would like to see in data.table's scope? (open-ended)

I think this provides value above the 1-5 "Scope/Breadth of functionality" question by generating asks to consider when defining the package scope.

(I doubt folks would bring up scope gaps re the "biggest challenges" question, which I suspect invites more QOL answers.)

On the items in (4):

Filtering functionality

Maybe "Filtering and join functionality" since joins aren't covered by another bullet, and filtering doesn't seem like a major feature area on its own.

Number of dependencies

Maybe "Minimal dependencies" which also refers to the package working for R versions "as old as possible for as long as possible" (from the readme).

Concise syntax
Expressive syntax
Intuitive syntax
Consistency in design

I think these sound like the same thing to most folks (like thumbs up or down for the syntax in general); and I'm not sure what expressive means in this context (?). Personally, I value the generality of the syntax (eg, so I can write DT1[DT2, on=.(x,y,z), {any code}] or DT[, {any code}, by=.(x,y,z)] while focusing on the code) which is not covered by any of these bullets (?). Maybe simply ask about "Syntax" instead?

Quality of documentation
Scope of documentation
Clarity of error messages

I'm wondering what a low score on "scope" would tell us -- everything is documented (as far as I have seen), so it could either mean the person would like to see more vignette coverage; could not find what they needed in the docs; or (unlikely) is trolling. Maybe switch to

Clarity of documentation, error messages and verbose output
Vignette coverage

Personally, I find the verbose output very useful and use it as a debugging tool even after learning do's and don'ts from errors and docs (eg, checking GForce to diagnose slower-than-expected operations; or # rows modified when I expect a join to be 1:1 instead of 1:n etc).

MichaelChirico · 2023-09-26T05:33:35Z

Thanks for putting this together!

On 1)

How would I answer the last two questions ("In what context are you using data.table?" and "What are you currently using data.table for?") if my answer to "How often do you use data.table these days?" was "I don't use data.table anymore"?

Similar consideration applies to later questions, where an earlier question precludes any answer to a later question.

On 2)

Clarity/wording under "In which areas are you interested in contributing to data.table?". E.g. "Reviewing Pull Requests involving R code" is pretty verbose, and also assumes (probably fairly) the reader knows what a "pull request" is. Maybe "Reviewing R code changes" will be simpler.
Other avenues of contribution to consider including (1) adding translations (2) giving talks/tutorials about using/learning data.table (3) financially 🤑 (4) supporting data.table within my organization
"What would make it easier or more appealing for you to contribute to the project?" sort-of assumes the respondent hasn't contributed before? Would we benefit from some gradation in the question along the lines of "Have you contributed before? If so, do you plan to continue to do so in the future? If not, how could we facilitate your contributing?"

On 5) I see Jan's point. It does seem a bit out of place, and with so many open-ended questions earlier in the survey, it feels like it's getting long. Maybe best is to remind users in the survey introduction to check out & interact with the issue tracker to express support/feedback for pending bugs/FRs.

ben519 · 2023-09-27T14:24:16Z

I know I'm late to the party, but this survey looks annoyingly long. Personally, I would see this and immediately close it.

My two cents: boil it down to two or three questions including an open-ended What are your suggestion(s) for data.table moving forward? <--That's the question I want to answer :)

sluga · 2023-10-01T17:46:08Z

Thanks everyone, I've begun incorporating your suggestions & will finalize the draft & address your comments sometime in the next few days.

phisanti · 2023-10-12T08:43:14Z

Hi @sluga, I guess the survey format is either finished or about to be finished. Thus, I was wondering how is going to be distributed to the data.table users?

sluga · 2023-10-12T11:11:20Z

Hi @phisanti,

yes, it's taken me longer than anticipated because of other obligations, but I'll post the link to the final draft today or tomorrow, asking @tdhock for a final check.

To answer your question, I'm hoping to get some help on this front. Including an invitation to take part in the survey on the data.table website and in this repo seem like the obvious moves, and I think @jangorecki mentioned the possibility of including an invitation in the start-up message. I'm not much of a social media user, so I'm hoping to get some help there.

Do you have any suggestions?

tdhock · 2023-10-12T16:16:28Z

I will be traveling to Montevideo to talk about data.table at the LatinR conference on Weds Oct 18, so it would be great if you could create the survey before then, so I could mention the survey during the talk. BTW slides are here if anyone wants to comment / leave constructive criticism: https://docs.google.com/presentation/d/1ypW1LUMmcrUTMF6B9h9s8qbvW5BSbN1IW6CEgqX01Co/edit?usp=sharing

sluga · 2023-10-13T19:21:22Z

That'd be great @tdhock!

Alright, the draft of the survey is here: https://forms.office.com/e/d7gLkySP3n?origin=lprLink

A couple of notes:

Thanks everyone for your suggestions, I've incorporated most of them.
@ben519 raised the issue of length and indeed, this isn't a short survey. But it does seem comparable to a few other surveys I've seen in this space (open source projects). More importantly, I've organized it into a couple of sections that aren't particularly long on their own, and respondents are free to skip individual questions or even whole sections. I'm hoping this is a good solution, but I could very well be wrong and we end up with n = 5. :)
In any case, as I said I'm giving @tdhock the final say, so if you'd prefer to cut or modify some questions or sections, let me know.
When should the survey close? I've set this to December 1st for now.

markseeto · 2023-10-13T19:33:37Z

In Q12, should the options say "important/unimportant" instead of "satisfied/dissatisfied"?

jangorecki · 2023-10-14T06:17:23Z

Nice

Q2 without seeing remaining questions is confusing. Better to put it at the end?

Q7 could have options like data preparation/transformation, not everyone does statistical analysis,. Sometimes it is just a piece in big ETL pipeline.

Good to mention at start how much time it can take, like 2 minutes.

Good to create some short URL to be easily distributed on slides or pkg startup message.

sluga · 2023-10-14T18:12:11Z

Thanks @markseeto, @jangorecki. I fixed Q12, moved the email & response-sharing questions to the end, and added data preparation to Q7. Not sure how to estimate time: the survey has 20 questions, including several open-ended ones, but respondents can skip questions/sections.

URL: https://forms.office.com/e/d7gLkySP3n?origin=lprLink
Short URL: https://tinyurl.com/datatable-survey
QR code:

tdhock · 2023-10-14T23:41:29Z

great thanks for the qr and short link, I added them to slides

sluga · 2023-10-15T11:58:49Z

If the survey looks OK now, I suggest the following:

@tdhock could you add something like the following to the README, perhaps immediately after the first sentence? (Feel free to rephrase.)

---

**NEW:** Take part in the [data.table 2023 community survey](https://forms.office.com/e/d7gLkySP3n?origin=lprLink) and help shape the future of the project! The survey closes on **December 1st**.

---

I'll close this issue tomorrow & open a new one, inviting everyone to participate & share the survey. @tdhock perhaps you could pin the new issue?

phisanti · 2023-10-17T09:41:05Z

@sluga I would also open a new issue and pin it.

tdhock · 2023-10-17T15:26:54Z

I'm definitely can't change the README on master branch (only Matt can do that, you may ask him), but I may be able to pin an issue.

sluga · 2023-10-17T17:38:27Z

I've opened a new issue (#5704) with the invitation and a PR (#5705) with the README update, hopefully @mattdowle sees it in time.

sluga closed this as completed Oct 17, 2023

jamesmbaazam mentioned this issue Oct 25, 2023

Quick start in README vs. workflow vignette epiforecasts/EpiNow2#482

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help draft the data.table community survey (open until September 26) #5686

Help draft the data.table community survey (open until September 26) #5686

sluga commented Sep 12, 2023 •

edited

Loading

jangorecki commented Sep 12, 2023

tdhock commented Sep 12, 2023

sluga commented Sep 13, 2023

sluga commented Sep 13, 2023

jangorecki commented Sep 13, 2023 •

edited

Loading

tdhock commented Sep 13, 2023

sluga commented Sep 18, 2023

markseeto commented Sep 22, 2023 •

edited

Loading

franknarf1 commented Sep 25, 2023 •

edited

Loading

MichaelChirico commented Sep 26, 2023

ben519 commented Sep 27, 2023

sluga commented Oct 1, 2023

phisanti commented Oct 12, 2023

sluga commented Oct 12, 2023

tdhock commented Oct 12, 2023

sluga commented Oct 13, 2023

markseeto commented Oct 13, 2023

jangorecki commented Oct 14, 2023

sluga commented Oct 14, 2023

tdhock commented Oct 14, 2023

sluga commented Oct 15, 2023

phisanti commented Oct 17, 2023

tdhock commented Oct 17, 2023

sluga commented Oct 17, 2023

Help draft the data.table community survey (open until September 26) #5686

Help draft the data.table community survey (open until September 26) #5686

Comments

sluga commented Sep 12, 2023 • edited Loading

Draft

About you

Contributing to data.table

Evaluation & Priorities

Feedback on candidate/upcoming features

Project governance

jangorecki commented Sep 12, 2023

tdhock commented Sep 12, 2023

sluga commented Sep 13, 2023

sluga commented Sep 13, 2023

jangorecki commented Sep 13, 2023 • edited Loading

tdhock commented Sep 13, 2023

sluga commented Sep 18, 2023

markseeto commented Sep 22, 2023 • edited Loading

franknarf1 commented Sep 25, 2023 • edited Loading

MichaelChirico commented Sep 26, 2023

ben519 commented Sep 27, 2023

sluga commented Oct 1, 2023

phisanti commented Oct 12, 2023

sluga commented Oct 12, 2023

tdhock commented Oct 12, 2023

sluga commented Oct 13, 2023

markseeto commented Oct 13, 2023

jangorecki commented Oct 14, 2023

sluga commented Oct 14, 2023

tdhock commented Oct 14, 2023

sluga commented Oct 15, 2023

phisanti commented Oct 17, 2023

tdhock commented Oct 17, 2023

sluga commented Oct 17, 2023

sluga commented Sep 12, 2023 •

edited

Loading

jangorecki commented Sep 13, 2023 •

edited

Loading

markseeto commented Sep 22, 2023 •

edited

Loading

franknarf1 commented Sep 25, 2023 •

edited

Loading