Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help draft the data.table community survey (open until September 26) #5686

Closed
sluga opened this issue Sep 12, 2023 · 24 comments
Closed

Help draft the data.table community survey (open until September 26) #5686

sluga opened this issue Sep 12, 2023 · 24 comments

Comments

@sluga
Copy link
Contributor

sluga commented Sep 12, 2023

Hi all,

following the discussion in #5676, here's the initial draft of a data.table community survey.
The plan is to collect suggestions/feedback in this issue until September 26 and then run the survey for a month.

Please give your suggestions below. I'll make a final draft after September 26 & then give @tdhock (or another senior data.table contributor) the chance to have a final say on survey content.

Draft

About you

  • How long have you been using R?
    • 0 - 3 months
    • 3 - 12 months
    • 12 - 24 months
    • 2 - 4 years
    • 4 - 7 years
    • 7+ years
  • How long have you been using data.table?
    • 0 - 3 months
    • 3 - 12 months
    • 12 - 24 months
    • 2 - 4 years
    • 4 - 7 years
    • 7+ years
  • Approximately how often do you use data.table these days?
    • Every day
    • Every week
    • Every month
    • Occasionally
    • I don't use data.table anymore
  • In what context are you using data.table? (Select all that apply.)
    • Professionally
    • For side projects
    • Academic research
    • Teaching
    • As a student
    • Other ( __________ )
  • What are you currently using data.table for? (Select all that apply.)
    • Statistical analysis
    • Machine learning
    • As a dependency in my R package(s)
    • In Shiny apps
    • In production
    • Other ( __________ )

Contributing to data.table

  • Are you interested in contributing to data.table in some way?
    • Yes
    • Maybe
    • No
  • In which areas are you interested in contributing to data.table? (Select all that apply.)
    • Spreading the word about data.table
    • Reporting bugs
    • Submitting feature requests
    • Giving talks/tutorials on data.table
    • Financial donation
    • GitHub issue triage (adding labels to issues, finding new issues to prioritize and old issues to close, etc.)
    • Writing/editing documentation
    • Adding translations
    • Reviewing R code changes
    • Reviewing C code changes
    • Submitting R code
    • Submitting C code
    • Other ( __________ )
  • In which areas have you already contributed to data.table? (Select all that apply.)
    • I haven't contributed to data.table yet.
    • Spreading the word about data.table
    • Reporting bugs
    • Submitting feature requests
    • Giving talks/tutorials on data.table
    • Financial donation
    • GitHub issue triage (adding labels to issues, finding new issues to prioritize and old issues to close, etc.)
    • Writing/editing documentation
    • Adding translations
    • Reviewing R code changes
    • Reviewing C code changes
    • Submitting R code
    • Submitting C code
    • Other ( __________ )
  • Is there anything that would make it easier or more appealing for you to contribute to the project?
    • (Open-ended)

Evaluation & Priorities

  • What do you appreciate the most about data.table?
    • (Open-ended)
  • What are your biggest challenges in using data.table?
    • (Open-ended)
  • How satisfied are you with data.table in the following areas? (1 - 5 scale)
    • Speed
    • Memory efficiency
    • Concise syntax
    • Expressive syntax
    • Intuitive syntax
    • Consistency in design
    • Backwards compatibility
    • Number of dependencies
    • Scope/Breadth of functionality
    • Quality of documentation
    • Scope of documentation
    • Clarity of error messages
    • Ease of use in a package
    • Programming (as opposed to interactive) usage
    • Import/export functionality
    • Reshaping functionality
    • Filtering functionality
    • Data manipulation & aggregation functionality
  • How important are the following areas to you?
    • (Same set as in the preceding question)

Feedback on candidate/upcoming features

Haven't drafted the exact questions for this section yet, but I was thinking of covering:

Project governance

  • The data.table team is currently preparing a governance document that would clarify the roles and processes for future development of data.table. If you'd like to contribute to this discussion, please answer the following questions.
  • Which values, principles, and objectives should guide the data.table project going forward?
    • (Open-ended)
  • Which specific agreements, roles, workflows & practices would best serve the project?
    • (Open-ended)
  • If you're familiar with governance documents of other open source projects, which ones should data.table imitate? How/Why?
    • (Open-ended)
@jangorecki
Copy link
Member

I had a brief look... I would completely drop 5)

@tdhock
Copy link
Member

tdhock commented Sep 12, 2023

Hi this is great thanks so much @sluga
for the question "How satisfied are you with data.table in the following areas?" and "How important are the following areas to you? " I wonder if it could be changed from 1-5 scale response to a simple yes/no for each, or check the top N which you are most satisfied / top N which need improvement or are most important for you. (I think that would be easier to analyze later than the 1-5 scale?)
In what context are you using data.table? -> for me at least, it would make sense to add responses "academic research" and "teaching"
In which areas are you interested in contributing to data.table? -> Please add response "Issue triage (add labels to issues, find new issues to prioritize, and old issues to close, etc)"

@sluga
Copy link
Contributor Author

sluga commented Sep 13, 2023

I had a brief look... I would completely drop 5)

Thanks @jangorecki, could you elaborate why?

My thinking was that the input from the broader community could inform some decisions. As an example, take the helper function for using data.table with the pipe. To me it would seem useful to know if, say, 95/100 users say "We need this" vs. if only 5/100 do.

Note that I'm not suggesting that any survey result should be taken as binding in any way. This isn't a popularity contest, so the core team should follow their best judgment. But couldn't that judgment potentially be informed by survey results?

@sluga
Copy link
Contributor Author

sluga commented Sep 13, 2023

Hi this is great thanks so much @sluga
for the question "How satisfied are you with data.table in the following areas?" and "How important are the following areas to you? " I wonder if it could be changed from 1-5 scale response to a simple yes/no for each, or check the top N which you are most satisfied / top N which need improvement or are most important for you. (I think that would be easier to analyze later than the 1-5 scale?)
In what context are you using data.table? -> for me at least, it would make sense to add responses "academic research" and "teaching"
In which areas are you interested in contributing to data.table? -> Please add response "Issue triage (add labels to issues, find new issues to prioritize, and old issues to close, etc)"

Thanks @tdhock, I've added the extra options to the context & contribution questions.

But I don't know about the Likert->Yes/No suggestion. Likert scales are very common in survey research & not particularly difficult to analyze, and we'd lose a lot of information by dichotomizing.

@jangorecki
Copy link
Member

jangorecki commented Sep 13, 2023

I had a brief look... I would completely drop 5)

Thanks @jangorecki, could you elaborate why?

My thinking was that the input from the broader community could inform some decisions. As an example, take the helper function for using data.table with the pipe. To me it would seem useful to know if, say, 95/100 users say "We need this" vs. if only 5/100 do.

Note that I'm not suggesting that any survey result should be taken as binding in any way. This isn't a popularity contest, so the core team should follow their best judgment. But couldn't that judgment potentially be informed by survey results?

I believe all governance processes we are trying to establish should be completely independent from what issue users/devs are liking/disliking/prioritizing/etc.
Once new governance is in place, then I think, it is best to ask for feedback and comments on other comments, and whoever wish to comment will do.
Discussing those issues, why they are (un?)important, what they address or miss, and all that stuff is not really relevant for setting up governance.

We have, and promote, using upvote, so we have a way to measure users interest.

...

OK. as for DT community survey (that meant not to be related to setting up governance) that make some sense, but assuming we want to focus on governance, and survey is meant to help on that, then I would skip that part. Or at least made that kind of DT community survey after we set up governance. We could promote survey in package startup message. But really, now I would completely focus on governance and use survey towards that, and then when that is set, make another survey "what you would like? etc.

Maybe we simply don't need a survey now, to move forward with governance, then ti's fine, and point 5) could of course stay. My bad, I kind of get idea that survey will help to push governance setup, which doesn't have to be true.

@tdhock
Copy link
Member

tdhock commented Sep 13, 2023

But I don't know about the Likert->Yes/No suggestion. Likert scales are very common in survey research & not particularly difficult to analyze, and we'd lose a lot of information by dichotomizing.

haha that is fine with me, I guess I just have the tendency to always respond either 1 or 5 on those kind of questions.

@sluga
Copy link
Contributor Author

sluga commented Sep 18, 2023

I had a brief look... I would completely drop 5)

Thanks @jangorecki, could you elaborate why?
My thinking was that the input from the broader community could inform some decisions. As an example, take the helper function for using data.table with the pipe. To me it would seem useful to know if, say, 95/100 users say "We need this" vs. if only 5/100 do.
Note that I'm not suggesting that any survey result should be taken as binding in any way. This isn't a popularity contest, so the core team should follow their best judgment. But couldn't that judgment potentially be informed by survey results?

I believe all governance processes we are trying to establish should be completely independent from what issue users/devs are liking/disliking/prioritizing/etc. Once new governance is in place, then I think, it is best to ask for feedback and comments on other comments, and whoever wish to comment will do. Discussing those issues, why they are (un?)important, what they address or miss, and all that stuff is not really relevant for setting up governance.

We have, and promote, using upvote, so we have a way to measure users interest.

...

OK. as for DT community survey (that meant not to be related to setting up governance) that make some sense, but assuming we want to focus on governance, and survey is meant to help on that, then I would skip that part. Or at least made that kind of DT community survey after we set up governance. We could promote survey in package startup message. But really, now I would completely focus on governance and use survey towards that, and then when that is set, make another survey "what you would like? etc.

Maybe we simply don't need a survey now, to move forward with governance, then ti's fine, and point 5) could of course stay. My bad, I kind of get idea that survey will help to push governance setup, which doesn't have to be true.

Thanks for the elaboration @jangorecki.

Yes, I wasn't thinking of the survey as purely about governance - should've made that clearer. Having two separate surveys - one now, one later - makes sense as well, but that way we'd probably be asking people to complete another survey in a couple of months, and it would likely repeat a number of questions (as it would be interesting to correlate some of the responses from sections 1/2 and 4/5). So I'm leaning toward the one-bigger-survey option.

@markseeto
Copy link
Contributor

markseeto commented Sep 22, 2023

@sluga Sorry if this seems pedantic, but in the first two questions the options seem a bit untidy. For example, there is no option that includes 4.5 years. Maybe "2 - 4 years" is meant to include 4.5 years, 4.9 years, etc., but if that's the intended interpretation then it doesn't seem right that "13 - 24 months" and "2 - 4 years" have no gap between them.

Also, the second question doesn't have the "Less than 3 months" option, but I'm guessing it was just left out by mistake.

From a statistical point of view, it would be preferable to collect these as continuous variables to avoid loss of information by categorising. But I understand that collecting continuous variables might have disadvantages in other ways.

@franknarf1
Copy link
Contributor

franknarf1 commented Sep 25, 2023

For (1), could include a question like

What is your package development experience? Check all that apply. [] Released an open-source R package [] Worked with 'R Internals' [] Contributed code to data.table [] Wrote 'production' code

This would give context for (2) re possible gaps between desire to contribute and (current) skills among respondents.


(4) looks like a mix of quality-of-life and functionality points (the latter in the final bullets starting from "Programming") -- maybe these should be treated separately in some way? Eg,

How important are the following areas of functionality to you?
...
How satisfied are you with data.table functionality in the following areas? (1 - 5 scale)
...
Are there any further areas of functionality that you would like to see in data.table's scope? (open-ended)

I think this provides value above the 1-5 "Scope/Breadth of functionality" question by generating asks to consider when defining the package scope.

(I doubt folks would bring up scope gaps re the "biggest challenges" question, which I suspect invites more QOL answers.)


On the items in (4):

Filtering functionality

Maybe "Filtering and join functionality" since joins aren't covered by another bullet, and filtering doesn't seem like a major feature area on its own.

Number of dependencies

Maybe "Minimal dependencies" which also refers to the package working for R versions "as old as possible for as long as possible" (from the readme).

Concise syntax
Expressive syntax
Intuitive syntax
Consistency in design

I think these sound like the same thing to most folks (like thumbs up or down for the syntax in general); and I'm not sure what expressive means in this context (?). Personally, I value the generality of the syntax (eg, so I can write DT1[DT2, on=.(x,y,z), {any code}] or DT[, {any code}, by=.(x,y,z)] while focusing on the code) which is not covered by any of these bullets (?). Maybe simply ask about "Syntax" instead?

Quality of documentation
Scope of documentation
Clarity of error messages

I'm wondering what a low score on "scope" would tell us -- everything is documented (as far as I have seen), so it could either mean the person would like to see more vignette coverage; could not find what they needed in the docs; or (unlikely) is trolling. Maybe switch to

Clarity of documentation, error messages and verbose output
Vignette coverage

Personally, I find the verbose output very useful and use it as a debugging tool even after learning do's and don'ts from errors and docs (eg, checking GForce to diagnose slower-than-expected operations; or # rows modified when I expect a join to be 1:1 instead of 1:n etc).

@MichaelChirico
Copy link
Member

Thanks for putting this together!

On 1)

How would I answer the last two questions ("In what context are you using data.table?" and "What are you currently using data.table for?") if my answer to "How often do you use data.table these days?" was "I don't use data.table anymore"?

Similar consideration applies to later questions, where an earlier question precludes any answer to a later question.


On 2)

  • Clarity/wording under "In which areas are you interested in contributing to data.table?". E.g. "Reviewing Pull Requests involving R code" is pretty verbose, and also assumes (probably fairly) the reader knows what a "pull request" is. Maybe "Reviewing R code changes" will be simpler.
  • Other avenues of contribution to consider including (1) adding translations (2) giving talks/tutorials about using/learning data.table (3) financially 🤑 (4) supporting data.table within my organization
  • "What would make it easier or more appealing for you to contribute to the project?" sort-of assumes the respondent hasn't contributed before? Would we benefit from some gradation in the question along the lines of "Have you contributed before? If so, do you plan to continue to do so in the future? If not, how could we facilitate your contributing?"

On 5) I see Jan's point. It does seem a bit out of place, and with so many open-ended questions earlier in the survey, it feels like it's getting long. Maybe best is to remind users in the survey introduction to check out & interact with the issue tracker to express support/feedback for pending bugs/FRs.

@ben519
Copy link

ben519 commented Sep 27, 2023

I know I'm late to the party, but this survey looks annoyingly long. Personally, I would see this and immediately close it.

My two cents: boil it down to two or three questions including an open-ended What are your suggestion(s) for data.table moving forward? <--That's the question I want to answer :)

@sluga
Copy link
Contributor Author

sluga commented Oct 1, 2023

Thanks everyone, I've begun incorporating your suggestions & will finalize the draft & address your comments sometime in the next few days.

@phisanti
Copy link

Hi @sluga, I guess the survey format is either finished or about to be finished. Thus, I was wondering how is going to be distributed to the data.table users?

@sluga
Copy link
Contributor Author

sluga commented Oct 12, 2023

Hi @phisanti,

yes, it's taken me longer than anticipated because of other obligations, but I'll post the link to the final draft today or tomorrow, asking @tdhock for a final check.

To answer your question, I'm hoping to get some help on this front. Including an invitation to take part in the survey on the data.table website and in this repo seem like the obvious moves, and I think @jangorecki mentioned the possibility of including an invitation in the start-up message. I'm not much of a social media user, so I'm hoping to get some help there.

Do you have any suggestions?

@tdhock
Copy link
Member

tdhock commented Oct 12, 2023

I will be traveling to Montevideo to talk about data.table at the LatinR conference on Weds Oct 18, so it would be great if you could create the survey before then, so I could mention the survey during the talk. BTW slides are here if anyone wants to comment / leave constructive criticism: https://docs.google.com/presentation/d/1ypW1LUMmcrUTMF6B9h9s8qbvW5BSbN1IW6CEgqX01Co/edit?usp=sharing

@sluga
Copy link
Contributor Author

sluga commented Oct 13, 2023

That'd be great @tdhock!

Alright, the draft of the survey is here: https://forms.office.com/e/d7gLkySP3n?origin=lprLink

A couple of notes:

  • Thanks everyone for your suggestions, I've incorporated most of them.
  • @ben519 raised the issue of length and indeed, this isn't a short survey. But it does seem comparable to a few other surveys I've seen in this space (open source projects). More importantly, I've organized it into a couple of sections that aren't particularly long on their own, and respondents are free to skip individual questions or even whole sections. I'm hoping this is a good solution, but I could very well be wrong and we end up with n = 5. :)
  • In any case, as I said I'm giving @tdhock the final say, so if you'd prefer to cut or modify some questions or sections, let me know.
  • When should the survey close? I've set this to December 1st for now.

@markseeto
Copy link
Contributor

In Q12, should the options say "important/unimportant" instead of "satisfied/dissatisfied"?

@jangorecki
Copy link
Member

Nice

Q2 without seeing remaining questions is confusing. Better to put it at the end?

Q7 could have options like data preparation/transformation, not everyone does statistical analysis,. Sometimes it is just a piece in big ETL pipeline.

Good to mention at start how much time it can take, like 2 minutes.

Good to create some short URL to be easily distributed on slides or pkg startup message.

@sluga
Copy link
Contributor Author

sluga commented Oct 14, 2023

Thanks @markseeto, @jangorecki. I fixed Q12, moved the email & response-sharing questions to the end, and added data preparation to Q7. Not sure how to estimate time: the survey has 20 questions, including several open-ended ones, but respondents can skip questions/sections.

URL: https://forms.office.com/e/d7gLkySP3n?origin=lprLink
Short URL: https://tinyurl.com/datatable-survey
QR code:
dt-survey-2023

@tdhock
Copy link
Member

tdhock commented Oct 14, 2023

great thanks for the qr and short link, I added them to slides

@sluga
Copy link
Contributor Author

sluga commented Oct 15, 2023

If the survey looks OK now, I suggest the following:

  • @tdhock could you add something like the following to the README, perhaps immediately after the first sentence? (Feel free to rephrase.)
---

**NEW:** Take part in the [data.table 2023 community survey](https://forms.office.com/e/d7gLkySP3n?origin=lprLink) and help shape the future of the project! The survey closes on **December 1st**.

---
  • I'll close this issue tomorrow & open a new one, inviting everyone to participate & share the survey. @tdhock perhaps you could pin the new issue?

@phisanti
Copy link

@sluga I would also open a new issue and pin it.

@tdhock
Copy link
Member

tdhock commented Oct 17, 2023

I'm definitely can't change the README on master branch (only Matt can do that, you may ask him), but I may be able to pin an issue.

@sluga
Copy link
Contributor Author

sluga commented Oct 17, 2023

I've opened a new issue (#5704) with the invitation and a PR (#5705) with the README update, hopefully @mattdowle sees it in time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants