Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Survey #4748

Closed
mrocklin opened this issue Apr 28, 2019 · 22 comments
Closed

User Survey #4748

mrocklin opened this issue Apr 28, 2019 · 22 comments
Assignees

Comments

@mrocklin
Copy link
Member

mrocklin commented Apr 28, 2019

We might consider developing a survey to better understand our users.

Larger questions we should consider

  1. What are larger themes of questions that might actually change our behavior? There are many questions that we might ask in a survey, but focusing on questions that would meaningfully affect the project will probably keep things more concise and make it more likely that users will complete and share the survey.
  2. To what extent do we share the results? There is a transparency/privacy tradeoff here. If we don't publish all data, who gets to see this data?
  3. Technologically how do we construct and distribute the survey? There are a variety of choices out there that offer some pleasant interactions, but also cost some amount of money.

Goals

Here are some goals that I have for such a survey. I'll also try to include larger goals from comments below (should any arrive).

  • How aggressive can we drop older versions of things, like Python 2, or older versions of Numpy/Pandas?
  • In which domains should we focus to broaden the user base?
  • Can we find advanced users who are willing to share their story more broadly?
  • What other technologies are people using alongside Dask so that we can ensure that we work well with this technology. This takes many forms, including cluster resources managers, code editors, other PyData libraries, and so on.
  • Should we focus effort more on documentation, features, performance, ... ?
@mrocklin
Copy link
Member Author

I've added some high level goals in the original comment

@mrocklin
Copy link
Member Author

Some output from the 2018 Python user survey. This isn't necessarily a good or bad example, but provides food for thought.

https://www.jetbrains.com/research/python-developers-survey-2018/

@TomAugspurger
Copy link
Member

IIRC, Bokeh ran a user survey. @bryevdv, are there any write-ups on how that went?

@bryevdv
Copy link
Contributor

bryevdv commented Apr 29, 2019

No, I never had time to prioritize any writing about it. The results are in the project team drive and a few of us studied them informally.

@mrocklin
Copy link
Member Author

@bryevdv any wisdom you have from doing this would be welcome (though of course, no pressure).

Also, is the Bokeh user survey still up somewhere? It would be interesting to see what you all chose to ask about.

@bryevdv
Copy link
Contributor

bryevdv commented Apr 29, 2019

messaging you offline

@mrocklin
Copy link
Member Author

Questions from the Bokeh user survey (thanks @bryevdv !)

  • How long have you been using Bokeh?
  • What country do you reside in?
  • What platforms do you use Bokeh on?
  • How did you install Bokeh?
  • How do you use Bokeh?
  • In what roles do you use Bokeh?
  • What resources have you used for support in the last six months?
  • Describe major or primary uses you have for Bokeh (please include links to public work if possible)
  • If you can share screen shots of apps or plots you have deployed, please upload them (these may be used publicly in documentation, presentations, or grant applications)
  • Do you use any other visualization tools regularly?
  • What is your number one suggestion for improvement? (Please be detailed and specific)

@mrocklin
Copy link
Member Author

mrocklin commented May 2, 2019

In looking around at different survey tooling slowly converged on https://www.sogosurvey.com/ . I don't have strong confidence in this though.

@mrocklin
Copy link
Member Author

mrocklin commented May 4, 2019

A strawman

Who are you?

  • Academia, industry, student
  • Domain
  • Optional, include subdomain as short answer like "astronomy" or "quantitative trading"
  • Role
  • Experience with PyData in years

How do you use Dask?

  • Extent (just curious, from time to time, daily, infrastructurally)
  • Collections we use (array, dataframe, delayed, futures, xarray, ml, other)
  • Installation/deployment (pip, conda, git, docker, IT provided)
  • Do you primarily use Dask interactively (like with Jupyter), in batch scripts, or in automated systems?

How could Dask improve?

  • What resources do you use?
    • Tutorial
    • Dask questions on SO
    • Issue tracker
    • Documentation
  • What would help you the most right now?
    • More documentation
    • More examples
    • Bug fixes
    • Performance improvements
    • New features
  • Is Dask stable enough for you? Yes/No

What else do you use?

  • Python 2/3
  • Other libraries (numpy, pandas, sklearn, TF, PyTorch, Spark, Kafka...)
  • Is it difficult for you to use newer versions of NumPy/Pandas/Python

Scheduling

  • Do you use the dask.distributed scheduler? (yes, no, don't know)

  • Deployed cluster size

    • laptop, heavy workstation, 2-10, 10-100, 100-1000
  • Which kind of cluster architecture do you use the most?

    • HPC (SLURM/PBS/SGE/LSF/...)
    • Yarn (default for on-premises Hadoop/Spark clusters)
    • Kubernetes (on-premises)
    • Kuberentes (Cloud)
  • Do you use the standard dask package to deploy, or do you roll your own?
    If you roll your own, why?

  • Which of the following distributed features are most important for you

    • Security/Auth
    • Ease of Deployment
    • Managing Deployments
    • ...
    • I'm fine. General documentation, features, performance are more important

Bugs and Features

Help direct our time. Which of these common bugs and features are very
important to you.

Diagnostics, query optimization, task graph overhead, gpus, IT administration,
TOOD

Willing to tell us more? (Optional)

  • Can we follow up with you for more information? Y/N

  • Would you be open to having your work highlighted in a Dask blog or examples page? Y/N

  • Name

  • Institution

  • Contact address

  • Links that you think we would enjoy

  • Can we share these links publicly? Y/N

@mrocklin
Copy link
Member Author

mrocklin commented Jun 2, 2019

@bryevdv
Copy link
Contributor

bryevdv commented Jun 2, 2019

I'm guessing "Option 6" is not a real support channel?

@TomAugspurger
Copy link
Member

TomAugspurger commented Jun 11, 2019

Just went through the survey. A few comments.

  1. Took me 5 minutes 45 seconds, while I was writing comments.
  2. The "role"
  3. It may be worth copying some questions directly from https://www.jetbrains.com/research/python-developers-survey-2018/, so that we can benchmark Dask users vs. the general population fo Python devs
  4. Typo in "What Dask resources...". Github "ssues" should be "issues".
  5. In the "What Dask resources..." question, "Examples" is a bit vague. May way to call out dask-examples binder" or examples.dask.org` by name.
  6. Very user focused. Do we want to hear from non-users, people who have heard of dask, but haven't used it yet? (I don't think this is as valuable)
  7. Can you add "Integration with deep learning frameworks" to the "What common feature requests do you care about most?" grid?

@quasiben
Copy link
Member

@TomAugspurger , is number 2 done -- did you want to say more about "role" ?

@quasiben
Copy link
Member

Is the Python 2 or 3? question still appropriate if we are already moving forward with #4919

@TomAugspurger
Copy link
Member

@TomAugspurger , is number 2 done -- did you want to say more about "role" ?

Sorry. I think I was going to say "should the role question be a select one, rather than select multiple". But I think select multiple is fine.

@mrocklin
Copy link
Member Author

Is the Python 2 or 3? question still appropriate if we are already moving forward with #4919

I'm still curious. I also think that answering it will make people feel satisfied and generally not add to survey fatigue.

@mrocklin
Copy link
Member Author

Current status: https://docs.google.com/forms/d/e/1FAIpQLSeG12dOna6dIR85Qza6Cr_CZ9rxqH08pilmeQR7ehaJDu25Fw/viewform

I think I'd like to publish this when we release Dask 2.0 and use that release blogpost to drive traffic to the survey. I think that ideally this happens in a day or two?

@mrocklin
Copy link
Member Author

I plan to publish this tomorrow if there are no further comments

@mrocklin
Copy link
Member Author

OK, I've cleared out the current responses. I'll push this out on twitter soonish.

@mrocklin
Copy link
Member Author

@jrbourbeau
Copy link
Member

Thanks @mrocklin

@mrocklin
Copy link
Member Author

If any Dask owners want access to the data please let me know. I'm going to opt to not share by default due to user e-mails, but I'm happy to share with people if they ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

5 participants