Studying game design for increasing code review participation #123

nilbus · 2017-03-09T04:20:17Z

As @kytrinyx pointed out in #88, Exercism currently struggles to get adequate participation in peer code review. The suggestion to gamify Exercism in various ways (e.g. reputation like StackOverflow) has been raised many times, but @kytrinyx has rightly objected, fearing that review quality will decrease and that users' intrinsic love of discussing code and doing reviews will be damaged, citing Alfie Kohn’s book Punished by Rewards (1999). More recent research by Deterding (2015) and others have suggested however that gamification done right can actually amplify intrinsic motivation.

This issue outlines an exploratory study I'll be performing with a fork Exercism with a smaller group, with the aim to gather evidence for whether gamification can be used to motivate code reviews in Exercism without harming intrinsic motivation. Although the limitations of this study will make it hard to extrapolate, it should hopefully give us an idea of whether the theory and application is on the right track.

During this project, I will submit my proposed changes as pull requests individually for discussion, including the experiment itself. @kytrinyx I would be interested to know if the Exercism maintainers would be potentially interested in running this same experiment for a random subset of its users. Doing so would greatly increase the validity of the results, because of its larger number of participants, large body of submitted work, and users who are inherently intrinsically motivated to use Exercism. This can be done on your own time table, independent of my graduate work, for the benefit of your own understanding. These results can help inform the next generation of Exercism's design, described in #113. Would you be open to experimenting on public exercism.io?

Background Research

Motivating Effective Learning Practices

As teachers, managers, and parents, we often want to help motivate learning practices that we believe to be effective. For instance, as a software engineer, I know from experience and from research that practice and peer review (Wang et al., 2012) are two practices that have substantial positive impact on learning software development. Despite that, learners often do not see past these practices' challenges (e.g. putting your work out in the open for judgement), and miss the real value that overshadows the challenge involved. The Exercism coding challenge platform struggles with this very motivation problem in its code review system. Gamification promises to be a means to providing additional motivation that can help learners overcome hesitation and discover the value of these activities (Muntean, 2011).

Gamification Can Be Quite Effective

A growing number of empirical studies are demonstrating that gamification can be effective in shaping behavior. Seaborn & Fels (2014) performed a review of 37 empirical studies of gamification. Of these studies, 63% reported that gamification positively influenced learning, enjoyability, participation, engagement, etc. Hamari (2014), author of another literature review of empirical studies on gamification, suggests that given the right context, implementation, and users, gamification provides the positive effects that gamified system designers hope for. People see these benefits and many times have suggested that Exercism implement points, leaderboards, badges, upvotes, etc.

Extrinsic Rewards Permanently Diminish Intrinsic Motivation

Earlier research on motivational psychology by Deci, et al. (1999) found that “extrinsic rewards… significantly undermined free-choice intrinsic motivation.” Alfie Kohn (1999) in his book Punished by Rewards expands on Deci's and others' work to suggest that points, grades, rankings, and all other extrinsic motivators that are commonly used in education replace children's innate desire to learn with a focus on rewards such as grades, and that this negative effect tends to endure. For example, children who were rewarded for playing certain math games avoided those same games when the rewards stopped coming, while children who had never been rewarded continued to play with and enjoy them.

Avoiding Harming Intrinsic Motivation

Many researchers do recognize the importance of intrinsic motivation and suggest that gamification can be used correctly to enhance rather than reduce intrinsic motivation. According to Deterding et al. (2012), in gamifying a system, we must design to “amplify the intrinsic motivations of their employees, fans, and customers” (p. 17). “The pleasures of games arise not from such system feedback [such as points, badges, and leaderboards], but from ‘meaningful choices’ in the pursuance of ‘interestingly hard goals’” (p. 14).

Deci & Ryan (1980) who described how extrinsic motivators harm motivation also put forth self-determination theory, which describes how to bolster internal motivation. Self-determination theory has become “arguably the empirically most well-researched psychological theory of intrinsic motivation” (Deterding, 2011) and has served as a basis for several gamification frameworks and studies (Seaborn, 2015). This theory posits that in order to facilitate intrinsic motivation, people must satisfy their need for competence, relatedness, and autonomy.

Autonomy is the need most commonly violated in gamification implementations. “Most deployments of gamification represent ‘exploitationware,’ in that they extract real value from users and employees in return for mere virtual tokens” (Deterding et al., 2011). Instead, a gamified system should be designed to help the user satisfy her intrinsic goals. When it does not, users' sense of autonomy diminishes, replaced by a feeling of being coerced or controlled.

Mekler et al. (2013) suggested that when a gamified system gives feedback, the user can perceive the feedback as either informational or controlling. When perceived as informational, feedback supports his competence need by helping communicate the progress he has made toward his learning goals. When perceived as controlling for the benefit of the system creator, the system then thwarts his feeling of autonomy and his intrinsic motivation to use the system. Furthermore, they suggested that studies finding gamification harmful may have been due to pressure to engage in social networking in the service of their employer, a form of controlling. The empirical study they performed suggested that user-centered gamification did not harm intrinsic motivation, at least in the short term. Their findings were backed by psychology research by Kruglanski, suggesting that “rewards may either undermine or enhance intrinsic motivation depending on whether they are endogenous or exogenous to a given task” (as cited in Mekler et al., 2013). This research suggests that a system can be gamified without harming intrinsic motivation if goals align and feedback is more informational than controlling.

Research by Deterding (2011) also supported these ideas and added two additional factors important to a sense of autonomy, voluntariness and lack of consequence. Citing research by Caillos & Barash and Ludens, Deterding states, “The overwhelming majority of theoretical discussions enlist voluntary engagement and lack of serious consequence as attributes defining play”. Being voluntary and lacking consequence is a significant source of the autonomy that is such an important contributor to intrinsic motivation. As an example, leaderboards can be intrinsically motivating in voluntary games without real consequence, because they are primarily informational, showing a person where they stand. On the other hand, when leaderboards are used in a business sales context to promote competition, participation is neither voluntary nor free of consequence, being tied to cash incentives. The leaderboard in this context is a controlling, extrinsic motivator (Deterding, 2011).

In summary, a gamified system can preserve and enhance intrinsic motivation by aligning goals, and designing for autonomy through informational messaging, voluntary participation, and avoiding serious consequence. This approach has been demonstrated to work well in case studies such as Deterding's (2015), and seems applicable in the context of Exercism.

Amplifying the Joy of Code Review

Deterding (2015) put forth a method of gameful design, a.k.a. gamification, that aims to amplify intrinsic value. As part of this method, he applies the concept of skill atoms, which are feedback loops organized around a challenge or skill that consists of smaller recurring components that make it game-like. Let's take a look at how the skill atom components apply to the Exercism platform as it exists now, both for completing exercises and doing code review.

Exercise Completion

Goals: Complete coding exercises
Actions: Run the tests, submit the code
Objects: Written code, provided test suite, development environment
Rules: All tests must pass to complete an exercise; you must complete an exercise before viewing others' solutions
Feedback: Individual automated tests pass, error, or fail; qualitative peer feedback
Challenge: Determine how to complete each exercise; write well enough to impress peer reviewers; implement suggestions effectively
Motivation: Learn the language being studied; improve code writing skill (satisfy competence need)

For exercise completion, every component necessary for a gameful skill atom is present. Notably, the learners' motivations are (assuming knowledge of the benefits of practice) obviously in line with the system's goals, the challenges are sufficiently challenging, and informational feedback on one's progress is present. Participation is voluntary, and there are no serious consequences. The needs for autonomy, competence, and relatedness are largely satisfied. Therefore, we can expect this activity to be intrinsically motivating. Based on the quantity of exercise participation compared with peer review participation, it seem that we do indeed observe this to be true.

Code Review

Goals: After completing an exercise, a call to action suggests, “see related solutions and get involved here: [url]”
Actions: Read others' code; leave comments
Objects: Others' code submissions
Rules: None
Feedback: Comment replies, if any
Challenge: Write effective feedback; communicate ideas; overcome fear of judgement
Motivation: Expand grasp of the language; improve code reading and critiquing skill; improve ability to communicate ideas (satisfy competence need). Contribute knowledge to the community, and help others' learn (satisfy relatedness need).

The code review system, viewed through a skill atom lens, is poorly designed for amplifying intrinsic value. The goals that the system communicates are weakly linked with learners' most likely motivations. Therefore, there is little motivation to overcome the challenges involved with providing good feedback and asking good questions. There are no expressed rules or guidelines to help focus reviews. Review feedback appears to be uncommon, and there are no suggestions for how to gain the most benefit from the code review process. There are plenty of intrinsic benefits in code review that could be amplified by Exercism, but these are not communicated effectively such that users will learn about or be reminded of them.

References

Deci, E. L., & Ryan, R. M. (1980). Self-determination theory: When mind mediates behavior. The Journal of Mind and Behavior, 33-43.

Deci, E. L., Koestner, R., & Ryan, R. M. (1999). A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation.

Deterding, S., Sicart, M., Nacke, L., O'Hara, K., & Dixon, D. (2011, May). Gamification. using game-design elements in non-gaming contexts. In CHI'11 Extended Abstracts on Human Factors in Computing Systems (pp. 2425-2428). ACM.

Deterding, S. (2011). Situated motivational affordances of game elements': A conceptual model. Gamification: Using Game Design Elements in Non-Gaming Contexts. In A Workshop at CHI.

Deterding, S., Antin, J., Lawley, E., & Paharia, R. (2012). Gamification: Designing for Motivation. interactions, 19(4), 14-17.

Deterding, S. (2015). The lens of intrinsic skill atoms: A method for gameful design. Human–Computer Interaction, 30(3-4), 294-335.

Hamari, J., Koivisto, J., & Sarsa, H. (2014, January). Does Gamification Work? — A Literature Review of Empirical Studies on Gamification. In System Sciences (HICSS), 2014 47th Hawaii International Conference on (pp. 3025-3034). IEEE.

Kohn, A. (1999). Punished by rewards: The trouble with gold stars, incentive plans, A's, praise, and other bribes. Houghton Mifflin Harcourt.

Mekler, E. D., Brühlmann, F., Opwis, K., & Tuch, A. N. (2013, October). Do points, levels and leaderboards harm intrinsic motivation?: an empirical analysis of common gamification elements. In Proceedings of the First International Conference on gameful design, research, and applications (pp. 66-73). ACM.

Muntean, C. I. (2011, October). Raising Engagement in E-learning Through Gamification. In Proc. 6th International Conference on Virtual Learning ICVL (pp. 323-329).

Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American psychologist, 55(1), 68.

Seaborn, K., & Fels, D. I. (2015). Gamification in Theory and Action: A survey. International Journal of Human-Computer Studies, 74, 14-31.

Wang, Y., Li, H., Feng, Y., Jiang, Y., & Liu, Y. (2012). Assessment of Programming Language Learning Based on Peer Code Review Model: Implementation and Experience Report. Computers & Education, 59(2), 412-422.

Zichermann, G., & Cunningham, C. (2011). Gamification by Design: Implementing Game Mechanics in Web and Mobile Apps. " O'Reilly Media, Inc.".

Study Overview

The aim of this project will be to use gamification techniques to increase participation in code review on Exercism, and to measure how intrinsic motivation, measured by participation, is affected after removing gamification. “Gameful design should focus on challenges inherent in the user’s goal pursuit” (Deterding, 2015). With this in mind, I will design a skill atom around code review on Exercism that helps to motivate participation in such a way that is intrinsic to the act of code review, which will continue to motivate code review beyond the scope of Exercism.

Problem

Can gamification enhance participation on a voluntary without harming the intrinsic motivation for the task being motivated?

Hypothesis

Removing game elements and mechanics encouraging peer review on Exercism after they are present will not decrease participation below levels before they were introduced.

Control

Because participation may change over time due to extraneous variables, such as natural gaining or waning interest over time, a randomly-selected 50% of participants will form a control group that will be used to measure participation trends that are extraneous to the experiment. This control group will be presented with a version of Exercism that contains no gamification modifications.

Method

The duration of the study will be split into 3 ~~1-week~~ periods: baseline, gamification, and withdrawal. The gamification period will introduce game elements and mechanics, and the withdrawal period will restore things to how they were at the baseline.

Edit: See comment: use existing data as the baseline, try gamification for 2 weeks, and withdraw for 1 week before analysis.

Measurement

Only people who participate during both the baseline and gamification periods can qualify as participants in the study. Decreased code review participation is expected during the withdrawal period, but to confirm the hypothesis, participation levels should not fall below the control group's participation levels.

Participation will be measured in quantity and size of participants' comments. Size is approximated to be a rough, quantifiable measure of quality.

Following the experiment, a survey will be distributed to participants to gain further understanding of participants' self-reported motivations, goals, and experience.

Internal Validity

There are a few factors that will limit internal validity, or the degree to which the results are attributable to gamification and not some other rival explanation.

Participant count: the relatively small probable size of the participant pool will make it difficult to demonstrate statistical significance
Participants knowing that this project has a short-term life or is a toy project may decrease their motivation to participate in peer review. I will attempt to overcome this shortcoming by advertising that solutions and review comments may be transferrable to the official exercism.io site after the project, but this issue is nonetheless present.
The participants in this study are not of the same demographic that typically uses Exercism and may have different motivations, as described in the following section.

Participant Motivation

To test my hypothesis, I will need participants to use my modified Exercism platform. The best source of participants in this study would be the existing users of Exercism—the people who are already intrinsically motivated to use the system, as it’s entirely voluntary. These users generally have no extrinsic reward motivating them to use or continue using it.

Problematically, with a project term of only 9 weeks, the project would be infeasible with the inevitable delays associated with discussing with a large group how to approach gamification, peer review, acceptance, release, distribution, and user updates. To avoid these delays, I plan to distribute my own copy of Exercism, and solicit volunteers to use it from various sources.

Our class has been given “participation tokens”, which translate to points in our grade, to help encourage classmates to participate in each other’s projects. That's great, except research suggests these participation rewards will decrease intrinsic motivation, which I’m trying to measure. Would this undermine my entire experiment? Potentially not. Although providing participation tokens is indeed extrinsic motivation, if what I reward people for doing (e.g. participate on exercism, which usually takes the form of completing exercises) isn’t the same as what I measure (i.e. comments or code reviews), then I should still be able to measure changes in intrinsic motivation toward doing code review.

External Validity

If participation stays at least as high, we can deduce that intrinsic motivation to perform code review was not harmed. Can this be generalized? This study is largely a test of Deterding’s (2015) method of game design that is proposed to enhance intrinsic motivation. The results of this study cannot be generalized to gamification that does not follow this approach. Furthermore, this study looks at a voluntary activity and cannot be generalized to compulsory activities such as school or work performance, unless the activity is strictly voluntary within that context.

Schedule

Week 1 (ending Mar 12): “Plan”
- Design an experiment that will demonstrate that will explore whether or not engagement is damaged by my gamification implementation
- Post ideas for feedback on the Exercism discussion board
Weeks 2–3 (ending Mar 26): “Modify”
- Implement the proposed designs
- Deploy: Host the website and a build of the command-line tool
- Craft a pitch for requesting participation
- Create installation and uninstallation instructions for the forked projects
Weeks 4–6 (ending Apr 16): “Experiment”
- Collect experiment data
- Create a survey and send to participants
Week 7 (ending Apr 23): “Analyze”
- Analyze and report on data findings
- Report findings on the Exercism discussion board

Proposed Improvements

These are ordered by my original estimate of usefulness and ease of implementation. Even if these change, numbering will stay consistent.

1. Prompts ✔️

Implemented in exercism/exercism#3427.

Usefulness: High | Complexity: Low

Summary: Provide varied prompts to elicit thought about some aspect of the submission being read.

Goal: Turn more of the same into varying challenge. Varying challenges give the user experiences of mastery and help the task not become boring. This increases users' sense of competence and builds intrinsic motivation.

Examples:

Is this easy for you read and understand? What might make it easier?
Is there anything in this submission that you don't understand or have questions about?
Do you understand why the author made the design choices they made? In what situations might these decisions be good?
What trade-offs can you identify being made in this submission? When would this be a good choice, and when would you want to try something different?
Will this solution be performant at scale? Is efficient performance likely to matter for this problem?
Where does this solution's formatting stray from community style guides? If any, do the variances matter?
Would you want to want to work in a codebase comprised of code similar to this? What do you like and dislike about it?
What principles could the author learn and apply that would improve this solution?
Did this solution use an appropriate amount of abstraction? What changes might have made the code easier to understand?

2. Email notifications & digests

This is probably not suitable for this experiment. Although I think restoring informational email notifications would be very helpful for improving participation, on second thought, I am wary of using this as a measurement of change in intrinsic motivation. After beginning to receive emails, people will certainly expect that notifications will continue in the future, and assume that no email equates to no activity. Discussion #91 has some good ideas for the future.

Summary: Summary emails of others' comments on your submissions, and when there are reviews needed for stages you've completed.

Goal: Good games use A's action to call B to action, and vice versa. These interactions build users' sense of relatedness with others and therefore their intrinsic motivation.

3. Have another!

Usefulness: High | Complexity: High

Summary: Reveal the next entry after leaving a review/comment.

Goal: Provide the next best action to maintain flow.

I envision after clicking on the submit button, the next entry is revealed above the footer. This is similar concept to infinite scrolling, except the trigger is commenting, rather than scrolling to the bottom. Below the comment button, a message appears, worded something like:

username could use a review of this recent revision. Care to look before moving to the next exercise?

4. Onboarding

Usefulness: Medium | Complexity: Low

Summary: Once per user, introduce people to the benefits of reading code, asking questions, and offering feedback.

Goal: Share a clear, common goal connected with the user's motivation. Help make this connection for the user, which should help draw out intrinsic motivation.

Anyone want to help write this copy?

5. Sentence starters ✔️

Implemented in exercism/exercism#3435

Usefulness: Medium | Complexity: Low

Summary: Fill the comment box with a random example lead-off sentence.

Goal: Templates help overcome barrier of starting from a blank page, making the path easier to success. Helping users find success builds their sense of competence and therefore their intrinsic motivation.

6. Reuse comments ✔️

Implemented in exercism/exercism#3437

Usefulness: Medium | Complexity: Medium

Summary: Allow copying prior comments from history, so providing common feedback is less tedious.

Goal: Automate away what has already been mastered, so the user can focus on the real challenges. Focusing on the real challenges and not annoyances builds their sense of competence and therefore their intrinsic motivation.

7. Visualize impact of comments

Usefulness: Medium | Complexity: High

Summary: Graph over time (or otherwise display) the percentage submissions that were revised as a new iteration after you commented.

Goal: Provide informational feedback to build the contributor's sense of confidence.

The causation connection between a comment the presence of a new iteration is weak, but there can often be a correlation.

This could be gamed by leaving a short comment on everything, or on users' entries who already clearly do several iterations. But if this graph was just for yourself to see, would people really be motivated do that?

8. Call to Action ✔️

Implemented in:

Usefulness: High | Complexity: Medium

Summary: After submitting an exercise in the CLI, give a call to action to review other submissions. Talk briefly about the benefits, and suggest a goal.

Goal: Provide the next best action to maintain flow. Help make the connection between code review and the user's own motivation for using exercism. Set a goal that will help accomplish this purpose.

kytrinyx · 2017-03-09T05:45:06Z

@nilbus this is phenomenal. I can't wait to dig into the details!

@exercism/reboot Please take some time to read through this—it's directly related to the questions we're digging into this week.

kytrinyx · 2017-03-09T05:56:08Z

I will go through this in depth tomorrow, but I wanted to answer the following question:

Would you be open to experimenting on public exercism.io?

Yes, absolutely. Everything on Exercism is an experiment, and if you would be willing to help decide the parameters for how to experiment on the live site to get the best data, I would be delighted to help make it so.

tleen · 2017-03-09T15:29:45Z

Participation is voluntary, and there are no serious consequences.

I would suggest that for some users there are consequences that are perhaps not serious, but do mean something:

Reputation: poorly reviewing some work could affect reputation. Although Exercism is a kind and welcoming environment compared to other code reviews there may still be some sense of ego. Missing something obvious in a review or just not knowing what to review (users on tracks may be brand new to the language) could be an ego hit. Ego has always been tied up in with programmers and programming, it is unavoidable and reviews tie directly into it. Then you think about adding your own reviews and that becomes another place where your perceptions can be judged. Since they are optional, its an easy skip if you are nervous about putting yourself out there.
Codeschools/Bootcamps: I have been under the impression since first using Exercism that the bootcamp/codeschool movement does use the system as a way to give their students practice. In that sense the work may not be completely voluntary and code judgements may have more of an effect? I don't know if this is something to consider.

Prompts is an idea that has been batted around and I think will at least handle some of the initial uncertainness of a reviewer new to a language wondering what to say, or if to say anything at all! Prompts to start a review and prompts in the review process.

@nilbus great work on this!

nilbus · 2017-03-09T15:35:18Z

👍 I agree that reviews should remain voluntary, at least as far as what the system enforces (we we have no control of third party programs). They should be encouraged explicitly for the learning benefit, and not mandatory.

kytrinyx · 2017-03-09T19:02:04Z

In summary, a gamified system can preserve and enhance intrinsic motivation by aligning goals, and designing for autonomy through informational messaging, voluntary participation, and avoiding serious consequence. This approach has been demonstrated to work well in case studies such as Deterding's (2015), and seems applicable in the context of Exercism.

Thank you so much for going into such detail on the actual research into intrinsic/extrinsic motivation and rewards. I've long suspected that gamification could work, I'm just so scared of messing with something that I don't understand, and which clearly can go badly wrong (destroy intrinsic motivation).

This reassures me that the principles are known, and the levers can be used effectively.

For exercise completion, every component necessary for a gameful skill atom is present.
[...]
The code review system, viewed through a skill atom lens, is poorly designed for amplifying intrinsic value.

This is a very useful analysis, and explains very coherently the discrepancies we're seeing in the participation between exercise completion and code review.

Problematically, with a project term of only 9 weeks, the project would be infeasible with the inevitable delays associated with discussing with a large group how to approach gamification, peer review, acceptance, release, distribution, and user updates. To avoid these delays, I plan to distribute my own copy of Exercism, and solicit volunteers to use it from various sources.

I would be perfectly happy to run your experiment as described, without the large-group discussion, with a 50/50 division into experimental and control groups.

I don't know whether it would be useful to use the entire population, including those who already participate in conversations, or just users who are new during the baseline week, or something in between.

Following the experiment, a survey will be distributed to participants to gain further understanding of participants' self-reported motivations, goals, and experience.

This is the only piece that I'm unsure of if we use the full population: we don't have email addresses for everyone on the platform, only for those who have a public email address on their GitHub profile.

nilbus · 2017-03-09T19:11:12Z

Following the experiment, a survey will be distributed to participants to gain further understanding of participants' self-reported motivations, goals, and experience.

I wouldn't expect to do a survey with exercism.io users; only participants who knew they were participating in a study, like my classmates and coworkers that I might recruit from.

With your cooperation, I think performing the experiment with exercism.io users would be far more effective, and I could spend more time implementing actual features, and less on fork distribution. I'm really happy to hear that you're interested in this.

I don't know whether it would be useful to use the entire population, including those who already participate in conversations, or just users who are new during the baseline week, or something in between.

I am actually more interested in existing users than new users. Remember, we're trying to answer the question of whether or not taking away gamification will cause existing users to participate less.

nilbus · 2017-03-09T22:02:48Z

Next steps:

I'll start work tonight on describing my proposed changed in more detail, so people can offer feedback, and we can agree on priority order.
I'll open pull requests to both exercism/exercism.io and exercism/cli to introduce feature-flag-like code to turn on and off features at set times and manually for testing or rollback.
Implement experimental improvements, and deploy one at a time for testing.

nilbus · 2017-03-10T01:34:01Z

@exercism/reboot For generating ideas that may work better in the context of the reboot, consider also reviewing Deterding's framework that I used as a guideline when generating my own ideas for exercism: The Lens of Intrinsic Skill Atoms: A Method for Gameful Design [pdf], especially but not limited to section 4. I found his method useful, and the case studies illustrate how it has been used on some real projects.

kytrinyx · 2017-03-10T02:44:08Z

I wouldn't expect to do a survey with exercism.io users; only participants who knew they were participating in a study, like my classmates and coworkers that I might recruit from.

Oh, excellent. In that case I see no problem with this.

I am actually more interested in existing users than new users. Remember, we're trying to answer the question of whether or not taking away gamification will cause existing users to participate less.

Excellent—that actually simplifies things. We can run it against a 50/50 split of everyone, and then the analysis can include whoever you want, as long as we capture the data you need to do the segmentation.

I'll open pull requests to [...] exercism/cli to introduce feature-flag-like code

The CLI might actually pose a problem, since most people don't upgrade regularly. If we can do it by putting feature flags in the API and the site only, that would be much easier.

consider also reviewing Deterding's framework

That is great, thank you! I'm adding it to our reboot resources.

nilbus · 2017-03-10T02:53:24Z

The CLI might actually pose a problem, since most people don't upgrade regularly.

This is true. The only change I'd like to make in the CLI is to the call-to-action wording. Even if most users don't get that change, some may, and it will still contribute. The change can be released once and automated to activate/deactivate/reactivate on pre-determined dates.

nilbus · 2017-03-10T03:58:14Z

Proposed improvements updated above.

kytrinyx · 2017-03-11T17:30:27Z

The suggested improvements all look excellent.

Anyone want to help write this copy?

I will send out a Behind the Scenes newsletter today or tomorrow, and will talk about this experiment, and ask for people to help with input on this.

It would be helpful to have a specific issue open just for the "write copy" (perhaps with a short intro about the context of the problem, and then linking to this for more depth). That will let us link directly to the task at hand for people who don't necessarily want to dig into all the details of everything else.

@nilbus Would you mind opening that issue here in this repository?

kytrinyx · 2017-03-11T17:32:54Z

[Email notifications & digests] is probably not suitable for this experiment. Although I think restoring informational email notifications would be very helpful for improving participation, on second thought, I am wary of using this as a measurement of change in intrinsic motivation. After beginning to receive emails, people will certainly expect that notifications will continue in the future, and assume that no email equates to no activity.

This is a really good point. Emails are going to be an important feature, and the experiments we might want to do with them are probably more about which emails, or multivariate tests on the content.

iHiD · 2017-03-11T18:01:07Z

I would rather we don't send emails at this stage. Emails have a degrading value (ie each email from a site gets a lower open and response rate) and I'd rather save this for a big push when we've finished the reboot.

kytrinyx · 2017-03-11T18:02:08Z

@iHiD Yeah, sorry I was unclear. My comment was more to the effect that if we do experiments (when we do the reboot) those are the types of experiments we might end up doing.

iHiD · 2017-03-11T18:02:53Z

@kytrinyx Ah. That makes total sense. Cool :)

nilbus · 2017-03-11T19:38:08Z

Would you mind opening that issue here in this repository?

@kytrinyx Will do, by end of day today.

nilbus · 2017-03-13T21:09:23Z

The experiment will do a one-week baseline test against the existing feature set, then test a set of experimental improvements for one week, and then finally we will remove the experimental features, which should provide a dataset that can speak to whether or not removing the gameful elements has a negative effect on people's intrinsic motivation to provide feedback.

I'm realizing now that 1 week of baseline measurements would have been necessary if I were starting from a fresh group, but on the live site, we already have years of baseline that can be measured after-the-fact using data already in our database. Let's instead aim for 2 weeks of experiment, and 1 week of withdrawal, after which I'll start analyzing the effect.

StudentOfJS · 2017-03-14T07:46:31Z

How about code review challenges as part of each track? This would be very appropriate for gaining real world skills, as developers are always faced with managing code bases that aren't their own. If you're rolling with a gamification upgrade (which could be really cool btw :-) ), then maybe you could offer bonus scores for code reviews. Or maybe, adding a real-time social chat platform and the option of requesting help when you are stuck and giving helpers extra credit / points?

nilbus · 2017-03-14T12:55:58Z

@StudentOfJS It's common to think about gamification and assume points, but I'm trying to avoid any extrinsic rewards that would detract from the intrinsic reward of the code review. See under Background Research the section Extrinsic Rewards Permanently Diminish Intrinsic Motivation. The good news is that there is enough intrinsic reward in the activity itself—improving your learning and helping people have their own rewards that apply outside exercism.

That aside, challenges that build real-world skills are exactly what we're looking for. Some of the suggested improvements are built with challenge in mind, but these are just a few ideas I came up with. What other challenges might we introduce that relate to the real-world code review skills? If they can provide useful feedback on user progress, even better.

kytrinyx · 2017-03-14T20:14:25Z

Let's instead aim for 2 weeks of experiment, and 1 week of withdrawal, after which I'll start analyzing the effect.

Yeah, that sounds good!

nilbus · 2017-03-18T03:06:57Z

If anyone is interested in helping implement any of the proposed improvements, the foundation is in place with the feature flag code now in master. We have just over 1 week before the experiment begins (Mon, Mar 27), and I'm confident that I won't be able to implement all these ideas in that period of time. The more we put in place, the stronger the results of the study are likely to be.

If you would like to participate in building these or have ideas of your own, please reply here, so we can coordinate.

IBwWG · 2017-03-23T14:03:15Z

I'm new to exercism but would love to help test either way. I also wanted to share a link for those interested, where Alfie Kohn addresses gamification a bit more directly: https://youtu.be/p_98XcxJqkw?t=16m

I'm quite interested in the results of your experiment. I'm glad you're doing a survey, as I think that's the only place any truth will be found about intrinsic motivation. Participation itself to me says nothing about it, but only about whether gamification increases participation. (Of course it does.)

But this brings up an interesting point from that video: Kohn suggests that if the goal of the system is something other than helping the individual learn, then we'll have quite a hard time making the system not reflect that. This is where I feel a bit conflicted about what I'm reading above. You've clearly put a lot of thought and research into the topic, but the stated goal is still to increase participation in code review.

This brings me to the proposed improvements (ignoring 2 since it's off the table at the moment.) Maybe something is lacking in my definition of gamification, but to me, items 1-6 do not seem like gamification at all (but do seem like fantastic ideas!) and item 7 does seem like gamification (and seems a bit of a red flag after reading or listening to Kohn.) Item 7 being the only item to mention the possibility of being gamed seems to fit this.

Is anyone else seeing this distinction? If so, maybe it would be worth separating groups a bit further into a control group, a group with items 1-6, a group with item 7, and perhaps a group with all of them. If not, could someone help me understand how items 1-6 are actually gamification?

kytrinyx · 2017-03-23T15:08:32Z

I'm glad you're doing a survey, as I think that's the only place any truth will be found about intrinsic motivation.

All of the research that I've read about the attenuation of intrinsic motivation bases its results on numbers, not on survey responses. The observed behavior is that where rewards reduce intrinsic motivation, if you take the rewards away, then they stop performing the target action, even if they performed the action with no rewards previously.

I think the survey will be interesting, but the numbers would presumably speak to this result.

the stated goal is still to increase participation in code review.

In my experience, people learn more from reading and reviewing other people's code than they do from doing the exercises themselves. However people don't know that until they've tried, and I'm interested in seeing what happens with this.

The experimental changes will be in effect for two weeks. I'm fine with that even if we find that the changes do affect intrinsic motivation.

to me, items 1-6 do not seem like gamification at all

From my reading of all of this (@nilbus, please correct me if I'm wrong), the experiment is designed around the concept of skill atoms, which has been shown to increase motivation. It is not necessarily gamification in the most common understanding of the word (points and badges), but does rely on principles of gameful design.

item 7 does seem like gamification (and seems a bit of a red flag after reading or listening to Kohn.)

Item 7 does not violate the principles of a sense of autonomy, voluntariness and lack of consequence. Since the behavior of the system does not change based on the graphs. Nobody else sees the graphs, and so there is no effect on reputation.

nilbus · 2017-03-27T15:30:28Z

Experimental features are live for 50% of users. If you don't see anything, you're probably in the control group. 😄 If you'd like to opt out of the experiment and see the new stuff, let me know.

codingthat · 2017-03-28T14:44:16Z

@nilbus You might as well add me in--given that I'm on this thread (I'm IBwWG, just merged accounts) I'm not really a great data point anyway :)

nilbus · 2017-03-29T01:04:02Z

@codingthat Done!

codingthat · 2017-03-29T19:55:12Z

Thanks @nilbus, I like the new way better, for sure!

This replaces the /stats redirect to the first track. See exercism/discussions#123 Feature flag: participation_stats This introduces a new plotting library, Plotly.js. See discussion in exercism#3445. The migration introduces postgres to the crc32 hashing function, so it can determine which users are in the experiment group and not.

This replaces the /stats redirect to the first track. See exercism/discussions#123 Feature flag: participation_stats This introduces a new plotting library, Plotly.js. See discussion in exercism#3445. The migration introduces postgres to the crc32 hashing function, so it can determine which users are in the experiment group and not. This branch will close exercism#3445.

nilbus · 2017-05-03T10:48:51Z

The results are in! This exploratory, short-term trial indicates that it may be possible to apply gamification to motivate peer review without harming intrinsic motivation. While not quite statistically significant (p=0.0757 exceeds the generally-accepted threshold for significance of p=0.05), results seemed to indicate that the modifications increased participation levels, and reverting the modifications showed no significant detriment. Although this short-term trial could show no significant influence, a longer study spanning more than two weeks per period might lend further support to Deterding’s ideas on gameful design and intrinsic motivation that were tested here.

PR coming soon that shows the results more clearly than existing stats.

kytrinyx · 2017-06-16T16:51:20Z

This has been a fantastic experience, and @nilbus, I suspect that we'll pull you in for some feedback on questions and choices in the prototype.

Thanks so much for taking the time to run this ✨ 💛 ✨

nilbus · 2017-07-13T19:20:37Z

Experiment results are published in a more readable format at http://exercism.io/stats. 🎉
These have been up for a little while, but I hadn't announced here yet.

We've decided to test each feature individually, so we can test for the statistical significance of each one's impact on participation. We'll keep the ones that help and ditch the ones that don't.

kytrinyx · 2017-07-13T20:57:55Z

That's exciting, @nilbus, thanks!

nilbus · 2017-07-15T00:57:16Z

@kytrinyx I'll enable the first feature tomorrow morning—the CLI call to action. Stats can be measured after-the-fact, so it's fine that I won't have updated the stats page to show both the new and prior experiment results separately yet. I'll do that next.

kytrinyx · 2017-07-15T01:57:20Z

@nilbus that sounds great. I've emailed you separately about some thoughts related to the redesign process.

nilbus · 2017-07-16T00:28:14Z

In light of the imminent launch of "nextercism" rewrite of the site, @kytrinyx and I decided to focus on experimentation on the new site rather than here. As of his morning, we have enabled most of the features implemented as part of this experiment for all users. Feedback welcome!

nilbus mentioned this issue Mar 9, 2017

Increasing amount and quality of participation in code review / conversations #88

Closed

kytrinyx added the topic/design-research label Mar 9, 2017

nilbus mentioned this issue Mar 12, 2017

Introduction to the benefits of reading and reviewing code #125

Closed

nilbus mentioned this issue Mar 15, 2017

Add feature flag support with flipper exercism/exercism#3408

Merged

4 tasks

This was referenced Mar 18, 2017

Display post-submission call to action message from submission API response exercism/cli#377

Merged

Introduce "what next instructions" to submission API response exercism/exercism#3414

Merged

This was referenced Mar 26, 2017

Add random sentence starter placeholders for comments exercism/exercism#3435

Merged

Facilitate comment reuse with a new History tab exercism/exercism#3437

Merged

kytrinyx mentioned this issue Mar 28, 2017

Offer and Receive feedback & Featured solutions or users exercism/exercism#2803

Closed

kytrinyx mentioned this issue Mar 30, 2017

LinkedIn Badges and GitHub intergration exercism/exercism#3361

Closed

nilbus mentioned this issue Apr 2, 2017

Graph experiment statistics exercism/exercism#3445

Closed

nilbus mentioned this issue Apr 3, 2017

Participation stats exercism/exercism#3447

Merged

20 tasks

kytrinyx added concerning/product and removed concerning/open-source concerning/product labels Jun 3, 2017

kytrinyx closed this as completed Jun 16, 2017

nilbus mentioned this issue Jun 29, 2017

Overhaul experiment result stats page exercism/exercism#3574

Merged

kytrinyx mentioned this issue Jul 8, 2017

Top 5 implementation exercism/exercism#3586

Closed

nilbus mentioned this issue Oct 19, 2022

Introduce feature flags for experimentation exercism/exercism#3407

Closed

Studying game design for increasing code review participation #123

Studying game design for increasing code review participation #123

Comments

nilbus commented Mar 9, 2017 • edited

Outline

Background Research

Motivating Effective Learning Practices

Gamification Can Be Quite Effective

Extrinsic Rewards Permanently Diminish Intrinsic Motivation

Avoiding Harming Intrinsic Motivation

Amplifying the Joy of Code Review

Exercise Completion

Code Review

References

Study Overview

Problem

Hypothesis

Control

Method

Measurement

Internal Validity

Participant Motivation

External Validity

Schedule

Proposed Improvements

1. Prompts ✔️

2. Email notifications & digests

3. Have another!

4. Onboarding

5. Sentence starters ✔️

6. Reuse comments ✔️

7. Visualize impact of comments

8. Call to Action ✔️

kytrinyx commented Mar 9, 2017

kytrinyx commented Mar 9, 2017

tleen commented Mar 9, 2017

nilbus commented Mar 9, 2017

kytrinyx commented Mar 9, 2017

nilbus commented Mar 9, 2017

nilbus commented Mar 9, 2017 • edited

nilbus commented Mar 10, 2017

kytrinyx commented Mar 10, 2017

nilbus commented Mar 10, 2017

nilbus commented Mar 10, 2017

kytrinyx commented Mar 11, 2017 • edited

kytrinyx commented Mar 11, 2017 • edited

iHiD commented Mar 11, 2017

kytrinyx commented Mar 11, 2017

iHiD commented Mar 11, 2017

nilbus commented Mar 11, 2017

nilbus commented Mar 13, 2017

StudentOfJS commented Mar 14, 2017

nilbus commented Mar 14, 2017

kytrinyx commented Mar 14, 2017

nilbus commented Mar 18, 2017

IBwWG commented Mar 23, 2017

kytrinyx commented Mar 23, 2017

nilbus commented Mar 27, 2017

codingthat commented Mar 28, 2017

nilbus commented Mar 29, 2017

codingthat commented Mar 29, 2017

nilbus commented May 3, 2017

kytrinyx commented Jun 16, 2017

nilbus commented Jul 13, 2017

kytrinyx commented Jul 13, 2017

nilbus commented Jul 15, 2017

kytrinyx commented Jul 15, 2017

nilbus commented Jul 16, 2017

nilbus commented Mar 9, 2017 •

edited

nilbus commented Mar 9, 2017 •

edited

kytrinyx commented Mar 11, 2017 •

edited

kytrinyx commented Mar 11, 2017 •

edited