Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cEP-0020: Newcomer metrics and Gamification system #131

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
355 changes: 355 additions & 0 deletions cEP-0020.md
@@ -0,0 +1,355 @@
# Newcomer metrics and Gamification

| Metadata | |
| -------- | ------------------------------------------------------ |
| cEP | 20 |
| Version | 1.0 |
| Title | Newcomer metrics and Gamification system |
| Authors | Shrikrishna Singh <mailto:krishnasingh.ss30@gmail.com> |
| Status | Proposed |
| Type | Process |

## Abstract

This cEP describes how a gamification system will automate the process
of becoming a `Developer` from a `Newcomer`, as a part of this
[GSoC project](https://summerofcode.withgoogle.com/projects/#5892040252981248).

For the implementation of this process, we will incorporate a gamification
system which will allow us to track newcomer's progress and assign some
points to each of the activity they do. We will also have different levels
and give them badges on the basis of points earned and performed activities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

badges on the basis of points earned and performed activities

Im not sure about this.

badges are about skills, and points are not.

It should at least be

badges on the basis of performed activities and points given for those activities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

badges on the basis of how many times they have performed an activity.

by the newcomers.

So, every newcomer at coala will have to complete the processes involved in
the gamification system in order to become a developer at coala or to get the
**coala Developer** badge.

## Background

### Newcomer Experience

**The problems we have:**

- How to attract more newcomers
- How to cope with the increasing amounts of newcomers
- How to teach and help newcomers to learn faster so
we can keep working on coala at a high pace.
- How to improve newcomer -> dev/maintainer ratio
- How to increase amount of feedback we get
- How to use feedback to improve coala and processes

## Why Gamification?

As coala tries to be a welcoming organization for newcomers, it gives them
clear pathways and as much direct assistance as possible. However, it is
already known that newcomers face many barriers while attempting to contribute
to the open source community for the first time. Some barriers they face include
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to the open source community for the first time.

to open source for the first time, and additional barriers while attempting to contribute to projects run by the coala organisation.

The next sentence (Some barriers) is better as a new paragraph as it moves from broad&vague to specific information

orientation issues that can potentially demotivate newcomers from placing
their first contribution. On the other hand, gamification is widely used to
engage and motivate people to accomplish tasks and improve their performance.
Therefore, the goal of this project is to use gamification to orient and
motivate newcomers to overcome onboarding barriers to contribute and engage
with coala and its community.

## Gamification Elements

There will be three game design elements to help newcomers: Points, Levels
and Badges. For each of these elements, we will have a set of rules to
describe the operation of the gamified environment. i.e.:

### Points and Rules

Let’s start by defining some rules for earning points. Points are a
simple tool to reward behavior, and they provide an excellent base upon which to
build a gamified design. The first step is to define the activities that we want
newcomers to take. i.e.:

- Created Issues by a Newcomer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the paragraph above you talk about rules, and points, but dont define any rules, or how points are calculated.

The first real specific information here is ... these activities.

We will be recording how many issues the person creates?
If so, that is the most basic element in the design.

Create a separate section for Activities, put it before all of this points, levels, badges stuff.

- Created Pull Requests by a Newcomer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misusing by making excessive PRs (knowingly or unknowingly)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya we need to track closed-not-merged PRs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" by a Newcomer" - eh ... this cEP is all about newcomers, so why mention it here and in other activities?

non-Newcomer stuff is out of scope for your project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prnvdixit @sks444 , yes we need to think about abuse. I've mentioned it frequently, but it isnt dealt with in detail in the cEP yet, probably as the bottom half hasnt been overhauled yet.

The problem is where do we record abusing behaviour so that this system can use it. Obtaining a feed of feedback from corobo is possible approach, but corobo is currently not storing state, and I doubt we would do that.

Meta-review provides a lot of opportunity for storing state, but it doesnt cover everything. It especially does not cover gitter/IM behaviour.
We may want to look at replies to a newcomer IM messages as a way to create a feedback loop there.

One concern with the schema design below is that the points dont seem to link to specific actions, so it wont be possible to use the data to see which actions counted and which ones didnt, which makes it harder to diagnose why a score isnt as high as it should be, which becomes especially relevant if there are rules aimed at preventing scores for potentially abuse behaviour.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern with the schema design below is that the points dont seem to link to specific actions, so it wont be possible to use the data to see which actions counted and which ones didnt, which makes it harder to diagnose why a score isnt as high as it should be,

This gets me to thinking - Shouldn't there be any logs being maintained for all increments/decrements of points (I'm thinking of a possibility if something goes wrong, we can just use (maybe manually) some deferred or immediate modification recovery)?

- Number of Gitmate errors on the PR
- CI status of the PR
- Number of reviews done by a Newcomer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spammy reviews - Things which does count as a review but isn't actually a review to be specific. Eg. Just adding a review Check coala.io/commit. would be suitable for most of newcomer PRs. Valid but insufficient review.
Adding this is not necessarily part of the goal ^^ 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Spammy reviews will be covered in the meta review projcet #131 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at that cEP as well then in a day or two, travelling rn 😅

- Number of comments in one review by a Newcomer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some people can just review for the sake of points - As @prnvdixit said in his review......... Maybe add a system to allow PR owner to give points to the reviewer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I also had an idea for that is to also check the reactions on the review comments, E.g.: if its thumbs up, we will assign him the points and vice versa, But this thing is taking into consideration in the very vague way in the meta-review project and as John said here

it will be providing data to the community data model

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense then 😄

- Comments on the Issues by a Newcomer
- Activities on Gitter channels
- Introduce `.coafile` in other projects

Then we will assign some points to each of the activity a newcomer perform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

who is "we". Remove that word everywhere. Be specific about who will do this, even if it is only 'the organisation' vs 'this project'.

i.e. does this project define these values, or will the organisation be able to configure it? Will the organisation be able to change these values any time they like?

E.g.: If a user is creating a difficulty/newcomer issue he will be awarded
with `10` points, similarly, the value of the points will be increased
according to the difficulty level of the issue.

### Levels

To increase a player’s competitive instinct, we can implement a series of
levels that confer rank as newcomers become more active. A good and
straightforward way to award levels in a new gamified design is to base them
on point thresholds. As players earn points, they move up an incremental
series of levels. To infuse the level with meaning i.e.: a ranking within the
system, the levels should be named in a way that indicates status.

- Level-I:
- Name: Beginner
- Points_required = 50
- Level-II:
- Name: Intermideate
- Points_required = 150
- Level-III:
- Name: High
- Points_required = 300

More complex designs, such as awarding levels based on consecutive activities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If levels can be achieved by doing a set of activities, how does that differ from a badge? When should we use a level vs a badge?

If level is only points based, people will expect to be able to compare the levels sanely. Two different newcomers doing different types of activities might have similar points - ensuring they are on a similar level depends on getting the points allocation correct. ouch.

If level is only points based, the proposed model is quite similar to stackoverflow with

  • points - reputation,
  • badges (customised sets of pre-reqs),
  • and privileges (point count thresholds)

I think most people earn badges and privileges on stackoverflow at roughly the same rate, but I guess some people might get lots of points without getting new badges.

If we are doing points only levels, we probably want to do the same as stackoverflow, allocating privileges to point thresholds, otherwise these levels are not very valuable.
This means people dont try to compare the effort by two people with similar levels -- when a person gets new privileges they usually start using them, and become more active. As the levels in stackoverflow have exponentially increasing thresholds, two people on the same level are busy trying to use those privileges to get to the next level.

If we dont allocate some value to these points thresholds, people will only aim for the badges.

Copy link
Member Author

@sks444 sks444 May 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What privileges could we give to them in each of the levels?

  • The one I could think of is to give them permission to use corobo after the completion of level one?

Also, it would be good to have some privileges after a newcomer has earned the coala Developer badge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one suggested by @ishanSrt:

  • Give permission to ping maintainers on gitter after some points threshold.

are also possible. E.g.:

- Level-I:
- Joining our Gitter channel and the community
- Running CI tools on your fork
- Running coala-ci on a popular GitHub repo
- Level-II:
- Getting assigned to a difficulty/newcomer issue
- Creating and merging PR for that issue
- Reviewing at least one difficulty/newcomer issue
- Level-III:
- Getting assigned to a difficulty/low issue
- Creating and merging a PR for that issue
- Reviewing at least one difficulty/low issue

But that would be hard to implement and out of the scope of this project
as we can't automatically prevent newcomers from performing the activities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we can't automatically prevent newcomers ...

This is not relevant to gamification. It is not expected that participants do not do all activities in order; it is expected that the gamification changes their level when they have met the pre-reqs, irrespective of what additional activities they have done which are not related to the next level they are seeking.

which comes under upcoming levels.

### Badges

Another way of explicitly nudging a newcomer to action is to award badges for
completing tasks. Badges enable newcomers to follow their performance and compare
to other newcomers.
Badges will be awarded after performing certain activities by the newcomers:

- Badge-I:
- Name: The bug finder
- Details: The one who find bugs in the existing codebase
- Activities: Created at least 2 or more issues which has a label bug
- Badge-II:
- Name: The reviewer
- Details: The one who review others pull requests
- Activities: Reviewed at least 4 or more PRs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad example at the moment, as reviews will rely on meta-review.

IMO replace this badge with something else, or just remove it.

- Badge-III:
- Name: The coder
- Details: The one who code
- Activity: Merged at least 3 or more PRs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Activity / Activities item is the important one.

To begin with, if it can include multiple, then it should always be Activities: and have sub-items underneath it.

It should use the same activity name as mentioned in the activities listed earlier, with a note that it needs to have been done three times.

- Badge-IV:
- Name: The coala Developer
- Details: The ones who are in the coala developers team
- Activities:
- Introduce `.coafile` in other projects
- Merge a difficulty/newcomer Pull Request
- Review at least a difficulty/newcomer Pull Request
- Merge a difficulty/low Pull Request
- Review at least a difficulty/low or higher Pull Request

So, for becoming a developer at coala, a newcomer must complete all the
activities mentioned in the `coala Developer` badge. But in case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you accidentally omit something here? The sentence is incomplete. 😄


This project only focus on coala newcomers team but in future, this system can
also, be extended for developers.

## Implementation

### Getting the data

#### GitHub/GitLab

For getting the data related to newcomers activity on GitHub/GitLab to the
community, we will use [IGitt](https://gitlab.com/gitmate/open-source/IGitt),
and since, our community repo is based on Django, we will use
[igitt-django](https://gitlab.com/gitmate/open-source/igitt-django) to do the
initial setup of IGitt with Django. We will first implement `igitt-django` in
the [webservices](https://gitlab.com/coala/landing/) and then import the useful
data to the community repo with the use of our APIs.

#### Gitter

For getting the stats of newcomer's messages, we will use
[Gitter API](https://developer.gitter.im/docs/messages-resource) and import
all the messages to newcomers or by newcomers in our webservices with some
textual analysis for avoiding spam. And then we will import the final analysis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the benefit of avoiding spam. we dont get much of it, and it will be associated with users who dont contribute code (are not ever going to be developers), so it will just be a tiny bit of unfindable data in the DB that is never surfaced on the website.

IMO not worth the effort of removing it.

It is easier for us to remove real spam from gitter by deleting the messages.

of the messages to our community repo for tracking and gamification process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the analysis, and algorithms, belong in the community repo, as that is where you have the django gamification tools.


We will have different Django models for each category like for GitHub/GitLab
data we will have models named `PullRequest`, `Issue`, `Comment` and for Gitter
data we will have models named `GoodMessage`, `BadMessage` and each of these
models will be connected to a `User` model which will help us to track the
activity of each user.

### Tracking Activities

After we have the data, we need to track important activities mentioned in
the above section

We will have a dashboard showing the tracked metrics of the user progress
which will motivate newcomers to work faster to compete with others.

#### Code Implementation

Assuming that we are using
[django-trackstats](https://github.com/pennersr/django-trackstats) for tracking
the statistics of newcomers, then we will make different domains of similar set
of metrics like for activities related to pull requests, we will have a domain
named PullRequest. i.e:

```python
from trackstats.models import Domain
Domain.objects.PullRequest = Domain.objects.register(
ref='pullrequest',
name='pullrequest')
```

And then we will define defferent metrics to track in each of the domain:

```python
Metric.objects.PULL_REQUEST_COUNT = Metric.objects.register(
domain=Domain.objects.PullRequest,
ref='pull_request_count',
name='Number of pull requests opened by the user')
```

Now we can store these metrics for the time period we want:

```python
from trackstats.models import StatisticByDate, Domain, Metric, Period
n = Order.objects.all().count()
StatisticByDate.objects.record(
metric=Metric.objects.PULL_REQUEST_COUNT,
value=n,
Period=Period.LIFETIME)
```

### Gamification

As we have already discuss the game elements involved in the gamification
system, we will assign some points to each of the activities we will be
tracking for a newcomer and then accordingly we will unlock the levels and
provide them badges.

#### Assign points to GitHub/GitLab activities

For assigning points to GitHub/GitLab activities the main factor of decision
will be the difficulty level of the issues.

E.g.: If a newcomer has created a difficulty/newcomer issue we will give
him `10` points, similarly we will increase `10` points according to the
increase in difficulty levels.

Similar concepts will be applied to the pull requests but we will also
consider the number of Gitmate errors on the PR which will help us in finding
that if the user has run coala locally before pushing his/her changes.

For activities related to the comments, we will also check if the newcomer
has made the review comment or the issue comment or its a simple comment on
the PR.

But this may lead to failure when the newcomers started doing comments just
for the sake of earning points, to avoid this type of spam we will use some
of the ideas from **Meta-review system**, i.e.: We will also check the reactions
on the comments and if it is Thumbs Down we will not give any points to the
newcomers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, the whole thread will be spammed and will become cumbersome to follow upon.
Maybe assigning negative points can be given a thought, which again can lead to certain kind of badges, followed by demotion or removal from the community. 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have discussions about negative points and we will implement it if it will not be so hard to implement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will use some of the ideas from Meta-review system

err, you dont need to do that. that is a separate GSoC project this year, and it will be providing data to the community data model, which your gamification system will hopefully use and/or provide the data model for. You probably want to go look at that cEP.


#### Assign points to Gitter activities

It is obvious that tracking “No. of Gitter messages” and giving points for
that will promote un-necessary spamming in the channel. So, to avoid that we
can apply some extra rules on Gitter messages, we will divide our message model
into two parts:

- Good behavior message model
- Bad behavior message model

So, before importing Gitter messages to our models we can check in which
category the message fall? We can do that by applying
[basic sentiment analysis](http://fjavieralba.com/basic-sentiment-analysis-with-python.html)
on the messages.

Basically, we will define two dictionaries of good and bad messages.
In the bad one, we will keep patterns like “please review [issue link]”,
“updated [issue link]”, “have a look [issue link]”, “[mention][issue link]”
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a sentiment analysis toolkit in mind? I suggest you try building a basic dictionary, and running some sample messages through the toolkit.
Then we try it, and see how robustly it handles variations on these example messages.

etc. So, that a newcomer doesn’t beg for review all the time.
In the good one, we will have the pattern of questions so we will generally
match “?” in the end. But what happens if a newcomer started asking irrelevant
questions? So for that, we can also define a rule like a newcomer can only ask a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'rules' needs to be a new paragraph, as these are not going to be achievable with sentiment analysis.

Probably even want sub-sections for sentiment analysis vs other rules.

certain number of questions in a day. Then we will track both the message model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm... "certain" seems a bit "harsh" - No? I mean everyone does ask a lot of questions during onboarding phase. 😅

of the newcomer and assign some points to both separately and see which model
metric has grown fast or slow while the whole newcomer process and not promote
them until they have learned to learn things by themselves.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole sentiment analysis approach needs to be revised IMO and should be clearly stated. Whatever you have mentioned is a little vague, no?
How about Can you review this? ? Begging for a review + Includes a question mark.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was just an idea from my side. Would be great if you could suggest me some patterns in both the category. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will follow along during the implementation phase of this part. ;)


#### Implementation

Now assuming the implementation of this process with
[django-gamification](https://github.com/mattjegan/django-gamification)
package, first we need to link our `user` model to the gamification interface:

```python
from .models import User
from django.db import models
from django_gamification.models import GamificationInterface

class UserModel(models.User):
interface = models.ForeignKey(GamificationInterface)
```

Then we can create our first level:

```python
from django_gamification.models import UnlockableDefinition

UnlockableDefinition.objects.create(
name='Beginner',
description='You’re a beginner at coala',
points_required=50
)
```

Similarly, we can create badges and award the user accordingly:

```python
from django_gamification.models import BadgeDefinition, Category

BadgeDefinition.objects.create(
name='Developer',
description='You’re now developer at coala',
points=300,
progression_target=100,
category=Category.objects.create(name='Developer Badges',
description='This is the best bagde'),
)
```

```python
from django_gamification.models import Badge

badge = Badge.objects.first()
badge.award()
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear, where will the newcomers be able to see these points and badges. thought of being set them beside their name in gitter like some actual game XP sounds cool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will make a dashboard in the community repo.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will consider it in future to add this functionality in corobo, but for now, I think its good to have the dashboard on a page.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea to add stuff like top commiters , top reviewers of the month/week in the community repo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds cool. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for this project, those stats pages would need to be limited to newcomers only.

reviewers data will come from the meta-review project -- reviews needs to be weighted (meta-review) to be useful, otherwise it encourages stupid reviews.

I think top commits can be obtained via the OpenHub data, however that can be a bit old, making it out of sync with reality, and thus not very good for weekly or even monthly stats. Still useful for leaderboard for newcomers, who generally progress quite slowly anyway.

Needs to include the affiliated and unaffiliated committers:

https://www.openhub.net/orgs/coala/?view=affiliated_committers
https://www.openhub.net/orgs/coala/outside_committers

For weekly & monthly, better to use 'merged PRs' as a rough substitute for commits. Possibly include lines added/removed/modified to weight PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the process of becoming a developer seems over simplified and developers seem overrated by that kind of saying. you seem to be doing too much work for such a little process plus I don't see people becoming (promoted to) developers just by completing that, it doesn't seem enough to get the idea of the project, I haven't seen people before gsoc becoming developers in less than 2 months after joining the org. maybe create an extra newbie developer team and 2 low issues more to join the developers team.
just some suggestions


## Conclusion

This project covers most of the problems a newcomer/maintainer at coala face
during the newcomer->developer process:

- A maintainer will don't have to check manually if the newcomer
has done all the activities involved in newcomer->developer process,
he/she can just look at the points and badges earned by the newcomer
through the gamification system.

- This project provides a fun way to work in the community, newcomers
will be motivated to work when they will earn points and badges for
even the tiniest bit of their contributions.

- Having a gamification system will be a good resource to attract more
newcomers and the automation process could easily deal with an increase
in the number of new contributors.

- In case if a newcomer is not able to complete the newcomer process
he/she can get other available badges based on their activities in
the community, which will be a kind of symbol of their talent and
work experience.