# Growth Maturity Decline for Manuscripts

This Jupyter-lab notebook is about creating functions defining and showing the CHAOSS-GMD metrics. These functions will be incorporated into the Manuscripts project. We will also be testing visualizatons in this notebook.

We'll be using the local elasticsearch instance and an already inserted index. This index has been created using the `p2o.py` script from the grimoirelab-toolset.

We will start by importing the necessary libraries and initializing the necessary variables:

In [1]:
import os
import elasticsearch

from elasticsearch_dsl import Search
from pprint import pprint

import altair as alt
import pandas as pd

from datetime import date, timezone
from dateutil import parser, relativedelta

from manuscripts.manuscripts import esquery
from manuscripts.manuscripts import metrics

In [2]:
# address of the local elasticsearch instance
es_url = "http://localhost:9200"

# names of the git and github indices to be used
git_index = "perceval_git"
github_index = "perceval_github"

# time interval in which the analysis has to be done
end_date = parser.parse(date.today().strftime('%Y-%m-%d')).replace(tzinfo=timezone.utc)
# start_date = end_date - relativedelta.relativedelta(months=18) 
start_date = date(2014, 1, 1)

The idea here is to divide the current reporting system into two parts: The metrics that are currently generated will have no change. 
Other than that, specific CHAOSS metrics can be generated using the `--chaoss` flag when calling the manuscripts command.

We can start by adding a class method named `def get_chaoss_metrics(cls):` to each of the classes(git, github, gerrit, its) in the data sources. For example, in the github.py file, we can add the following section for CHAOSS metrics:
```
class GitHubIssues(its.ITS):
    name = "github_issues"
    
    @classmethod
    def get_chaoss_metrics(cls):
        return {
            "issue_resolution" : {
                "open": [Open],
                "closed": [Closed],
                "issue_resolution_efficiency": [],
                "open_issue_age": [],
                "first_response_to_issue_duration": [],
                "closed_issue_resolution_Duration": [],
            }
        }
```

We will go with the structure already defined in Manuscripts:

## Issue Resolution

Goal: Identify how effective the community is at addressing issues identified by community partcipants.

Name | Question
--- | ---
[Open Issues](activity-metrics/open-issues.md) | What is the number of open issues?   DONE
[Closed Issues](activity-metrics/closed-issues.md) | What is the number of closed issues?   DONE
[Issue Resolution Efficiency](activity-metrics/issue-resolution-efficiency.md) | What is the number of closed issues/number of abandoned issues? 
[Open Issue Age](activity-metrics/open-issue-age.md) | What is the the age of open issues? 
[First Response to Issue Duration](activity-metrics/first-response-to-issue-duration.md) | What is the duration of time for a first response to an issue?
[Closed Issue Resolution Duration](activity-metrics/closed-issue-resolution-duration.md) | What is the duration of time for issues to be resolved?

In [3]:
from manuscripts.manuscripts.metrics import github_issues, its, metrics

### Open issues

(Only create the classes that are not present and reuse code where possible)

In [4]:
# this class goes into the its.py file in manuscripts/metrics folder
# names of the classes will be changed according to the pattern used in that file

class ITSOpen(its.ITSMetrics):
    """ Tickets Open metric class for issue tracking systems """
    id = "open"
    name = "Open tickets"
    desc = "Number of tickets currently open"
    FIELD_COUNT = "id"
    FIELD_NAME = "url"
    FIELD_DATE = "created_at"

In [5]:
# this class goes into github_issues.py file in manuscripts/metrics folder

class Open(ITSOpen):
    ds = github_issues.GitHubIssues
    filters = {"pull_request": "false", "state": "open"}

The Open class can be called inside report.py file

In [6]:
open_issues = Open(es_url, github_index, start=start_date, end=end_date)

In [7]:
open_issues.get_agg()

22

### Closed issues

For this, we can used the already defined `Closed` class in github_issues.py file

In [8]:
closed_issues = github_issues.Closed(es_url, github_index, start=start_date, end=end_date)

In [9]:
closed_issues.get_agg()

113

### Issue Resolution Efficiency (What is the number of closed issues/number of abandoned issues?)

How do we say that an issue has been abandoned?

### Open issue age

*This is not a mere metric dear mortal.*


The documents are unclear if this only gives an average of the number of days since the open issues have been created or if individial days have to be looked at.
I thought it will be a good idea to look at all the issues that have been created and where they stand now. We can visualise them and do other cool stuff.

For this metric, we had to get data from 2 fields: `time_open_days` and `id_in_repo`. And both these values had to match each other, so simple aggregations wont work as the dicts are not ordered.

In [10]:
from manuscripts.manuscripts import esquery
from manuscripts.manuscripts.metrics import github_issues, metrics

class AgeOpenIssue(metrics.Metrics):
    ds = github_issues.GitHubIssues
    
    id = "age_open_issues"
    name = "Age of open issues"
    desc = "Number of days since the open issues were created"
    FIELD_COUNT = "id_in_repo"
    FIELD_NAME = "time_open_days"
    filters = {"pull_request": "false", "state": "open"}
    FIELD_LIST = ["time_open_days", "id_in_repo"]

    def get_agg(self):
        agg = super(type(self), self).get_agg()
        if agg is None:
            agg = 0  # None is because NaN in ES. Let's convert to 0
        return agg

In [11]:
age_open_issues = AgeOpenIssue(es_url, github_index, start=start_date, end=end_date)

Here the function `get_query_source` has been added to the `Metrics` class under `metrics.py`. This function gives returns the fields present in the FIELD_LIST varible and takes in a size parameter which tells elasticsearch how many entries to return.

In [12]:
age_open_issues.get_query()
q = age_open_issues.get_query_source(age_open_issues.get_agg())

In [13]:
age_open_issues.get_metrics_data(q)

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5},
 'hits': {'hits': [{'_id': 'fbe754f0a3cb220a57b1788852d8329873caa79d',
    '_index': 'perceval_github',
    '_score': 4.9321957,
    '_source': {'id_in_repo': '58', 'time_open_days': 606.51},
    '_type': 'items'},
   {'_id': 'f82ce08dde38d8924514df62aa2f73e2efb6d17c',
    '_index': 'perceval_github',
    '_score': 4.9321957,
    '_source': {'id_in_repo': '104', 'time_open_days': 496.61},
    '_type': 'items'},
   {'_id': 'a910a4e60fb632492c59414d71d3350cf0257c51',
    '_index': 'perceval_github',
    '_score': 4.9321957,
    '_source': {'id_in_repo': '319', 'time_open_days': 88.53},
    '_type': 'items'},
   {'_id': '98b606ab48ab43a97c3508ea06082dfc8e0d6bd6',
    '_index': 'perceval_github',
    '_score': 4.705124,
    '_source': {'id_in_repo': '91', 'time_open_days': 546.59},
    '_type': 'items'},
   {'_id': 'c312200c337184e9bcfe14721ae9c9b3c81377d9',
    '_index': 'perceval_github',
    '_score': 4.705124,
    '_

In [14]:
import json
age_open_issues.get_metrics_data(age_open_issues.get_query())

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5},
 'aggregations': {'1': {'value': 22}},
 'hits': {'hits': [], 'max_score': 0.0, 'total': 22},
 'timed_out': False,
 'took': 5}

### First Response to Issue Duration


### Closed Issue Resolution Duration

For these metrics, the definitions are unclear and we might have to add code into Perceval.

## Code Development

Goal: Identify how effective the community is at merging new code into the codebase.

Name | Question
--- | ---
[Code Commits](activity-metrics/code-commits.md) | What is the number of code commits? 
[Lines of Code Changed](activity-metrics/lines-of-code-changed.md) | What is the number of lines of code changed?
[Code Reviews](activity-metrics/code-reviews.md) | What is the number of code reviews?
[Code Merge Duration](activity-metrics/code-merge-duration.md) | What is the duration of time between code merge request and code commit?
[Code Review Efficiency](activity-metrics/code-review-efficiency.md) | What is the number of merged code changes/number of abandoned code change requests?
[Maintainer Response to Merge Request Duration](activity-metrics/maintainer-response-to-merge-request-duration.md) | What is the duration of time for a maintainer to make a first response to a code merge request?
[Code Review Iteration](activity-metrics/code-review-iteration.md) | What is the number of iterations that occur before a merge request is accepted or declined? 

### Code Commits

In [15]:
from manuscripts.manuscripts.metrics import git
num_commits = git.Commits(es_url, git_index, start=start_date, end=end_date)

In [16]:
num_commits.get_agg()

1176

### Lines of code changed

This metric will require for a new aggregation `sum` to be added into ElasticQuery class. The `sum` aggregation is very similar to `avg` aggregation. We can just reuse the same code and rename the function.

In [17]:
class LinesChanged(git.GitMetrics):
    """ Lines changed metric class for source code management systems """

    id = "lineschanged"
    name = "LinesChanged"
    desc = "Number of lines changed"
    AGG_TYPE = "sum"
    FIELD_COUNT = "lines_changed"

In [18]:
lines_changed = LinesChanged(es_url, git_index, start=start_date, end=end_date)

In [19]:
lines_changed.get_agg()

177142.0

Apart from LinesChanged, we can add LinesAdded and LinesRemoved to create a bargraph showing the distributions.

In [20]:
class LinesAdded(git.GitMetrics):
    """ Lines added metric class for source code management systems """

    id = "linesadded"
    name = "LinesAdded"
    desc = "Number of lines added"
    AGG_TYPE = "sum"
    FIELD_COUNT = "lines_added"
    
lines_added = LinesAdded(es_url, git_index, start=start_date, end=end_date)

lines_added.get_agg()

132512.0

In [21]:
class LinesRemoved(git.GitMetrics):
    """ Lines removed metric class for source code management systems """

    id = "linesremoved"
    name = "LinesRemoved"
    desc = "Number of lines removed"
    AGG_TYPE = "sum"
    FIELD_COUNT = "lines_removed"
    
lines_removed = LinesRemoved(es_url, git_index, start=start_date, end=end_date)

lines_removed.get_agg()

44630.0

## Community Growth

Goal: Identify the size of the project community and whether it's growing, shrinking, or staying the same.

Name | Quesiton
--- | ---
[Contributors](activity-metrics/contributors.md) | What is the number of contributors?
[New Contributors](activity-metrics/new-contributors.md) | What is the number of new contributors?
[Contributing Organizations](activity-metrics/contributing-organizations.md) | What is the number of contributing organizations? 
[New Contributing Organizations](activity-metrics/new-contributing-organizations.md) | What is the number of new contributing organizations?
[Sub-Projects](activity-metrics/sub-projects.md) | What is the number of sub-projects?