## CHAOSS METRICS

This is an attempt to map all the Metrics that manuscripts produces currently. I am not imepelemting this directly into manuscripts, because of the following reasons:
- I think I need to test a bit more how the new functions will work with the current metrics to be calculated
- The functions and classes that are used currently in manuscripts are quite different than how we are calculating the new metrics and the old ones using new functions. Hence the new functions cannot be directly plugged in.

In [1]:
from new_functions import Metric, calculate_bmi
from elasticsearch import Elasticsearch
# utility and support modules
from pprint import pprint
from datetime import datetime, timezone, timedelta

import pandas as pd
# declare the necessary variables
es = Elasticsearch("http://localhost:9200/")

github_index = "aima_github"
git_index = "aima_git"

start_date = datetime(2015, 1, 1)
end_date = datetime.now()
end_date = end_date.replace(hour=0, minute=0, second=0, microsecond=0)

Here is the PDF that is generated when we create a report using manuscripts as it is.

### OVERVIEW

- Activity metrics: we have to get the trend for these:
	- Closed PRs
	- Open PRs
	- Issues Open
	- Issues Closed
	- Commits created 
   

- Authors per interval selected: description: average number of developers per month by quarters (so we have the average number of developers per month during those three months). If the approach is to work at the level of month, then just the number of developers per month.


- BMI metrics: a little introduction about BMI- here, BMI calculates the efficiency of creating/closing Issues and PRs.
	- BMI of PRs: closed PRs/ submitted PRs in total and a trend showing the same ratio over the said interval(month, week, year) in the given range of time.
	- BMI for issues: same as PRs but for issues.


- Time to close metrics:
	- Median for Days to close a PR.
	- Median for Days to close an issue.

In [2]:
closed_pr = Metric("http://localhost:9200/", github_index, start_date, end_date)
closed_pr.add_query({"pull_request":"true"})
closed_pr.is_closed()
# get trend by month:
closed_pr.get_cardinality("id_in_repo").by_period()
print("Trend for month: ", closed_pr.get_trend())

# get trend by quarter:
closed_pr.get_cardinality("id_in_repo").by_period("quarter")
print("Trend for quarter: ", closed_pr.get_trend())

Trend for month:  (0, -100)
Trend for quarter:  (15, -893)


In [3]:
open_pr = Metric("http://localhost:9200/", github_index, start_date, end_date)
open_pr.add_query({"pull_request":"true"})
# get trend by month:
open_pr.get_cardinality("id_in_repo").by_period()
print("Trend for month: ", open_pr.get_trend())
# get trend by quarter:
open_pr.get_cardinality("id_in_repo").by_period("quarter")
print("Trend for quarter: ", open_pr.get_trend())

Trend for month:  (2, -250)
Trend for quarter:  (18, -738)


In [4]:
closed_issues = Metric("http://localhost:9200/", github_index, start_date, end_date)
closed_issues.add_query({"pull_request":"false"})
closed_issues.is_closed()
# get trend by month:
closed_issues.get_cardinality("id_in_repo").by_period(field="closed_at")
print("Trend for month: ", closed_issues.get_trend())
# get trend by quarter:
closed_issues.get_cardinality("id_in_repo").by_period("quarter")
print("Trend for quarter: ", closed_issues.get_trend())

Trend for month:  (0, -100)
Trend for quarter:  (6, -716)


In [5]:
open_issues = Metric("http://localhost:9200/", github_index, start_date, end_date)
open_issues.add_query({"pull_request":"false"})
open_issues.get_cardinality("id_in_repo").by_period()
print(open_issues.get_trend())

(0, -100)


In [6]:
commits = Metric("http://localhost:9200/", git_index, start_date, end_date)
commits.get_cardinality("hash").by_period()
print(commits.get_trend())

(0, -100)


In [7]:
# Issues closed in the last month:
issues = Metric("http://localhost:9200/", github_index, start_date, end_date)
issues.add_query({"pull_request":"false"})
issues.is_closed()
issues.get_cardinality("id")
# here taking a month made up of 30 days on an average
previous_month_date = end_date - timedelta(days=30)
issues.set_range(date_field="closed_at", start=previous_month_date, end=end_date)
issues.get_aggs()

3

In [8]:
# Issues opened in the last month:
issues = Metric("http://localhost:9200/", github_index, start_date, end_date)
issues.add_query({"pull_request":"false"})
issues.get_cardinality("id")
# May has 31 days
previous_month_date = end_date - timedelta(days=31)
issues.set_range(start=previous_month_date, end=end_date)
issues.get_aggs()

1

There is still a little problem on how the dates are being calculated, hence these values differ from the origin values 5 and 8 respectively

In [9]:
# PRs closed in the last month:
pr = Metric("http://localhost:9200/", github_index, start_date, end_date)
pr.add_query({"pull_request":"true"})
pr.is_closed()
pr.get_cardinality("id")
# May has 31 days
previous_month_date = end_date - timedelta(days=31)
pr.set_range(date_field="closed_at", start=previous_month_date, end=end_date)
pr.get_aggs()

9

In [10]:
# PRs opened in the last month:
pr = Metric("http://localhost:9200/", github_index, start_date, end_date)
pr.add_query({"pull_request":"true"})
pr.get_cardinality("id")
# May has 31 days
previous_month_date = end_date - timedelta(days=31)
pr.set_range(date_field="closed_at", start=previous_month_date, end=end_date)
pr.get_aggs()

9

In [11]:
# Percentile PR closed
PR = Metric("http://localhost:9200/", github_index, start_date, end_date)
PR.add_query({"pull_request":"true"})
PR.is_closed()
PR.get_percentile("time_to_close_days")
# May has 31 days
previous_month_date = end_date - timedelta(days=31)
PR.set_range(start=previous_month_date, end=end_date)
PR.get_aggs()

2.059999942779541

In [12]:
# Percentile issues closed
issues = Metric("http://localhost:9200/", github_index, start_date, end_date)
issues.add_query({"pull_request":"false"})
issues.is_closed()
issues.get_percentile("time_to_close_days")
# May has 31 days
previous_month_date = end_date - timedelta(days=31)
issues.set_range(start=previous_month_date, end=end_date)
issues.get_aggs()

There is no output for above because the answer is None!

### Communication Channels

Nothing here because all the communication for git and github happens via Issues and PRs

### Project Activities

In [13]:
# number of commits made by month and quarter
commits = Metric("http://localhost:9200/", git_index, start_date, end_date)
commits.set_range(start=start_date, end=end_date)
commits.get_cardinality("hash").by_period()
print(pd.DataFrame(commits.get_ts()))
commits.get_cardinality("author_uuid").by_period()
print(pd.DataFrame(commits.get_ts()))

                        date  value      unixtime
0   2015-01-01T00:00:00.000Z      0  1.420070e+09
1   2015-02-01T00:00:00.000Z      0  1.422749e+09
2   2015-03-01T00:00:00.000Z      0  1.425168e+09
3   2015-04-01T00:00:00.000Z      0  1.427846e+09
4   2015-05-01T00:00:00.000Z      0  1.430438e+09
5   2015-06-01T00:00:00.000Z      0  1.433117e+09
6   2015-07-01T00:00:00.000Z      0  1.435709e+09
7   2015-08-01T00:00:00.000Z      0  1.438387e+09
8   2015-09-01T00:00:00.000Z      0  1.441066e+09
9   2015-10-01T00:00:00.000Z      0  1.443658e+09
10  2015-11-01T00:00:00.000Z      0  1.446336e+09
11  2015-12-01T00:00:00.000Z      0  1.448928e+09
12  2016-01-01T00:00:00.000Z      0  1.451606e+09
13  2016-02-01T00:00:00.000Z      3  1.454285e+09
14  2016-03-01T00:00:00.000Z    311  1.456790e+09
15  2016-04-01T00:00:00.000Z     74  1.459469e+09
16  2016-05-01T00:00:00.000Z     14  1.462061e+09
17  2016-06-01T00:00:00.000Z     40  1.464739e+09
18  2016-07-01T00:00:00.000Z     24  1.467331e+09


In [14]:
type(commits.get_ts()['date'][0])

str

### Community

In [15]:
# number of commits made by month and quarter
commits = Metric("http://localhost:9200/", git_index, start_date, end_date)
commits.set_range(start=start_date, end=end_date)
commits.get_cardinality("author_uuid").by_period()
print(pd.DataFrame(commits.get_ts()))

                        date  value      unixtime
0   2015-01-01T00:00:00.000Z      0  1.420070e+09
1   2015-02-01T00:00:00.000Z      0  1.422749e+09
2   2015-03-01T00:00:00.000Z      0  1.425168e+09
3   2015-04-01T00:00:00.000Z      0  1.427846e+09
4   2015-05-01T00:00:00.000Z      0  1.430438e+09
5   2015-06-01T00:00:00.000Z      0  1.433117e+09
6   2015-07-01T00:00:00.000Z      0  1.435709e+09
7   2015-08-01T00:00:00.000Z      0  1.438387e+09
8   2015-09-01T00:00:00.000Z      0  1.441066e+09
9   2015-10-01T00:00:00.000Z      0  1.443658e+09
10  2015-11-01T00:00:00.000Z      0  1.446336e+09
11  2015-12-01T00:00:00.000Z      0  1.448928e+09
12  2016-01-01T00:00:00.000Z      0  1.451606e+09
13  2016-02-01T00:00:00.000Z      1  1.454285e+09
14  2016-03-01T00:00:00.000Z     27  1.456790e+09
15  2016-04-01T00:00:00.000Z      9  1.459469e+09
16  2016-05-01T00:00:00.000Z      8  1.462061e+09
17  2016-06-01T00:00:00.000Z      3  1.464739e+09
18  2016-07-01T00:00:00.000Z      3  1.467331e+09


In [16]:
# Top committers in the previous month:
authors = Metric("http://localhost:9200/", git_index, start_date, end_date)
previous_month_date = end_date - timedelta(days=31)
authors.set_range(start=previous_month_date, end=end_date)
authors.get_terms(field="author_name")
authors.get_results()

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5},
 'aggregations': {'0': {'buckets': [{'doc_count': 5, 'key': 'Aman Deep Singh'},
    {'doc_count': 1, 'key': 'AdityaDaflapurkar'},
    {'doc_count': 1, 'key': 'DKE'},
    {'doc_count': 1, 'key': 'tbcdebug'}],
   'doc_count_error_upper_bound': 0,
   'sum_other_doc_count': 0}},
 'hits': {'hits': [], 'max_score': 0.0, 'total': 8},
 'timed_out': False,
 'took': 1}

In [17]:
# Top commiting orgs in the previous month:
orgs = Metric("http://localhost:9200/", git_index, start_date, end_date)
previous_month_date = end_date - timedelta(days=31)
orgs.set_range(start=previous_month_date, end=end_date)
orgs.get_terms(field="author_org_name")
orgs.get_results()

{'_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5},
 'aggregations': {'0': {'buckets': [{'doc_count': 8, 'key': 'Unknown'}],
   'doc_count_error_upper_bound': 0,
   'sum_other_doc_count': 0}},
 'hits': {'hits': [], 'max_score': 0.0, 'total': 8},
 'timed_out': False,
 'took': 1}

### Process

In [26]:
# Issues closed/ issues created

closed_issues = Metric("http://localhost:9200/", github_index, start_date, end_date)
closed_issues.add_query({"pull_request":"false"})
closed_issues.set_range(start=start_date, end=end_date)
closed_issues.is_closed()
closed_issues.get_cardinality("id").by_period()
closed_ts = closed_issues.get_ts()

opened_issues = Metric("http://localhost:9200/", github_index, start_date, end_date)
opened_issues.add_query({"pull_request":"false"})
opened_issues.set_range(start=start_date, end=end_date)
opened_issues.get_cardinality("id").by_period()
opened_ts = opened_issues.get_ts()

print(pd.DataFrame(calculate_bmi(closed_ts, opened_ts)))

     Period  Closed/Submitted
0   2015-01          0.000000
1   2015-02          0.000000
2   2015-03          0.000000
3   2015-04          0.000000
4   2015-05          0.000000
5   2015-06          0.000000
6   2015-07          0.000000
7   2015-08          0.000000
8   2015-09          0.000000
9   2015-10          0.000000
10  2015-11          0.000000
11  2015-12          0.000000
12  2016-01          0.000000
13  2016-02          1.000000
14  2016-03          0.923077
15  2016-04          0.750000
16  2016-05          1.000000
17  2016-06          1.000000
18  2016-07          0.000000
19  2016-08          1.000000
20  2016-09          1.000000
21  2016-10          1.000000
22  2016-11          0.500000
23  2016-12          0.000000
24  2017-01          0.666667
25  2017-02          0.666667
26  2017-03          0.833333
27  2017-04          0.933333
28  2017-05          1.000000
29  2017-06          0.714286
30  2017-07          1.000000
31  2017-08          0.727273
32  2017-0

In [27]:
# PRs closed/ PRs submitted

closed_pr = Metric("http://localhost:9200/", github_index, start_date, end_date)
closed_pr.add_query({"pull_request":"true"})
closed_pr.set_range(start=start_date, end=end_date)
closed_pr.is_closed()
closed_pr.get_cardinality("id").by_period()
closed_ts = closed_pr.get_ts()

opened_pr = Metric("http://localhost:9200/", github_index, start_date, end_date)
opened_pr.add_query({"pull_request":"true"})
opened_pr.set_range(start=start_date, end=end_date)
opened_pr.get_cardinality("id").by_period()
opened_ts = opened_pr.get_ts()

print(pd.DataFrame(calculate_bmi(closed_ts, opened_ts)))

     Period  Closed/Submitted
0   2015-01          0.000000
1   2015-02          0.000000
2   2015-03          0.000000
3   2015-04          0.000000
4   2015-05          0.000000
5   2015-06          0.000000
6   2015-07          0.000000
7   2015-08          0.000000
8   2015-09          0.000000
9   2015-10          0.000000
10  2015-11          0.000000
11  2015-12          0.000000
12  2016-01          0.000000
13  2016-02          0.000000
14  2016-03          1.000000
15  2016-04          1.000000
16  2016-05          1.000000
17  2016-06          1.000000
18  2016-07          1.000000
19  2016-08          1.000000
20  2016-09          1.000000
21  2016-10          1.000000
22  2016-11          1.000000
23  2016-12          1.000000
24  2017-01          1.000000
25  2017-02          1.000000
26  2017-03          1.000000
27  2017-04          1.000000
28  2017-05          1.000000
29  2017-06          1.000000
30  2017-07          1.000000
31  2017-08          1.000000
32  2017-0

In [30]:
# days to close review(PR) average
closed_pr = Metric("http://localhost:9200/", github_index, start_date, end_date)
closed_pr.add_query({"pull_request":"true"})
closed_pr.set_range(start=start_date, end=end_date)
closed_pr.is_closed()
closed_pr.get_average("time_to_close_days").by_period()
print(pd.DataFrame(closed_pr.get_ts()))

                        date       value      unixtime
0   2015-01-01T00:00:00.000Z         NaN  1.420070e+09
1   2015-02-01T00:00:00.000Z         NaN  1.422749e+09
2   2015-03-01T00:00:00.000Z         NaN  1.425168e+09
3   2015-04-01T00:00:00.000Z         NaN  1.427846e+09
4   2015-05-01T00:00:00.000Z         NaN  1.430438e+09
5   2015-06-01T00:00:00.000Z         NaN  1.433117e+09
6   2015-07-01T00:00:00.000Z         NaN  1.435709e+09
7   2015-08-01T00:00:00.000Z         NaN  1.438387e+09
8   2015-09-01T00:00:00.000Z         NaN  1.441066e+09
9   2015-10-01T00:00:00.000Z         NaN  1.443658e+09
10  2015-11-01T00:00:00.000Z         NaN  1.446336e+09
11  2015-12-01T00:00:00.000Z         NaN  1.448928e+09
12  2016-01-01T00:00:00.000Z         NaN  1.451606e+09
13  2016-02-01T00:00:00.000Z         NaN  1.454285e+09
14  2016-03-01T00:00:00.000Z    1.375500  1.456790e+09
15  2016-04-01T00:00:00.000Z    0.282105  1.459469e+09
16  2016-05-01T00:00:00.000Z    1.386000  1.462061e+09
17  2016-0

In [31]:
# days to close review(PR) average
closed_pr = Metric("http://localhost:9200/", github_index, start_date, end_date)
closed_pr.add_query({"pull_request":"true"})
closed_pr.set_range(start=start_date, end=end_date)
closed_pr.is_closed()
closed_pr.get_percentile("time_to_close_days").by_period()
print(pd.DataFrame(closed_pr.get_ts()))

                        date       value      unixtime
0   2015-01-01T00:00:00.000Z         NaN  1.420070e+09
1   2015-02-01T00:00:00.000Z         NaN  1.422749e+09
2   2015-03-01T00:00:00.000Z         NaN  1.425168e+09
3   2015-04-01T00:00:00.000Z         NaN  1.427846e+09
4   2015-05-01T00:00:00.000Z         NaN  1.430438e+09
5   2015-06-01T00:00:00.000Z         NaN  1.433117e+09
6   2015-07-01T00:00:00.000Z         NaN  1.435709e+09
7   2015-08-01T00:00:00.000Z         NaN  1.438387e+09
8   2015-09-01T00:00:00.000Z         NaN  1.441066e+09
9   2015-10-01T00:00:00.000Z         NaN  1.443658e+09
10  2015-11-01T00:00:00.000Z         NaN  1.446336e+09
11  2015-12-01T00:00:00.000Z         NaN  1.448928e+09
12  2016-01-01T00:00:00.000Z         NaN  1.451606e+09
13  2016-02-01T00:00:00.000Z         NaN  1.454285e+09
14  2016-03-01T00:00:00.000Z    0.465000  1.456790e+09
15  2016-04-01T00:00:00.000Z    0.200000  1.459469e+09
16  2016-05-01T00:00:00.000Z    0.600000  1.462061e+09
17  2016-0