## How to compute Code Commits

Clone this repository and change directory to where this notebook resides:

```
$ git clone https://github.com/chaoss/wg-gmd
$ cd wg-gmd/examples
```

Install perceval and jupyter in a Python3 environment:

```
$ pip install perceval
$ pip install jupyter
```

Then launch Jupyter from the command line...

```
$ jupyter notebook
```

And in the browser, load this notebook. You are ready to go...

## Retrieving data from the data source

First, run some Perceval code on a repository to produce a file with JSON documents for all its commits, one per line (`git-commits.json`). In this case we will use the Perceval git repository: change it to get data from your preferred repo.

In [98]:
from perceval.backends.core.git import Git

# url for the git repo to analyze (uncomment the line you want to analyze)
#repo_url = 'http://github.com/chaoss/grimoirelab-perceval'
repo_url = 'https://github.com/elastic/elasticsearch-docker'
#repo_url = 'https://github.com/git/git.git'
# directory for letting Perceval clone the git repo (make sure it is empty)
repo_dir = '/tmp/git_repo'

# create a Git object, pointing to repo_url, using repo_dir for cloning
repo = Git(uri=repo_url, gitpath=repo_dir)

number = 0
with open('git-commits.json', 'w') as commits_file:
    # fetch all commits as an iterator, and dump them to a file, one per line
    for commit in repo.fetch():
        json.dump(commit, commits_file)
        commits_file.write('\n')
        number += 1
print("Commits read:", number)

Commits read: 335


Now, let's prepare a dictionary, `commits`, with all commits retrieved,
by reading the file we just produced.
This will be the one we will use in the computing of the metrics, later.

We do it this way, instead of directly producing the dictionary from
the output by Perceval, because that allows you, to easily start from the
dumped file, if you don't want to run Perceval.

In [99]:
commits = {}
with open('git-commits.json') as commits_file:
    for line in commits_file:
        commit = json.loads(line)
        commits[commit['data']['commit']] = commit
print("Total number of commits:", len(commits))

Total number of commits: 335


## Computing

### Naive version

Now, let's compute the metric, the easiest way. Let's read the file we produced from the data source, and just count commits:

In [100]:
code_commits = len(commits)
print("Code Commits (naive):", code_commits)

Code Commits (naive): 335


### Ignoring empty commits

Empty commits are those that touch no file (for example, most merge commits). We can find them by looking at the list of files involved in the commit, and checking that all of them have no 'action' field ('action' is for identifying the action performed on the file, such as modification or creation):

In [101]:
code_commits = 0
for commit in commits.values():
    for file in commit['data']['files']:
        if 'action' in file:
            code_commits += 1
            break
                
print("Code Commits (non-empty):", code_commits)

Code Commits (non-empty): 326


### Only non-merge commits

Now, instead of filtering out empty commits, let's filter those commits that are merge commits. Those involve no real coding, but merging commits in different branches (for example, after a pull request).

In [102]:
code_commits = 0
for commit in commits.values():
    if 'Merge' not in commit['data']:
        code_commits += 1
                
print("Code Commits (non-merge):", code_commits)

Code Commits (non-merge): 324


### Only commits in master

In this case, we will consider only commits in the master branch:

In [103]:
# Find commits in master branch.
# Start by adding head to an empty todo list. Then loop until todo set is empty:
# for each commit in the todo list, add it to the master set, and go backwards
# (finding parents), adding them to the todo set.

todo = set()
for id, commit in commits.items():
    if 'HEAD -> refs/heads/master' in commit['data']['refs']:
        todo.add(id)

master = set()
while len(todo) > 0:
    current = todo.pop()
    master.add(current)
    for parent in commits[current]['data']['parents']:
        if parent not in master:
            todo.add(parent)
    
code_commits = len(master)
    
print("Code Commits (master branch):", code_commits)

Code Commits (master branch): 245


### Only non-empty commits in master

Now, let's consider only those non-empty commits that you can find in the master branch. Run the next snippet after running the previous one, so that master has the right collection of commits.

In [104]:
code_commits = 0
for commit_id in master:
        commit = commits[commit_id]
        for file in commit['data']['files']:
            if 'action' in file:
                code_commits += 1
                break

print("Code Commits (non-empty in master branch):", code_commits)

Code Commits (non-empty in master branch): 237
