In [1]:
import pygal as pg

%load_ext cypher
%config CypherMagic.uri='http://neo4j:neo@localhost:7474/db/data'

# Analysing organizational structures with Software Analytics

## Question

<center>What are unknow spots in the source code, i.e. source code that was never touched by one of the current developers?</center>

## Data Sources

* Java structures of the system scanned by jQAssistant and available in Neo4j
* Git history of the system scanned by jQAssistant and available in Neo4j
* List of current developers (provided manually)

## Heuristics

* The committer can correctly identified by the authors mail address

## Validation

* Tabular overview of all committers
* Tabular overview of all classes, where no one of the current team already worked on

## Implementation

In [2]:
%%cypher
// Duplicate removal (manual post processing)
WITH [
  ["Stephan Pirnbaum", "stephan.pirnbaum@googlemail.com", "Stephan.Pirnbaum@googlemail.com"]
] AS authors
UNWIND authors AS duplicateAuthor
MATCH (author:Author{email: duplicateAuthor[1]}),
      (duplicate:Author{email: duplicateAuthor[2]})
SET author.name = duplicateAuthor[0]      
WITH author, duplicate
MATCH (duplicate)-[:COMMITTED]->(c:Commit)
MERGE (author)-[:COMMITTED]->(c)
DETACH DELETE duplicate
RETURN author.name AS AuthorName, author.email AS AuthorMail, count(DISTINCT duplicate) AS Duplicates

1 nodes deleted.
1 properties set.
2 relationships created.
3 relationship deleted.


AuthorName,AuthorMail,Duplicates
Stephan Pirnbaum,stephan.pirnbaum@googlemail.com,1


In [3]:
%%cypher
//Every :Git:Commit with more than one parent commit is labeled as Merge.
MATCH  (c:Commit)-[:HAS_PARENT]->(p:Commit)
WITH   c, count(p) as parents
WHERE  parents > 1
SET    c:Merge
RETURN count(c) as MergeCommits

1 rows affected.


MergeCommits
0


In [4]:
%%cypher
//Copies the relativePath property of :Git:File nodes to the property fileName that is index and allows faster lookups.
MATCH  (f:Git:File)
SET    f.fileName = f.relativePath
RETURN count(f) as Files

70 properties set.


Files
70


In [5]:
%%cypher
//A HAS_SOURCE relationship is created between a :Java:Type and a :Git:File if their source file names match.
MATCH  (p:Java:Package)-[:CONTAINS]->(t:Java:Type)
WITH   t, p.fileName + "/" + t.sourceFileName as sourceFileName // e.g. "/org/junit/Test.java"
MATCH  (f:Git:File)
WHERE  f.fileName ends with sourceFileName
MERGE  (t)-[h:HAS_SOURCE]->(f)
RETURN count(h) as Matches

22 relationships created.


Matches
22


In [6]:
# Get all committers
committers = %cypher \
    MATCH  (author:Author) \
    RETURN author.name AS Name, author.email AS EMail

1 rows affected.


In [7]:
# Get all types changed by authors of the current team
unknownTypes = %cypher \
    WITH     ["stephan.pirnbaum@googlemail.com"] AS currentAuthors \
    MATCH    (c:Commit)-[:CONTAINS_CHANGE]->(:Change)-[]->(f:Git:File), \
             (f)<-[:HAS_SOURCE]-(t:Type:Java), \
             (a:Author)-[:COMMITTED]->(c) \
    WHERE    NOT c:Merge \
    WITH     t, collect(DISTINCT a.email) AS authors, currentAuthors \
    WHERE    none(a IN currentAuthors WHERE a in authors) \
    RETURN   t.fqn AS Type

0 rows affected.


## Results

Following is a list of all commiters.

In [8]:
committers

Name,EMail
Stephan Pirnbaum,stephan.pirnbaum@googlemail.com


Following is a list of classes that was never changed by any of he current developers.

In [9]:
unknownTypes

Type


## Next Steps

* It was found that there are no locations that are unknown by the current developers
  * This needs to be monitored whenever someone is about to leave so that proper handover sessions can be planned