Source code and communication analysis
HTML JavaScript Python CSS PHP
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.




pygments -
Version 2.0.1
sudo pip install pygments

Git -
installed and available from the commandline

Neo4j verion 2.0.X (tested with 2.04)

Make sure this is uncommented:
in conf/

Django (Requires 1.6 or greater)

sudo pip install django


sudo pip install python-magic


Install this specific version.

sudo python install

sudo pip install django_extensions

sudo pip install werkzeug

Python Clang bindings

Add to /etc/apt/sources.list file:

#for version 3.5 of LLVM/CLANG
deb llvm-toolchain-utopic main
deb-src llvm-toolchain-utopic main

sudo apt-key adv --keyserver --recv-keys 15CF4D18AF4F7421

sudo apt-get update

sudo apt-get install clang-3.5 python-clang-3.5 libclang-3.5-dev

Setup a MySQL user and database (see file)

This sets up the database:
python syncdb

Plain server:
python runserver

Server with werkzeug debugger:
python runserver_plus

In, make sure the following variables are set.

VERSION_CONTROL_REPOS - sourcecode repos, make sure read permissions are set
EXPONENTIAL_DECAY - numerical decimal value between 0 and 1
STATICFILES_DIRS - location where CSS, javascript, and images reside
RECURSIVE_LIMIT - integer value defining how far back in history we go before backing off
NEO4J_DATABASE - Neo4j database location
MBOX_ARCHIVES - this is where the processing step looks for mbox files
SECRET_KEY - make sure to change this value as it's important for security reasons

There are additional settings in this file that are optional that you probably
will want to change as well.

Deployment (Apache2):

Debian/Ubuntu package: libapache2-mod-wsgi

/tmp/ directory setup and write permission.


Make sure you run the scripts (see processing directory) in the following order.

Neo4j should be stopped.  Database is empty.

*Make sure the database graph.db folder does not exist, sometimes this causes errors.*

-s is the hash you want to start processing from.  This script runs clang
and saves all subsequent data about the technical structure to files in /tmp/devknowledge/
These are later processed by the next script.  The reason these are split out is because
there was some sort of weird crashing that happened when trying to use both the python
clang bindings and Neo4j at the same time.  This runs on a single thread.

python -s 0a8c370a6dcd470e1c84191e1a7ac1fb1fc7a78e

-s is the hash you want to start processing from.  Make sure it's the same as the one you
picked above.  This runs on a single thread.

python -s 0a8c370a6dcd470e1c84191e1a7ac1fb1fc7a78e

This saves files and developer names/line knowledge to the database.  This is
multi-threaded and should be run on a fast machine with many cores.


Parses through #includes to find dependencies.  This should probably be switched
over to using Clang.


This parses a mbox file with the same name as the project and saves the resulting
communication information.


Now start the database and it will upgrade from 1.9 to 2.0.X

Lastly, run the processing to calculate congruence.  This may take awhile.


The final results will be here:



django-extensions has a profiling tool built-in
python runprofileserver --kcachegrind --prof-path=/home/username/Desktop/profile-data/
Then use KCacheGrind to open the file.

Or to debug the processing file, use cProfile built-in to Python.

python -m cProfile -o

Then you can use something like runsnakerun to visualize.

sudo aptitude install runsnakerun OR sudo pip install runsnakerun


Python embedded bindings (currently dead link as this is no longer maintained)

sudo aptitude install python-jpype

The Python embedded bindings need to be updated for Neo4j 2.0

Make sure to install this specific version.

sudo python install

Make sure to set the JAVA_HOME variable, I set it via the .bashrc file
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
To check the value: echo $JAVA_HOME

Testing and counts

Count of all nodes
start a=node(*) return count(a);

Count of all relationships
start a=relationship(*) return count(a);

Number of developers
start a=node(*) where has( return count(a);

Number of individual files
start a=node(*) where has(a.filename) return count(a);

Number of individual functions
start a=node(*) where has(a.function) return count(a);

Number of mailing list threads
start a=node(*) where has(a.subject) return count(a);

Number of commit hash objects:
start a=node(*) where has(a.hash) return count(a);

Number of include relationships
start a=node(*) match a-[r:include]->() return count(r);

Number of depends relationships
start a=node(*) match a-[r:depends]->() return count(r);

Number of calls relationships
start a=node(*) match a-[r:calls]->() return count(r);

Number of has relationships
start a=node(*) match a-[r:has]->() return count(r);

Number of knowledge lines used for calculating expertise
start a=node(*) match a-[r:knowledge]->() return count(r);

Number of expertise calculations
start a=node(*) match a-[r:expertise]->() return count(r);

Number of file_score edges
start a=node(*) match a-[r:file_score]->() return count(r);

Number of interpersonal_score edges
start a=node(*) match a-[r:interpersonal_score]->() return count(r);

Number of communication edges
start a=node(*) match a-[r:communication]->() return count(r);

Number of impact edges
start a=node(*) match a-[r:impact]->() return count(r);