From [JarAnalyzer](http://www.kirkk.com/main/Main/JarAnalyzer)

**Number of Classes**  
The number of concrete and abstract classes (and interfaces) in the jar is an indicator of the extensibility of the jar.

**Number of Packages**  
The number of packages in the jar.

**Level**  
The Level represents where in the hierarchy a jar file lives. Level 1 jars are at the bottom. Level 2 depend on at least one Level 1. Level 3 depend on at least one Level 2. The Level of the jar, used in conjunction with Instability, gives an indication of the jar's resilience to change.

**Afferent Couplings**  
The number of other jars that depend upon classes within the jar is an indicator of the jar's responsibility.

**Efferent Couplings**  
The number of other jars that the classes in the jar depend upon is an indicator of the jar's independence.

**Abstractness**  
The ratio of the number of abstract classes (and interfaces) in the analyzed jar to the total number of classes in the analyzed jar.

The range for this metric is 0 to 1, with A=0 indicating a completely concrete jar and A=1 indicating a completely abstract jar.

**Instability**  
The ratio of efferent coupling (Ce) to total coupling (Ce / (Ce + Ca)). This metric is an indicator of the jar's resilience to change.

The range for this metric is 0 to 1, with I=0 indicating a completely stable jar and I=1 indicating a completely instable jar.

**Distance**  
The perpendicular distance of a jar from the idealized line A + I = 1. This metric is an indicator of the jar's balance between abstractness and stability.

A jar squarely on the main sequence is optimally balanced with respect to its abstractness and stability. Ideal jars are either completely abstract and stable (x=0, y=1) or completely concrete and instable (x=1, y=0).

The range for this metric is 0 to 1, with D=0 indicating a jar that is coincident with the main sequence and D=1 indicating a jar that is as far from the main sequence as possible.

**Unresolved Packages**  
Packages not found in any of the jars analyzed. These can be filtered from output by specifying the packages to exlude in the Filter.properties file. Conversely, you can include jars containing these packages in the directory being analyzed.

These packages are excluded from all calculations and adding the jars containing these packages will result in modified metrics.

In [1]:
import sys

# add project root directory to python path to enable import of saapy
if ".." not in sys.path:
    sys.path.append('..')

In [2]:
import keyring
from neo4j.v1 import GraphDatabase, basic_auth
import xml.etree.ElementTree as ET

In [40]:
def jar_properties(jar):
    jar_name = jar.get("name")
    stats = jar.find("./Summary/Statistics")
    metrics = jar.find("./Summary/Metrics")
    class_count = int(stats.find("./ClassCount").text)
    abstract_class_count = int(stats.find("./AbstractClassCount").text)
    package_count = int(stats.find("./PackageCount").text)
    abstractness = float(metrics.find("./Abstractness").text)
    efferent = int(metrics.find("./Efferent").text)
    afferent = int(metrics.find("./Afferent").text)
    instability = float(metrics.find("./Instability").text)
    distance = float(metrics.find("./Distance").text)
    node_props = {
        "name": jar_name,
        "class_count": class_count,
        "abstract_class_count": abstract_class_count,
        "package_count": package_count,
        "abstractness": abstractness,
        "efferent": efferent,
        "afferent": afferent,
        "instability": instability,
        "distance": distance
    }
    return node_props

def merge_jar_node(tx, jar):
    node_props = jar_properties(jar)
    jar_name = node_props["name"]
    query = """
    MERGE (jar:JarFile {
        name: {name},
        class_count: {class_count},
        abstract_class_count: {abstract_class_count},
        package_count: {package_count},
        abstractness: {abstractness},
        efferent: {efferent},
        afferent: {afferent},
        instability: {instability},
        distance: {distance}
    })
    """
    result = tx.run(query, node_props)
    return result

In [44]:
def add_package(tx, jar_name, package_name):
    query = """
    MATCH (jar:JarFile {name: {jar_name}})
    MERGE (package:JavaPackage {
        name: {package_name}
    })
    MERGE (jar)-[r:CONTAINS]->(package)
    """
    result = tx.run(query, {"jar_name": jar_name, "package_name": package_name})
    return result

def add_unresolved_dependency(tx, jar_name, package_name):
    query = """
    MATCH (jar:JarFile {name: {jar_name}})
    MERGE (package:JavaPackage {
        name: {package_name}
    })
    MERGE
    (jar)-[r:DEPENDS_UNRESOLVED]->(package)
    """
    result = tx.run(query, {"jar_name": jar_name, "package_name": package_name})
    return result

def add_packages(tx, jar):
    jar_name = jar.get("name")
    for package in jar.findall("./Summary/Packages/Package"):
        package_name = package.text
        add_package(tx, jar_name, package_name)
    for package in jar.findall("./Summary/UnresolvedDependencies/Package"):
        package_name = package.text
        add_unresolved_dependency(tx, jar_name, package_name)

In [46]:
def merge_jar_dependencies(tx, jar):
    jar_name = jar.get("name")
    out_deps = jar.findall("./Summary/OutgoingDependencies/Jar")
    for out_jar in out_deps:
        out_jar_name = out_jar.text
        query = """
        MATCH (jar:JarFile {name: {jar_name}})
        MATCH (out_jar:JarFile {name: {out_jar_name}})
        MERGE
        (jar)-[r:DEPENDS]->(out_jar)
        """
        tx.run(query, {"jar_name": jar_name, "out_jar_name": out_jar_name})
    in_deps = jar.findall("./Summary/IncomingDependencies/Jar")
    for in_jar in in_deps:
        in_jar_name = in_jar.text
        query = """
        MATCH (jar:JarFile {name: {jar_name}})
        MATCH (in_jar:JarFile {name: {in_jar_name}})
        MERGE
        (jar)-[r:DEPENDED_BY]->(in_jar)
        """
        tx.run(query, {"jar_name": jar_name, "in_jar_name": in_jar_name})

In [47]:
def merge_jar_cycles(tx, jar):
    jar_name = jar.get("name")
    cycles = jar.findall("./Summary/Cycles/Cycle")
    for cycle_jar in cycles:
        cycle_jar_name = cycle_jar.text
        query = """
        MATCH (jar:JarFile {name: {jar_name}})
        MATCH (cycle_jar:JarFile {name: {cycle_jar_name}})
        MERGE
        (jar)-[r:CYCLE]->(cycle_jar)
        """
        tx.run(query, {"jar_name": jar_name, "cycle_jar_name": cycle_jar_name})

In [44]:
neo4j_service = "test_neo4j"
neo4j_user = "neo4j"
neo4j_password = keyring.get_password(neo4j_service, neo4j_user)
neo4j_url = "bolt://localhost"
driver = GraphDatabase.driver(neo4j_url, auth=basic_auth(neo4j_user, neo4j_password))

In [38]:
# produced with JarAnalyzer 1.2
idempiere_jar_deps_xml = "../data/idempiere/idempiere-jar-deps-jaranalyzer.xml"
deps_tree = ET.parse(idempiere_jar_deps_xml)
root = deps_tree.getroot()
jars = root.findall("./Jars/Jar")

In [48]:
neo4j_session = driver.session()
tx = neo4j_session.begin_transaction()
try:
    for jar in jars:
        merge_jar_node(tx, jar)
        add_packages(tx, jar)
        merge_jar_dependencies(tx, jar)
        merge_jar_cycles(tx, jar)
except:
    tx.rollback()
else:
    tx.commit()
finally:
    neo4j_session.close()

In [4]:
from saapy import scitools

In [5]:
udb = scitools.safe_open_understand("../data/idempiere/idempiere-fork1-development.udb")

In [24]:
def store_couples(udb, run_query):
    java_files = udb.ents("Java File")
    for fent in java_files:
        pfent = fent.refs("Define", "Package")[0].ent()
        file_query = """
        MERGE (file:JavaFile {name: {file_name}})
        MERGE (package:JavaPackage {name: {package_name}})
        MERGE (file)-[r:DEFINES]->(package)
        """
        run_query(file_query, {"file_name": fent.relname(), "package_name": pfent.longname()})
        fmetrics = fent.metric(["CountLineCode", "SumCyclomaticStrict"])
        set_file_metrics_query = """
        MATCH (file: JavaFile {name: {file_name}})
        SET file.count_line_code = {count_line_code},
            file.sum_cyclomatic_strict = {sum_cyclomatic_strict}
        """
        run_query(set_file_metrics_query, {
                "file_name": fent.relname(), 
                "count_line_code": fmetrics["CountLineCode"],
                "sum_cyclomatic_strict": fmetrics["SumCyclomaticStrict"]})
        for crel in fent.refs("Define", "Class, Interface"):
            cent = crel.ent()
            class_query = """
            MATCH (file:JavaFile {name: {file_name}})
            MATCH (package:JavaPackage {name: {package_name}})
            MERGE (class:JavaClass {name: {class_name}})
            MERGE (file)-[r:DEFINES]->(class)
            MERGE (package)-[r1:CONTAINS]->(class)
            """
            run_query(class_query, {
                    "file_name": fent.relname(),
                    "package_name": pfent.longname(),
                    "class_name": cent.longname()})
            for cprel in cent.refs("Couple"):
                cpent = cprel.ent()
                pent = cpent.parent()
                while pent.parent() is not None and pent.kindname() != "File":
                    pent = pent.parent()
                package_ent = pent.refs("Define", "Package")[0].ent()
                if package_ent.longname() == "java.lang":
                    continue
                run_query(file_query, {"file_name": pent.relname(), "package_name": package_ent.longname()})
                run_query(class_query, {
                        "file_name": pent.relname(),
                        "package_name": package_ent.longname(),
                        "class_name": cpent.longname()})
                couple_query = """
                MATCH (from_class:JavaClass {name: {from_class_name}})
                MATCH (to_class:JavaClass {name: {to_class_name}})
                MERGE (from_class)-[r:COUPLES]->(to_class)
                """
                run_query(couple_query, {"from_class_name": cent.longname(), "to_class_name": cpent.longname()})

In [25]:
from functools import partial

def run_query(tx, query, args):
    return tx.run(query, args)

class DryRunTx:
    def __init__(self):
        self.runs = []
    def run(self, query, args):
        self.runs.append((query, args))

In [26]:
def run_in_transaction(batch_job, dry=False):
    if dry:
        tx = DryRunTx()
        dry_run = partial(run_query, tx)
        batch_job(dry_run)
        return tx.runs
    else:
        neo4j_session = driver.session()
        tx = neo4j_session.begin_transaction()
        neo_run = partial(run_query, tx)
        try:
            batch_job(neo_run)
        except:
            tx.rollback()
            from traceback import print_exc
            print_exc(file=sys.stdout)
        else:
            tx.commit()
        finally:
            neo4j_session.close()
            return []

In [27]:
batch_job = partial(store_couples, udb)
result = run_in_transaction(batch_job, dry=True)
print(len(result))

190207


In [50]:
import time

def recorded_batch_job(runs, run_query):
    start_time = time.perf_counter()
    for i, run in enumerate(runs):
        run_query(*run)
        if i > 0 and i % 1000 == 0:
            next_time = time.perf_counter()
            print("{0} runs processed in {1} sec".format(i, round(next_time - start_time, 1)))
    end_time = time.perf_counter()
    print("{0} runs completed in {1} sec".format(len(runs), round(end_time - start_time, 1)))

In [53]:
# we need to run transactions in batches < 140000 otherwise the database breaks
chunk_size = 50000
chunk_start = 0
chunk_end = min(chunk_start + chunk_size, len(result))
while chunk_start < len(result):
    chunk = result[chunk_start:chunk_end]
    batch_job = partial(recorded_batch_job, chunk)
    run_in_transaction(batch_job, dry=True)
    chunk_start = chunk_end
    chunk_end = min(chunk_end + chunk_size, len(result))

1000 runs processed in 0.0 sec
2000 runs processed in 0.0 sec
3000 runs processed in 0.0 sec
4000 runs processed in 0.0 sec
5000 runs processed in 0.0 sec
6000 runs processed in 0.0 sec
7000 runs processed in 0.0 sec
8000 runs processed in 0.0 sec
9000 runs processed in 0.0 sec
10000 runs processed in 0.0 sec
11000 runs processed in 0.0 sec
12000 runs processed in 0.0 sec
13000 runs processed in 0.0 sec
14000 runs processed in 0.0 sec
15000 runs processed in 0.0 sec
16000 runs processed in 0.0 sec
17000 runs processed in 0.0 sec
18000 runs processed in 0.0 sec
19000 runs processed in 0.0 sec
20000 runs processed in 0.0 sec
21000 runs processed in 0.0 sec
22000 runs processed in 0.0 sec
23000 runs processed in 0.0 sec
24000 runs processed in 0.0 sec
25000 runs processed in 0.0 sec
26000 runs processed in 0.0 sec
27000 runs processed in 0.0 sec
28000 runs processed in 0.0 sec
29000 runs processed in 0.0 sec
30000 runs processed in 0.0 sec
31000 runs processed in 0.0 sec
32000 runs proces