# Structural and logical change coupling

Instructions: 

Depending on if run the notebook locally on in a cloud drive: 
* Replace the proj_name and proj_data_folder in the Configuration section
* Replace the cloud drive folder in the Configuration section 

## Configuration

In [17]:
proj_name = 'glucosio-android' # 'PX4-Autopilot' #'PROJ_NAME'
proj_data_folder = '../project_results/' + proj_name + '/'

### If you run this notebook in google colaboratory, configure this block.
You will have to copy the generated database, the folders "notebooks" and "analytics". 

In [2]:
from google.colab import drive
import os

GDRIVE_FOLDER = 'callgraphCA/glucosioExample'  # cloud dri
 
drive.mount('/gdrive')
# the project's folder
drive_folder = '/gdrive/My Drive/' + GDRIVE_FOLDER
os.chdir(drive_folder)

Mounted at /gdrive


## Imports

In [3]:
!pip install apyori
# https://github.com/ymoch/apyori
# https://medium.com/linkit-intecs/apriori-algorithm-in-data-mining-part-2-590d58e0998b

!pip install python-stopwatch

Collecting apyori
  Using cached apyori-1.1.2.tar.gz (8.6 kB)
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py): started
  Building wheel for apyori (setup.py): finished with status 'done'
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=f3feac3d1af95f2c3b3e8c7401f3555d012a09827af64f74336367afd4261a8e
  Stored in directory: c:\users\lopm\appdata\local\pip\cache\wheels\32\2a\54\10c595515f385f3726642b10c60bf788029e8f3a1323e3913a
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2
Collecting python-stopwatch
  Using cached python_stopwatch-1.0.4-py3-none-any.whl (6.5 kB)
Collecting termcolor
  Using cached termcolor-1.1.0.tar.gz (3.9 kB)
Building wheels for collected packages: termcolor
  Building wheel for termcolor (setup.py): started
  Building wheel for termcolor (setup.py): finished with status 'done'
  Created wheel for termcolor: filename=termcolor-1.1.0-py3-none-a

In [11]:
import sys
import os
import pandas as pd
import sqlite3
from pathlib import Path

#from apyori import apriori
import apyori
#from stopwatch import Stopwatch, profile
# works with lists, not pandas, no nan values, apostrophe between values of transaction

# min_support -- The minimum support of relations (float).
# min_confidence -- The minimum confidence of relations (float).
# min_lift -- The minimum lift of relations (float).
# max_length -- The maximum length of the relation (integer).


In [9]:
# CCSD libraries
analytics_folder_path = str(Path.cwd().parents[0] / "analytics")
sys.path.append(analytics_folder_path)

from coupling_association_rules_utils import *

In [34]:
# Reolads
import importlib
import coupling_association_rules_utils

importlib.reload(coupling_association_rules_utils)
from coupling_association_rules_utils import *

## Database connections

In [18]:
ANALYTICS_DB_PATH =  proj_data_folder + proj_name + '_analytics.db'
print(ANALYTICS_DB_PATH)
print(os.path.isfile(ANALYTICS_DB_PATH))
con_analytics_db = sqlite3.connect(ANALYTICS_DB_PATH)

../project_results/glucosio-android/glucosio-android_analytics.db
True


# Change coupling and structural dependency rates

In [19]:
## On commit and file level - distinct
sql_statement = """select 
--commit_hash,
GROUP_CONCAT(distinct("'" || file_name|| "'") )  as files_in_hash
from file_commit
group by commit_hash;"""

records, pruned_records, df = get_records(con_analytics_db, 'files_in_hash', sql_statement, 2)

df len:  7
records len:  7
pruned_records len:  3


In [20]:
# for applying the apyori.apriori algorithm the records list must have a specific format
records[0:2]

[["'GlucosioApplication.java'",
  "'AddA1CActivity.java'",
  "'AddCholesterolActivity.java'",
  "'AddKetoneActivity.java'",
  "'AddPressureActivity.java'",
  "'AddWeightActivity.java'",
  "'OverviewFragment.java'",
  "'OverviewPresenter.java'",
  "'HelloActivityTest.java'"],
 ["'MainActivity.java'"]]

We can observe the differences between applying association rule mining to the whole itemset records or just the ones who are pruned with a minimum number of items (default >2)

In [21]:
rules_list, itemsets_list = calculate_structural_coupling_rates(con_analytics_db, records, min_confidence=0.1, min_support=0.1)

Nr rules 669, with structural coupling 615, 0.92


In [22]:
rules_list, itemsets_list = calculate_structural_coupling_rates(con_analytics_db, pruned_records, min_confidence=0.1, min_support=0.1)

Nr rules 668, with structural coupling 615, 0.92


The funciton *calculate_structural_coupling_rates* retunrs two lists, the itemsets_list contains itemsets > 2, the rules_list contains all found association rules (including the ones for one item because they show the support of the item in the whole transactions set). 

In [23]:
for r in itemsets_list:
  print(r)

['A1cCalculatorActivity.java', 'AddGlucoseActivity.java']
['GlucosioConverter.java', 'A1cCalculatorActivity.java']
['HistoryAdapter.java', 'A1cCalculatorActivity.java']
['A1cCalculatorActivity.java', 'OverviewPresenter.java']
['ReadingTools.java', 'A1cCalculatorActivity.java']
['AddCholesterolActivity.java', 'AddA1CActivity.java']
['AddA1CActivity.java', 'AddGlucoseActivity.java']
['AddKetoneActivity.java', 'AddA1CActivity.java']
['AddA1CActivity.java', 'AddPressureActivity.java']
['AddReadingActivity.java', 'AddA1CActivity.java']
['AddWeightActivity.java', 'AddA1CActivity.java']
['AddA1CActivity.java', 'GlucosioApplication.java']
['HelloActivityTest.java', 'AddA1CActivity.java']
['AddA1CActivity.java', 'OverviewFragment.java']
['AddA1CActivity.java', 'OverviewPresenter.java']
['AddCholesterolActivity.java', 'AddGlucoseActivity.java']
['AddCholesterolActivity.java', 'AddKetoneActivity.java']
['AddCholesterolActivity.java', 'AddPressureActivity.java']
['AddCholesterolActivity.java', 'Ad

In [24]:
for r in rules_list:
  print(r)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)




RelationRecord(items=frozenset({"'AddCholesterolActivity.java'", "'AddKetoneActivity.java'", "'GlucosioApplication.java'", "'OverviewPresenter.java'", "'OverviewFragment.java'", "'AddWeightActivity.java'", "'HelloActivityTest.java'", "'AddPressureActivity.java'", "'AddA1CActivity.java'"}), support=0.3333333333333333, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({"'AddCholesterolActivity.java'", "'AddKetoneActivity.java'", "'GlucosioApplication.java'", "'OverviewPresenter.java'", "'OverviewFragment.java'", "'AddWeightActivity.java'", "'HelloActivityTest.java'", "'AddPressureActivity.java'", "'AddA1CActivity.java'"}), confidence=0.3333333333333333, lift=1.0), OrderedStatistic(items_base=frozenset({"'AddA1CActivity.java'"}), items_add=frozenset({"'AddCholesterolActivity.java'", "'AddKetoneActivity.java'", "'GlucosioApplication.java'", "'OverviewPresenter.java'", "'OverviewFragment.java'", "'AddWeightActivity.java'", "'HelloActivityTest.java'", "'AddPre

We can observe the influence of setting different values of confidence and support.

In [25]:
rules_list, itemsets_list = calculate_structural_coupling_rates(con_analytics_db, pruned_records, min_confidence=0.1, min_support=0.1)

Nr rules 668, with structural coupling 615, 0.92


In [26]:
rules_list, itemsets_list = calculate_structural_coupling_rates(con_analytics_db, pruned_records, min_confidence=0.15, min_support=0.15)

Nr rules 668, with structural coupling 615, 0.92


In [27]:
rules_list, itemsets_list = calculate_structural_coupling_rates(con_analytics_db, pruned_records, min_confidence=0.05, min_support=0.1)

Nr rules 668, with structural coupling 615, 0.92


In [28]:
rules_list, itemsets_list = calculate_structural_coupling_rates(con_analytics_db, pruned_records, min_confidence=0.05, min_support=0.05)

Nr rules 668, with structural coupling 615, 0.92


In [40]:
rules_list, itemsets_list = calculate_structural_coupling_rates(con_analytics_db, pruned_records, min_confidence=0.6, min_support=0.6)

Nr rules 33, with structural coupling 26, 0.79


# Display transactions with given itemsets

In [41]:
l_elem = ['OverviewFragment.java', 'DatabaseHandler.java']
show_transactions_containing_items(df, 'files_in_hash',l_elem, print_elems=False)

Element count. Df len 7. 1ind: 1, 2dep: 0, 2ind: 0


In [42]:
l_elem = ['AddCholesterolActivity.java', 'AddA1CActivity.java']
show_transactions_containing_items(df,'files_in_hash', l_elem, print_elems=False)

Element count. Df len 7. 1ind: 2, 2dep: 2, 2ind: 2


In [43]:
l_elems = ["'AddKetoneActivity.java'", "'AddWeightActivity.java'", "'AddGlucoseActivity.java'", "'AddPressureActivity.java'"]
show_transactions_containing_items(df, 'files_in_hash', l_elems, print_elems=False)

Element count. Df len 7. 1ind: 2, 2dep: 2, 3dep: 1, 4dep: 1,
    2ind: 2, 3ind: 2, 4ind: 2


## On week

In [44]:
# apyori.apriori needs apostrophes around each of the values of the transaction
sql_statement = """select
strftime('%Y', date(commit_commiter_datetime)) as iso_yr,
(strftime('%j', date(commit_commiter_datetime, '-3 days', 'weekday 4')) - 1) / 7 + 1 as iso_week,
GROUP_CONCAT("'" || file_name|| "'") as files_in_week
--GROUP_CONCAT(distinct("'" || file_name|| "'") ) as files_in_week
from file_commit
group by strftime('%Y', date(commit_commiter_datetime)),
(strftime('%j', date(commit_commiter_datetime, '-3 days', 'weekday 4')) - 1) / 7 + 1;"""


records, pruned_records, df = get_records(con_analytics_db, 'files_in_week', sql_statement, 2)

df len:  5
records len:  5
pruned_records len:  2


In [45]:
rules_list, itemsets_list = calculate_structural_coupling_rates(con_analytics_db, records, min_confidence=0.1, min_support=0.1)

Nr rules 4155, with structural coupling 4091, 0.98


In [50]:
rules_list, itemsets_list = calculate_structural_coupling_rates(con_analytics_db, pruned_records, min_confidence=0.6, min_support=0.5)

Nr rules 4141, with structural coupling 4091, 0.99
