Skip to content

Commit

Permalink
CloneDetectionRoutines: Relative diff for matching
Browse files Browse the repository at this point in the history
This increases precision for the code clone detection. (That's tested,
it has an effect.) Since we want to optimize the overall cost which is
also dependent on the max_sum, we need to take the maxabs into account
when building the cost matrix.
  • Loading branch information
sils committed Jun 24, 2015
1 parent 62f83b2 commit 2cfa809
Showing 1 changed file with 14 additions and 5 deletions.
19 changes: 14 additions & 5 deletions bears/codeclone_detection/CloneDetectionRoutines.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,12 @@ def pad_count_vectors(cm1, cm2):
return cm1, cm2


def relative_difference(difference, maxabs):
if maxabs == 0:
return 1
return difference/maxabs


def compare_functions(cm1, cm2):
"""
Compares the functions represented by the given count matrices.
Expand All @@ -105,21 +111,24 @@ def compare_functions(cm1, cm2):
# side (columns). The fields in the matrix are the weighted nodes
# connecting each element from one side to the other.
diff_table = [(cv1,
[(cv2, cv1.difference(cv2)) for cv2 in cm2.values()])
[(cv2, cv1.difference(cv2), cv1.maxabs(cv2))
for cv2 in cm2.values()])
for cv1 in cm1.values()]
cost_matrix = [[difference
for cv2, difference in lst]

cost_matrix = [[relative_difference(difference, maxabs)
for cv2, difference, maxabs in lst]
for cv1, lst in diff_table]

# The munkres algorithm will calculate a matching such that the sum of
# the taken fields is minimal. It thus will associate each variable
# from one function to one on the other function.
matching = munkres.compute(cost_matrix)

diff_sum = sum(cost_matrix[x][y] for x, y in matching)
diff_sum = sum(diff_table[x][1][y][1]
for x, y in matching)
# For each match we get the maximum of the absolute value of the count
# vectors. Summed up with this we can normalize the whole thing.
max_sum = sum(diff_table[x][0].maxabs(diff_table[x][1][y][0])
max_sum = sum(diff_table[x][1][y][2]
for x, y in matching)

if diff_sum == 0:
Expand Down

1 comment on commit 2cfa809

@fneu
Copy link
Contributor

@fneu fneu commented on 2cfa809 Jun 24, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Please sign in to comment.