# Analysing the connection between a commit and the numbers of issues introduced by it and duplicated issues
In the version 1 data, each issue is defined by the ID of the commit it got introduced in and (if solved) which commit fixed the issue. For version 2, an analysis key is given to each issue that links it to the commit over another table. It is to be investigated whether the two data versions both display a link to the introducing commit by investigating how the metrics of the analysis link to the issues. It is expected that there can be one or more issues for each analysis. <br>
Another investigation is necessary to understand whether known code smells are repeatedly found and noted for each commit or whether already identified errors are listed again for each analysis or not. The expectation is that known issues aren't repeatedly listed if already identified.

In [1]:
import pandas as pd
import numpy as np
import os

In [2]:
# data import
current_dir = os.getcwd()

# construct path to the project data folder
data_dir = os.path.join(current_dir, '..', '..', 'Data','Sonar_Issues')

# load SonarQube measure data
df1 = pd.read_csv(os.path.join(data_dir, 'SONAR_ISSUES_v1.csv'))
df2 = pd.read_csv(os.path.join(data_dir, 'SONAR_ISSUES_v2.csv'))

data_dir2 = os.path.join(current_dir, '..', '..', 'Data','Sonar_Measures')
v1_metr = pd.read_csv(os.path.join(data_dir2, 'SONAR_MEASURES_V1.csv'), low_memory = False)
v2_metr = pd.read_csv(os.path.join(data_dir2, 'SONAR_MEASURES_V2.csv'), low_memory = False)

## Connection between a commit and the numbers of issues
To better understand the connection between a commit and the linked issues, a small project is chosen and the counts of issues per commit are counted in both dataset versions.

In [3]:
df1.groupby('projectID').count().sort_values(by = "creationDate")

Unnamed: 0_level_0,creationDate,closeDate,creationCommitHash,closeCommitHash,type,squid,component,severity,project,startLine,endLine,resolution,status,message,effort,debt,author
projectID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
commons-daemon,393,91,393,91,393,393,393,393,393,375,375,91,393,393,393,393,393
commons-dbutils,644,303,644,303,644,644,644,644,644,570,570,303,644,644,644,626,644
commons-exec,762,448,762,448,762,762,762,762,762,702,702,448,762,762,762,738,758
commons-fileupload,769,482,769,482,769,769,769,769,769,695,695,482,769,769,769,717,769
commons-codec,2041,903,2041,903,2041,2041,2041,2041,2041,1972,1972,903,2041,2041,2041,1985,2041
commons-validator,2050,1165,2050,1165,2050,2050,2050,2050,2050,1897,1897,1165,2050,2050,2050,1979,2050
commons-dbcp,3696,2686,3696,2681,3696,3696,3696,3696,3696,3567,3567,2686,3696,3696,3696,3657,3695
commons-vfs,3719,2387,3719,2387,3719,3719,3719,3719,3719,3514,3514,2387,3719,3719,3719,3539,3719
commons-ognl,4945,3885,4945,3885,4945,4945,4945,4945,4945,4746,4746,3885,4945,4945,4945,4913,4945
commons-digester,4947,4947,4947,3756,4947,4947,4947,4947,4947,4529,4529,4947,4947,4947,4947,4922,4947


Based on the overview, commons-daemon is chosen as the project to investigate the connection between issues and metrics.

In [4]:
df1[df1["projectID"] == "commons-daemon"]

Unnamed: 0,projectID,creationDate,closeDate,creationCommitHash,closeCommitHash,type,squid,component,severity,project,startLine,endLine,resolution,status,message,effort,debt,author
0,commons-daemon,2003-09-04T23:28:19Z,,d3416d3a25b16da3d18b3849522fa96183918e5b,,CODE_SMELL,squid:S00112,org.apache:daemon:src/main/java/org/apache/com...,MAJOR,org.apache:daemon,71.0,71.0,,OPEN,Define and throw a dedicated exception instead...,20,20min,yoavs@apache.org
1,commons-daemon,2003-09-04T23:28:19Z,2010-03-15T08:09:26Z,d3416d3a25b16da3d18b3849522fa96183918e5b,6cbc872eb202dfc27f2eb59b02d953c3deca32c8,CODE_SMELL,squid:S00122,org.apache:deamon:src/java/org/apache/commons/...,MINOR,org.apache:daemon,265.0,265.0,FIXED,CLOSED,"At most one statement is allowed per line, but...",1,1min,yoavs@apache.org
2,commons-daemon,2003-09-04T23:28:19Z,2010-03-15T08:09:26Z,d3416d3a25b16da3d18b3849522fa96183918e5b,6cbc872eb202dfc27f2eb59b02d953c3deca32c8,CODE_SMELL,squid:S00122,org.apache:deamon:src/java/org/apache/commons/...,MINOR,org.apache:daemon,259.0,259.0,FIXED,CLOSED,"At most one statement is allowed per line, but...",1,1min,yoavs@apache.org
3,commons-daemon,2003-09-04T23:28:19Z,2010-03-15T08:09:26Z,d3416d3a25b16da3d18b3849522fa96183918e5b,6cbc872eb202dfc27f2eb59b02d953c3deca32c8,CODE_SMELL,squid:S00122,org.apache:deamon:src/java/org/apache/commons/...,MINOR,org.apache:daemon,261.0,261.0,FIXED,CLOSED,"At most one statement is allowed per line, but...",1,1min,yoavs@apache.org
4,commons-daemon,2003-09-04T23:28:19Z,2010-03-15T08:09:26Z,d3416d3a25b16da3d18b3849522fa96183918e5b,6cbc872eb202dfc27f2eb59b02d953c3deca32c8,CODE_SMELL,squid:S00122,org.apache:deamon:src/java/org/apache/commons/...,MINOR,org.apache:daemon,278.0,278.0,FIXED,CLOSED,"At most one statement is allowed per line, but...",1,1min,yoavs@apache.org
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
388,commons-daemon,2011-06-02T16:29:44Z,,d352d2cc3a2da86da4573d200d471452a327b3bb,,CODE_SMELL,squid:S106,org.apache:daemon:src/main/java/org/apache/com...,MAJOR,org.apache:daemon,110.0,110.0,,OPEN,Replace this use of System.out or System.err b...,10,10min,mturk@apache.org
389,commons-daemon,2011-06-02T16:29:44Z,,d352d2cc3a2da86da4573d200d471452a327b3bb,,CODE_SMELL,squid:S106,org.apache:daemon:src/test/java/org/apache/com...,MAJOR,org.apache:daemon,162.0,162.0,,OPEN,Replace this use of System.out or System.err b...,10,10min,mturk@apache.org
390,commons-daemon,2011-06-02T16:29:44Z,,d352d2cc3a2da86da4573d200d471452a327b3bb,,CODE_SMELL,squid:S106,org.apache:daemon:src/test/java/org/apache/com...,MAJOR,org.apache:daemon,164.0,164.0,,OPEN,Replace this use of System.out or System.err b...,10,10min,mturk@apache.org
391,commons-daemon,2011-08-01T20:31:34Z,,5a5aed907aae4ac8aa544a4ee010ccfd92747105,,CODE_SMELL,squid:S1166,org.apache:daemon:src/main/java/org/apache/com...,CRITICAL,org.apache:daemon,104.0,104.0,,OPEN,Either log or rethrow this exception.,10,10min,mturk@apache.org


In [5]:
v1_metr[v1_metr["projectID"] == "commons-daemon"]

Unnamed: 0,commitHash,projectID,SQAnalysisDate,classes,files,functions,commentLines,commentLinesDensity,complexity,fileComplexity,...,qualityGateDetails,qualityProfiles,newSqaleDebtRatio,vulnerabilities,reliabilityRemediationEffort,reliabilityRating,securityRemediationEffort,securityRating,wontFixIssues,packageDependencyCycles
48573,49cbb142a2b5d7d89aab077dc63f7646828c9408,commons-daemon,2003-09-04T23:28:20Z,0,0,0,0,0.0,0,0.0,...,"{""level"":""OK"",""conditions"":[{""metric"":""blocker...",0,0.000000,0,0,1,0,1,0,0
48574,5e90dbea078fca205d913efc8e61ba278c5f39d8,commons-daemon,2003-09-04T23:42:57Z,15,11,87,274,23.0,153,13.9,...,"{""level"":""ERROR"",""conditions"":[{""metric"":""bloc...","[{""key"":""java-sonar-way-04122"",""language"":""jav...",0.000000,0,85,4,0,1,0,0
48575,6c0eafee28fc5c8ab69215df31dc4f07c5579a34,commons-daemon,2003-09-05T08:50:36Z,15,11,87,274,23.0,153,13.9,...,"{""level"":""ERROR"",""conditions"":[{""metric"":""bloc...","[{""key"":""java-sonar-way-04122"",""language"":""jav...",0.000000,0,85,4,0,1,0,0
48576,7b73ce30f32318b99056fee53397c08063d6f661,commons-daemon,2003-09-12T09:05:57Z,15,11,87,274,23.0,153,13.9,...,"{""level"":""ERROR"",""conditions"":[{""metric"":""bloc...","[{""key"":""java-sonar-way-04122"",""language"":""jav...",0.000000,0,85,4,0,1,0,0
48577,7c9d9cde24a00cde7e584136355ce5e048e11e5e,commons-daemon,2003-09-12T09:08:51Z,15,11,87,274,23.0,153,13.9,...,"{""level"":""ERROR"",""conditions"":[{""metric"":""bloc...","[{""key"":""java-sonar-way-04122"",""language"":""jav...",0.000000,0,85,4,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49548,4962bb2786a7ef6ff9d1b4913b14c1a0c24b800f,commons-daemon,2017-12-08T20:55:20Z,24,39,169,993,17.7,328,15.6,...,"{""level"":""ERROR"",""conditions"":[{""metric"":""bloc...","[{""key"":""css-sonar-way-41536"",""language"":""css""...",6.128205,0,125,4,0,1,0,0
49549,75111d90afe15b1222449b401ca94dbe13d4e667,commons-daemon,2018-01-07T20:41:45Z,24,39,169,993,17.7,328,15.6,...,"{""level"":""ERROR"",""conditions"":[{""metric"":""bloc...","[{""key"":""css-sonar-way-41536"",""language"":""css""...",6.081425,0,125,4,0,1,0,0
49550,a3ff183a3d78f2264baebf716b46f01c71e0d6b8,commons-daemon,2018-04-08T17:04:51Z,24,39,169,993,17.7,328,15.6,...,"{""level"":""ERROR"",""conditions"":[{""metric"":""bloc...","[{""key"":""css-sonar-way-41536"",""language"":""css""...",6.081425,0,125,4,0,1,0,0
49551,e87cfbabd73f8c6a0895731799732294c26511bb,commons-daemon,2018-04-08T17:15:45Z,24,39,169,993,17.7,328,15.6,...,"{""level"":""ERROR"",""conditions"":[{""metric"":""bloc...","[{""key"":""css-sonar-way-41536"",""language"":""css""...",5.945274,0,125,4,0,1,0,0


In [6]:
df2[df2["PROJECT_ID"] == "org.apache:daemon"]

Unnamed: 0,PROJECT_ID,CREATION_ANALYSIS_KEY,ISSUE_KEY,TYPE,RULE,SEVERITY,STATUS,RESOLUTION,EFFORT,DEBT,...,MESSAGE,COMPONENT,START_LINE,END_LINE,START_OFFSET,END_OFFSET,HASH,FROM_HOTSPOT,NOT_FOUND,CLOSE_ANALYSIS_KEY
317524,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYIM2JIufLPH4zmf5,CODE_SMELL,squid:S00112,MAJOR,OPEN,,20.0,20.0,...,Define and throw a dedicated exception instead...,org.apache:daemon:src/main/java/org/apache/com...,71.0,71.0,36.0,45.0,,,0,
317525,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYIM2JIufLPH4zmf8,CODE_SMELL,squid:RedundantThrowsDeclarationCheck,MINOR,CLOSED,FIXED,5.0,5.0,...,Remove the declaration of thrown exception 'ja...,org.apache:deamon:src/java/org/apache/commons/...,32.0,32.0,11.0,32.0,,,0,AV3d8DdiJIufLPH4znah
317526,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYIM2JIufLPH4zmf9,CODE_SMELL,squid:RedundantThrowsDeclarationCheck,MINOR,CLOSED,FIXED,5.0,5.0,...,Remove the declaration of thrown exception 'ja...,org.apache:deamon:src/java/org/apache/commons/...,38.0,38.0,11.0,32.0,,,0,AV3d8DdiJIufLPH4znah
317527,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYIM2JIufLPH4zmf-,CODE_SMELL,squid:RedundantThrowsDeclarationCheck,MINOR,CLOSED,FIXED,5.0,5.0,...,Remove the declaration of thrown exception 'ja...,org.apache:deamon:src/java/org/apache/commons/...,44.0,44.0,11.0,32.0,,,0,AV3d8DdiJIufLPH4znah
317528,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYINFJIufLPH4zmf_,CODE_SMELL,squid:RedundantThrowsDeclarationCheck,MINOR,CLOSED,FIXED,5.0,5.0,...,Remove the declaration of thrown exception 'ja...,org.apache:deamon:src/java/org/apache/commons/...,50.0,50.0,11.0,32.0,,,0,AV3d8DdiJIufLPH4znah
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
318315,org.apache:daemon,AWMBlR-dB1UEtThguyc1,AWMBlSKQB1UEtThguydo,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,...,Missing newline after last element,org.apache:daemon:src/site/xdoc/procrun.xml,315.0,315.0,0.0,76.0,,,0,
318316,org.apache:daemon,AWMBmmI_B1UEtThguyfc,AWMBmmYpB1UEtThguyfd,CODE_SMELL,xml:IndentCheck,MINOR,OPEN,,1.0,1.0,...,Make this line start at column 9.,org.apache:daemon:src/native/unix/man/jsvc.1.xml,50.0,50.0,0.0,84.0,,,0,
318317,org.apache:daemon,AWMBmmI_B1UEtThguyfc,AWMBmmYsB1UEtThguyfe,CODE_SMELL,xml:IndentCheck,MINOR,OPEN,,1.0,1.0,...,Make this line start at column 11.,org.apache:daemon:src/native/unix/man/jsvc.1.xml,151.0,151.0,0.0,65.0,,,0,
318318,org.apache:daemon,AWMBmmI_B1UEtThguyfc,AWMBmmYsB1UEtThguyff,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,...,Missing newline after last element,org.apache:daemon:src/native/unix/man/jsvc.1.xml,50.0,50.0,0.0,84.0,,,0,


In [7]:
v2_metr[v2_metr["PROJECT_ID"] == "org.apache:daemon"]

Unnamed: 0,PROJECT_ID,ANALYSIS_KEY,COMPLEXITY,FILE_COMPLEXITY,COMPLEXITY_IN_CLASSES,CLASS_COMPLEXITY,COMPLEXITY_IN_FUNCTIONS,FUNCTION_COMPLEXITY,CLASS_COMPLEXITY_DISTRIBUTION,FUNCTION_COMPLEXITY_DISTRIBUTION,...,sg_i.JAVA_CYCLIC_PACKAGES_PERCENT,sg_i.MAX_MODULE_NCCD,sg_i.ARCHITECTURE_FEATURE_AVAILABLE,sg_i.NUMBER_OF_ISSUES,sg_i.NUMBER_OF_CRITICAL_ISSUES_WITHOUT_RESOLUTION,sg_i.VIOLATING_COMPONENTS_PERCENT,sg_i.UNASSIGNED_COMPONENTS_PERCENT,sg_i.NUMBER_OF_THRESHOLD_VIOLATIONS,sg_i.NUMBER_OF_WORKSPACE_WARNINGS,sg_i.NUMBER_OF_IGNORED_CRITICAL_ISSUES
35963,org.apache:daemon,AWZ-LXaXaNRaZ0AgbEtg,328.0,15.6,299.0,12.5,326.0,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,,,,,,,,,,
35964,org.apache:daemon,AWZ-LCWiaNRaZ0AgbEs4,328.0,15.6,,12.5,,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,,,,,,,,,,
35965,org.apache:daemon,AWRHygpN5esBcMsFz0-R,328.0,15.6,,12.5,,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,,,,,,,,,,
35966,org.apache:daemon,AWMBs0KTB1UEtThguylM,328.0,15.6,,12.5,,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,,,,,,,,,,
35967,org.apache:daemon,AWMBszm1B1UEtThguylH,328.0,15.6,,12.5,,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
36939,org.apache:daemon,AV3dYalIJIufLPH4zmjc,153.0,13.9,,10.2,,1.8,,"1=50,2=13,4=6,6=0,8=3,10=2,12=0",...,,,,,,,,,,
36940,org.apache:daemon,AV3dYV7wJIufLPH4zmjR,153.0,13.9,,10.2,,1.8,,"1=50,2=13,4=6,6=0,8=3,10=2,12=0",...,,,,,,,,,,
36941,org.apache:daemon,AV3dYQ3SJIufLPH4zmjM,153.0,13.9,,10.2,,1.8,,"1=50,2=13,4=6,6=0,8=3,10=2,12=0",...,,,,,,,,,,
36942,org.apache:daemon,AV3dYMyEJIufLPH4zmjH,153.0,13.9,,10.2,,1.8,,"1=50,2=13,4=6,6=0,8=3,10=2,12=0",...,,,,,,,,,,


In [8]:
merged_df = pd.merge(v2_metr[v2_metr["PROJECT_ID"] == "org.apache:daemon"], df2[df2["PROJECT_ID"] == "org.apache:daemon"], left_on='ANALYSIS_KEY', right_on='CREATION_ANALYSIS_KEY', how='inner')
merged_df

Unnamed: 0,PROJECT_ID_x,ANALYSIS_KEY,COMPLEXITY,FILE_COMPLEXITY,COMPLEXITY_IN_CLASSES,CLASS_COMPLEXITY,COMPLEXITY_IN_FUNCTIONS,FUNCTION_COMPLEXITY,CLASS_COMPLEXITY_DISTRIBUTION,FUNCTION_COMPLEXITY_DISTRIBUTION,...,MESSAGE,COMPONENT,START_LINE,END_LINE,START_OFFSET,END_OFFSET,HASH,FROM_HOTSPOT,NOT_FOUND,CLOSE_ANALYSIS_KEY
0,org.apache:daemon,AWMBmmI_B1UEtThguyfc,328.0,15.6,,12.5,,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,Make this line start at column 9.,org.apache:daemon:src/native/unix/man/jsvc.1.xml,50.0,50.0,0.0,84.0,,,0,
1,org.apache:daemon,AWMBmmI_B1UEtThguyfc,328.0,15.6,,12.5,,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,Make this line start at column 11.,org.apache:daemon:src/native/unix/man/jsvc.1.xml,151.0,151.0,0.0,65.0,,,0,
2,org.apache:daemon,AWMBmmI_B1UEtThguyfc,328.0,15.6,,12.5,,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,Missing newline after last element,org.apache:daemon:src/native/unix/man/jsvc.1.xml,50.0,50.0,0.0,84.0,,,0,
3,org.apache:daemon,AWMBmmI_B1UEtThguyfc,328.0,15.6,,12.5,,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,Start every element on a separate line.,org.apache:daemon:src/native/unix/man/jsvc.1.xml,151.0,151.0,0.0,65.0,,,0,
4,org.apache:daemon,AWMBlR-dB1UEtThguyc1,328.0,15.6,,12.5,,1.9,,"1=80,2=35,4=10,6=1,8=4,10=0,12=4",...,Make this line start at column 3.,org.apache:daemon:src/site/xdoc/procrun.xml,20.0,20.0,0.0,13.0,,,0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
578,org.apache:daemon,AV3dZ4SdJIufLPH4zmlm,164.0,13.7,,10.3,,1.8,,"1=52,2=16,4=6,6=0,8=2,10=3,12=0",...,Remove this unused import 'java.util.Date'.,org.apache:daemon:src/samples/AloneService.java,23.0,23.0,0.0,22.0,,,0,AV3eA5a4JIufLPH4znj7
579,org.apache:daemon,AV3dZ4SdJIufLPH4zmlm,164.0,13.7,,10.3,,1.8,,"1=52,2=16,4=6,6=0,8=2,10=3,12=0",...,Remove this unused import 'java.text.SimpleDat...,org.apache:daemon:src/samples/AloneService.java,22.0,22.0,0.0,34.0,,,0,AV3eA5a4JIufLPH4znj7
580,org.apache:daemon,AV3dZ4SdJIufLPH4zmlm,164.0,13.7,,10.3,,1.8,,"1=52,2=16,4=6,6=0,8=2,10=3,12=0",...,Do not override the Object.finalize() method.,org.apache:daemon:src/samples/AloneService.java,35.0,35.0,19.0,27.0,,,0,
581,org.apache:daemon,AV3dZ4SdJIufLPH4zmlm,164.0,13.7,,10.3,,1.8,,"1=52,2=16,4=6,6=0,8=2,10=3,12=0",...,Refactor this method to throw at most one chec...,org.apache:daemon:src/samples/AloneService.java,101.0,101.0,16.0,20.0,,,0,AV3eBGaLJIufLPH4znkP


In [9]:
merged_df['ANALYSIS_KEY'].value_counts().sort_values(ascending=False)

ANALYSIS_KEY
AWMBlOSiB1UEtThguyWS    348
AWMBlR-dB1UEtThguyc1     51
AV3eHnTFJIufLPH4zn2k     45
AV3dZ4SdJIufLPH4zmlm     26
AV3eGQfIJIufLPH4znz7     24
AV3eMJC9JIufLPH4zn-N      9
AV3eGKwbJIufLPH4znzn      9
AV3eILrPJIufLPH4zn4R      8
AV3d--o5JIufLPH4znga      6
AV3eIIBWJIufLPH4zn4F      6
AV3eGZ6pJIufLPH4zn0l      6
AV3djCRBJIufLPH4zm2y      4
AV3eIcCHJIufLPH4zn43      4
AWMBmmI_B1UEtThguyfc      4
AV3dewd6JIufLPH4zmxs      4
AV3daDS3JIufLPH4zmmW      4
AV3eHzPAJIufLPH4zn3i      3
AV3eECXGJIufLPH4znpU      3
AV3eGSqxJIufLPH4zn0Y      3
AV3eIP0XJIufLPH4zn4e      2
AV3d_r6IJIufLPH4znhz      2
AV3d8DdiJIufLPH4znah      2
AV3dh0QDJIufLPH4zm1W      2
AV3eJNT9JIufLPH4zn6B      2
AV3eNoGQJIufLPH4zoAZ      2
AV3eA5a4JIufLPH4znj7      1
AV3d_O_xJIufLPH4znhA      1
AV3eIYYeJIufLPH4zn4w      1
AV3dmwTSJIufLPH4zm7B      1
Name: count, dtype: int64

In [10]:
merged_df2 = pd.merge(v1_metr[v1_metr["projectID"] == "commons-daemon"], df1[df1["projectID"] == "commons-daemon"], left_on='commitHash', right_on='creationCommitHash', how='inner')
merged_df2

Unnamed: 0,commitHash,projectID_x,SQAnalysisDate,classes,files,functions,commentLines,commentLinesDensity,complexity,fileComplexity,...,severity,project,startLine,endLine,resolution,status,message,effort,debt,author
0,139615aebe97ec81dc22fca5cd2bdd46a1b8cc95,commons-daemon,2003-09-27T15:45:02Z,16,12,92,320,24.2,164,13.7,...,MINOR,org.apache:daemon,,,FIXED,CLOSED,Replace all tab characters in this file by seq...,2,2min,jfclere@apache.org
1,139615aebe97ec81dc22fca5cd2bdd46a1b8cc95,commons-daemon,2003-09-27T15:45:02Z,16,12,92,320,24.2,164,13.7,...,MAJOR,org.apache:daemon,44.0,44.0,,OPEN,Define and throw a dedicated exception instead...,20,20min,jfclere@apache.org
2,139615aebe97ec81dc22fca5cd2bdd46a1b8cc95,commons-daemon,2003-09-27T15:45:02Z,16,12,92,320,24.2,164,13.7,...,MAJOR,org.apache:daemon,43.0,43.0,,OPEN,"Remove this unused method parameter ""arguments"".",5,5min,jfclere@apache.org
3,139615aebe97ec81dc22fca5cd2bdd46a1b8cc95,commons-daemon,2003-09-27T15:45:02Z,16,12,92,320,24.2,164,13.7,...,MINOR,org.apache:daemon,24.0,24.0,FIXED,CLOSED,Remove this unused import 'java.util.Enumerati...,2,2min,jfclere@apache.org
4,139615aebe97ec81dc22fca5cd2bdd46a1b8cc95,commons-daemon,2003-09-27T15:45:02Z,16,12,92,320,24.2,164,13.7,...,MINOR,org.apache:daemon,23.0,23.0,FIXED,CLOSED,Remove this unused import 'java.util.Date'.,2,2min,jfclere@apache.org
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
175,d352d2cc3a2da86da4573d200d471452a327b3bb,commons-daemon,2011-06-02T16:29:44Z,24,17,151,591,24.3,297,17.5,...,MAJOR,org.apache:daemon,110.0,110.0,,OPEN,Replace this use of System.out or System.err b...,10,10min,mturk@apache.org
176,d352d2cc3a2da86da4573d200d471452a327b3bb,commons-daemon,2011-06-02T16:29:44Z,24,17,151,591,24.3,297,17.5,...,MAJOR,org.apache:daemon,162.0,162.0,,OPEN,Replace this use of System.out or System.err b...,10,10min,mturk@apache.org
177,d352d2cc3a2da86da4573d200d471452a327b3bb,commons-daemon,2011-06-02T16:29:44Z,24,17,151,591,24.3,297,17.5,...,MAJOR,org.apache:daemon,164.0,164.0,,OPEN,Replace this use of System.out or System.err b...,10,10min,mturk@apache.org
178,5a5aed907aae4ac8aa544a4ee010ccfd92747105,commons-daemon,2011-08-01T20:31:34Z,24,17,151,591,24.2,298,17.5,...,CRITICAL,org.apache:daemon,104.0,104.0,,OPEN,Either log or rethrow this exception.,10,10min,mturk@apache.org


In [11]:
merged_df2['commitHash'].value_counts().sort_values(ascending=False)

commitHash
394454a2e3e6acae0d24c8005af06372d5ffc37d    45
139615aebe97ec81dc22fca5cd2bdd46a1b8cc95    26
84cf9c0e47fc56ae243f9948ea98297907c13375    24
d352d2cc3a2da86da4573d200d471452a327b3bb     9
787182423164028573b9b6c3579e7e79e984856e     9
5b0057535a564980c436c2345ad8939afaec1d99     8
d0abf31b5dee5c8cb7f90fecdad57650bf1e373e     6
9e8cf8ed0dce1015f00393df82e5da5bc6a73859     6
5e8f67f1e44da42fcf8962375ab423030c2376f4     6
6d915afa8cd6eadae812e6cc89d3a422326a8fff     4
874c8f99a6e8002d6d780fa5214836ccc0cb9ce5     4
d030f54c7257fb2d28fbbf0a60ce4d1426e04384     4
3f107c5fdc37610508e1218070101f405df02016     4
553415310feb587986ee05420b093856ea6c9572     3
1e09c60df58ce1c45c273deb35979ced7710cdad     3
fe18fada3ad485c41a0e3dcaa01bc12a67a6d7ec     3
78f04f71d81372356ea39c701007109a779d0c1d     2
dafda28bb17a44faaa235cf2eb7a8af323aec9ac     2
ba9e8bef1f2f0c5166c0927836e40e62a0f9f38e     2
7bdbb56aca295e03797ca969c304d1d1602ca5e4     2
6cbc872eb202dfc27f2eb59b02d953c3deca32c8     2
5a

Both data versions follow the same logic, an introduced code smell can be linked to the commit that introduced it. <br>
How ever, some commits seem to introduce way more errors than others. It could be interferring with successfully modelling the issue tags and needs to be acknowledged when choosing a modelling approach.

## Investigating the repetition of issues over different analysis
To better understand how known issues are handled for each new commit and analysis, it is investigated whether there are duplicated issues over different commits for each dataset version.

#### Version 1

In [21]:
check = df1[df1["projectID"] == "commons-daemon"]
check

Unnamed: 0,projectID,creationDate,closeDate,creationCommitHash,closeCommitHash,type,squid,component,severity,project,startLine,endLine,resolution,status,message,effort,debt,author
0,commons-daemon,2003-09-04T23:28:19Z,,d3416d3a25b16da3d18b3849522fa96183918e5b,,CODE_SMELL,squid:S00112,org.apache:daemon:src/main/java/org/apache/com...,MAJOR,org.apache:daemon,71.0,71.0,,OPEN,Define and throw a dedicated exception instead...,20,20min,yoavs@apache.org
1,commons-daemon,2003-09-04T23:28:19Z,2010-03-15T08:09:26Z,d3416d3a25b16da3d18b3849522fa96183918e5b,6cbc872eb202dfc27f2eb59b02d953c3deca32c8,CODE_SMELL,squid:S00122,org.apache:deamon:src/java/org/apache/commons/...,MINOR,org.apache:daemon,265.0,265.0,FIXED,CLOSED,"At most one statement is allowed per line, but...",1,1min,yoavs@apache.org
2,commons-daemon,2003-09-04T23:28:19Z,2010-03-15T08:09:26Z,d3416d3a25b16da3d18b3849522fa96183918e5b,6cbc872eb202dfc27f2eb59b02d953c3deca32c8,CODE_SMELL,squid:S00122,org.apache:deamon:src/java/org/apache/commons/...,MINOR,org.apache:daemon,259.0,259.0,FIXED,CLOSED,"At most one statement is allowed per line, but...",1,1min,yoavs@apache.org
3,commons-daemon,2003-09-04T23:28:19Z,2010-03-15T08:09:26Z,d3416d3a25b16da3d18b3849522fa96183918e5b,6cbc872eb202dfc27f2eb59b02d953c3deca32c8,CODE_SMELL,squid:S00122,org.apache:deamon:src/java/org/apache/commons/...,MINOR,org.apache:daemon,261.0,261.0,FIXED,CLOSED,"At most one statement is allowed per line, but...",1,1min,yoavs@apache.org
4,commons-daemon,2003-09-04T23:28:19Z,2010-03-15T08:09:26Z,d3416d3a25b16da3d18b3849522fa96183918e5b,6cbc872eb202dfc27f2eb59b02d953c3deca32c8,CODE_SMELL,squid:S00122,org.apache:deamon:src/java/org/apache/commons/...,MINOR,org.apache:daemon,278.0,278.0,FIXED,CLOSED,"At most one statement is allowed per line, but...",1,1min,yoavs@apache.org
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
388,commons-daemon,2011-06-02T16:29:44Z,,d352d2cc3a2da86da4573d200d471452a327b3bb,,CODE_SMELL,squid:S106,org.apache:daemon:src/main/java/org/apache/com...,MAJOR,org.apache:daemon,110.0,110.0,,OPEN,Replace this use of System.out or System.err b...,10,10min,mturk@apache.org
389,commons-daemon,2011-06-02T16:29:44Z,,d352d2cc3a2da86da4573d200d471452a327b3bb,,CODE_SMELL,squid:S106,org.apache:daemon:src/test/java/org/apache/com...,MAJOR,org.apache:daemon,162.0,162.0,,OPEN,Replace this use of System.out or System.err b...,10,10min,mturk@apache.org
390,commons-daemon,2011-06-02T16:29:44Z,,d352d2cc3a2da86da4573d200d471452a327b3bb,,CODE_SMELL,squid:S106,org.apache:daemon:src/test/java/org/apache/com...,MAJOR,org.apache:daemon,164.0,164.0,,OPEN,Replace this use of System.out or System.err b...,10,10min,mturk@apache.org
391,commons-daemon,2011-08-01T20:31:34Z,,5a5aed907aae4ac8aa544a4ee010ccfd92747105,,CODE_SMELL,squid:S1166,org.apache:daemon:src/main/java/org/apache/com...,CRITICAL,org.apache:daemon,104.0,104.0,,OPEN,Either log or rethrow this exception.,10,10min,mturk@apache.org


In [25]:
check.drop(columns = ["creationDate", "closeDate", "creationCommitHash", "closeCommitHash", "author"])[check.drop(columns = ["creationDate", "closeDate", "creationCommitHash", "closeCommitHash", "author"]).duplicated()]

Unnamed: 0,projectID,type,squid,component,severity,project,startLine,endLine,resolution,status,message,effort,debt
243,commons-daemon,CODE_SMELL,code_smells:long_method,org.apache:deamon:src/java/org/apache/commons/...,MAJOR,org.apache:daemon,1.0,1.0,FIXED,CLOSED,Long method,90,1h30min
244,commons-daemon,CODE_SMELL,code_smells:long_method,org.apache:deamon:src/java/org/apache/commons/...,MAJOR,org.apache:daemon,1.0,1.0,FIXED,CLOSED,Long method,90,1h30min
245,commons-daemon,CODE_SMELL,code_smells:long_method,org.apache:deamon:src/java/org/apache/commons/...,MAJOR,org.apache:daemon,1.0,1.0,FIXED,CLOSED,Long method,90,1h30min
246,commons-daemon,CODE_SMELL,code_smells:lazy_class,org.apache:daemon:src/samples/ServiceDaemonRea...,MAJOR,org.apache:daemon,1.0,1.0,FIXED,CLOSED,Lazy class,90,1h30min
264,commons-daemon,CODE_SMELL,code_smells:long_method,org.apache:daemon:src/samples/ProcrunService.java,MAJOR,org.apache:daemon,1.0,1.0,FIXED,CLOSED,Long method,90,1h30min


There are only 5 duplicates for version 1, all showing fixed issues. Therefore there aren't any open issues duplicated for version 1.

#### Version 2

In [14]:
check = df2[df2["PROJECT_ID"] == "org.apache:daemon"]
check

Unnamed: 0,PROJECT_ID,CREATION_ANALYSIS_KEY,ISSUE_KEY,TYPE,RULE,SEVERITY,STATUS,RESOLUTION,EFFORT,DEBT,...,MESSAGE,COMPONENT,START_LINE,END_LINE,START_OFFSET,END_OFFSET,HASH,FROM_HOTSPOT,NOT_FOUND,CLOSE_ANALYSIS_KEY
317524,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYIM2JIufLPH4zmf5,CODE_SMELL,squid:S00112,MAJOR,OPEN,,20.0,20.0,...,Define and throw a dedicated exception instead...,org.apache:daemon:src/main/java/org/apache/com...,71.0,71.0,36.0,45.0,,,0,
317525,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYIM2JIufLPH4zmf8,CODE_SMELL,squid:RedundantThrowsDeclarationCheck,MINOR,CLOSED,FIXED,5.0,5.0,...,Remove the declaration of thrown exception 'ja...,org.apache:deamon:src/java/org/apache/commons/...,32.0,32.0,11.0,32.0,,,0,AV3d8DdiJIufLPH4znah
317526,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYIM2JIufLPH4zmf9,CODE_SMELL,squid:RedundantThrowsDeclarationCheck,MINOR,CLOSED,FIXED,5.0,5.0,...,Remove the declaration of thrown exception 'ja...,org.apache:deamon:src/java/org/apache/commons/...,38.0,38.0,11.0,32.0,,,0,AV3d8DdiJIufLPH4znah
317527,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYIM2JIufLPH4zmf-,CODE_SMELL,squid:RedundantThrowsDeclarationCheck,MINOR,CLOSED,FIXED,5.0,5.0,...,Remove the declaration of thrown exception 'ja...,org.apache:deamon:src/java/org/apache/commons/...,44.0,44.0,11.0,32.0,,,0,AV3d8DdiJIufLPH4znah
317528,org.apache:daemon,AV3dX_FWJIufLPH4zmfP,AV3dYINFJIufLPH4zmf_,CODE_SMELL,squid:RedundantThrowsDeclarationCheck,MINOR,CLOSED,FIXED,5.0,5.0,...,Remove the declaration of thrown exception 'ja...,org.apache:deamon:src/java/org/apache/commons/...,50.0,50.0,11.0,32.0,,,0,AV3d8DdiJIufLPH4znah
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
318315,org.apache:daemon,AWMBlR-dB1UEtThguyc1,AWMBlSKQB1UEtThguydo,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,...,Missing newline after last element,org.apache:daemon:src/site/xdoc/procrun.xml,315.0,315.0,0.0,76.0,,,0,
318316,org.apache:daemon,AWMBmmI_B1UEtThguyfc,AWMBmmYpB1UEtThguyfd,CODE_SMELL,xml:IndentCheck,MINOR,OPEN,,1.0,1.0,...,Make this line start at column 9.,org.apache:daemon:src/native/unix/man/jsvc.1.xml,50.0,50.0,0.0,84.0,,,0,
318317,org.apache:daemon,AWMBmmI_B1UEtThguyfc,AWMBmmYsB1UEtThguyfe,CODE_SMELL,xml:IndentCheck,MINOR,OPEN,,1.0,1.0,...,Make this line start at column 11.,org.apache:daemon:src/native/unix/man/jsvc.1.xml,151.0,151.0,0.0,65.0,,,0,
318318,org.apache:daemon,AWMBmmI_B1UEtThguyfc,AWMBmmYsB1UEtThguyff,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,...,Missing newline after last element,org.apache:daemon:src/native/unix/man/jsvc.1.xml,50.0,50.0,0.0,84.0,,,0,


In [20]:
check.drop(columns = ["CREATION_ANALYSIS_KEY", "ISSUE_KEY"])[check.drop(columns = ["CREATION_ANALYSIS_KEY", "ISSUE_KEY"]).duplicated()]

Unnamed: 0,PROJECT_ID,TYPE,RULE,SEVERITY,STATUS,RESOLUTION,EFFORT,DEBT,TAGS,CREATION_DATE,...,MESSAGE,COMPONENT,START_LINE,END_LINE,START_OFFSET,END_OFFSET,HASH,FROM_HOTSPOT,NOT_FOUND,CLOSE_ANALYSIS_KEY
318086,org.apache:daemon,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,,2017-10-20 07:48:15,...,Start every element on a separate line.,org.apache:daemon:src/site/xdoc/mail-lists.xml,106.0,106.0,0.0,97.0,,,0,
318099,org.apache:daemon,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,,2017-10-20 07:48:15,...,Start every element on a separate line.,org.apache:daemon:src/site/xdoc/mail-lists.xml,124.0,124.0,0.0,95.0,,,0,
318112,org.apache:daemon,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,,2017-10-20 07:48:15,...,Start every element on a separate line.,org.apache:daemon:src/site/xdoc/mail-lists.xml,142.0,142.0,0.0,98.0,,,0,
318124,org.apache:daemon,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,,2017-10-20 07:48:15,...,Start every element on a separate line.,org.apache:daemon:src/site/xdoc/mail-lists.xml,159.0,159.0,0.0,99.0,,,0,
318136,org.apache:daemon,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,,2017-10-20 07:48:15,...,Start every element on a separate line.,org.apache:daemon:src/site/xdoc/mail-lists.xml,192.0,192.0,0.0,113.0,,,0,
318307,org.apache:daemon,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,,2017-10-20 18:52:50,...,Start every element on a separate line.,org.apache:daemon:src/site/xdoc/procrun.xml,268.0,268.0,0.0,87.0,,,0,
318308,org.apache:daemon,CODE_SMELL,xml:NewlineCheck,MINOR,OPEN,,2.0,2.0,,2017-10-20 18:52:50,...,Missing newline after last element,org.apache:daemon:src/site/xdoc/procrun.xml,268.0,268.0,0.0,87.0,,,0,


For version 2, there are duplicated issues, that do however have their own CREATION_ANALYSIS_KEY and ISSUE_KEY. <br>
At the same time, there are only 7 duplicated issues out of 796, showing that this is not commonly done for all issues but instead could be reintroduced errors by developers. In comparison to version 1, the duplicated issues are marked as open.