-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
need papers with thresholds #7
Comments
Hahaha, the images in issue #5 were from those papers! They have some awesome references which I'll use for our paper. |
what we need is you to run some quick select queries over our test data. do you see this chart and "SEESAW"? that was an old tool of mine. but see how it does better than N other standard things? what we need is this chart for the defect data sets with "SEESAW" replaced with "RANK" (the name of our method, currently, in this paper) |
|
Goof. Will need bibtex entries for all papers you usr |
Also... when these ranges are applied to the data, what effect do they have to the defect distribution? |
Working on that, I'll have the results this evening.
|
Nor that the home run would be that if Data set d, that RANK found good tteatments for, after division into good,bad (where bad = rows selected by threshold and good = all - bad) then the defect density is,about the same in good and bad (as witnessed by, say, box plots) |
what are the thresholds in the tool that harman used to assess his refactorings? |
Harman's refactoring tool thresholds. I'm looking in to this, comment as soon as I find it. |
when can i get results from applying those thresholds? |
In about an hour.. fixing some bugs. |
Results (Updating...) :ant
ivy
poi
jedit
|
😏
|
|
so is the deal that the 2010 TSE paper defines a procedure for finding thresholds? and you applied that procedure and got the above? what is that procedure? please answer in enough detail so i can succinctly but authoritatively write this down n the paper. |
Yup, that's right.
In our work, we have coded fault-free classes as zero, and faulty classes as one. We could leverage this binary nature to apply a Univariate Binary Logistic Regression (UBR) to identify metrics that have a significant association with the occurrence of defects. To set a cut-off for this association, we use a confidence interval of 95%. To identify thresholds for the metrics that we significant, we use a method called Value of Acceptable Risk Level (VARL) first proposed by Bender~\cite{bender99} in identifying thresholds in epidemiology studies. In his TSE 2010 article, Shatnawi~\cite{shatnawi10} endorsed the use of this method in identifying thresholds in object-oriented metrics for open source software systems. The VARL method measures cut-off values in metrics such that, below that threshold, the probability of occurrence of defect is less than a probability \begin{equation}
It is highly unlikely that the metrics have a similar impact on all data sets. Therefore, we must run the model on a data set to identify metrics and corresponding thresholds that matter. |
v.good |
Ant{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
1 & RANK & 57.83 & 29.52 & \quart{46}{33}{66}{1} \\
\hline 2 & Reduce cbo & 16.27 & 4.21 & \quart{15}{5}{18}{1} \\
2 & Reduce loc & 15.66 & 2.41 & \quart{16}{3}{17}{1} \\
2 & Reduce cam & 15.06 & 3.01 & \quart{16}{3}{17}{1} \\
2 & Reduce avg_cc & 15.66 & 3.01 & \quart{16}{3}{17}{1} \\
2 & Reduce ic & 15.66 & 3.61 & \quart{15}{4}{17}{1} \\
2 & Reduce lcom & 15.66 & 4.82 & \quart{14}{5}{17}{1} \\
2 & Reduce wmc & 15.66 & 3.01 & \quart{15}{4}{17}{1} \\
2 & Reduce max_cc & 15.06 & 2.41 & \quart{15}{3}{17}{1} \\
\hline \end{tabular}} Ivy{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
1 & RANK & 52.5 & 17.5 & \quart{57}{22}{67}{1} \\
\hline 2 & Reduce avg_cc & 22.5 & 7.5 & \quart{25}{10}{28}{1} \\
2 & Reduce loc & 22.5 & 10.0 & \quart{22}{13}{28}{1} \\
2 & Reduce cbo & 22.5 & 10.0 & \quart{22}{13}{28}{1} \\
2 & Reduce wmc & 22.5 & 7.5 & \quart{22}{9}{28}{1} \\
2 & Reduce max_cc & 20.0 & 7.5 & \quart{22}{9}{25}{1} \\
2 & Reduce cam & 20.0 & 10.0 & \quart{22}{13}{25}{1} \\
\hline \end{tabular}} Poi{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
1 & RANK & 19.93 & 12.11 & \quart{46}{33}{54}{2} \\
\hline 2 & Reduce lcom & 9.25 & 1.43 & \quart{23}{4}{25}{2} \\
2 & Reduce ic & 9.25 & 1.43 & \quart{23}{4}{25}{2} \\
2 & Reduce lcom3 & 8.9 & 1.77 & \quart{22}{5}{24}{2} \\
\hline 3 & Reduce loc & 8.9 & 2.14 & \quart{20}{6}{24}{2} \\
3 & Reduce cam & 8.53 & 1.78 & \quart{21}{5}{23}{2} \\
\hline \end{tabular}} Jedit{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\
1 & Reduce dam & 36.36 & 27.28 & \quart{34}{34}{45}{1} \\
1 & Reduce moa & 36.36 & 18.19 & \quart{45}{23}{45}{1} \\
1 & Reduce rfc & 45.45 & 18.19 & \quart{45}{23}{57}{1} \\
1 & Reduce ca & 45.45 & 18.19 & \quart{45}{23}{57}{1} \\
1 & Reduce ce & 45.45 & 18.19 & \quart{45}{23}{57}{1} \\
1 & Reduce npm & 45.45 & 18.19 & \quart{45}{23}{57}{1} \\
1 & Reduce loc & 45.45 & 9.09 & \quart{45}{12}{57}{1} \\
1 & Reduce amc & 45.45 & 27.28 & \quart{45}{34}{57}{1} \\
1 & Reduce avg_cc & 45.45 & 18.19 & \quart{45}{23}{57}{1} \\
\hline 2 & Reduce dit & 36.36 & 36.37 & \quart{22}{46}{45}{1} \\
2 & Reduce lcom3 & 36.36 & 18.19 & \quart{45}{23}{45}{1} \\
2 & Reduce cbm & 36.36 & 18.19 & \quart{45}{23}{45}{1} \\
2 & RANK & 36.36 & 0.0 & \quart{45}{0}{45}{1} \\
\hline \end{tabular}} Lucene{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
1 & RANK & 14.78 & 4.92 & \quart{57}{22}{66}{4} \\
1 & Reduce lcom3 & 15.76 & 1.96 & \quart{68}{9}{71}{4} \\
1 & Reduce moa & 15.76 & 2.45 & \quart{66}{11}{71}{4} \\
1 & Reduce cbo & 16.26 & 1.97 & \quart{71}{8}{73}{4} \\
1 & Reduce npm & 16.26 & 2.46 & \quart{68}{11}{73}{4} \\
1 & Reduce loc & 16.75 & 2.46 & \quart{68}{11}{75}{4} \\
\hline \end{tabular}} |
re harman's threshold technique
|
re harman's threshold technique
There are 2 references. @article{hermans15,
title={Detecting and refactoring code smells in spreadsheet formulas},
author={Hermans, Felienne and Pinzger, Martin and van Deursen, Arie},
journal={Empirical Software Engineering},
volume={20},
number={2},
pages={549--575},
year={2015},
publisher={Springer}
} @inproceedings{Alves2010,
author = {Alves, Tiago L. and Ypma, Christiaan and Visser, Joost},
booktitle = {2010 IEEE Int. Conf. Softw. Maint.},
doi = {10.1109/ICSM.2010.5609747},
benchmark data - 2010.pdf:pdf},
isbn = {978-1-4244-8630-4},
issn = {10636773},
mendeley-groups = {OO Metric Thresholds},
month = {sep},
pages = {1--10},
publisher = {IEEE},
title = {{Deriving metric thresholds from benchmark data}},
url = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5609747},
year = {2010}
}
They seem to use a benchmark data set to derive a set of common thresholds. Since, we don't have that, we can derive thresholds separately for every data set. The technique is straightforward. |
Hermans thresholdsSummaryIn addition to using VARL to identify thresholds as proposed by Shatnawi. We another alternative method proposed by Alves et al~\cite{alves10}. This method is unique in that respects the underlying statistical distribution and scale of the metrics. It works as follows. Evey metric value is weighted according to the source lines of code (LOC) of the class. All the weighted metrics are then normalized i.e., they are divided by the sum of all weights of the same system. Following this, the normalized metric values are ordered in an ascending fashion. This is equivalent to computing a density function, in which the x-axis represents the weight ratio (0-100%), and the y-axis the metric scale. Thresholds are then derived by choosing the percentage of the overall code that needs to be represented. For instance, Alves et al suggest the use 90% quantile of the overall code to derive the threshold for a specific metric. This threshold is meaningful since it can be used to identify 10% of the worst code with respect to a specific metric. And thresholds greater than 90% represent very-high risk. Ant{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
1 & RANK & 63.25 & 24.1 & \quart{53}{26}{70}{1} \\
\hline 2 & Reduce wmc & 22.29 & 6.63 & \quart{19}{7}{24}{1} \\
2 & Reduce max_cc & 21.69 & 7.23 & \quart{18}{8}{24}{1} \\
2 & Reduce loc & 21.69 & 4.82 & \quart{20}{6}{24}{1} \\
2 & Reduce lcom & 21.69 & 4.82 & \quart{22}{6}{24}{1} \\
2 & Reduce cbo & 21.69 & 4.82 & \quart{21}{5}{24}{1} \\
2 & Reduce ic & 21.69 & 5.43 & \quart{20}{6}{24}{1} \\
2 & Reduce cbm & 21.08 & 5.43 & \quart{20}{6}{23}{1} \\
2 & Reduce dam & 21.08 & 6.02 & \quart{21}{7}{23}{1} \\
2 & Reduce npm & 21.08 & 5.43 & \quart{20}{6}{23}{1} \\
2 & Reduce rfc & 21.08 & 3.61 & \quart{21}{4}{23}{1} \\
2 & Reduce cam & 21.08 & 4.22 & \quart{20}{5}{23}{1} \\
2 & Reduce moa & 19.88 & 5.42 & \quart{20}{6}{22}{1} \\
2 & Reduce ce & 20.48 & 4.21 & \quart{21}{5}{22}{1} \\
2 & Reduce avg_cc & 19.88 & 7.23 & \quart{19}{8}{22}{1} \\
\hline \end{tabular}} Ivy{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
1 & Reduce noc & 30.0 & 15.0 & \quart{31}{20}{38}{1} \\
1 & Reduce amc & 30.0 & 12.5 & \quart{31}{16}{38}{1} \\
1 & Reduce ce & 30.0 & 12.5 & \quart{35}{16}{38}{1} \\
1 & Reduce lcom & 32.5 & 10.0 & \quart{35}{12}{41}{1} \\
1 & Reduce loc & 32.5 & 12.5 & \quart{35}{16}{41}{1} \\
1 & Reduce wmc & 32.5 & 17.5 & \quart{31}{23}{41}{1} \\
1 & Reduce cbo & 35.0 & 12.5 & \quart{35}{16}{44}{1} \\
1 & Reduce rfc & 35.0 & 12.5 & \quart{35}{16}{44}{1} \\
1 & Reduce npm & 35.0 & 7.5 & \quart{38}{9}{44}{1} \\
1 & Reduce cam & 35.0 & 15.0 & \quart{38}{19}{44}{1} \\
1 & Reduce max_cc & 35.0 & 12.5 & \quart{35}{16}{44}{1} \\
1 & Reduce avg_cc & 35.0 & 15.0 & \quart{35}{19}{44}{1} \\
1 & Reduce cbm & 40.0 & 17.5 & \quart{38}{22}{51}{1} \\
\hline 2 & RANK & 52.5 & 20.0 & \quart{54}{25}{67}{1} \\
\hline \end{tabular}} Jedit{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
1 & Reduce wmc & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce dit & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce cbo & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce rfc & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce lcom & 36.36 & 36.36 & \quart{0}{79}{79}{2} \\
1 & Reduce ca & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce ce & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce npm & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce lcom3 & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce loc & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce dam & 36.36 & 36.36 & \quart{0}{79}{79}{2} \\
1 & Reduce moa & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce cam & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce ic & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce cbm & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce amc & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce max_cc & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & Reduce avg_cc & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
1 & RANK & 36.36 & 0.0 & \quart{79}{0}{79}{2} \\
\hline \end{tabular}} Lucene{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
1 & Reduce lcom & 14.78 & 2.46 & \quart{51}{9}{57}{3} \\
1 & Reduce dam & 14.78 & 1.97 & \quart{55}{7}{57}{3} \\
1 & Reduce npm & 15.27 & 2.96 & \quart{53}{11}{59}{3} \\
1 & Reduce cam & 15.27 & 2.46 & \quart{55}{9}{59}{3} \\
1 & Reduce rfc & 15.76 & 1.48 & \quart{57}{5}{60}{3} \\
1 & Reduce lcom3 & 15.76 & 1.97 & \quart{57}{7}{60}{3} \\
1 & Reduce loc & 15.76 & 2.96 & \quart{53}{11}{60}{3} \\
\hline 2 & Reduce cbo & 15.76 & 2.94 & \quart{55}{11}{60}{3} \\
2 & Reduce cbm & 15.76 & 2.45 & \quart{57}{9}{60}{3} \\
2 & Reduce wmc & 16.26 & 2.94 & \quart{55}{11}{62}{3} \\
2 & Reduce ce & 16.26 & 2.45 & \quart{57}{9}{62}{3} \\
2 & Reduce amc & 16.26 & 2.46 & \quart{55}{9}{62}{3} \\
2 & Reduce moa & 16.26 & 1.96 & \quart{59}{7}{62}{3} \\
2 & RANK & 16.75 & 7.88 & \quart{49}{30}{64}{3} \\
\hline \end{tabular}} Poi{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
1 & Reduce lcom & 9.61 & 2.86 & \quart{25}{9}{28}{2} \\
1 & Reduce npm & 9.61 & 4.27 & \quart{23}{13}{28}{2} \\
1 & Reduce lcom3 & 9.61 & 2.15 & \quart{25}{7}{28}{2} \\
1 & Reduce ic & 9.96 & 3.2 & \quart{26}{10}{29}{2} \\
1 & Reduce amc & 9.96 & 1.78 & \quart{26}{6}{29}{2} \\
1 & Reduce ce & 9.96 & 2.86 & \quart{25}{9}{29}{2} \\
1 & Reduce rfc & 10.32 & 2.84 & \quart{26}{9}{30}{2} \\
1 & Reduce moa & 10.32 & 2.13 & \quart{26}{7}{30}{2} \\
1 & Reduce mfa & 10.32 & 3.21 & \quart{25}{10}{30}{2} \\
1 & Reduce wmc & 10.32 & 2.13 & \quart{28}{7}{30}{2} \\
\hline 2 & Reduce dit & 10.68 & 2.13 & \quart{28}{7}{32}{2} \\
2 & Reduce cam & 10.68 & 3.21 & \quart{25}{10}{32}{2} \\
2 & Reduce max_cc & 10.32 & 3.2 & \quart{26}{10}{30}{2} \\
2 & Reduce loc & 11.03 & 3.2 & \quart{28}{10}{33}{2} \\
2 & Reduce cbm & 11.39 & 2.14 & \quart{29}{7}{34}{2} \\
\hline 3 & RANK & 20.64 & 8.9 & \quart{53}{26}{61}{2} \\
\hline \end{tabular}} |
Summary Shatnawi10In our work, we have coded fault-free classes as zero, and faulty classes as one. We could leverage this binary nature to apply a Univariate Binary Logistic Regression (UBR) to identify metrics that have a significant association with the occurrence of defects. To set a cut-off for this association, we use a confidence interval of 95%. To identify thresholds for the metrics that we significant, we use a method called Value of Acceptable Risk Level (VARL) first proposed by Bender~\cite{bender99} in identifying thresholds in epidemiology studies. In his TSE 2010 article, Shatnawi~\cite{shatnawi10} endorsed the use of this method in identifying thresholds in object-oriented metrics for open source software systems. The VARL method measures cut-off values in metrics such that, below that threshold, the probability of occurrence of defect is less than a probability \begin{equation} Summary Alves10In addition to using VARL to identify thresholds as proposed by Shatnawi. We another alternative method proposed by Alves et al~\cite{alves10}. This method is unique in that respects the underlying statistical distribution and scale of the metrics. It works as follows. Evey metric value is weighted according to the source lines of code (LOC) of the class. All the weighted metrics are then normalized i.e., they are divided by the sum of all weights of the same system. Following this, the normalized metric values are ordered in an ascending fashion. This is equivalent to computing a density function, in which the x-axis represents the weight ratio (0-100%), and the y-axis the metric scale. Thresholds are then derived by choosing the percentage of the overall code that needs to be represented. For instance, Alves et al suggest the use 90% quantile of the overall code to derive the threshold for a specific metric. This threshold is meaningful since it can be used to identify 10% of the worst code with respect to a specific metric. And thresholds greater than 90% represent a very high risk. Deprecated MethodOne of the first methods of finding thresholds was proposed Erni and Lewerentz~\cite{erni96}. Their technique to identify thresholds was based on the data distribution, specifically the mean and the standard deviation of the metric values. They propose the use of values that lie beyond one standard deviation from the mean as a threshold. The minimum value Several researchers~\cite{shatnawi10}~cite{alves10} have pointed out that this method is subject to a few problems. Firstly, it doesn't consider the fault-proneness of classes when the thresholds are computed. Secondly, there is a lack of empirical validation of this methodology, which impedes reasonable comparisons. |
Need to know what happens when thresholds from N sources are applied to our data sets. ANd i need that written up and into the paper.
as you say, these authors agree: However, there are few studies that were conducted to formulate the guidelines, represented as threshold values, to interpret the complexity of the software design using metrics.
but from this paper you CAN get thresholds.
BTW, Here'a a paper that does what we hate. menthions metrcs but not thresholds
The text was updated successfully, but these errors were encountered: