Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need papers with thresholds #7

Open
4 tasks done
timm opened this issue Mar 1, 2016 · 22 comments
Open
4 tasks done

need papers with thresholds #7

timm opened this issue Mar 1, 2016 · 22 comments
Assignees

Comments

@timm
Copy link

timm commented Mar 1, 2016

Need to know what happens when thresholds from N sources are applied to our data sets. ANd i need that written up and into the paper.

BTW, Here'a a paper that does what we hate. menthions metrcs but not thresholds

@rahlk
Copy link
Contributor

rahlk commented Mar 1, 2016

Hahaha, the images in issue #5 were from those papers! They have some awesome references which I'll use for our paper.

@timm
Copy link
Author

timm commented Mar 1, 2016

what we need is you to run some quick select queries over our test data. do you see this chart and "SEESAW"? that was an old tool of mine. but see how it does better than N other standard things?

what we need is this chart for the defect data sets with "SEESAW" replaced with "RANK" (the name of our method, currently, in this paper)

image

@rahlk
Copy link
Contributor

rahlk commented Mar 2, 2016

  • Implement the techniques from the above 2 papers.
    DONE: Here are the thresholds for ANT
+---------+---------------------+-------------+
|         | VARL (Shatnawi '10) | Filó et al. |
+ Metrics +---------------------+-------------+
|         | Threshold | P-Value | Threshold   |
+---------+-----------+---------+-------------+
| CBO     | 1.78      | 0.000   | -           |
+---------+-----------+---------+-------------+
| MAX_CC  | 2.07      | 0.000   | -           |
+---------+-----------+---------+-------------+
| AVG_CC  | 0.86      | 0.003   | -           |
+---------+-----------+---------+-------------+
| LCOM    | 51        | 0.000   | 725         |
+---------+-----------+---------+-------------+
| LOC     | 171.59    | 0.000   | 30          |
+---------+-----------+---------+-------------+
| NOC     | -         | -       | 28          |
+---------+-----------+---------+-------------+
| CA      | -         | -       | 39          |
+---------+-----------+---------+-------------+
| CE      | -         | -       | 16          |
+---------+-----------+---------+-------------+
| DIT     | -         | -       | 4           |
+---------+-----------+---------+-------------+
| WMC     | -         | -       | 34          |
+---------+-----------+---------+-------------+
  • Compare against RANK

@timm
Copy link
Author

timm commented Mar 2, 2016

Goof. Will need bibtex entries for all papers you usr

@timm
Copy link
Author

timm commented Mar 2, 2016

Also... when these ranges are applied to the data, what effect do they have to the defect distribution?

@rahlk
Copy link
Contributor

rahlk commented Mar 2, 2016

Working on that, I'll have the results this evening.
On Mar 2, 2016 8:44 AM, "Tim Menzies" notifications@github.com wrote:

Also... when these ranges are applied to the data, what effect do they
have to the defect distribution?


Reply to this email directly or view it on GitHub
#7 (comment).

@timm
Copy link
Author

timm commented Mar 2, 2016

Nor that the home run would be that if Data set d, that RANK found good tteatments for, after division into good,bad (where bad = rows selected by threshold and good = all - bad) then the defect density is,about the same in good and bad (as witnessed by, say, box plots)

@timm
Copy link
Author

timm commented Mar 3, 2016

what are the thresholds in the tool that harman used to assess his refactorings?

@rahlk
Copy link
Contributor

rahlk commented Mar 3, 2016

Harman's refactoring tool thresholds. I'm looking in to this, comment as soon as I find it.

@timm
Copy link
Author

timm commented Mar 4, 2016

when can i get results from applying those thresholds?

@rahlk
Copy link
Contributor

rahlk commented Mar 4, 2016

In about an hour.. fixing some bugs.

@rahlk
Copy link
Contributor

rahlk commented Mar 4, 2016

Results (Updating...) :

ant

+--------+-----------+---------+
| Metric | Threshold | P-Value |
+========+===========+=========+
| wmc    | 14.67     | 0.000   |
+--------+-----------+---------+
| cbo    | 30.13     | 0.000   |
+--------+-----------+---------+
| lcom   | 849.16    | 0.000   |
+--------+-----------+---------+
| loc    | 2951.64   | 0.000   |
+--------+-----------+---------+
| cam    | 0.84      | 0.000   |
+--------+-----------+---------+
| ic     | 5.29      | 0.000   |
+--------+-----------+---------+
| max_cc | 34.47     | 0.000   |
+--------+-----------+---------+
| avg_cc | 14.63     | 0.003   |
+--------+-----------+---------+ 


rank ,         name ,    med   ,  iqr 
----------------------------------------------------
   1 ,   Reduce cam ,    12.35  ,  13.25 (  --*          |              ), 7.83,  13.86,  21.08
   1 ,   Reduce wmc ,    12.65  ,  11.45 (  -*           |              ), 9.64,  12.65,  21.08
   1 , Reduce avg_cc ,    14.46  ,  5.42 (   -*          |              ), 12.05,  15.06,  17.47
   1 ,   Reduce loc ,    15.06  ,  13.25 (   -*          |              ), 10.24,  15.66,  23.49
   1 ,    Reduce ic ,    15.36  ,  7.23  (   --*         |              ), 11.45,  16.87,  18.67
   1 ,   Reduce cbo ,    16.57  ,  9.04  (   --*         |              ), 12.05,  18.07,  21.08
   1 ,  Reduce lcom ,    17.77  ,  12.05 (  ---*         |              ), 9.64,  18.07,  21.69
   1 , Reduce max_cc ,    19.58  ,  7.23 (     *         |              ), 16.87,  19.88,  24.10
   2 ,         RANK ,    47.89  ,  30.72 (           ---*|              ), 37.95,  48.19,  68.67

ivy

+--------+-----------+---------+
| Metric | Threshold | P-Value |
+========+===========+=========+
| wmc    | 84.99     | 0.000   |
+--------+-----------+---------+
| cbo    | 22.17     | 0.002   |
+--------+-----------+---------+
| lcom   | 16048.61  | 0.027   |
+--------+-----------+---------+
| loc    | 1668.51   | 0.000   |
+--------+-----------+---------+
| cam    | 2.29      | 0.000   |
+--------+-----------+---------+
| max_cc | 31.06     | 0.034   |
+--------+-----------+---------+
| avg_cc | 30.91     | 0.026   |
+--------+-----------+---------+ 


rank ,         name ,    med   ,  iqr 
----------------------------------------------------
   1 ,   Reduce cam ,    20.00  ,  15.00 (    -*         |              ), 15.00,  20.00,  30.00
   1 ,   Reduce loc ,    20.00  ,  10.00 (    --*        |              ), 15.00,  22.50,  25.00
   1 ,   Reduce cbo ,    21.25  ,  10.00 (     -*        |              ), 17.50,  22.50,  27.50
   1 , Reduce max_cc ,    21.25  ,  7.50 (     -*        |              ), 17.50,  22.50,  25.00
   1 ,  Reduce lcom ,     22.50  ,  2.50 (      *        |              ), 22.50,  22.50,  25.00
   1 ,   Reduce wmc ,    23.75  ,  10.00 (     --*       |              ), 17.50,  25.00,  27.50
   1 , Reduce avg_cc ,   23.75  ,  15.00 (     ---*      |              ), 17.50,  30.00,  32.50
   2 ,         RANK ,    57.50  ,  12.50 (              -|-*            ), 47.50,  57.50,  60.00

poi

+--------+-----------+---------+
| Metric | Threshold | P-Value |
+========+===========+=========+
| lcom   | 4092.69   | 0.000   |
+--------+-----------+---------+
| lcom3  | 4.78      | 0.000   |
+--------+-----------+---------+
| loc    | 71055.23  | 0.000   |
+--------+-----------+---------+
| cam    | 3.34      | 0.000   |
+--------+-----------+---------+
| ic     | 26.97     | 0.000   |
+--------+-----------+---------+ 


rank ,         name ,    med   ,  iqr 
----------------------------------------------------
   1 ,   Reduce cam ,    8.54  ,  1.07 (  *            |              ), 8.19,  8.90,  9.25
   1 , Reduce lcom3 ,    8.72  ,  3.56 (  *            |              ), 7.12,  8.90,  10.68
   1 ,  Reduce lcom ,    8.90  ,  2.49 (  *            |              ), 7.47,  8.90,  9.96
   1 ,   Reduce loc ,    9.07  ,  2.85 (  *            |              ), 7.47,  9.25,  10.32
   1 ,    Reduce ic ,    9.96  ,  2.14 (  *            |              ), 8.90,  9.96,  11.03
   2 ,        RANK ,    23.13  ,  6.41 (     --*       |              ), 19.22,  23.84,  25.62

jedit

+--------+-----------+---------+
| Metric | Threshold | P-Value |
+========+===========+=========+
| dit    | 14.47     | 0.000   |
+--------+-----------+---------+
| rfc    | 20.73     | 0.000   |
+--------+-----------+---------+
| ca     | 2.37      | 0.000   |
+--------+-----------+---------+
| ce     | 2.69      | 0.000   |
+--------+-----------+---------+
| npm    | 11.55     | 0.000   |
+--------+-----------+---------+
| lcom3  | 4.16      | 0.000   |
+--------+-----------+---------+
| loc    | 61269.41  | 0.000   |
+--------+-----------+---------+
| dam    | 0.53      | 0.000   |
+--------+-----------+---------+
| moa    | 8.88      | 0.000   |
+--------+-----------+---------+
| cbm    | 6.76      | 0.000   |
+--------+-----------+---------+
| amc    | 510.48    | 0.001   |
+--------+-----------+---------+
| avg_cc | 2.02      | 0.000   |
+--------+-----------+---------+ 


rank ,         name ,    med   ,  iqr 
----------------------------------------------------
   1 ,   Reduce dit ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,   Reduce rfc ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,    Reduce ca ,    36.36  ,  18.18 (        --*    |              ), 27.27,  36.36,  45.45
   1 ,    Reduce ce ,    36.36  ,  18.18 (        --*    |              ), 27.27,  36.36,  45.45
   1 ,   Reduce npm ,    36.36  ,  18.18 (        --*    |              ), 27.27,  36.36,  45.45
   1 , Reduce lcom3 ,    36.36  ,  9.09 (        --*    |              ), 27.27,  36.36,  36.36
   1 ,   Reduce loc ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,   Reduce dam ,    36.36  ,  27.27 (     -----*    |              ), 18.18,  36.36,  45.45
   1 ,   Reduce moa ,    36.36  ,  36.36 (  --------*    |              ), 9.09,  36.36,  45.45
   1 ,   Reduce cbm ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,   Reduce amc ,    36.36  ,  9.09 (          *    |              ), 36.36,  36.36,  45.45
   1 ,         RANK ,    36.36  ,  0.00 (          *    |              ), 36.36,  36.36,  36.36
   1 , Reduce avg_cc ,    40.91  ,  9.09 (          ---* |              ), 36.36,  45.45,  45.45

@timm
Copy link
Author

timm commented Mar 4, 2016

😏

  • why are these thresholds different in different data sets?
  • what is the source reference for each threshold?
  • how many more should i expect?
  • can i get these in latex too? not today but early next week?

@rahlk
Copy link
Contributor

rahlk commented Mar 4, 2016

  • why are these thresholds different in different data sets?
  • Nature of the data set/projects. I only retain metrics with valid thresholds with P<0.05. Only LOC and LCOM are common to all data sets. This trend has also been noted in the 2010 TSE article.
  • what is the source reference for each threshold?
    The method (VARL) proposed by 2010 TSE article (is this what you mean?)
  • how many more should i expect?
    1 More. I'm running these with 40 repeats (random no. > 30)
  • can i get these in latex too? not today but early next week?
    10 - 4.

@timm
Copy link
Author

timm commented Mar 4, 2016

I only retain metrics with valid thresholds with P<0.05.

so is the deal that the 2010 TSE paper defines a procedure for finding thresholds? and you applied that procedure and got the above? what is that procedure? please answer in enough detail so i can succinctly but authoritatively write this down n the paper.

@rahlk
Copy link
Contributor

rahlk commented Mar 5, 2016

so is the deal that the 2010 TSE paper defines a procedure for finding thresholds? and you applied that procedure and got the above?

Yup, that's right.

what is that procedure? please answer in enough detail so I can succinctly but authoritatively write this down n the paper.

In our work, we have coded fault-free classes as zero, and faulty classes as one. We could leverage this binary nature to apply a Univariate Binary Logistic Regression (UBR) to identify metrics that have a significant association with the occurrence of defects. To set a cut-off for this association, we use a confidence interval of 95%.

To identify thresholds for the metrics that we significant, we use a method called Value of Acceptable Risk Level (VARL) first proposed by Bender~\cite{bender99} in identifying thresholds in epidemiology studies. In his TSE 2010 article, Shatnawi~\cite{shatnawi10} endorsed the use of this method in identifying thresholds in object-oriented metrics for open source software systems.

The VARL method measures cut-off values in metrics such that, below that threshold, the probability of occurrence of defect is less than a probability $p_0$. To do this, we fit a Univariate Binary Logistic Regression (UBR) to the metrics. For every significant metric, this generates a general logistic regression model with a constant intercept ($\alpha$) and a coefficient for maximizing log-likelihood function ($\beta$). With these, the VARL is measure as follows:

\begin{equation}
VARL = \frac{1}{\beta }\left( {\log \left( {\frac{{{p_0}}}{{1 - {p_0}}}} \right) - \alpha } \right)
\end{equation}

why are these thresholds different in different data sets?

It is highly unlikely that the metrics have a similar impact on all data sets. Therefore, we must run the model on a data set to identify metrics and corresponding thresholds that matter.

@timm
Copy link
Author

timm commented Mar 5, 2016

v.good

@rahlk
Copy link
Contributor

rahlk commented Mar 7, 2016

Ant

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    57.83  &  29.52 & \quart{46}{33}{66}{1} \\
\hline  2 &   Reduce cbo &    16.27  &  4.21 & \quart{15}{5}{18}{1} \\
  2 &   Reduce loc &    15.66  &  2.41 & \quart{16}{3}{17}{1} \\
  2 &   Reduce cam &    15.06  &  3.01 & \quart{16}{3}{17}{1} \\
  2 & Reduce avg_cc &    15.66  &  3.01 & \quart{16}{3}{17}{1} \\
  2 &    Reduce ic &    15.66  &  3.61 & \quart{15}{4}{17}{1} \\
  2 &  Reduce lcom &    15.66  &  4.82 & \quart{14}{5}{17}{1} \\
  2 &   Reduce wmc &    15.66  &  3.01 & \quart{15}{4}{17}{1} \\
  2 & Reduce max_cc &    15.06  &  2.41 & \quart{15}{3}{17}{1} \\
\hline \end{tabular}}

Ivy

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    52.5  &  17.5 & \quart{57}{22}{67}{1} \\
\hline  2 & Reduce avg_cc &    22.5  &  7.5 & \quart{25}{10}{28}{1} \\
  2 &   Reduce loc &    22.5  &  10.0 & \quart{22}{13}{28}{1} \\
  2 &   Reduce cbo &    22.5  &  10.0 & \quart{22}{13}{28}{1} \\
  2 &   Reduce wmc &    22.5  &  7.5 & \quart{22}{9}{28}{1} \\
  2 & Reduce max_cc &    20.0  &  7.5 & \quart{22}{9}{25}{1} \\
  2 &   Reduce cam &    20.0  &  10.0 & \quart{22}{13}{25}{1} \\
\hline \end{tabular}}

Poi

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    19.93  &  12.11 & \quart{46}{33}{54}{2} \\
\hline  2 &  Reduce lcom &    9.25  &  1.43 & \quart{23}{4}{25}{2} \\
  2 &    Reduce ic &    9.25  &  1.43 & \quart{23}{4}{25}{2} \\
  2 & Reduce lcom3 &    8.9  &  1.77 & \quart{22}{5}{24}{2} \\
\hline  3 &   Reduce loc &    8.9  &  2.14 & \quart{20}{6}{24}{2} \\
  3 &   Reduce cam &    8.53  &  1.78 & \quart{21}{5}{23}{2} \\
\hline \end{tabular}}

Jedit

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\
  1 &   Reduce dam &    36.36  &  27.28 & \quart{34}{34}{45}{1} \\
  1 &   Reduce moa &    36.36  &  18.19 & \quart{45}{23}{45}{1} \\
  1 &   Reduce rfc &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &    Reduce ca &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &    Reduce ce &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &   Reduce npm &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
  1 &   Reduce loc &    45.45  &  9.09 & \quart{45}{12}{57}{1} \\
  1 &   Reduce amc &    45.45  &  27.28 & \quart{45}{34}{57}{1} \\
  1 & Reduce avg_cc &    45.45  &  18.19 & \quart{45}{23}{57}{1} \\
\hline  2 &   Reduce dit &    36.36  &  36.37 & \quart{22}{46}{45}{1} \\
  2 & Reduce lcom3 &    36.36  &  18.19 & \quart{45}{23}{45}{1} \\
  2 &   Reduce cbm &    36.36  &  18.19 & \quart{45}{23}{45}{1} \\
  2 &         RANK &    36.36  &  0.0 & \quart{45}{0}{45}{1} \\
\hline \end{tabular}}

Lucene

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    14.78  &  4.92 & \quart{57}{22}{66}{4} \\
  1 & Reduce lcom3 &    15.76  &  1.96 & \quart{68}{9}{71}{4} \\
  1 &   Reduce moa &    15.76  &  2.45 & \quart{66}{11}{71}{4} \\
  1 &   Reduce cbo &    16.26  &  1.97 & \quart{71}{8}{73}{4} \\
  1 &   Reduce npm &    16.26  &  2.46 & \quart{68}{11}{73}{4} \\
  1 &   Reduce loc &    16.75  &  2.46 & \quart{68}{11}{75}{4} \\
\hline \end{tabular}}

@timm
Copy link
Author

timm commented Mar 7, 2016

re harman's threshold technique

  • will need a bibtex reference
  • this is just using some study from 2002, right? so no tuning to local conditions?

@rahlk
Copy link
Contributor

rahlk commented Mar 7, 2016

re harman's threshold technique

  • will need a bibtex reference

There are 2 references.

@article{hermans15,
  title={Detecting and refactoring code smells in spreadsheet formulas},
  author={Hermans, Felienne and Pinzger, Martin and van Deursen, Arie},
  journal={Empirical Software Engineering},
  volume={20},
  number={2},
  pages={549--575},
  year={2015},
  publisher={Springer}
}
@inproceedings{Alves2010,
author = {Alves, Tiago L. and Ypma, Christiaan and Visser, Joost},
booktitle = {2010 IEEE Int. Conf. Softw. Maint.},
doi = {10.1109/ICSM.2010.5609747},
benchmark data - 2010.pdf:pdf},
isbn = {978-1-4244-8630-4},
issn = {10636773},
mendeley-groups = {OO Metric Thresholds},
month = {sep},
pages = {1--10},
publisher = {IEEE},
title = {{Deriving metric thresholds from benchmark data}},
url = {http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5609747},
year = {2010}
}
  • this is just using some study from 2002, right? so no tuning to local conditions?

They seem to use a benchmark data set to derive a set of common thresholds. Since, we don't have that, we can derive thresholds separately for every data set. The technique is straightforward.

@rahlk
Copy link
Contributor

rahlk commented Mar 7, 2016

Hermans thresholds

Summary

In addition to using VARL to identify thresholds as proposed by Shatnawi. We another alternative method proposed by Alves et al~\cite{alves10}. This method is unique in that respects the underlying statistical distribution and scale of the metrics. It works as follows.

Evey metric value is weighted according to the source lines of code (LOC) of the class. All the weighted metrics are then normalized i.e., they are divided by the sum of all weights of the same system. Following this, the normalized metric values are ordered in an ascending fashion. This is equivalent to computing a density function, in which the x-axis represents the weight ratio (0-100%), and the y-axis the metric scale.

Thresholds are then derived by choosing the percentage of the overall code that needs to be represented. For instance, Alves et al suggest the use 90% quantile of the overall code to derive the threshold for a specific metric. This threshold is meaningful since it can be used to identify 10% of the worst code with respect to a specific metric. And thresholds greater than 90% represent very-high risk.

Ant

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &         RANK &    63.25  &  24.1 & \quart{53}{26}{70}{1} \\
\hline  2 &   Reduce wmc &    22.29  &  6.63 & \quart{19}{7}{24}{1} \\
  2 & Reduce max_cc &    21.69  &  7.23 & \quart{18}{8}{24}{1} \\
  2 &   Reduce loc &    21.69  &  4.82 & \quart{20}{6}{24}{1} \\
  2 &  Reduce lcom &    21.69  &  4.82 & \quart{22}{6}{24}{1} \\
  2 &   Reduce cbo &    21.69  &  4.82 & \quart{21}{5}{24}{1} \\
  2 &    Reduce ic &    21.69  &  5.43 & \quart{20}{6}{24}{1} \\
  2 &   Reduce cbm &    21.08  &  5.43 & \quart{20}{6}{23}{1} \\
  2 &   Reduce dam &    21.08  &  6.02 & \quart{21}{7}{23}{1} \\
  2 &   Reduce npm &    21.08  &  5.43 & \quart{20}{6}{23}{1} \\
  2 &   Reduce rfc &    21.08  &  3.61 & \quart{21}{4}{23}{1} \\
  2 &   Reduce cam &    21.08  &  4.22 & \quart{20}{5}{23}{1} \\
  2 &   Reduce moa &    19.88  &  5.42 & \quart{20}{6}{22}{1} \\
  2 &    Reduce ce &    20.48  &  4.21 & \quart{21}{5}{22}{1} \\
  2 & Reduce avg_cc &    19.88  &  7.23 & \quart{19}{8}{22}{1} \\
\hline \end{tabular}}

Ivy

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &   Reduce noc &    30.0  &  15.0 & \quart{31}{20}{38}{1} \\
  1 &   Reduce amc &    30.0  &  12.5 & \quart{31}{16}{38}{1} \\
  1 &    Reduce ce &    30.0  &  12.5 & \quart{35}{16}{38}{1} \\
  1 &  Reduce lcom &    32.5  &  10.0 & \quart{35}{12}{41}{1} \\
  1 &   Reduce loc &    32.5  &  12.5 & \quart{35}{16}{41}{1} \\
  1 &   Reduce wmc &    32.5  &  17.5 & \quart{31}{23}{41}{1} \\
  1 &   Reduce cbo &    35.0  &  12.5 & \quart{35}{16}{44}{1} \\
  1 &   Reduce rfc &    35.0  &  12.5 & \quart{35}{16}{44}{1} \\
  1 &   Reduce npm &    35.0  &  7.5 & \quart{38}{9}{44}{1} \\
  1 &   Reduce cam &    35.0  &  15.0 & \quart{38}{19}{44}{1} \\
  1 & Reduce max_cc &    35.0  &  12.5 & \quart{35}{16}{44}{1} \\
  1 & Reduce avg_cc &    35.0  &  15.0 & \quart{35}{19}{44}{1} \\
  1 &   Reduce cbm &    40.0  &  17.5 & \quart{38}{22}{51}{1} \\
\hline  2 &         RANK &    52.5  &  20.0 & \quart{54}{25}{67}{1} \\
\hline \end{tabular}}

Jedit

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &   Reduce wmc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce dit &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce cbo &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce rfc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &  Reduce lcom &    36.36  &  36.36 & \quart{0}{79}{79}{2} \\
  1 &    Reduce ca &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &    Reduce ce &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce npm &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 & Reduce lcom3 &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce loc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce dam &    36.36  &  36.36 & \quart{0}{79}{79}{2} \\
  1 &   Reduce moa &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce cam &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &    Reduce ic &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce cbm &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &   Reduce amc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 & Reduce max_cc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 & Reduce avg_cc &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
  1 &         RANK &    36.36  &  0.0 & \quart{79}{0}{79}{2} \\
\hline \end{tabular}}

Lucene

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &  Reduce lcom &    14.78  &  2.46 & \quart{51}{9}{57}{3} \\
  1 &   Reduce dam &    14.78  &  1.97 & \quart{55}{7}{57}{3} \\
  1 &   Reduce npm &    15.27  &  2.96 & \quart{53}{11}{59}{3} \\
  1 &   Reduce cam &    15.27  &  2.46 & \quart{55}{9}{59}{3} \\
  1 &   Reduce rfc &    15.76  &  1.48 & \quart{57}{5}{60}{3} \\
  1 & Reduce lcom3 &    15.76  &  1.97 & \quart{57}{7}{60}{3} \\
  1 &   Reduce loc &    15.76  &  2.96 & \quart{53}{11}{60}{3} \\
\hline  2 &   Reduce cbo &    15.76  &  2.94 & \quart{55}{11}{60}{3} \\
  2 &   Reduce cbm &    15.76  &  2.45 & \quart{57}{9}{60}{3} \\
  2 &   Reduce wmc &    16.26  &  2.94 & \quart{55}{11}{62}{3} \\
  2 &    Reduce ce &    16.26  &  2.45 & \quart{57}{9}{62}{3} \\
  2 &   Reduce amc &    16.26  &  2.46 & \quart{55}{9}{62}{3} \\
  2 &   Reduce moa &    16.26  &  1.96 & \quart{59}{7}{62}{3} \\
  2 &         RANK &    16.75  &  7.88 & \quart{49}{30}{64}{3} \\
\hline \end{tabular}}

Poi

{\scriptsize \begin{tabular}{l@{~~~}l@{~~~}r@{~~~}r@{~~~}c}
\arrayrulecolor{lightgray}
\textbf{Rank} & \textbf{Treatment} & \textbf{Median} & \textbf{IQR} & \\\hline
  1 &  Reduce lcom &    9.61  &  2.86 & \quart{25}{9}{28}{2} \\
  1 &   Reduce npm &    9.61  &  4.27 & \quart{23}{13}{28}{2} \\
  1 & Reduce lcom3 &    9.61  &  2.15 & \quart{25}{7}{28}{2} \\
  1 &    Reduce ic &    9.96  &  3.2 & \quart{26}{10}{29}{2} \\
  1 &   Reduce amc &    9.96  &  1.78 & \quart{26}{6}{29}{2} \\
  1 &    Reduce ce &    9.96  &  2.86 & \quart{25}{9}{29}{2} \\
  1 &   Reduce rfc &    10.32  &  2.84 & \quart{26}{9}{30}{2} \\
  1 &   Reduce moa &    10.32  &  2.13 & \quart{26}{7}{30}{2} \\
  1 &   Reduce mfa &    10.32  &  3.21 & \quart{25}{10}{30}{2} \\
  1 &   Reduce wmc &    10.32  &  2.13 & \quart{28}{7}{30}{2} \\
\hline  2 &   Reduce dit &    10.68  &  2.13 & \quart{28}{7}{32}{2} \\
  2 &   Reduce cam &    10.68  &  3.21 & \quart{25}{10}{32}{2} \\
  2 & Reduce max_cc &    10.32  &  3.2 & \quart{26}{10}{30}{2} \\
  2 &   Reduce loc &    11.03  &  3.2 & \quart{28}{10}{33}{2} \\
  2 &   Reduce cbm &    11.39  &  2.14 & \quart{29}{7}{34}{2} \\
\hline  3 &         RANK &    20.64  &  8.9 & \quart{53}{26}{61}{2} \\
\hline \end{tabular}}

This was referenced Mar 8, 2016
@rahlk
Copy link
Contributor

rahlk commented Mar 8, 2016

Summary Shatnawi10

In our work, we have coded fault-free classes as zero, and faulty classes as one. We could leverage this binary nature to apply a Univariate Binary Logistic Regression (UBR) to identify metrics that have a significant association with the occurrence of defects. To set a cut-off for this association, we use a confidence interval of 95%.

To identify thresholds for the metrics that we significant, we use a method called Value of Acceptable Risk Level (VARL) first proposed by Bender~\cite{bender99} in identifying thresholds in epidemiology studies. In his TSE 2010 article, Shatnawi~\cite{shatnawi10} endorsed the use of this method in identifying thresholds in object-oriented metrics for open source software systems.

The VARL method measures cut-off values in metrics such that, below that threshold, the probability of occurrence of defect is less than a probability $p_0$. To do this, we fit a Univariate Binary Logistic Regression (UBR) to the metrics. For every significant metric, this generates a general logistic regression model with a constant intercept ($\alpha$) and a coefficient for maximizing log-likelihood function ($\beta$). With these, the VARL is measure as follows:

\begin{equation}
VARL = \frac{1}{\beta }\left( {\log \left( {\frac{{{p_0}}}{{1 - {p_0}}}} \right) - \alpha } \right)
\end{equation}

Summary Alves10

In addition to using VARL to identify thresholds as proposed by Shatnawi. We another alternative method proposed by Alves et al~\cite{alves10}. This method is unique in that respects the underlying statistical distribution and scale of the metrics. It works as follows.

Evey metric value is weighted according to the source lines of code (LOC) of the class. All the weighted metrics are then normalized i.e., they are divided by the sum of all weights of the same system. Following this, the normalized metric values are ordered in an ascending fashion. This is equivalent to computing a density function, in which the x-axis represents the weight ratio (0-100%), and the y-axis the metric scale.

Thresholds are then derived by choosing the percentage of the overall code that needs to be represented. For instance, Alves et al suggest the use 90% quantile of the overall code to derive the threshold for a specific metric. This threshold is meaningful since it can be used to identify 10% of the worst code with respect to a specific metric. And thresholds greater than 90% represent a very high risk.

Deprecated Method

One of the first methods of finding thresholds was proposed Erni and Lewerentz~\cite{erni96}. Their technique to identify thresholds was based on the data distribution, specifically the mean and the standard deviation of the metric values. They propose the use of values that lie beyond one standard deviation from the mean as a threshold. The minimum value $T_{min}$ is given by $T_{min}=\mu-\sigma$, and this is used when metric definition considers very small values as an indicator of problems. Otherwise, $T_{max}=\mu+\sigma$ is used, when large metric values are considered problematic.

Several researchers~\cite{shatnawi10}~cite{alves10} have pointed out that this method is subject to a few problems. Firstly, it doesn't consider the fault-proneness of classes when the thresholds are computed. Secondly, there is a lack of empirical validation of this methodology, which impedes reasonable comparisons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants