Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions and analyze #53

Open
sergey-karazhenets opened this issue Nov 5, 2017 · 5 comments
Open

Questions and analyze #53

sergey-karazhenets opened this issue Nov 5, 2017 · 5 comments

Comments

@sergey-karazhenets
Copy link
Contributor

I have analyzed some implemented metrics and also think about the following questions during implementation metrics where we counting methods of a class and types of method parameters or return types. Next questions relate to all metrics and I think it should be clarified for all who want to implement some new metric. To be easy it to understand what I want to say let's look to CAMC metric implementation. We can take any metric, but CAMC has easy description and formula.

See this one paper Class Cohesion Metrics for Software Engineering: A Critical Review, PDF, Table 2. Interface-based metrics. CAMC metric is described as:
camc

1. Letter l in formula is a number of distinct parameter types across all methods in a class. For example, if class has the next methods

public int length(String str) {
	return s.length();
}
public int bytes(String str) {
	return str.getBytes().length;
}

then l should be equals to 1 because two methods have the same parameter type String. But for committed to repository implementation l will be equals to 2. It happens because our implementation counts not only parameter types of method but also it counts return types of method. For code above it counts int and String. I think our implementation contradicts to definition in the paper, we should count only parameter types of method.

One more evidence for that: if we should count both parameter types of method and return types then it clearly is wrote in the paper. See description for MMAC metric. Paper clearly tells us that i is can be a parameter type of method or return type. In the description of CAMC metric l is defined only as count of parameter types.

So, am I right that we should count only parameter types of method for l?


2. For example class has the next method:

public int plus(int a, Integer b) {
	return a + b;
}

Let see the point 1 above, for CAMC metric we should calculate value l as a number of distinct parameter types for method. Our implmentation counts l as 2 because method has two distinct types: int and java.lang.Integer. Also we can see it in a bytecode of method:

public plus(ILjava/lang/Integer;)I
 L0
  LINENUMBER 13 L0
  // ... and other instructions  

For compiler int and java.lang.Integer it is two different types, but for human it is one type for integer values.

So, should we consider simple types and theirs analogous boxed types as the same types or consider them as different types (like compiler does)?


3. Letter k in CAMC formula is a number of all methods in a class.

So, is it correct that we consider constructors as methods?

Should we consider only public methods or consider all methods (public, private, protected, package-private)? I think we should consider all methods, but what do you think?

If class inherits from another class then should we consider methods from parent class? I think we shouldn't, but what do you think?


4. Also interesting bug in counting of all methods in a class when class overrides one or more generic methods from parent. Let's see the next code:

public final class OverrideGenericMethodFromInterface 
	implements GenericIncrementor1<Integer> {

  private final int num;

  public OverrideGenericMethodFromInterface(int num) {
    this.num = num;
  }

  @Override
  public Integer inc() {
    return num + 1;
  }
}

interface GenericIncrementor1<T> {
  T inc();
}

We have generic interface GenericIncrementor1 and its implementation presented by class OverrideGenericMethodFromInterface. Visually we see only one method in OverrideGenericMethodFromInterface class, it is public Integer inc() { }. And when we count methods for some metric we expect that Javassist or ASM returns us only one method. But it doesn't. It returns two methods. Why does it happen? Let's see to the bytecode:

// class version 52.0 (52)
// access flags 0x31
// signature Ljava/lang/Object;LGenericIncrementor1<Ljava/lang/Integer;>;
// declaration: OverrideGenericMethodFromInterface implements GenericIncrementor1<java.lang.Integer>
public final class OverrideGenericMethodFromInterface implements GenericIncrementor1  {

  // compiled from: OverrideGenericMethodFromInterface.java

  // access flags 0x12
  private final I num

  // access flags 0x1
  public <init>(I)V
   L0
    LINENUMBER 5 L0
    ALOAD 0
    INVOKESPECIAL java/lang/Object.<init> ()V
   L1
    LINENUMBER 6 L1
    ALOAD 0
    ILOAD 1
    PUTFIELD OverrideGenericMethodFromInterface.num : I
   L2
    LINENUMBER 7 L2
    RETURN
    MAXSTACK = 2
    MAXLOCALS = 2

  // access flags 0x1
  public inc()Ljava/lang/Integer;
   L0
    LINENUMBER 11 L0
    ALOAD 0
    GETFIELD OverrideGenericMethodFromInterface.num : I
    ICONST_1
    IADD
    INVOKESTATIC java/lang/Integer.valueOf (I)Ljava/lang/Integer;
    ARETURN
    MAXSTACK = 2
    MAXLOCALS = 1

  // access flags 0x1041
  public synthetic bridge inc()Ljava/lang/Object;
   L0
    LINENUMBER 1 L0
    ALOAD 0
    INVOKEVIRTUAL OverrideGenericMethodFromInterface.inc ()Ljava/lang/Integer;
    ARETURN
    MAXSTACK = 1
    MAXLOCALS = 1
}

In the bytecode two methods is presented: public inc()Ljava/lang/Integer; and public synthetic bridge inc()Ljava/lang/Object;. And according that Javassist or ASM works properly when returning two methods, because bytecode contains two methods. Compiler always generates one more method on each implementation of generic method in a class. And it impacts all our metrics. To avoid incorrect counting of all methods in a classs we should exclude synthetic methods.

Let's discuss all points/questions and I will fix them if it will be needed.

@0crat
Copy link
Collaborator

0crat commented Nov 5, 2017

@yegor256 please, pay attention to this issue

sergey-karazhenets pushed a commit to sergey-karazhenets/jpeek that referenced this issue Nov 5, 2017
@sergey-karazhenets
Copy link
Contributor Author

I've created pull request #54 for show a bug with generic methods. Pull request relates only for CAMC metric, but if we are going to fix it I will add test for all metrics. Let's see what was happened in tests for class OverrideGenericMethodFromInterface. Accoding to our current implemented logic we expect in test that CAMC will be calculated as:

l = 2 (two distinct types: int, java.lang.Integer)
k = 2 (methods: construnctor, inc())
a = 1 (type: int) + 1 (type: java.lang.Integer) = 2

CAMC = a / (l * k) = 2 / (2 * 2) = 2 / 4 = 0.5000

But result was 0.3333. It was happend because we counted synthetic methods. See point 4 with a bytecode that copiler produces. According to that bytecode we calculate:

l = 3 (three distinct types: int, java.lang.Integer, java.lang.Object)
k = 3 (methods: construnctor, inc(), synthetic inc())
a = 1 (type: int) + 1 (type: java.lang.Integer) + 1 (type: java.lang.Object) = 3

CAMC = a / (l * k) = 3 / (3 * 3) = 3 / 9 = 0.3333

I will fix that if we will decide that is wrong behaviour or you can just close pull request without merging and without any fixes.

@yegor256
Copy link
Member

yegor256 commented Nov 5, 2017

@sergey-karazhenets for CAMC:

  1. So, am I right that we should count only parameter types of method for l?

Yes, totally agree.

  1. So, should we consider simple types and theirs analogous boxed types as the same types or consider them as different types (like compiler does)?

I think we should treat Integer and int as one type. I think that we should introduce a new class which will do exactly that -- fetch the list of types from a method signature. And it has to explicitly say in its Javadoc how do we understand the differences between types. This should not be a decision for CAMC metric only, but for the entire product. Make sense?

  1. So, is it correct that we consider constructors as methods?

I think we should not consider constructors as methods. Let's try to find some papers that confirm that. Again, let's create a class that will fetch all methods from a class. Let's use it everywhere in all metrics, to make this decision universal for the entire product.

  1. Should we consider only public methods or consider all methods (public, private, protected, package-private)? I think we should consider all methods, but what do you think?

I would say we pay attention only to public methods.

I will fix that if we will decide that is wrong behaviour or you can just close pull request without merging and without any fixes.

I agree with the fix introduced.

@yegor256
Copy link
Member

yegor256 commented Nov 5, 2017

@sergey-karazhenets once again, let's not forget that there are many metrics. Let's try to make all changes apply to all metrics. Try to introduce new classes, instead of making changes to specific metrics.

@yegor256
Copy link
Member

@sergey-karazhenets can you please split this ticket to a few separate tickets? Now it's difficult to track the progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants