New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for minimization of amplified tests #54

Open
monperrus opened this Issue Jan 31, 2017 · 18 comments

Comments

Projects
None yet
3 participants
@monperrus
Copy link
Member

monperrus commented Jan 31, 2017

Motivation: During amplification, there is some neutral test evolution happening. This results in very long and unreadable tests. However, many changes in the amplified test are not required. The goal is minimization is to reduce the size and increase the readability of amplified test cases.

What: Implement a minimization algorithm (such as delta-debugging) to remove useless statements in amplified test cases.

Hints: For instance, useless statement are local variable that are set and never modified such as Object myObject = null; The local variable should be in-lined in this case. For tests that expect an exception, every statement after the exception the one that throws it can be removed.

@monperrus

This comment has been minimized.

Copy link
Member Author

monperrus commented Dec 8, 2017

initial attempt in #154

@sbihel

This comment has been minimized.

Copy link
Contributor

sbihel commented Feb 23, 2018

I think it would make sense to remove all amplifications that have no impact on the increase of mutation score.

Simple instrumentation could be used to detect useless generated assertions.

As for input amplification, I think we have to define a limit:

  • an added input can be removed;
  • a modified input can be reverted to its original state (i.e. minimising the amplification/noise).

Because if we apply general unit test or even source code minimisation it might be harder for the developer to identify the original test? And they can apply general-purpose minimisation on their own anyway.

@monperrus

This comment has been minimized.

Copy link
Member Author

monperrus commented Feb 23, 2018

@danglotb

This comment has been minimized.

Copy link
Member

danglotb commented Feb 23, 2018

I think it would make sense to remove all amplifications that have no impact on the increase of mutation score.

Yes, that the idea.

See also the idea of "delta debugging" to minimize.

The major con with this approach, is the time consumption. In fact, it will take "a lot" of execution of PIT, and so a lot of time.

Simple instrumentation could be used to detect useless generated assertions.

What do you suggest?

In addition to this, we introduce comments in amplified tests and I think they create a lot of noise. Maybe we could first, remove them, when we aim at presenting amplified test to developers.

Would you think that this minimization should be automatically done, and enabled by default, or we should provide it, as an "external service tool" of DSpot?

@monperrus

This comment has been minimized.

Copy link
Member Author

monperrus commented Feb 24, 2018

@sbihel

This comment has been minimized.

Copy link
Contributor

sbihel commented Feb 25, 2018

Simple instrumentation could be used to detect useless generated assertions.

What do you suggest?

I was thinking of adding a call to a counter after each added assertion. The test would be executed on the new detected mutants and if an assertion never lowers the counter then that means that it never fails, thus is useless.


In addition to this, we introduce comments in amplified tests and I think they create a lot of noise. Maybe we could first, remove them, when we aim at presenting amplified test to developers.

If comments were removed, we (DSpot or the developer) would have to rely on a diff to identify the amplifications, right? Would that be a good solution? By that I mean, does the pretty printer of Spoon generate a source code with the same style as the test given as an input?


Would you think that this minimization should be automatically done, and enabled by default,

yes, I think so, in order to maximize the prettiness of the generated tests, so that people like
them, also by their look'n'feel. (In Dspot, we generate tests for humans, not for machines)

It would also be easier to interact with the main amplification process. To have a more powerful interface.

@danglotb

This comment has been minimized.

Copy link
Member

danglotb commented Feb 26, 2018

I was thinking of adding a call to a counter after each added assertion. The test would be executed on the new detected mutants and if an assertion never lowers the counter then that means that it never fails, thus is useless.

The problem is, that we execute the mutation analysis through maven goals. So, it is a new JVM, we will need serialization to obtain infos about the runs and it is kinda of tricky, right?

By that I mean, does the pretty printer of Spoon generate a source code with the same style as the test given as an input?

I think you can rely on the print of Spoon.

It would also be easier to interact with the main amplification process. To have a more powerful interface.

We need to minimize only test that have been selected.

In one hand, if there is a selection it means that the minimization is tight to the selection, right?

In the other hand, some minimization can be done regardless any test criterion such as the in-lining of local variable.

I set up some classes and a test about that: #338. I'm gonna at least this general minimization, using static analysis of the program.

WDYT?

@sbihel

This comment has been minimized.

Copy link
Contributor

sbihel commented Feb 27, 2018

The problem is, that we execute the mutation analysis through maven goals. So, it is a new JVM, we will need serialization to obtain infos about the runs and it is kinda of tricky, right?

What if each test wrote a report in a file?


In the other hand, some minimization can be done regardless any test criterion such as the in-lining of local variable.

Yes but what I don't really understand is that it will modify the original test. What if the author of the test though it was clearer to use a variable?

@danglotb

This comment has been minimized.

Copy link
Member

danglotb commented Feb 27, 2018

What if each test wrote a report in a file?

It will be the same than serialization / deserialization. I have some issues here.

During the mutation analysis:
In the case an assertion never fail, (I am not sure it happens but w/e), we can remove it.
In the case an assertion fails, there are two cases:

  1. it detects an already detected mutants by the original test suite.
  2. it detects a new mutant.

In addition to this, we have another dimensions: What we do with the amplified test?

  1. Does the amplified test is an improved version of an existing test, and in this case, the 1. should be kept, since the amplified test is meant to replace the original test.
  2. The amplified test has a new semantic, derived from a existing test, in this case the 1. should be removed, since we will keep the original test and the new test.

I'll think about it.

Yes but what I don't really understand is that it will modify the original test. What if the author of the test though it was clearer to use a variable?

You made a point here. Maybe we should only minimize what DSpot added. We may rely on name convention of local variables, DSpot names them something like __DSPOT_XX. We may also only in-line local variable initialized with literals.

In any case, we won't be able to satisfy everybody, and need to make choices.

@sbihel

This comment has been minimized.

Copy link
Contributor

sbihel commented Feb 27, 2018

In addition to this, we have another dimensions: What we do with the amplified test?

  1. Does the amplified test is an improved version of an existing test, and in this case, the 1. should be kept, since the amplified test is meant to replace the original test.
  2. The amplified test has a new semantic, derived from a existing test, in this case the 1. should be removed, since we will keep the original test and the new test.

I agree. In the second case, would we still want new mutants to be located in the same method?

@sbihel

This comment has been minimized.

Copy link
Contributor

sbihel commented Feb 28, 2018

It will be the same than serialization / deserialization. I have some issues here.

Would INRIA/spoon#1874 be useful?

@danglotb

This comment has been minimized.

Copy link
Member

danglotb commented Mar 7, 2018

Hi @sbihel

Would you mind to have look to #354

I propose a minimizer for the ChangeDetectorSelector.

The ChangeDetectorSelector runs amplified test against a "modifier" version of the same program and will keep only amplified test that fail.

The goal is to have amplified tests that encode a change, e.g a new feature or a regression bug.

My idea is to perform a delta-diff on assertions, i.e. remove one by one assertions and see if the amplified test still fail.

WDYT?

@sbihel

This comment has been minimized.

Copy link
Contributor

sbihel commented Mar 8, 2018

Hi @danglotb,

Wouldn't we need a list of input programs to have all mutants detected by the test case?

Thanks for your efforts 👍

@danglotb

This comment has been minimized.

Copy link
Member

danglotb commented Mar 9, 2018

As I said, some minimization are related to the test criterion used.

For instance, if I use the mutation score as a test criterion, the minimization must keep the mutation score obtained after the amplification.

Here, I am talking about another test criterion: encode a behavioral changes.

The point is, with this selector, that we obtain amplified tests that pass on a given version, and fail on the other one. Such amplified tests, encode the, desired or undesired, behavioral changes.

In one hand, when I say desired, it means that maybe, the developer want that the behavior of the program changes, i.e. it creates a new feature or fix something.
In the other hand, when I say undesired, it might be a regression bug. Something that was working before, but does not anymore on the changed version. It means that amplification are able to capture something that was not captured before.

In both case, we win, because we can enhance the test suite.

Back to the minimization of a such test criterion, Do you think that we should only keep assertions that make the amplified test fails? If yes, does the failure should be the same?

@sbihel

This comment has been minimized.

Copy link
Contributor

sbihel commented Mar 12, 2018

If a behavioural change is detected, that means we keep both versions in the test suite. And thus we can apply general minimisation on the amplified version, using the improved criterion for the combined tests.

I was thinking that a generated assertion could be a duplicate of an existing one. In that case the new assertion would falsely be useful. But if we focus on amplified assertions, with the delta-diff we would detect them.

And I think we should only keep amplified assertions that make the test fail because it enforces clarity on the generated test. If we wanted to keep the exact same failures as before, would it not greatly reduce the range of acceptable amplifications?

@monperrus

This comment has been minimized.

Copy link
Member Author

monperrus commented Jun 25, 2018

there are two kinds of minimization

  • input minimization (removing method calls, etc)
  • assertion minimization (removing useless assertions or assertions redundant with others)
@monperrus

This comment has been minimized.

Copy link
Member Author

monperrus commented Jul 10, 2018

See also: Fine-grained test minimization. | Arash Vahabzadeh, Andrea Stocco, Ali Mesbah
0001 | ICSE | 2018
URL: https://dblp.org/rec/conf/icse/VahabzadehS018

@monperrus

This comment has been minimized.

Copy link
Member Author

monperrus commented Nov 16, 2018

RW:

  • An empirical study of the effects of minimization on the fault detection capabilities of test suites
  • Test set size minimization and fault detection effectiveness: A case study in a space application
  • On the effect of test-suite reduction on automatically generated model-based tests
  • Regression testing minimization, selection and prioritization: a survey
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment