Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPD finds duplications in test fixtures despite source limitation #50

Open
C-Otto opened this issue Jul 10, 2020 · 15 comments
Open

CPD finds duplications in test fixtures despite source limitation #50

C-Otto opened this issue Jul 10, 2020 · 15 comments

Comments

@C-Otto
Copy link
Contributor

C-Otto commented Jul 10, 2020

I configured CPD for my multi-project Gradle 6.5.1 project. Some of these sub-projects make use of the java-test-fixtures plugin, which allows me to put Java files into src/testFixtures/java. Despite limiting the source as shown below, I get a CPD issue for code inside src/testFixtures/java (both files which are part of the issue are inside the same directory).

With --debug I see the tokenize debug message for the files in source/main/java, followed by those in source/testFixtures/java:

2020-07-10T10:29:58.078+0100 [DEBUG] [de.aaschmid.gradle.plugins.cpd.internal.worker.CpdExecutor] Tokenize /home/cotto/IdeaProjects/xxx/yyy/src/main/java/com/aaa/Logs.java
2020-07-10T10:29:58.078+0100 [DEBUG] [de.aaschmid.gradle.plugins.cpd.internal.worker.CpdExecutor] Tokenize /home/cotto/IdeaProjects/xxx/yyy/src/testFixtures/java/com/bbb/DatabaseLockingTestJob.java

However, when I debug the source property in my configuration, everything is fine (i.e. I only see src/main/java files):

files(source).each { File file ->
    println(file)
}

This happens with:

  • source = files('src/main/java')
  • source = sourceSets.main.java
  • source = sourceSets.main.allJava
plugins {
    id 'de.aaschmid.cpd' version '3.1' apply false
}

configure(subprojects.findAll { it.name != 'xxx-platform' }) {
    apply plugin: 'java'
    apply plugin: 'de.aaschmid.cpd'

    check.dependsOn(cpdCheck)

    cpdCheck {
        source = files('src/main/java') // do not run for test code
    }
}
@aaschmid
Copy link
Owner

Hm, that sounds indeed very strange @C-Otto. Do you have a small reproducer? Is it urgent (just for my prioritization of things)?

@C-Otto
Copy link
Contributor Author

C-Otto commented Jul 12, 2020

We won't be able to use CPD once we switched to Gradle, but that won't happen for a few weeks or so. Besides, we can live without CPD, although it's nice to have. So... not that urgent. I'll have a look at a reproducible case tomorrow.

@C-Otto
Copy link
Contributor Author

C-Otto commented Jul 14, 2020

https://github.com/C-Otto/cpdtestfixtures

If CPD is configured before the integration tests are introduced, for reasons I don't understand CPD also scans the integration test sources:
C-Otto/cpdtestfixtures@fbf644b

If you enable the java-test-fixtures plugin in aaa/build.gradle CPD starts complaining about duplications in the test fixtures code:
C-Otto/cpdtestfixtures@52db52e

This might be related to gradle/gradle#13781 and the linked issues.

@aaschmid
Copy link
Owner

Sounds like a lazy initialization stuff. Thanks for reproducer, I will have a look at it.

aaschmid added a commit that referenced this issue Jul 18, 2020
Signed-off-by: Andreas Schmid <service@aaschmid.de>
@aaschmid
Copy link
Owner

aaschmid commented Jul 18, 2020

Very, very strange ... but finally could find the problem: It appears if and only if the cpdChecks source is set before further sourceSets are lazily configured by other applied plugins...

Here is the full story:

  1. Could reproduce it using your reproducer
  2. Wrote an acceptance test, see https://github.com/aaschmid/gradle-cpd-plugin/blob/%2350.text.fixture/src/integTest/groovy/de/aaschmid/gradle/plugins/cpd/test/CpdAcceptanceTest.groovy#L662-L705. Unfortunately, it was working :-( and playing around hadn't helped me understand it
  3. Added some printlns to your reproducer for the subproject:
    println(files(sourceSets.main.allJava).files)
    println(files(sourceSets.test.allJava).files)
    println(files(sourceSets.testFixtures.allJava).files)
    println("----------------")
    println(cpdCheck.source.files)
    
    They produced the "incorrect" result:
    []
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/test/java/BadTest.java]
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
    ----------------
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
    
  4. I restructured the project by moving the application of java-test-fixture to rootProjects build.gradle => suprisingly, now it was working:
    []
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/test/java/BadTest.java]
    [/home/schmida/tmp/cpdtestfixtures/aaa/src/testFixtures/java/BadTestFixture.java]
    ----------------
    []
    
  5. Another possibility to fix this is to apply and configure the cpd plugin in the aaas (= sub project) build.gradle => the problem arises if the java-test-fixture plugin is not applied and configured together with the cpd plugin
  6. So this is indeed a problem with the lazy source initialization of a cpdCheck task :-( (see Code)
  7. => The described problem appears if and only if the cpdChecks source is set before further sourceSets configuring plugins are applied because Cpd tasks lazy source initialization still listens to further applied plugins and their sourceSets.

Example: if cpdCheck is configured on in rootProjects build.gradle for subprojects and later another subprojects closure applies java-test-fixture or java plugin, the cpdChecks source is enhanced by their sourceSets.allJava files. subprojects build.gradle files are also applied separately and later...

Good, that I have found the root cause of this issue. Bad, that I currently have only a quite ugly (gut feeling) hack to solve this. Concrete I check if the source was explicitly set and prevent further changes.

@C-Otto does that information help you already such that you can fix the issue by restructuring your build?
To be honest, I at least think about documenting this behavior and leave it as is.

Another question to you @C-Otto: Also I prefer to apply the cpd' plugin on the rootProject` and configure inclusions / exclusions there. Would this be an alternative or would you like to have duplicates checked only project by project? SonarQube e.g. does check for duplicates also over the whole code base if I remember correctly.

@aaschmid aaschmid self-assigned this Jul 18, 2020
@aaschmid aaschmid added the bug label Jul 18, 2020
@C-Otto
Copy link
Contributor Author

C-Otto commented Jul 19, 2020

It looks like your IT adds a file with duplications to src/main/java? That doesn't look right to me.

Anyway, for me the hotfix/hack using evaluationDependsOnChildren() also helped here (see linked Gradle issue).
I'm now able to configure CPD inside something like a subprojects block in the root project, and configure the sources correctly as each subproject's plugins were added before this root configuration happens (the subprojects' configuration tasks run before, so the "new" sourcesets are already known when I configure CPD in the root project's configuration task).
Of course I'd appreciate a long-term (non-hacked) solution :) Thanks for the effort!

Running CPD on the whole project would be possible. This doesn't match the existing Maven configuration I'm using as a base, but I think that might be a valid step. However, the root project does not have any source, and I don't know how to configure CPD. If you have any ideas, I'd be happy to experiment further

How to run CPD as part of ./gradlew check? How to add CPD to a subject's check? How to define the sources? I just got this far:

WARNING: Due to the absence of 'LifecycleBasePlugin' on root project 'xxx-parent' the task ':cpdCheck' could not be added to task graph. Therefore CPD will not be executed. To prevent this, manually add a task dependency of ':cpdCheck' to a 'check' task of a subproject.
1) Directly to project ':xxx-subproject':
    check.dependsOn(':cpdCheck')
2) Indirectly, e.g. via root project 'xxx-parent':
    project(':xxx-subproject') {
        plugins.withType(LifecycleBasePlugin) { // <- just required if 'java' plugin is applied within subproject
            check.dependsOn(cpdCheck)
        }
    }

@aaschmid
Copy link
Owner

It looks like your IT adds a file with duplications to src/main/java? That doesn't look right to me.
Yes this was for my test that neither test nor testFixtures are included into source and therefore check for duplicates.

Anyway, for me the hotfix/hack using evaluationDependsOnChildren() also helped here (see linked Gradle issue).
evaluationDependsOnChildren seems like a fix, right ;-)

How to run CPD as part of ./gradlew check? How to add CPD to a subject's check? How to define the sources?
That is a know limitation if the rootProject has no lifecycle plugin applied (normally implicit via other plugin). The solution is document on https://github.com/aaschmid/gradle-cpd-plugin#multi-module-project.

Depending on your needs, now java sourcesets of all projects will be automatically checked for duplicates (see Code. If you want to override this behaviour, you have a lot of options but note that the lazy evaluation of plugins applied to later configured subprojects adds further sourceSets again unless you use evaluationDependsOnChildren:

cpdCheck {
    source = subprojects*.sourceSets*.main*.java*.srcDirs
}

or

cpdCheck {
    source = subprojects*.sourceSets*.main*.java*.srcDirs + subprojects*.sourceSets*.test*.java*.srcDirs
}

or

cpdCheck {
    source = []
    subprojects.forEach { project -> it.source(project.sourceSets.main.java) } // only if all subprojects have corresponding `sourceSet`
}

or

cpdCheck {
    source = []
    allprojects.forEach { project ->
        project.plugins.withType(JavaBasePlugin) { plugin ->
            project.sourceSets.main.java.forEach { s -> rootProject.cpdCheck.source(s) }
        }
    }
}

Maybe I should document these as well ;-)

Currently best idea for a proper solutions to this: I could provide an additional property with an allowlist or denylist for sourceSet names, e.g. allowedSourceSets = [ "main", "test" ] or deniedSourceSets = [ "testFixture"] which should (not) automatically be added - defaults to all as currently.

@aaschmid aaschmid added this to the next version milestone Jul 19, 2020
@C-Otto
Copy link
Contributor Author

C-Otto commented Jul 19, 2020

Thanks for the snippets, I'll try them out tomorrow. I agree, a bit more documentation wouldn't hurt, especially related to the warning I posted earlier (I've seen it several times when I tried to get the plugin to work with subprojects).

The additional options also seem useful!

@C-Otto
Copy link
Contributor Author

C-Otto commented Jul 20, 2020

I'm unable to create a configuration that combines the scan for all subprojects, as described in your response above. I'm using this snippet for all variants described below:

evaluationDependsOnChildren()
apply plugin: 'java' // required so that the check task exists?
apply from: 'etc/cpd.gradle'

With this I get a single cpdCheck task, but it finds duplicates in subproject/src/integrationTest:

source = subprojects*.sourceSets*.main*.java*.srcDirs

With this I get the same (duplication in integrationTest):

source = []
subprojects.forEach { project -> it.source(project.sourceSets.main.java) }

Finally, I get

* What went wrong:
A problem occurred evaluating root project 'xxx-parent'.
> Failed to apply plugin class 'org.gradle.api.plugins.JavaBasePlugin'.
   > Could not get unknown property 'main' for SourceSet container of type org.gradle.api.internal.tasks.DefaultSourceSetContainer.

for

source = []
allprojects.forEach { project ->
    project.plugins.withType(JavaBasePlugin) { plugin ->
        project.sourceSets.main.java.forEach { s -> rootProject.cpdCheck.source(s) }
    }
}

@aaschmid
Copy link
Owner

aaschmid commented Jul 27, 2020

  1. the java plugin is not required for the CPD plugin to work but it prints a warning if you haven't manually add it to one of the childrens check task if you have no JavaBasePlugin on the root project, see e.g. https://github.com/TNG/junit-dataprovider/blob/master/build.gradle.kts#L357

  2. Hm ... I think the too many sourcesets are still caused by the lazy evalutation logic even if you are using the evalationDependsOnChildren(). Would you mind to adjust your reproducer above? I have not that much time at the moment but would try to fix it if it is urgent to you...

  3. Your exception is also odd as the plugins.withType (see here) should not cause errors if no plugin matches. JavaBasePlugin (see here) should have sourceSets already but maybe they changed that java plugin creates the main sourcesets, see here. And you apply java plugin ...

@C-Otto
Copy link
Contributor Author

C-Otto commented Jul 28, 2020

Sorry, I'm confused about the current state. Let's ignore the "global scan", i.e. using CPD to find duplicates between subprojects.

My understanding:
This is a bug in your code, where my explicit configuration is overwritten due to lazy initalisation. The configuration order plays a crucial role here, which is why evaluationDependsOnChildren() works around this issue. Despite my assumptions, the java-test-fixtures plugin does not do anything that contributes to this bug (especially this bug: gradle/gradle#13781).

Is this correct?

@aaschmid
Copy link
Owner

@C-Otto: sorry for the inactivity on this issue but my spare is quite limited at the moment and I honestly higher prioritize other things.

I deeply need to think about the lazy initialization here but still not sure if this is really a bug...

@C-Otto
Copy link
Contributor Author

C-Otto commented Sep 21, 2020

As long as CPD does not stick to the configuration, I'd like to think it's a bug. This could be a bug in the code that reads and applies the configuration, or something in the documentation. Either way, it does not work, and I have no way of making it work.

@aaschmid
Copy link
Owner

I will try to figure out a solution as soon as possible using your reproducer from above. However, you mentioned integrationTest source sets above. Is this use case already part of the reproducer or could you kindly add it?

@C-Otto
Copy link
Contributor Author

C-Otto commented Sep 21, 2020

I believe the issue is related to the existence of several source sets, and the name/purpose of these is not relevant. The reproducer contains two commits, where the initial commit exposes the bug with integration tests.

@aaschmid aaschmid modified the milestones: v3.2, next version Mar 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants