Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random PR test comparison failures in DD4Hep workflows #33552

Closed
makortel opened this issue Apr 27, 2021 · 17 comments
Closed

Random PR test comparison failures in DD4Hep workflows #33552

makortel opened this issue Apr 27, 2021 · 17 comments

Comments

@makortel
Copy link
Contributor

We are seeing infrequent comparison failures in DD4Hep workflows in tests of unrelated PRs. The purpose of this issue is to track their frequency and progress to find out the cause.

@makortel
Copy link
Contributor Author

assign geometry

@cmsbuild
Copy link
Contributor

New categories assigned: geometry

@Dr15Jones,@cvuosalo,@civanch,@ianna,@mdhildreth,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

One occurrence is here #33526 (comment)

@makortel
Copy link
Contributor Author

Another one #33534 (comment)

@makortel
Copy link
Contributor Author

Here is another #33549 (comment)

@slava77
Copy link
Contributor

slava77 commented Apr 28, 2021

is there a way to track which machine was used in each case?
This is in case there is some architecture dependence (me being reminded of existing differences in reco/miniAOD for e.g. tensorflow's mkl dependency)

@VinInn
Copy link
Contributor

VinInn commented Apr 28, 2021

Did anybody run valgrind on those DD4HEP wf?

@mrodozov
Copy link
Contributor

I asked last week on two PRs for externals the same thing
cms-sw/cmsdist#6822 (comment)
cms-sw/cmsdist#6787 (comment)

@makortel
Copy link
Contributor Author

Here is another #33550 (comment)

@cvuosalo
Copy link
Contributor

This workflow was stable for the last few weeks, but now it is unstable again. I think some recent change either re-activated an old bug or introduced a new stability problem. I will submit a PR to remove it from the PR tests for now.

@cvuosalo
Copy link
Contributor

Looking through the recent history of merged PRs, I think the recent ROOT update might be a suspect for triggering the instability:

cms-sw/cmsdist#6835 ROOT change (April 23)
cms-sw/cmsdist#6826 DD4hep change (April 21)

That said, the instability seems to be around the 10% level. Many PR tests pass successfully with this workflow.

@cvuosalo
Copy link
Contributor

@VinInn > Did anybody run valgrind on those DD4HEP wf?

Yes, I ran it back on March 5: #32963 (comment)
There have been changes to DD4hep and ROOT since then.

@cvuosalo
Copy link
Contributor

PR #33568 submitted to remove the DD4hep workflow from the PR tests.

@cvuosalo
Copy link
Contributor

cvuosalo commented Jun 7, 2021

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 7, 2021

This issue is fully signed and ready to be closed.

@cvuosalo
Copy link
Contributor

cvuosalo commented Jun 7, 2021

The 11634.911 DD4hep workflow is now stable. See #34003. This issue can be closed.

@makortel makortel closed this as completed Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants