Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better annotate interprocess_exception #3

Merged

Conversation

Dr15Jones
Copy link

The IBs with the external generator process are periodically failing with the unhelpful error message

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
	26077_0 process: caught exception 
	boost::interprocess_exception::library_error 15-Dec-2022 03:59:39 CET
	  while serialize lumi
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

That message happens when boost::interprocess_exception constructor is passed an error_info that is set to 0 AND the constructor is not passed a const char*. I went and modified all the places our code might reach a function that calls boost::interprocess_exception and forces it to also pass a const char* saying why the exception is being thrown.

Hopefully this will help track down the problems we see ever so often.

@cmsbuild
Copy link

A new Pull Request was created by @Dr15Jones (Chris Jones) for branch cms/v1.80.0.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@Dr15Jones
Copy link
Author

@smuzaffar I compiled all CMSSW code that includes boost/interprocess and had the headers redirected to this changed copy of boost.

@@ -310,7 +310,7 @@ class managed_open_or_create_impl
//File existing when trying to create, but non-existing when
//trying to open, and tried MaxCreateOrOpenTries times. Something fishy
//is happening here and we can't solve it
throw interprocess_exception(error_info(corrupted_error));
throw interprocess_exception(error_info(corrupted_error), "file exists when create but not when open");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw interprocess_exception(error_info(corrupted_error), "file exists when create but not when open");
throw interprocess_exception(error_info(corrupted_error), "do_create_else_open: file exists when trying to create, but not when trying to open");

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the other cases where you mention the function name explicitly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -1,4 +1,4 @@
//////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this could be avoided ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -100,13 +100,13 @@ inline bool semaphore_open
default:
{
error_info err(other_error);
throw interprocess_exception(err);
throw interprocess_exception(err, "semaphore_open unknown type");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw interprocess_exception(err, "semaphore_open unknown type");
throw interprocess_exception(err, "semaphore_open: unknown type");

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
}

//Check for error
if(handle == BOOST_INTERPROCESS_POSIX_SEM_FAILED){
throw interprocess_exception(error_info(errno));
throw interprocess_exception(error_info(errno),"semaphore_open sem_open failed");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw interprocess_exception(error_info(errno),"semaphore_open sem_open failed");
throw interprocess_exception(error_info(errno),"semaphore_open: sem_open failed");

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -148,7 +148,7 @@ inline void semaphore_init(sem_t *handle, unsigned int initialCount)
//In the future, a successful call might be required to return 0.
if(ret == -1){
error_info err = system_error_code();
throw interprocess_exception(err);
throw interprocess_exception(err, "semaphore_unlink sem_init failed");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw interprocess_exception(err, "semaphore_unlink sem_init failed");
throw interprocess_exception(err, "semaphore_init: sem_init failed");

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Dr15Jones Dr15Jones force-pushed the annotate_interprocess_exception branch from 0dcaab1 to 777347c Compare December 15, 2022 20:55
@cmsbuild
Copy link

Pull request #3 was updated.

Avoid case where exception message is just
  "boost::interprocess_exception::library_error"
@Dr15Jones Dr15Jones force-pushed the annotate_interprocess_exception branch from 777347c to fc6a197 Compare December 15, 2022 21:02
@cmsbuild
Copy link

Pull request #3 was updated.

@Dr15Jones
Copy link
Author

One thing of note is from looking at the code, it looks like the message boost::interprocess_exception::library_error was never supposed to be issued and instead the error_info was supposed to give something useful. So this change is just trying to figure out where error_info is not working like it was intended and then to use the extra info to figure out our problem.

@smuzaffar
Copy link

test parameters:

  • full_cmssw = true

@smuzaffar
Copy link

please test

@cmsbuild
Copy link

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3a0f0/29648/summary.html
COMMIT: fc6a197
CMSSW: CMSSW_13_0_X_2022-12-15-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-externals/boost/3/29648/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3a0f0/29648/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c3a0f0/29648/git-merge-result

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 157 differences found in the comparisons
  • DQMHistoTests: Total files compared: 49
  • DQMHistoTests: Total histograms compared: 3557521
  • DQMHistoTests: Total failures: 77
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3557422
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
  • Checked 211 log files, 162 edm output root files, 49 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link

+externals

@cmsbuild
Copy link

This pull request is fully signed and it will be integrated in one of the next cms/v1.80.0 IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants