Skip to content

Conversation

@soleti
Copy link
Contributor

@soleti soleti commented Apr 13, 2020

Now the code is able to parse the fifth field of a filename in the Mu2e convention and use it as a subrun number.

@FNALbuild
Copy link
Collaborator

Hi @soleti,
You have proposed changes to files in these packages:

  • Sources

which require these tests: build.

FNALbuild is explained here.

unsigned CorsikaBinaryDetail::getSubRunNumber(const std::string& filename) const {
const std::string::size_type islash = filename.find("DAT") ;
const std::string basename = (islash == std::string::npos) ? filename : filename.substr(islash + 3);
const std::string::size_type corsikaConvention = filename.find("DAT");
Copy link
Contributor

@gaponenko gaponenko Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Roberto, I think the current patch will break on an input filename like
/mu2e/data/users/gandr/MYDATA/DAT1001
I suggest to use regular expressions to require a more precise match.
Something like:

#include "regex" // github eats angular brackets!

then in the getSubRun(filename) method:

  std::regex re_corsika("^(.*/)?DAT([0-9]+)$");
  std::regex re_mu2e("^(.*/)?sim\\.\\w+\\.[\\w-]+\\.[\\w-]+\\.([0-9]+)\\.csk$");

  std::smatch match;
  if(std::regex_search(filename, match, re_corsika)) {
    // [0]: the whole string                                                                                                                                                                      
    // [1]: dirname or emtpy                                                                                                                                                                      
    // [2]: the run number string                                                                                                                                                                 
    sr = std::stoi(match.str(2));
  }   
  else if(std::regex_search(filename, match, re_mu2e)) {
    // [0]: the whole string                                                                                                                                                                      
    // [1]: dirname or emtpy                                                                                                                                                                      
    // [2]: the run number string                                                                                                                                                                 
    sr = std::stoi(match.str(2));
  }
  else {
     throw cet::exception("BADINPUT", " FromCorsikaBinary: ")
      << " Can not parse filename to extract subrun number:  "<<filename<<"\n";

  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, one solution however would be to change filename.find("DAT") into filename.find_last_of("DAT"). Personally I prefer to use regexs only when they are absolutely necessary, because I find them unreadable (you basically have to take a leap of faith). I think in this case we could achieve the same result without using them, what do you think?

@gaponenko
Copy link
Contributor

gaponenko commented Apr 13, 2020 via email

@kutschke
Copy link
Contributor

kutschke commented Apr 13, 2020 via email

@gaponenko
Copy link
Contributor

One more thing: can you please remove unused configuration parameters? There is at least one: fhicl::Atom firstSubRunNumber. May be it is the only one.

@soleti
Copy link
Contributor Author

soleti commented Apr 14, 2020

I removed the unused FCL parameters. I think the filename parsing is pretty safe right now, it should raise an exception if the filename does not contain a number in the right place. If the filename does not have neither "DAT" (CORSIKA convention) nor ".csk" (Mu2e convention), then I assign a default value to the subrun of 1.

@brownd1978
Copy link
Collaborator

@FNALbuild run build test

@FNALbuild
Copy link
Collaborator

⌛ The following tests have been triggered for ref 00bc5c0: build

@FNALbuild
Copy link
Collaborator

☀️ The tests passed at ref 00bc5c0. Total build time: 11 min 37 sec.

Test Result
scons build (prof) ✔️
ceSimReco (-n 10) ✔️

For more information, please check here.

@gaponenko
Copy link
Contributor

Roberto, thank you for removing the unused parameters. The filename parsing code will still fail on valid filenames, like /mu2e/data/users/gandr/MYDATA/sim.mu2e.test.v1.1001.csk I did not realize that when this PR was brought up in the meeting today.

@soleti
Copy link
Contributor Author

soleti commented Apr 16, 2020

If you call your folder MYDATA you kind of deserve the job to fail :) but I switched to the regex parsing anyway.

@gaponenko
Copy link
Contributor

If you call your folder MYDATA you kind of deserve the job to fail :) but I switched to the regex parsing anyway.

Matching a pattern in a general string seems easy. Until one thinks of all the corner cases. Regular expressions have almost 70 years of refinements (according to wikipedia they originated in 1951). I am sure it would be possible to handle all cases correctly with simple string searches, but I am not so sure it would be easy to do.

@gaponenko
Copy link
Contributor

@FNALbuild run build test

@gaponenko gaponenko self-requested a review April 16, 2020 06:24
@kutschke
Copy link
Contributor

@FNALbuild run build test

@FNALbuild
Copy link
Collaborator

⌛ The following tests have been triggered for ref fa520ae: build

@FNALbuild
Copy link
Collaborator

☀️ The tests passed at ref fa520ae. Total build time: 11 min 38 sec.

Test Result
scons build (prof) ✔️
ceSimReco (-n 10) ✔️

For more information, please check here.

@kutschke kutschke merged commit 09fc225 into Mu2e:master Apr 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants