-
Notifications
You must be signed in to change notification settings - Fork 98
Intercepting also filenames with Mu2e convention #174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @soleti,
which require these tests: build. |
| unsigned CorsikaBinaryDetail::getSubRunNumber(const std::string& filename) const { | ||
| const std::string::size_type islash = filename.find("DAT") ; | ||
| const std::string basename = (islash == std::string::npos) ? filename : filename.substr(islash + 3); | ||
| const std::string::size_type corsikaConvention = filename.find("DAT"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Roberto, I think the current patch will break on an input filename like
/mu2e/data/users/gandr/MYDATA/DAT1001
I suggest to use regular expressions to require a more precise match.
Something like:
#include "regex" // github eats angular brackets!
then in the getSubRun(filename) method:
std::regex re_corsika("^(.*/)?DAT([0-9]+)$");
std::regex re_mu2e("^(.*/)?sim\\.\\w+\\.[\\w-]+\\.[\\w-]+\\.([0-9]+)\\.csk$");
std::smatch match;
if(std::regex_search(filename, match, re_corsika)) {
// [0]: the whole string
// [1]: dirname or emtpy
// [2]: the run number string
sr = std::stoi(match.str(2));
}
else if(std::regex_search(filename, match, re_mu2e)) {
// [0]: the whole string
// [1]: dirname or emtpy
// [2]: the run number string
sr = std::stoi(match.str(2));
}
else {
throw cet::exception("BADINPUT", " FromCorsikaBinary: ")
<< " Can not parse filename to extract subrun number: "<<filename<<"\n";
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, one solution however would be to change filename.find("DAT") into filename.find_last_of("DAT"). Personally I prefer to use regexs only when they are absolutely necessary, because I find them unreadable (you basically have to take a leap of faith). I think in this case we could achieve the same result without using them, what do you think?
|
Hi Roberto,
On Mon, 13 Apr 2020, Stefano Roberto Soleti wrote:
@soleti commented on this pull request.
> @@ -137,13 +137,32 @@ namespace mu2e {
//----------------------------------------------------------------
unsigned CorsikaBinaryDetail::getSubRunNumber(const std::string& filename) const {
- const std::string::size_type islash = filename.find("DAT") ;
- const std::string basename = (islash == std::string::npos) ? filename : filename.substr(islash + 3);
+ const std::string::size_type corsikaConvention = filename.find("DAT");
You're right, one solution however would be to change
`filename.find("DAT")` into `filename.find_last_of("DAT")`. Personally I
prefer to use regexs only when they are absolutely necessary, because I
find them unreadable
May be I am too used to them :) I like doing the full string match (with
the cared and dollar), like in the example I posted. It pretty much
eliminates surpises.
(you basically have to take a leap of faith). I think in this case we
could achieve the same result without using them, what do you think?
Sure, there are multiple ways of doing things. It's your code, so do it
your way.
Best regards,
Andrei
…
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Mu2e_Offline_pull_174-23discussion-5Fr407754510&d=DwICaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=O47fc5vzDTR2V_gla4Ub0Q&m=zsMqvUr66axgAhwk1wpvlVv5DacFZyKzisFk89JOYms&s=bCrGhxzyY8i3M62AxL8my7Ba4-_JYHGaRH1pbwcfat0&e=
|
|
Hi Roberto,
However you choose to do it, my advice is to be rigorous and expect that people will
not follow advice. If you want people to follow a convention, enforce it. If you encounter
an ambiguous situation, throw an exception rather than guessing what the user intended.
Rob
… On Apr 13, 2020, at 5:22 PM, Stefano Roberto Soleti ***@***.***> wrote:
@soleti commented on this pull request.
In Sources/src/FromCorsikaBinary_source.cc:
> @@ -137,13 +137,32 @@ namespace mu2e {
//----------------------------------------------------------------
unsigned CorsikaBinaryDetail::getSubRunNumber(const std::string& filename) const {
- const std::string::size_type islash = filename.find("DAT") ;
- const std::string basename = (islash == std::string::npos) ? filename : filename.substr(islash + 3);
+ const std::string::size_type corsikaConvention = filename.find("DAT");
You're right, one solution however would be to change filename.find("DAT") into filename.find_last_of("DAT"). Personally I prefer to use regexs only when they are absolutely necessary, because I find them unreadable (you basically have to take a leap of faith). I think in this case we could achieve the same result without using them, what do you think?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
|
One more thing: can you please remove unused configuration parameters? There is at least one: fhicl::Atom firstSubRunNumber. May be it is the only one. |
|
I removed the unused FCL parameters. I think the filename parsing is pretty safe right now, it should raise an exception if the filename does not contain a number in the right place. If the filename does not have neither "DAT" (CORSIKA convention) nor ".csk" (Mu2e convention), then I assign a default value to the subrun of 1. |
|
@FNALbuild run build test |
|
⌛ The following tests have been triggered for ref 00bc5c0: build |
|
Roberto, thank you for removing the unused parameters. The filename parsing code will still fail on valid filenames, like /mu2e/data/users/gandr/MYDATA/sim.mu2e.test.v1.1001.csk I did not realize that when this PR was brought up in the meeting today. |
|
If you call your folder MYDATA you kind of deserve the job to fail :) but I switched to the regex parsing anyway. |
Matching a pattern in a general string seems easy. Until one thinks of all the corner cases. Regular expressions have almost 70 years of refinements (according to wikipedia they originated in 1951). I am sure it would be possible to handle all cases correctly with simple string searches, but I am not so sure it would be easy to do. |
|
@FNALbuild run build test |
|
@FNALbuild run build test |
|
⌛ The following tests have been triggered for ref fa520ae: build |
Now the code is able to parse the fifth field of a filename in the Mu2e convention and use it as a subrun number.