-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #2844 VAPOR fails to reliably identify file formats of files supplied on the command line #2970
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On OSX, my macbook hangs when loading the CF file 24Maycontrol.04001.000000.nc via the command line.
This file can be found on glade at /glade/p/cisl/vast/vapor/data/Source/CM1/Orf/24Maycontrol/24Maycontrol.04000.000000.nc
It appears that dataset is causing the BOV reader to hang. |
@sgpearse , @StasJ , are we going to address this bug in this PR, or open a separate issue? |
@clyne I can write a ticket so we can move forward, but I believe that the BOV reader is the last entry to be tested. Shouldn't the CF reader prevent the BOV from being triggered? |
No, it should not. Per the conversation above, all of the readers are tested and if multiple readers are able to open the same file than auto-determination of the file format is not possible. Ideally, the BOV reader should not be hanging when confronted with an invalid file. |
In that case the current solution makes loading any dataset from the command line hang. Accepting this as-is would send us backwards so I think we should fix the problem before accepting. The BOV reader scans its input file line-by-line for tokens until it reaches the end of the file. I'm not sure what a good strategy is for it detecting invalid files without reading the input file. One option is to remove the BOV reader check, but I often use this feature for BOV and don't want to lose the option. |
BOV files have to be ASCII, correct? One way is to read the first n (some reasonable number) bytes and see if they are all ASCII. @StasJ can probably give guidance on testing a byte for ASCII-ness |
I haven't looked deeply into the problem yet, and I'm all ears for suggestions. Rregarding ASCII, the characters could be UTF-8/16 or something else that ifstream handles. |
You might also just put a limit on how much data your parser reads, and terminate it after X number of bytes. Since are no magic numbers for these files, voodoo guesswork is all you can do. |
@clyne and @StasJ, I added a rule to the BOV reader to reject headers < 1MB in size. They shouldn't ever be larger than a couple KB. However I'm unable to get wrf, ugrid, bov, or ROMS files loaded through the command line. Leigh's MayControl files seem to work. I feel like this PR would move us backwards by disabling a frequently used tool (at least among developers). Thoughts? |
@StasJ what are your thoughts on simply adding a '-ftype' option so the user can specify the format without ambiguity and we can be done with this? |
That makes sense. I believe at our meeting we decided such functionality would be added in a future PR which is why it wasn’t included here. I can add it to this PR. |
lib/vdc/BOVCollection.cpp
Outdated
if (getFileSize(paths[i]) > 1000) { | ||
SetErrMsg(("BOV header file larger than 1MB. This text file shouldn't need to be larger than a few KB." + paths[0]).c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant 1KB than 1MB here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - I'm checking for > 1MB. Thanks sam.
Fix #2844