Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #2844 VAPOR fails to reliably identify file formats of files supplied on the command line #2970

Merged
merged 9 commits into from
Jan 19, 2022

Conversation

StasJ
Copy link
Collaborator

@StasJ StasJ commented Jan 12, 2022

Fix #2844

@clyne clyne requested review from clyne and sgpearse January 12, 2022 18:54
Copy link
Collaborator

@sgpearse sgpearse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On OSX, my macbook hangs when loading the CF file 24Maycontrol.04001.000000.nc via the command line.

This file can be found on glade at /glade/p/cisl/vast/vapor/data/Source/CM1/Orf/24Maycontrol/24Maycontrol.04000.000000.nc

@StasJ
Copy link
Collaborator Author

StasJ commented Jan 13, 2022

On OSX, my macbook hangs when loading the CF file 24Maycontrol.04001.000000.nc via the command line.

This file can be found on glade at /glade/p/cisl/vast/vapor/data/Source/CM1/Orf/24Maycontrol/24Maycontrol.04000.000000.nc

It appears that dataset is causing the BOV reader to hang.

@clyne
Copy link
Collaborator

clyne commented Jan 13, 2022

On OSX, my macbook hangs when loading the CF file 24Maycontrol.04001.000000.nc via the command line.
This file can be found on glade at /glade/p/cisl/vast/vapor/data/Source/CM1/Orf/24Maycontrol/24Maycontrol.04000.000000.nc

It appears that dataset is causing the BOV reader to hang.

@sgpearse , @StasJ , are we going to address this bug in this PR, or open a separate issue?

@sgpearse
Copy link
Collaborator

@clyne I can write a ticket so we can move forward, but I believe that the BOV reader is the last entry to be tested. Shouldn't the CF reader prevent the BOV from being triggered?

@clyne
Copy link
Collaborator

clyne commented Jan 13, 2022

@clyne I can write a ticket so we can move forward, but I believe that the BOV reader is the last entry to be tested. Shouldn't the CF reader prevent the BOV from being triggered?

No, it should not. Per the conversation above, all of the readers are tested and if multiple readers are able to open the same file than auto-determination of the file format is not possible. Ideally, the BOV reader should not be hanging when confronted with an invalid file.

@sgpearse sgpearse self-requested a review January 13, 2022 18:14
@sgpearse
Copy link
Collaborator

In that case the current solution makes loading any dataset from the command line hang. Accepting this as-is would send us backwards so I think we should fix the problem before accepting.

The BOV reader scans its input file line-by-line for tokens until it reaches the end of the file. I'm not sure what a good strategy is for it detecting invalid files without reading the input file. One option is to remove the BOV reader check, but I often use this feature for BOV and don't want to lose the option.

@clyne
Copy link
Collaborator

clyne commented Jan 13, 2022

In that case the current solution makes loading any dataset from the command line hang. Accepting this as-is would send us backwards so I think we should fix the problem before accepting.

The BOV reader scans its input file line-by-line for tokens until it reaches the end of the file. I'm not sure what a good strategy is for it detecting invalid files without reading the input file. One option is to remove the BOV reader check, but I often use this feature for BOV and don't want to lose the option.

BOV files have to be ASCII, correct? One way is to read the first n (some reasonable number) bytes and see if they are all ASCII. @StasJ can probably give guidance on testing a byte for ASCII-ness

@sgpearse
Copy link
Collaborator

sgpearse commented Jan 13, 2022

I haven't looked deeply into the problem yet, and I'm all ears for suggestions. Rregarding ASCII, the characters could be UTF-8/16 or something else that ifstream handles.

@clyne
Copy link
Collaborator

clyne commented Jan 14, 2022

You might also just put a limit on how much data your parser reads, and terminate it after X number of bytes. Since are no magic numbers for these files, voodoo guesswork is all you can do.

@sgpearse
Copy link
Collaborator

@clyne and @StasJ, I added a rule to the BOV reader to reject headers < 1MB in size. They shouldn't ever be larger than a couple KB.

However I'm unable to get wrf, ugrid, bov, or ROMS files loaded through the command line. Leigh's MayControl files seem to work. I feel like this PR would move us backwards by disabling a frequently used tool (at least among developers). Thoughts?

@clyne
Copy link
Collaborator

clyne commented Jan 14, 2022

@clyne and @StasJ, I added a rule to the BOV reader to reject headers < 1MB in size. They shouldn't ever be larger than a couple KB.

However I'm unable to get wrf, ugrid, bov, or ROMS files loaded through the command line. Leigh's MayControl files seem to work. I feel like this PR would move us backwards by disabling a frequently used tool (at least among developers). Thoughts?

@StasJ what are your thoughts on simply adding a '-ftype' option so the user can specify the format without ambiguity and we can be done with this?

@StasJ
Copy link
Collaborator Author

StasJ commented Jan 14, 2022

@clyne and @StasJ, I added a rule to the BOV reader to reject headers < 1MB in size. They shouldn't ever be larger than a couple KB.
However I'm unable to get wrf, ugrid, bov, or ROMS files loaded through the command line. Leigh's MayControl files seem to work. I feel like this PR would move us backwards by disabling a frequently used tool (at least among developers). Thoughts?

@StasJ what are your thoughts on simply adding a '-ftype' option so the user can specify the format without ambiguity and we can be done with this?

That makes sense. I believe at our meeting we decided such functionality would be added in a future PR which is why it wasn’t included here. I can add it to this PR.

Comment on lines 118 to 119
if (getFileSize(paths[i]) > 1000) {
SetErrMsg(("BOV header file larger than 1MB. This text file shouldn't need to be larger than a few KB." + paths[0]).c_str());
Copy link
Collaborator

@shaomeng shaomeng Jan 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant 1KB than 1MB here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @shaomeng. Can you confirm the size here, @sgpearse, and then I'll accept.

Copy link
Collaborator

@sgpearse sgpearse Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - I'm checking for > 1MB. Thanks sam.

@clyne clyne merged commit dd7ac4e into main Jan 19, 2022
@clyne clyne deleted the issue2844 branch January 19, 2022 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VAPOR fails to reliably identify file formats of files supplied on the command line
4 participants