Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VAPOR fails to reliably identify file formats of files supplied on the command line #2844

Closed
clyne opened this issue Aug 4, 2021 · 2 comments · Fixed by #2970
Closed

VAPOR fails to reliably identify file formats of files supplied on the command line #2844

clyne opened this issue Aug 4, 2021 · 2 comments · Fixed by #2970
Assignees
Milestone

Comments

@clyne
Copy link
Collaborator

clyne commented Aug 4, 2021

VAPOR can't always automatically determine a data file's type (format) when the file is supplied on the command line. Moreover, the current implementation can lead to misleading error messages (see #2842).

The current file format detection algorithm works as follows: for each supported file type attempt to initialize the DC class for that format using the derived DC classes' Initialization method. If DC::Initialize() succeeds, assume that the correct DC class (and thus the format) has been identified.

A couple of problems result from this approach:

  1. Some DC classes will successfully open (initialize) files that are of a different format. For example, the DCCF class will successfully initialize itself with some WRF files. The workaround for this is to test WRF files before CF files. But this problem will only grow as more formats are added.
  2. If a file is corrupt/invalid in some way the DC::Initialize method will fail and thus the file will not be correctly identified. Moreover, the user will only be informed that the file type could not be identified, and not informed of the reason why the initialization failed.

Fixing this problem is tricky. Some of the file formats don't have 'magic numbers' that would enable a reliable file detection method. Thus it might never be possible to have a fool proof method. However, adding a DC::FileDetect() method that looks for specific items expected to be found in a file of a given type could improve the situation.

@sgpearse
Copy link
Collaborator

Possible solution: Remove auto file detection, and add a consistent flag along the lines of what's already in vapor (some tools use -ftype, others use -filetype, etc) to the command line arguments.

Another option: rather than removing auto-detect, find when there's an ambiguity in data files and throw an error. @shaomeng and @clyne may have more feedback.

@sgpearse sgpearse added this to the 3_6_0 Release milestone Nov 9, 2021
@clyne
Copy link
Collaborator Author

clyne commented Nov 9, 2021

We discussed this and think option (2) above (...when there's an ambiguity in data...) is a reasonable path forward.

StasJ added a commit that referenced this issue Jan 12, 2022
clyne pushed a commit that referenced this issue Jan 19, 2022
…lied on the command line (#2970)

* Fix #2844

* clang-format pre-push hook

* Add BOV logic to reject header files > 1MB in size

* clang-format

* Add -ftype option

* clang-format pre-push hook

* fix typo

Co-authored-by: Scott Pearse <pearse@ucar.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants