Skip to content
Please note that GitHub no longer supports Internet Explorer.

We recommend upgrading to the latest Microsoft Edge, Google Chrome, or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Profile Mode - issues and questions. #375

Open
nishihatapalmer opened this issue Feb 2, 2020 · 3 comments
Open

No Profile Mode - issues and questions. #375

nishihatapalmer opened this issue Feb 2, 2020 · 3 comments

Comments

@nishihatapalmer
Copy link
Contributor

@nishihatapalmer nishihatapalmer commented Feb 2, 2020

The No Profile mode in the command line project suffers from multiple issues.

  • doesn't match extensions the same as profile mode.
  • only outputs filename and puid.
  • archive file processing support is limited (it usually can't handle nested archives)

Origin
The original idea of no profile mode was to support outputting droid metadata without the overhead of a database. If the results were just going to go to the standard out, or a file, then there was no need to record them in a database as well.

Implementation
A decision was made to re-implement all the core identification and archive handling in the command line project for the no profile mode. This was a short term decision made due to some annoying incompatibilities (filters being one). At the time there was only zip, gz and tar archive handling.

Outcomes
The result of this is that there is a considerable maintenance burden imposed by the no profile mode, as the archive handling must be re-implemented every time a new type is added. In fact, the implementations provided by no profile mode are much less capable than the ones already available in the main droid projects (they can't handle nested archives for the most part).

Identification also suffers, as the logic doesn't match what is available in the rest of DROID, and results suffer as very little metadata is outputted in this mode (just filename and puid).

Original design
In the original design, the idea of no profile mode was that it should use all the same profile machinery, but just plug in a csv output ResultsHandler (whether to standard out or a file), instead of the database results handler.

Questions

  1. Would implementing no profile mode to the original design be of interest? Doing so would massively simplify the command line project, as it would just use the standard droid modules used by the rest of DROID.
  2. Do people depend on the current output (just filename and puid) of no profile mode? This could be preserved, although I suspect most people will want the full metadata most of the time.

Note: I see there is an issue with no-profile-mode filenames containing commas here: #34 This would be fixed by using a CSV output format for droid metadata. The idea would be that no profile mode could output data essentially the same as if we had profiled with a database, and then performed an export. The output could still be given in the current format for backwards compatibility reasons.

@nishihatapalmer

This comment has been minimized.

Copy link
Contributor Author

@nishihatapalmer nishihatapalmer commented Feb 2, 2020

The only approaches I can see to errors subsequent to the initial identification are:

  1. Defer recording the identification metadata of the parent archival format until we've successfully processed it. This would lead to archival parents always being recorded after their children in the output.
  2. Allow error reporting entries in the output. These are corrections which should be applied to the status of previous entries.
  3. Ignore errors processing an archival format - the file was still identified as that format, even if there was a problem processing it. The log will still contain evidence of problems processing it.

Note: this issue surfaces when we identify an archival format - and record the result, then later encounter an error processing the archival format. The database handler updates the original entry to indicate an error. Obviously, you can't go back and correct a row previously written to standard out or a file.

@marhop

This comment has been minimized.

Copy link

@marhop marhop commented Feb 3, 2020

Would implementing no profile mode to the original design be of interest?

Yes! It would be great to have consistent results with CLI (no profile) and GUI DROID. See also #224 which I think is related.

@thorsted

This comment has been minimized.

Copy link

@thorsted thorsted commented Feb 3, 2020

I also would have interested in seeing consistency between the two versions. I would first want to know how this would affect our enterprise installation of Rosetta and how it implements DROID. I believe it uses the CLI version, but also has added a way to set the Max Byte Scan size setting. It would also allow us to add the ability to scan more archival formats within the Rosetta preservation system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.