Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the file manager API also return a hash? #849

Open
iandouglas opened this issue May 1, 2024 · 7 comments
Open

Can the file manager API also return a hash? #849

iandouglas opened this issue May 1, 2024 · 7 comments
Labels
enhancement New feature or request

Comments

@iandouglas
Copy link

Is your feature request related to a problem? Please describe

In discussion with folks involved in the "FDM Monster" print farm software, they mentioned that when syncing a file over the network, OctoPrint will return a filename, file size, and a hash of the file, which can be used as a comparison for whether to re-copy a file over the network or not. They mentioned that Moonraker's API only returns the filename and file size, without a hash.

Describe the solution you'd like

Extend the file manager API to include a hash for any files returned in the response.

Describe alternatives you've considered

None at the moment.

Additional information

No response

@Arksine
Copy link
Owner

Arksine commented May 2, 2024

The upload endpoint accepts an optional SHA256 checksum in the form. The frontend should pre-calculate the hash and add it to the form. If the checksum doesn't match Moonraker will return a 422 error. This flow is generally superior over the requested flow (return a checksum after processsing), as prevents a file with a mismatched checksum from being written to the destination.

@davidzwa
Copy link

davidzwa commented May 2, 2024

Hi,

The original question above was not intended to implement upload error prevention, but instead as a mechanism to prevent having to upload a file which already exists on Moonraker.

The idea is to optimize having to send 400MB files. The OP originally asked in DM about whether an approach like rsync was possible, and I had to fall back to API based approach. In that case knowing a checksum can prevent the client (the FDM Monster server in this case) to upload a big file (if its not new by only 1 bit) and then proceed immediately to print it.

In the proposed situation the file has to be uploaded (with checksum multipart field) after which Moonraker will (most likely) never throw an error as the new file is probably fine. Therefore the (large) upload is not prevented and no optimization is achieved.

Our feature request might have been misinterpreted a bit ;)

@davidzwa
Copy link

davidzwa commented May 4, 2024

@iandouglas based on our latest conversation and change in requirements on my part, shall we close this issue?

@Arksine
Copy link
Owner

Arksine commented May 4, 2024

It looks like I did misinterpret the original request. While it sounds like you have reached an alternative solution, I thought it would be worth providing feedback on why I don't believe it would be a good idea to modify the return values for existing endpoints:

  1. Moonraker attempts to avoid processing complete gcode files. There are exceptions (outlined below), generally speaking Moonraker attempts to read out a minimal part of the file in an effort to extract metadata as fast as possible.
  2. Presuming Moonraker calculated a checksum for every gcode file, caching the checksum isn't necessarily reliable. A file could be modified after upload.
  3. As mentioned above, Moonraker does process complete uploads in two scenarios: UFP uploads and when "object processing" is enabled and detected as necessary. Checksums aren't useful in either of these scenarios, as the the result is guaranteed to be different from the original file. That said, popular slicers now support native Klipper objects, so the latter scenario is less likely to occur.

FWIW, I would be willing to consider a new endpoint that calculates a checksum for a file on request. This way we avoid the major pitfalls outlined above.

@davidzwa
Copy link

davidzwa commented May 4, 2024

  1. Moonraker attempts to avoid processing complete gcode files. There are exceptions (outlined below), generally speaking Moonraker attempts to read out a minimal part of the file in an effort to extract metadata as fast as possible.

And I completely agree with this from your point of view. I would do the same.

  1. Presuming Moonraker calculated a checksum for every gcode file, caching the checksum isn't necessarily reliable. A file could be modified after upload.

Agreed. If a file is changed on the root directory attached to Moonraker, that's up to the user to figure out (its simply out of my control). I hope to document this so the user is aware from the perspective of FDM Monster.

  1. As mentioned above, Moonraker does process complete uploads in two scenarios: UFP uploads and when "object processing" is enabled and detected as necessary. Checksums aren't useful in either of these scenarios, as the the result is guaranteed to be different from the original file. That said, popular slicers now support native Klipper objects, so the latter scenario is less likely to occur.

I see. Interesting to note.

FWIW, I would be willing to consider a new endpoint that calculates a checksum for a file on request. This way we avoid the major pitfalls outlined above.

It would be a nice thing. But let's get this question answered first:

One more question: is the checksum calculation done by POST /server/files/upload a heavy operation CPU wise (for the Moonraker server/host)? Or is this done in a streaming fashion while the upload is being processed (chunk by chunk) and therefore not of big impact?
I want to be able to advise the users of FDM Monster appropriately whether checksums are wise for large files (400MB+)

@Arksine
Copy link
Owner

Arksine commented May 5, 2024

One more question: is the checksum calculation done by POST /server/files/upload a heavy operation CPU wise (for the Moonraker server/host)? Or is this done in a streaming fashion while the upload is being processed (chunk by chunk) and therefore not of big impact?

The checksum is calculated in a streaming fashion as the file is received. Its always calculated, regardless of whether or not the form includes the checksum argument, so there is no additional overhead. The checksum is calculated before the file is passed off to the metadata processor, so there it should always match the checksum calculated by the frontend before submitting the upload request.

@davidzwa
Copy link

davidzwa commented May 5, 2024

Perfect! I suggest that we keep this issue open with a new title "Add API endpoint to recalculate file hash" and that it gets low priority. For my project at this point the feature is an edge case, but it might become important after the next half year when moonraker is fully supported and out of beta in FDM Monster.

If things change on my end I will let you know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants