Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coders/pnm: Handle write failures #2081

Closed

Conversation

MerlijnWajer
Copy link

@MerlijnWajer MerlijnWajer commented May 31, 2020

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practices as demonstrated in the repository.

Description

When using convert, coders/pnm fails to detect is the disk is full, and instead return exit code 0. The Write* calls throughout the file do not check the actual amount of bytes that are written, leading to silent truncation errors. Additionally, the actual return line at the end of the function also unconditionally returns true, even if it has just detected that the processing went wrong.

Further description, including how to reproduce, can be found in the commit messages, but here's a quick summary:

    Evident by this sequence:

        $ ls -lsh /tmp/0009-1-small.png
        20M -rw-r--r-- 1 merlijn merlijn 20M May 25 23:59 /tmp/0009-1-small.png
        $ sudo mkdir /mnt/test
        $ sudo mount -t tmpfs none -o size=1G /mnt/test/
        $ df -h /mnt/test/
        Filesystem      Size  Used Avail Use% Mounted on
        none            5.0M     0  5.0M   0% /mnt/test
        $ convert /tmp/0009-1-small.png /mnt/test/0009.pnm
        $ echo $?
        0
        $ ls -lsh /mnt/test/0009.pnm
        5.0M -rw-r--r-- 1 merlijn merlijn 5.0M May 31 11:42 /mnt/test/0009.pnm

    Failure is evident with strace:

        write(5, "\373\372\367\373\371\367\372\370\365\372\370\364\372\370\364\372\371\366\372\367\363\363\355\346\351\344\327\370\363\356\372\370"..., 8192) = -1 ENOSPC (No space left on device)

Please let me know if you are OK with this assessment and approach and I will fix up the rest of coders/pnm.c in this pull request, and briefly scan some of the others coders as well.

@MerlijnWajer
Copy link
Author

From a quick search, this seems to be a fairly common error:

$ grep -r WriteBlob coders/* | grep '\(void\)' | sed 's/:.*//' | sort | uniq
coders/aai.c
coders/art.c
coders/avs.c
coders/bmp.c
coders/braille.c
coders/cals.c
coders/cip.c
coders/dds.c
coders/debug.c
coders/dib.c
coders/ept.c
coders/fits.c
coders/gif.c
coders/hdr.c
coders/heic.c
coders/html.c
coders/icon.c
coders/info.c
coders/inline.c
coders/ipl.c
coders/jbig.c
coders/json.c
coders/magick.c
coders/map.c
coders/mat.c
coders/meta.c
coders/miff.c
coders/mono.c
coders/mpc.c
coders/mtv.c
coders/mvg.c
coders/otb.c
coders/palm.c
coders/pcd.c
coders/pcl.c
coders/pcx.c
coders/pdb.c
coders/pdf.c
coders/pgx.c
coders/pict.c
coders/png.c
coders/pnm.c
coders/ps2.c
coders/ps3.c
coders/ps.c
coders/psd.c
coders/rgf.c
coders/sgi.c
coders/sun.c
coders/svg.c
coders/tga.c
coders/txt.c
coders/uil.c
coders/uyvy.c
coders/vicar.c
coders/viff.c
coders/vips.c
coders/wbmp.c
coders/webp.c
coders/xbm.c
coders/xpm.c
coders/xwd.c
coders/yuv.c

@MerlijnWajer
Copy link
Author

Confirmed the bmp writer has the same problem.

I can try to create some resuable macros, since most coders/*.c follow the same pattern, and then we can probably fix up the ~1700 bad write calls mostly automatically.

What do you think?

Something like this, but better, of course:

+#define MyWriteBlob(img, len, dat) \
+    {   \
+        ssize_t __ret_size = WriteBlob(img, len, dat); \
+        if (__ret_size != (ssize_t)len) { \
+            perror("WriteBlob failed"); \
+            fprintf(stderr, "WriteBlob* failed in %s at %d\n", __FILE__, __LINE__); \
+            status = MagickFalse; \
+            break; \
+        } \
+    }

@MerlijnWajer
Copy link
Author

Example output would then be:

$ ./utilities/magick convert /tmp/0009-1-small.png /mnt/test/test.bmp
WriteBlob failed: No space left on device
WriteBlob* failed in coders/bmp.c at 2469

@dlemstra
Copy link
Member

It seems that we ignore the return value in a lot of places. But we never use perror or print to stderr directly from our coders. We "throw" exceptions instead (see exception-private.h) and print those at a later point. But I don't think that we should raise an exception for all these cases. My first thought would be to have a macro that does this:

#define NewWriteStringMacro(buffer) 
{
  size_t
    write_length;

  write_length=strlen(string);
  if (WriteBlobStream(image,write_length,(const unsigned char *) buffer) != (ssize_t) write_length)
    status=MagickFalse;
}

And then check status and act upon it depending of the situation. But I am not sure yet if that would be the correct approach.

@MerlijnWajer
Copy link
Author

Thank you for your response. I'm toying with some macros right now and will convert pnm and bmp coders with those, and perhaps we can evaluate how that looks/works.

Understood, regarding perror.

@dlemstra
Copy link
Member

It might be wise to limit your actions to the pnm coder first and fix other coders in other pull requests. Also not yet sure how we would like to resolve this. Will need some time to think about this but feel free to continue with the code in pnm and do a purposal.

p.s. We have strict code style rules. Could you please take a look at the style of the current code and take the style in account in you purposal?

@dlemstra
Copy link
Member

dlemstra commented May 31, 2020

We would prefer to not use a macro in this situation and solve it inline instead. Similar to what you have done now.

p.s. It looks like you could re-use count instead of creating a new variable.

@MerlijnWajer
Copy link
Author

Without a macro, code like this might get quite convoluted:

https://github.com/ImageMagick/ImageMagick/blob/master/coders/bmp.c#L2289

https://github.com/ImageMagick/ImageMagick/blob/master/coders/bmp.c#L2345

I'll push an updated version for pnm and bmp to this PR (no need to merge it as is, seems like there's useful discussion going on), and then please let me know what you think makes most sense.

@dlemstra
Copy link
Member

Your comment about the bmp coder gave me another idea. Would it not be sufficient enough to only check the last write? If there is no more diskspace all of them will fail.

TODO:

- Get rid of perror()
- Pick proper function name for each macro (not literal WriteBlob*)
@MerlijnWajer
Copy link
Author

Updated the current branch (not ready for merging at all)

Couple notes:

  • Still using perror, not because I want it there, but because I haven't figured out the exception handling you mentioned yet.
  • the macro doesn't work for if ... else statements without { and }, I'm trying to figure it out, per
    https://stackoverflow.com/questions/55933541/else-without-previous-if-error-when-defining-macro-with-arguments - but can't use do/while because of the break; in there.
  • In general the approach is relatively fragile and assumes that all coders have the same setup (with a variable named status defined, in a loop, etc)
  • I've tried to stick to the right whitespace, but I think your remark about code standards was about more than that, so I'll have to delve into that.

For code that does actually check the return values, the macro cannot be used, obviously.

@MerlijnWajer
Copy link
Author

Your comment about the bmp coder gave me another idea. Would it not be sufficient enough to only check the last write? If there is no more diskspace all of them will fail.

How would you handle:

  • Disk space becoming available while you're writing?
  • if statements that bail out earlier (e.g. how do you determine what is really the "last write call")?

@urban-warrior
Copy link
Member

Here is the recommended practice for addressing detecting I/O problems:

          count=WriteBlob(image,extent,pixels);
          if (count != (ssize_t) extent)
            {
              ThrowFileException(exception,FileOpenError,"UnableToWriteFile",
                image->filename);
              break;
            }

@MerlijnWajer
Copy link
Author

Here is the recommended practice for addressing detecting I/O problems:

          count=WriteBlob(image,extent,pixels);
          if (count != (ssize_t) extent)
            {
              ThrowFileException(exception,FileOpenError,"UnableToWriteFile",
                image->filename);
              break;
            }

Ok, that works:

convert: UnableToWriteFile '/mnt/test/test.bmp': No space left on device @ error/bmp.c/WriteBMPImage/2461.

Will add that to the macro instead of perror and fprintf.

@dlemstra
Copy link
Member

We would prefer you to not use a macro to cover the whole situation. There are just too many differences between the coders. And t is possible that you will need to clean up memory before you do a break or return.

And to answer your question. You could do this:

status=CheckWriteBlobLSBLong(image,bmp_info.image_size);
status&=CheckWriteBlobLSBLong(image,bmp_info.x_pixels);
status&=CheckWriteBlobLSBLong(image,bmp_info.y_pixels);
status&=CheckWriteBlobLSBLong(image,bmp_info.number_colors);
status&=CheckWriteBlobLSBLong(image,bmp_info.colors_important);
if (status == MagickFalse)
  {
    ThrowFileException(exception,FileOpenError,"UnableToWriteFile",
      image->filename);
    // Do things
  }

And CheckWriteBlobLSBLong could be an "inline" method in the private header that checks the result. And yes this is another idea on how this could be solved.

@MerlijnWajer
Copy link
Author

MerlijnWajer commented May 31, 2020

We would prefer you to not use a macro to cover the whole situation. There are just too many differences between the coders. And t is possible that you will need to clean up memory before you do a break or return.

And to answer your question. You could do this:

status=CheckWriteBlobLSBLong(image,bmp_info.image_size);
status&=CheckWriteBlobLSBLong(image,bmp_info.x_pixels);
status&=CheckWriteBlobLSBLong(image,bmp_info.y_pixels);
status&=CheckWriteBlobLSBLong(image,bmp_info.number_colors);
status&=CheckWriteBlobLSBLong(image,bmp_info.colors_important);
if (status == MagickFalse)
  {
    ThrowFileException(exception,FileOpenError,"UnableToWriteFile",
      image->filename);
    // Do things
  }

And CheckWriteBlobLSBLong could be an "inline" method in the private header that checks the result. And yes this is another idea on how this could be solved.

This looks like a good approach me.

EDIT: Actually, no, this has problems, since errno will get overriden by subsequent calls.

@urban-warrior
Copy link
Member

Its also not necessary to check every write. Recommended practice is to have the single check in the loop when pixels are written. That will capture any I/O errors that occur.

MerlijnWajer and others added 6 commits May 31, 2020 16:03
Previously the code would always just return MagickTrue, even if it had
just inferred that the processing failed. This is required for more
elegant error handling throughout the function, such as checks on the
WriteBlob() calls, where the return value is currently being ignored.

Upon WriteBlob errors, we can then set the status to MagickFalse, and
break.
In WritePNMImage, when converting to a PNM image, the result of the
WriteBlob calls is not checked, and instead ignored explicitly by
'(void)' in front of it. This means that ImageMagick return success on
images that get truncated when write(2) fails with ENOSPC.

Evident by this sequence:

    $ ls -lsh /tmp/0009-1-small.png
    20M -rw-r--r-- 1 merlijn merlijn 20M May 25 23:59 /tmp/0009-1-small.png
    $ sudo mkdir /mnt/test
    $ sudo mount -t tmpfs none -o size=1G /mnt/test/
    $ df -h /mnt/test/
    Filesystem      Size  Used Avail Use% Mounted on
    none            5.0M     0  5.0M   0% /mnt/test
    $ convert /tmp/0009-1-small.png /mnt/test/0009.pnm
    $ echo $?
    0
    $ ls -lsh /mnt/test/0009.pnm
    5.0M -rw-r--r-- 1 merlijn merlijn 5.0M May 31 11:42 /mnt/test/0009.pnm

Failure is evident with strace:

    write(5, "\373\372\367\373\371\367\372\370\365\372\370\364\372\370\364\372\371\366\372\367\363\363\355\346\351\344\327\370\363\356\372\370"..., 8192) = -1 ENOSPC (No space left on device)

MagickCore/blob.c handles these errors nicely, but the callers of
WriteBlob should still check the actual amount of written bytes.
Since MagickCore/blob.c already handles EINTR for them, the checks are
simple. This commit only fixes up the PNM image writing, but similar
issues remain throughout the file.
@MerlijnWajer
Copy link
Author

Its also not necessary to check every write. Recommended practice is to have the single check in the loop when pixels are written. That will capture any I/O errors that occur.

How would this work in the following scenario:

  1. No space is available while header is being written, imagemagick writes fail
  2. scheduler lets another program run for a bit, disk space becomes available
  3. imagemagick starts writing pixels

Now we'll end up with an image without a proper header, and imagemagick will still return success.

That seems undesirable to me.

@urban-warrior
Copy link
Member

urban-warrior commented May 31, 2020

Add minimal checks as we recommended in the coders to reduce the exception footprint in the coders. It will capture all/most of IO exceptions. For your proposed scenario, we already capture that in CloseBlob() with ferror(). All we need to do is to check the status of CloseBlob() and throw an exception if its not MagickFalse. Something like

If (CloseBlob(image) != MagickFalse)
  ThrowFileException(...)

@MerlijnWajer
Copy link
Author

Hm... If ferror was supposed to work, then why does the current code not print any errors?

... I suppose that for that to work, the return value of CloseBlob also needs to be taken into account when determining the status for each coder (this also currently gets ignored, at least for the coders I looked at).

Unless you want to check for ferror again in WriteImage?

@urban-warrior
Copy link
Member

urban-warrior commented May 31, 2020

As our use case, check CloseBlob() in coders/pnm.c. That should capture all I/O errors:

  if (CloseBlob(image) != MagickFalse)
    {
      (void) ThrowMagickException(exception,GetMagickModule(),CorruptImageError,
         "AnErrorHasOccurredWritingToFile","`%s'",image->filename);
      return(MagickFalse);
    }

@MerlijnWajer
Copy link
Author

MerlijnWajer commented May 31, 2020

This feels fragile. It only works if everything uses fwrite, this is not the case.

It looks like the blob code also uses gzwrite, from zlib, and this function uses write(2) internally, not fwrite(3), so there is no way you would be able to use glibc ferror(3) to catch the errors in gzwrite, right?

@MerlijnWajer
Copy link
Author

More specifically, if WriteBlob relies on anything that uses anything other than fwrite(3), the ferror approach breaks down, since ferror only works on things that go entirely through glibc. zlib doesn't do that, for example.

@urban-warrior
Copy link
Member

urban-warrior commented May 31, 2020

CloseBlob() check the blob status. If it returns MagickFalse, it has an error from a file, pipe, stdin, etc. If any of these fail, an exception will be throw per the recommended practice:

  if (CloseBlob(image) != MagickFalse)
    {
      (void) ThrowMagickException(exception,GetMagickModule(),CorruptImageError,
        "AnErrorHasOccurredWritingToFile","`%s'",image->filename);
      return(MagickFalse);
    }

We can do the same check for the readers with the AnErrorHasOccurredReadingFromFile exception severity.

@MerlijnWajer
Copy link
Author

So, as far as I can see, for anything that is not directly under your control, you'd have to explicitly check if ferror(3) can be used, and if the libraries use system calls directly - as zlib does - then there is no way to tell, other than actually checking every single write call. Otherwise the code would still fail silently.

@urban-warrior
Copy link
Member

Rather than speculate, we would need a use case so we can reproduce the problem. If a problem exists, we might be able to fix the problem within blob.c. Of course, you can check each write call as @dlemstra suggests: status&=Write..., to be complete, but its likely unnecessary. We have a catchall with checking the status of CloseBlob() until proven otherwise and then we fix any issues that we missed on a case by case basis.

@MerlijnWajer
Copy link
Author

The ferror(3) glibc call checks a glibc internal struct, so at least on glibc, anything that calls write(2) even once on the same fd will result in errors being missed. But you're right, I don't think speculating makes sense either. I've laid out where this can occur, I think, but I don't want to write up a proof-of-concept to convince you of that scenario.

So, assuming that:

  1. We'll want to fix this throughout the coders
  2. You don't want a macro to be used in this process
  3. You want to return the correct error.

I can rework my PR to add the inline functions to blob-private.h, rework the bmp and pnm handlers with this code, where it checks every write call, and stops on the first error.
I will have to ponder a little bit on how to preserve errno this way, if you'd insist on repeating write calls on the same fd even if an earlier one fails, to keep the code more compact.

@urban-warrior
Copy link
Member

Recall, you can choose to instead check every write. Consider what @dlemstra and @urban-warrior offered and converge on a final pull request. We'll squash and merge it and if it proves to be an optimal soluton, we can then clone it to the other coders.

@MerlijnWajer
Copy link
Author

I'll try to update this coming weekend at the latest and check the status of each call, thanks for the input. If I find the time I will also make an example with the gz write(2)/fwrite(3)/ferror(3) mixing.

@MerlijnWajer
Copy link
Author

MerlijnWajer commented Jun 9, 2020

I came up with one more idea, which I actually quite like.

Define/add a struct that can be passed to an (inline) wrapper around the various WriteBlob() calls, say SafeWriteBlob() for WriteBlob(). This would keep track of potential errors it encountered, store the first one, if any, and hang onto it. Could maybe look something like this: (again, please ignore casing of names and such, just writing out my thoughts);

struct IM_blob_status blob_status = // empty cleared struct;
SafeWriteBlobString(image,"255\n",&blob_status);
SafeWriteBlobString(image,"Hello\n",&blob_status);
SafeWriteBlobString(image,"Magick\n",&blob_status);
int ret = CheckBlobStatus(&blob_status, &potential_errno);
ClearBlobStatus(&blob_status);
if (ret != 0) { int olderrno = errno; errno = potential_errno; perror(...); olderrno = errno; }

SafeWriteBlob would be something like:

size_t SafeWriteBlob(Image *im, const size_t s, const void * d, struct IM_blob_status* bs) {
    ssize_t written = WriteBlob(im, s, d);
    if (written != s) {
        bs.errno = errno;
        bs.error_status = 1;
    }
}

int CheckBlobStatus(struct IM_blob_status* bs, int* potential_errno) {
    int ret = bs.error_status != 1;
    if (potential_errno != NULL)
        potential_errno = bs.errno;
}

//ClearBlobStatus left as an exercise for the reader

This has the added advantage that we can store errno on the first error, and preserve it, until CheckBlobStatus is called. In the extreme, you could call it only once you're done with a file (just before CloseBlob). Then it would still be "safer" than ferror(3) since it explicitly checks, but your code would not be much more complicated.

And of course, if you really like the idea API, you could integrate it into the blob API somehow.

When I find some time I'll try to update my PR to this, if you like the idea.

@urban-warrior
Copy link
Member

ImageMagick already tracks each exception that is thrown. That is, ImageMagick does not replace an exception when a new one is thrown. Instead it keeps a list. We can replicate your idea simply by throwing additional exceptions inside of the Blob methods. Right now we return a status. Instead, we can issue warning/error exception and when the exceptions are revealed to the user, a list of all I/O exceptions are included. Any exception can be caught by checking the blob method status and/or checking exception->severity > warning.

Any blob method that include the Image* parameter can directly throw an exception. For the low-level methods that do not include an Image*, we would need to add an ExceptionInfo structure to track exceptions and push them to the Image* parameter of the GetBlobError() method.

CheckBlobStatus() is not required. You can do that now with the existing GetBlobError().

We like your ideas and we do need to robustly address possible I/O exceptions, but we certainly want to leverage existing functionality and apply the KISS principle in addressing the issue, whenever possible.

If you agree with adding exceptions to the blob methods, let us know. We can code up that solution and you can then leverage the functionality in your PNM patches.

@MerlijnWajer
Copy link
Author

Yeah, using existing mechanisms seems fine. KISS is fine. So then you would do what my suggested SafeWriteBlob does, but with exceptions rather than an explicit struct with the state?

And then the recommended way to use this functions would to:

  1. Check after a batch of writes (for small separate header writes, just do all the writes and then check)
  2. In the loop with one main write, check after that write.

I can imagine you don't want to have a list with >100000 exceptions because every write on a large image fails, so that's probably good to keep in mind. (The single errno + status bit doesn't have this problem, fwiw)
Maybe it makes sense to limit the exceptions in the blob to one, or a few at most?

If you can create an example with the exceptions in the blob methods, I'd be happy to leverage them to the PNM coder and see how it goes.

@urban-warrior
Copy link
Member

urban-warrior commented Jun 9, 2020

Ok, here's the plan. We'll will add exception handling insider of the Blob handler by the end of this weekend. We'll add one check only in coders/pnm.c to show the promise of this solution. We will then want feedback from you and @dlemstra to finalize our approach. Once we're all in agreement, you can finalize your patches to the PNM coder. Once reviewed and approved, we export your model to the other coders. Thanks for the discussion. Its important to get this right before we unleash our changes to over 130 coders modules.

Note, the exception handler limits the # of exceptions to 1000 to help prevent excessive reporting.

@MerlijnWajer
Copy link
Author

Sounds good, thanks for being patient with me & replying quickly!

@urban-warrior
Copy link
Member

urban-warrior commented Jun 9, 2020

The proper place for catching I/O exceptions is in the blob methods. Our patch resolves most if not all potential I/O exceptions within MagickCore/blob.c such that little or no patching is required in any of the coders. Although we could check the return value of each read/write, it is not necessary as any I/O exception is now noted via errno when GetBlobError() is called. We will push the patch, likely sometime tomorrow. With the patch, we get expected results:

magick logo.png /magick/logo.pnm
magick: unable to write file `/magick/logo.pnm' @ error/constitute.c/WriteImage/1283.

urban-warrior pushed a commit to ImageMagick/ImageMagick6 that referenced this pull request Jun 10, 2020
@MerlijnWajer
Copy link
Author

Ok, I see the commits made it in the master branch. I'll update this PR this weekend - thanks!

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jun 19, 2020
2020-06-12  7.0.10-19 Cristy  <quetzlzacatenango@image...>
  * Release ImageMagick version 7.0.10-19, GIT revision 17343:e552d22:20200612

2020-06-09  7.0.10-19 Cristy  <quetzlzacatenango@image...>
  * Improve checking for write failures (reference
    ImageMagick/ImageMagick#2081).

2020-06-08  7.0.10-18 Cristy  <quetzlzacatenango@image...>
  * Release ImageMagick version 7.0.10-18, GIT revision 17333:d071c2032:20200608

2020-06-08  7.0.10-18 Cristy  <quetzlzacatenango@image...>
  * Colorspace change will remove ICC profile (reference
    ImageMagick/ImageMagick6#82).

2020-06-07  7.0.10-17 Cristy  <quetzlzacatenango@image...>
  * Release ImageMagick version 7.0.10-17, GIT revision 17311:8b5350f:20200607

2020-06-03  7.0.10-17 Cristy  <quetzlzacatenango@image...>
  * Free up memory after a ICC profile is removed.

2020-05-31  7.0.10-16 Cristy  <quetzlzacatenango@image...>
  * Release ImageMagick version 7.0.10-16, GIT revision 17294:5be1abe:20200531

2020-05-30  7.0.10-16 Cristy  <quetzlzacatenango@image...>
  * Fix PDF XREF directory for image sequences with and without ICC profiles.

2020-05-29  7.0.10-15 Cristy  <quetzlzacatenango@image...>
  * Release ImageMagick version 7.0.10-15, GIT revision 17282:9294896:20200529

2020-05-24  7.0.10-15 Cristy  <quetzlzacatenango@image...>
  * Clipping was not returning expected results (reference
    ImageMagick/ImageMagick#2061).
  * Don't write a ICC profile to PDF if the image is gray (reference
    ImageMagick/ImageMagick#2070).

2020-05-22  7.0.10-14 Cristy  <quetzlzacatenango@image...>
  * Release ImageMagick version 7.0.10-14, GIT revision 17268:e9c804c93:20200522

2020-05-22  7.0.10-14 Cristy  <quetzlzacatenango@image...>
  * Errant warning when reading a profile file (reference
    ImageMagick/ImageMagick#2030).
   * Fix one off error on PDF object for images with ICC profile.
@MerlijnWajer
Copy link
Author

Sorry, I was quite busy. I checked out the current work, and it seems to resolve the problems:

$ ./utilities/magick convert /tmp/test.pnm /mnt/test/test.bmp
convert: UnableToWriteFile '/mnt/test/test.bmp': No space left on device @ error/bmp.c/WriteBMPImage/2461.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants