Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images being skewed by Frame Converters? #1115

Open
DBrown2019 opened this issue Dec 28, 2018 · 13 comments
Open

Images being skewed by Frame Converters? #1115

DBrown2019 opened this issue Dec 28, 2018 · 13 comments

Comments

@DBrown2019
Copy link

I've been working on a project that needs to convert between several different types of images (BufferedImage, Mat, PIX), and I've noticed a strange issue when using the JavaCV frame converters, wherein the output image will be heavily skewed. For example, The original BufferedImage:
buffimgtest
vs. The Mat after conversion:
mattest

Does anyone know what could be causing this?

@saudet
Copy link
Member

saudet commented Dec 28, 2018 via email

@saudet
Copy link
Member

saudet commented Dec 28, 2018 via email

@saudet
Copy link
Member

saudet commented Dec 29, 2018

We could enhance the frame converters to copy the buffers in those cases though.

@DBrown2019
Copy link
Author

Can you explain to me what you mean by custom strides? I'm generating the images by using ghost4j to convert from PDF to Java Image, then processing them as a Mat to deskew them, then returning them as an Image. As far as I can tell, the Image makes it through this intact, but has issues when I convert them again later.

I got these examples by using ImageIO.write() on the Image, then using Imgcodecs.write() for the Mat.
Maybe the conversion from BufferedImage to Mat causes the issue, but the conversion from Mat to BufferedImage undoes it? (That would definitely explain why I was having so much trouble finding this problem.)

@saudet
Copy link
Member

saudet commented Jan 3, 2019 via email

@DBrown2019
Copy link
Author

I see. You said earlier that I could resize the images to use that API. How would I go about doing that? I figure you don't mean just decreasing the resolution. Would I have to do something to change the strides somehow?

@saudet
Copy link
Member

saudet commented Jan 8, 2019 via email

@DBrown2019
Copy link
Author

Thank you, That worked! I made it so that it adds a little padding to the right when scanning them in to make the columns a multiple of four, and that fixed it. If you don't mind explaining, why is that the case? Why is this affected by the number of columns?

@saudet
Copy link
Member

saudet commented Jan 9, 2019

The Wikipedia article is a good introduction I find: https://en.wikipedia.org/wiki/Stride_of_an_array

@msmuenchen
Copy link
Contributor

msmuenchen commented Jan 27, 2024

I rewrote the converter function, it now supports 8-bit grayscale and 8-bit RGB without corrupting the stride of the PIX data or the color data in the case of non-aligned strides.

Unfortunately I had to drop big-endian support entirely as I lack a machine to test that. And I didn't touch the reverse function at all, maybe I'll do it when I have a bit of spare time but no promises...

Also, I removed BytePointer frameData, pixData; ByteBuffer frameBuffer, pixBuffer; out of the equation entirely. I assume these were originally used for the byte-order conversion but as that's gone there is no reason to keep the image data around in any buffers where it might leak.

import org.bytedeco.javacpp.BytePointer;
import org.bytedeco.javacpp.Pointer;
import org.bytedeco.javacv.Frame;
import org.bytedeco.javacv.LeptonicaFrameConverter;
import org.bytedeco.leptonica.PIX;

import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.IntBuffer;

public class FixedLeptonicaFrameConverter extends LeptonicaFrameConverter {
    PIX pix;

    static boolean isEqual(Frame frame, PIX pix) {
        return pix != null && frame != null && frame.image != null && frame.image.length > 0
                && frame.imageWidth == pix.w() && frame.imageHeight == pix.h()
                && frame.imageChannels == pix.d() / 8 && frame.imageDepth == Frame.DEPTH_UBYTE
                && (ByteOrder.nativeOrder().equals(ByteOrder.LITTLE_ENDIAN)
                || new Pointer(frame.image[0]).address() == pix.data().address())
                && frame.imageStride * Math.abs(frame.imageDepth) / 8 == pix.wpl() * 4;
    }

    public PIX convert(Frame frame) {
        if (frame == null || frame.image == null) {
            return null;
        } else if (frame.opaque instanceof PIX) {
            return (PIX) frame.opaque;
        } else if (!isEqual(frame, pix)) {
            //I simply lack a machine to test this.
            if (ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN)) {
                System.err.println("This converter does not support running on big-endian machines");
                return null;
            }
            //PIX data should be packed as tightly as possible, see https://github.com/DanBloomberg/leptonica/blob/0d4477653691a8cb4f63fa751d43574c757ccc9f/src/pix.h#L133
            //For anything not greyscale or RGB @ 8 bit per pixel, this involves more bit-shift logic than I'm willing to write (and I lack test cases)
            if (frame.imageChannels != 3 && frame.imageChannels != 1) {
                System.out.println(String.format("Image has %d channels, converter only supports 3 (RGB) or 1 (grayscale) for input", frame.imageChannels));
                return null;
            }
            if (frame.imageDepth != 8 || !(frame.image[0] instanceof ByteBuffer)) {
                System.out.println(String.format("Image has bit depth %d, converter only supports 8 (1 byte/px) for input", frame.imageDepth));
                return null;
            }
            // Leptonica frame scan lines must be padded to 32 bit / 4 bytes of stride (line) length, otherwise one gets nasty scan effects
            // See http://www.leptonica.org/library-notes.html#PIX
            int srcChannelDepthBytes = frame.imageDepth / 8;
            int srcBytesPerPixel = srcChannelDepthBytes * frame.imageChannels;

            // Leptonica counts RGB images as 24 bits per pixel, while the data actually is 32 bit per pixel
            int destBytesPerPixel = srcBytesPerPixel;
            if (frame.imageChannels == 3) {
                // RGB pixels are stored as RGBA, so they take up 4 bytes!
                // See https://github.com/DanBloomberg/leptonica/blob/master/src/pix.h#L157
                destBytesPerPixel = 4;
            }
            int currentStrideLength = frame.imageWidth * destBytesPerPixel;
            int targetStridePad = 4 - (currentStrideLength % 4);
            if (targetStridePad == 4)
                targetStridePad = 0;
            int targetStrideLength = (currentStrideLength) + targetStridePad;
            ByteBuffer src = ((ByteBuffer) frame.image[0]).order(ByteOrder.LITTLE_ENDIAN);
            int newSize = targetStrideLength * frame.imageHeight;
            ByteBuffer dst = ByteBuffer.allocate(newSize).order(ByteOrder.LITTLE_ENDIAN);
            /*
            System.out.println(String.format(
                    "src: %d bytes total, %d channels @ %d bytes per pixel, stride length %d",
                    frame.image[0].capacity(),
                    frame.imageChannels,
                    srcBytesPerPixel,
                    currentStrideLength
            ));
            System.out.println(String.format(
                    "dst: %d bytes total, stride length %d, stride pad %d",
                    newSize,
                    targetStrideLength,
                    targetStridePad
            ));
             */
            //The source bytes will be RGB, which means it will have to be copied byte-by-byte to match Leptonica RGBA
            //todo: use qword copy ops?
            byte[] rowData = new byte[targetStrideLength];
            for (int row = 0; row < frame.imageHeight; row++) {
                for (int col = 0; col < frame.imageWidth; col++) {
                    int srcIndex = (frame.imageStride * row) + (col * frame.imageChannels);
                    if (frame.imageChannels == 1) {
                        byte v = src.get(srcIndex);
                        rowData[col] = v;
                        //System.out.println(String.format("row %03d col %03d idx src %03d val %02x", row, col, srcIndex,v));
                    } else if (frame.imageChannels == 3) {
                        int dstIndex = col * destBytesPerPixel;
                        byte[] pixelData = new byte[3];
                        src.get(srcIndex, pixelData);
                        // Convert BGR (OpenCV's standard ordering) to RGB (Leptonica)
                        // See https://learnopencv.com/why-does-opencv-use-bgr-color-format/ and https://github.com/DanBloomberg/leptonica/blob/master/src/pix.h#L157
                        rowData[dstIndex] = pixelData[2]; //dst: r
                        rowData[dstIndex + 1] = pixelData[1]; //dst: g
                        rowData[dstIndex + 2] = pixelData[0]; // dst: b
                        rowData[dstIndex + 3] = 0;
                        //System.out.println(String.format("row %03d col %03d idx src %03d dst %03d val r %02x g %02x b %02x", row, col, srcIndex,dstIndex, pixelData[2], pixelData[1], pixelData[1]));
                    }
                }
                //And since pixel data in source is little-endian, but Leptonica is big-endian on 32-bit level, now invert accordingly...
                ByteBuffer rowBuffer = ByteBuffer.wrap(rowData);
                IntBuffer inverted = rowBuffer.order(ByteOrder.BIG_ENDIAN).asIntBuffer();
                //System.out.println(Arrays.toString(rowBuffer.array()));
                dst.position(row * targetStrideLength).asIntBuffer().put(inverted);
            }
            pix = PIX.create(frame.imageWidth, frame.imageHeight, destBytesPerPixel * 8, new BytePointer(dst.position(0)));
        }

        return pix;
    }
}

Test harness:

import org.bytedeco.javacpp.BytePointer;
import org.bytedeco.javacv.Frame;
import org.bytedeco.javacv.LeptonicaFrameConverter;
import org.bytedeco.javacv.OpenCVFrameConverter;
import org.bytedeco.leptonica.PIX;
import org.bytedeco.leptonica.global.leptonica;
import org.bytedeco.opencv.global.opencv_imgcodecs;
import org.bytedeco.opencv.opencv_core.Mat;
import org.bytedeco.opencv.opencv_core.Point;
import org.bytedeco.opencv.opencv_core.Scalar;
import org.opencv.core.CvType;

import java.nio.ByteOrder;

import static org.bytedeco.opencv.global.opencv_core.CV_8UC1;
import static org.bytedeco.opencv.global.opencv_core.CV_8UC3;
import static org.bytedeco.opencv.global.opencv_imgproc.line;

public class FrameConverterTest {
    public static void main(String[] args) {
        testBw(10,8);
        testBw(10,9);
        testBw(10,10);
        testBw(10,11);


        testRgb(10,8);
        testRgb(10,9);
        testRgb(10,10);
        testRgb(10,11);
    }
    public static void testBw(int rows, int cols) {
        LeptonicaFrameConverter lfcFixed = new FixedLeptonicaFrameConverter();
        OpenCVFrameConverter.ToMat matConverter = new OpenCVFrameConverter.ToMat();

        Mat originalImage=new Mat(rows,cols,CV_8UC1);
        int stepX=255/cols;
        int stepY=255/rows;
        int stepTotal=Math.min(stepX,stepY);
        for(int i=0;i<originalImage.rows();i++) {
            for(int j=0;j<originalImage.cols();j++) {
                line(originalImage, new Point(j, i), new Point(j, i), new Scalar(stepTotal*j));
            }
        }

        System.out.println(String.format("orig\n  capacity %d\n  w %d\n  h %d\n  ch %d\n  dpt %d\n  str %d",originalImage.asByteBuffer().capacity(),originalImage.cols(),originalImage.rows(),originalImage.channels(),originalImage.depth(),originalImage.step()));
        opencv_imgcodecs.imwrite(System.getProperty("user.dir")+"/mat-bw-"+rows+"x"+cols+".bmp", originalImage);

        Frame ocrFrame = matConverter.convert(originalImage);
        System.out.println(String.format("frame\n  capacity %d\n  w %d\n  h %d\n  ch %d\n  dpt %d\n  str %d",ocrFrame.image[0].capacity(),ocrFrame.imageWidth,ocrFrame.imageHeight,ocrFrame.imageChannels,ocrFrame.imageDepth,ocrFrame.imageStride));

        PIX converted = lfcFixed.convert(ocrFrame);
        System.out.println(String.format("fixconverted pix\n  capacity %d\n  w %d\n  h %d\n  ch %d\n  dpt %d\n  wpl %d",converted.createBuffer().capacity(),converted.w(),converted.h(),-1,converted.d(),converted.wpl()));
        leptonica.pixWrite(System.getProperty("user.dir")+"/pix-bw-"+rows+"x"+cols+".bmp", converted,leptonica.IFF_BMP);
    }
    public static void testRgb(int rows, int cols) {
        LeptonicaFrameConverter lfcFixed = new FixedLeptonicaFrameConverter();
        OpenCVFrameConverter.ToMat matConverter = new OpenCVFrameConverter.ToMat();

        Mat originalImage=new Mat(rows,cols, CV_8UC3);
        int stepX=255/cols;
        int stepY=255/rows;
        for(int i=0;i<originalImage.rows();i++) {
            for(int j=0;j<originalImage.cols();j++) {
                // Warning: OpenCV uses BGR ordering under the hood!
                // See https://learnopencv.com/why-does-opencv-use-bgr-color-format/
                line(originalImage, new Point(j, i), new Point(j, i), new Scalar(i*stepY,j*stepX,0,0));
            }
        }

        System.out.println(String.format("orig\n  capacity %d\n  w %d\n  h %d\n  ch %d\n  dpt %d\n  str %d",originalImage.asByteBuffer().capacity(),originalImage.cols(),originalImage.rows(),originalImage.channels(),originalImage.depth(),originalImage.step()));
        opencv_imgcodecs.imwrite(System.getProperty("user.dir")+"/mat-rgb-"+rows+"x"+cols+".bmp", originalImage);

        Frame ocrFrame = matConverter.convert(originalImage);
        System.out.println(String.format("frame\n  capacity %d\n  w %d\n  h %d\n  ch %d\n  dpt %d\n  str %d",ocrFrame.image[0].capacity(),ocrFrame.imageWidth,ocrFrame.imageHeight,ocrFrame.imageChannels,ocrFrame.imageDepth,ocrFrame.imageStride));

        PIX converted = lfcFixed.convert(ocrFrame);
        System.out.println(String.format("fixconverted pix\n  capacity %d\n  w %d\n  h %d\n  ch %d\n  dpt %d\n  wpl %d",converted.createBuffer().capacity(),converted.w(),converted.h(),-1,converted.d(),converted.wpl()));
        leptonica.pixWrite(System.getProperty("user.dir")+"/pix-rgb-"+rows+"x"+cols+".bmp", converted,leptonica.IFF_BMP);
    }
}

@msmuenchen
Copy link
Contributor

I also seriously question the usability of LeptonicaFrameConverter::isEqual - it doesn't compare the actual image data just the metadata, and on LITTLE_ENDIAN platforms it doesn't even compare the pointer address.

@saudet
Copy link
Member

saudet commented Jan 27, 2024

Great! Please open a pull request with that

@msmuenchen
Copy link
Contributor

@saudet sure, can do - I'd take the liberty and introduce unit tests for the test harness. Are you fine with junit5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants