Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPUJPEG with Vulkan #75

Open
wolfviking0 opened this issue Mar 1, 2022 · 9 comments
Open

GPUJPEG with Vulkan #75

wolfviking0 opened this issue Mar 1, 2022 · 9 comments

Comments

@wolfviking0
Copy link

Hi @MartinPulec,

I have a question, I have the crazy idea to want to run gpujpeg using Vulkan for that I was looking a project call Vuda : https://github.com/jgbit/vuda

The approach of this code it’s to wrap the cuda_runtime api with Vulkan. Then all kernel need to be transform into compute shader.

Do you think your kernel could be converted into opencl or compute shader ?

@MartinPulec
Copy link
Collaborator

Hi @wolfviking0,

thanks, I didn't know about that vuda, but as you mentioned, it would also need to convert the shaders to something that is compilable to SPIR-V (assuming that OpenCL would work).

Although it would be nice to do, I am afraid that I cannot promise anything now, it is mostly thing of priorities since this requires higher amount of work than small.

Anyways, what is exactly the point of running it with Vulkan? Support for cards other then NVIDIA? (or platform, namely mac?)

@wolfviking0
Copy link
Author

Yes for Mac and iOS but not only, I am also thinking for Android.

@MartinPulec
Copy link
Collaborator

MartinPulec commented Mar 2, 2022

Thanks for the info. But as I already noted, there there will probably not be enough development capacity for that in near future.

I've also a bit searched and there are projects (also here) strive to compile also CUDA kernels to something more generic but I am not sure how much effort would it require to convert to SPIR-V, looks like more than one step would be required. Perhaps OpenCL could be also considered as a conservative intermediate?

I have to admit, that from my point of view there is now currently an explosion of (compute) APIs that I don't know and it is a pity that CUDA toolkit doesn't compile/transpile to anything else (but it a bit understandable).

@RamKromberg
Copy link

I am not sure how much effort would it require to convert to SPIR-V

hipcl was integrated into chip-spv which converts HIP to SPIR-V backends like OpenCL and Level Zero. So, in theory, if you've already went with HIP, it would be just a compile away. Well, in theory... :D

@anthonyliot
Copy link

Hi @MartinPulec

Just wanted to give you a small update on a work I am trying to do, before going to the Vulkan road, I want to try OpenCL first. Mainly because OpenCL is already supported on many devices include Android.

To do that I initiate a fork, with a first step trying to get the code fully buildable on OSX.
The step one will be to try to get a GPUJPEG fully working on OSX using the CPU fallback. So far I am able to get everything built but the runtime is not yet working.

Its really the beginning, but I will try to keep working on it as much as I can
anthonyliot@cbd8a3b

My idea for now will be to get at compile time (during the config of cmake) the user to choose CUDA vs OPENCL vs CPU
Let me know what you think about it :)

$ ./gpujpegtool -I ../../imageloader/images/test.jpg 

GPUJPEG rev cbd8a3b
width: 300
height: 225
component count: 3
color space: YCbCr BT.601 256 Levels (YCbCr JPEG)
internal representation: 444-u8-p012 (4:4:4)
segment count: 1 (DRI = 0)

$ ./gpujpegtool -d ../../imageloader/images/test.jpg test.rgb

GPUJPEG rev cbd8a3b
Failed to create decoder!

@anthonyliot
Copy link

I started adding some function to test first decode, I am testing with this image
test

Right now the approach is really naive, replace cudaMalloc/cudaFree/cudaMemset/cudaMemcpy and all other function to malloc/free/memset/memcpy.
Some function are still missing "NOT YET IMPLEMENTED" but I think I almost have the pipeline, where I am not sure its for this missing one : [WARNING] gpujpeg_decoder_decode(): NOT YET IMPLEMENTED

rc = gpujpeg_preprocessor_decode(&decoder->coder, decoder->stream);

I am not sure what should I do here, any thought ??

 ./gpujpegtool -vvv -d ~/Desktop/github/imageloader/images/test.jpg test.rgb
GPUJPEG rev 89878b2
[WARNING] gpujpeg_init_device(): NOT YET IMPLEMENTED
[WARNING] gpujpeg_coder_init(): NOT YET IMPLEMENTED

Decoding Image [/Users/anthony.liot/Desktop/github/imageloader/images/test.jpg] to 444-u8-p012 RGB
Load Image:                0.13 ms
[GPUJPEG] [Warning] JPEG data contains not supported APP1 marker
[GPUJPEG] [Debug] coder image reconfiguration

Allocation Info:
    Segment Count:            1
    Allocated Data Size:      304x232
    Raw Buffer Size:          0.2 MiB
    Preprocessor Buffer Size: 0.2 MiB
    DCT Buffer Size:          0.4 MiB
    Compressed Buffer Size:   1.6 MiB
    Huffman Temp buffer Size: 1.6 MiB
    Structures Size:          0.4 KiB
    Total GPU Memory Size:    4.0 MiB

[WARNING] gpujpeg_decoder_decode(): NOT YET IMPLEMENTED
 -Stream Reader:         0.0000 ms
 -Copy To Device:        0.0000 ms
 -Huffman Decoder:       0.0000 ms
 -DCT & Quantization:    0.0000 ms
 -Postprocessing:        0.0000 ms
 -Copy From Device:      0.0000 ms
Decode Image GPU:        0.0000 ms (only in-GPU processing)
Decode Image Bare:       7.6489 ms (without copy to/from GPU memory)
Decode Image:            7.6489 ms
[GPUJPEG] [Warning] JPEG data contains not supported APP1 marker
Save Image:                0.59 ms
Decompressed Size:       202500 bytes [test.rgb]
[WARNING] gpujpeg_image_destroy(): NOT YET IMPLEMENTED
[WARNING] gpujpeg_coder_deinit(): NOT YET IMPLEMENTED
[WARNING] gpujpeg_decoder_destroy(): NOT YET IMPLEMENTED

@MartinPulec
Copy link
Collaborator

Hi @anthonyliot,

My idea for now will be to get at compile time (during the config of cmake) the user to choose CUDA vs OPENCL vs CPU
Let me know what you think about it :)

I think that having the OpenCL/CPU implementation is a very good idea. Anyways, instead of ifdef-ing everything, what about having structure of platform-dependent functions? I mean something like:

struct platform_fns {
        void *(*memcpy_h2d)(void *dst, const void *src, size_t size);
};
struct platform_fns fn_cpu = {
        memcpy
};
void *cuda_memcpy_h2d(void *dst, const void *src, size_t size) {
        cudaMemcpy(dst, src, size, cudaMemcpyHostToDevice);
        return dst;
}
struct platform_fns fn_cuda = {
        cuda_memcpy_h2d
};

/// instead of calling cudaMemcpy, there could be something like coder->fns->memcpy_h2d

But it can also be done retroactively. The advantage is that there doesn't need to be only primitive functions like memcpy, free but also actual computations, like the platform-specific preprocessor call.

@MartinPulec
Copy link
Collaborator

[...] where I am not sure its for this missing one : [WARNING] gpujpeg_decoder_decode(): NOT YET IMPLEMENTED

rc = gpujpeg_preprocessor_decode(&decoder->coder, decoder->stream);

I am not sure what should I do here, any thought ??

If I got it right, you have the output after iDCT and de-quantization but the postprocessor is missing? I am afraid that you'll need to make up one by yourself, but it is not too difficult. I've opened a discussion #89 not to mess it here too much.

@anthonyliot
Copy link

Hi @MartinPulec
Regarding your first comment you right, the correct approach will be not (only) #IFDEF approach (Still to be able to build for OSX / Android we need to be able to entirely disable include related to cuda), but instead use an approach similar to what you describe.
I was planning to also add something like a Device for the user to choose when initializing gpujpeg what acceleration to used.
Here how I wanted to go step by step.

1 - Be able to have a full running version on OSX (Maybe also Android) using CPU at compile time.
2 - Implement the factorization using a struct for the the platform dependent function only for CUDA and CPU
3 - Implement the CL version of it
4 - .... why not adding also webgpu ;) but let see where that goes from the first step :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants