Releases: bitbank2/JPEGDEC
Fixed for Arduino targets
Recent changes for Intel + Arm64 broke the MCU reference code. I stabilized the code again, but had to temporarily disable the M4/M7 SIMD optimizations. I'll work on a new release to re-enable them. Meanwhile... For x86/x64 and Arm64, the optimizations make the code quite a bit faster.
Fix 16-byte alignment of buffers for ESP32-S3 SIMD
I thought I had this working properly in build 1.4.0, but a user alerted me that the 16-byte alignment of the pixel buffer and MCU buffers wasn't always guaranteed, so this release adds specific code to ensure that both of those buffers are always 16-byte aligned.
Added ESP32-S3 SIMD
This release adds initial support for accelerating the decode by using the ESP32-S3's SIMD instructions. My measurements show a 20-40% speedup depending on the options. I wrote a short blog post about how I figured out how to use these instructions here:
https://bitbanksoftware.blogspot.com/2024/01/surprise-esp32-s3-has-few-simd.html
Fixed simd conflict on Cortex-M
In the last release I added NEON optimizations for the output stage and unfortunately they were enabled for Cortex-M targets too. This caused a compiler error. This release fixes the issue.
RGBA 32-bit and initial aarch64 SIMD
I corrected some errors in the 16 different permutations of subsampling and scaling options. I also added an experimental set of code to optimize the color conversion for aarch64 (Arm NEON) for the 4:2:0 subsampling, full size output. On my MacBook Air M1, it doubles the decode speed. A 126K 938x698 file decodes in just 8 milliseconds (previously 15 milliseconds). I can optimize this code for x86 and Arm desktop usage, but need to evaluate the cost/benefit of investing the time. I believe my code can beat libjpeg-turbo for certain situations (if I fully deploy SIMD optimizations). Please let me know if you need this code optimized for your desktop application.
fixed compiler warnings for new RGB8888 code
The warnings of pointer type difference (uint32_t * vs uint16_t *) can create errors depending on the compiler settings. This change casts all of the pointers to the correct type.
Added RGB8888 pixel type
This release adds support for outputting RGB8888 (RGBA 32-bit) pixels. All options are supported the same way. The output pixel buffer can only hold half as many 32-bit pixels compared to 16-bit pixels. I also updated the Linux example to allow file conversion of JPEG to Windows BMP files (32-bit pixels).
Added HTTP Stream API and indicator of clipped MCUs to draw callback
This release adds the open() method to work with streams and provides a new variable to the JPEGDRAW callback (iWidthUsed). For images which are not an exact multiple of MCUs wide, this variable allows you to know when to clip columns off the right edge, but doesn't change the bytes-per-line behavior of the output pixels. For example, if a non-subsampled color image is 33 pixels wide, the JPEGDraw callback will provide strips of pixels 40 wide (5 x mcu width 8). For the call containing the right edge, the iWidthUsed variable will be different from the iWidth variable to alert you of pixels to be clipped.
Added context/user pointer to draw callback
This release adds the ability to have a user-supplied context pointer passed to the JPEGDRAW callback function.
esp32_demo fix
This release fixes a problem with the decoder comparing the output Y value with the current Y when there is a starting offset. It also updates the esp32_demo sketch to reflect the changes to both the bb_spi_lcd library and JPEGDEC.