-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add Color Filter Array (Bayer) #100
base: master
Are you sure you want to change the base?
Conversation
Link to document for YDgCoCg-R, which shows some theory and formulae. |
good that you ask. First, before the specification or draft can be changed there needs to be research and testing of compression algorithms to compress bayer. These results should be publically discussed on CELLAR and then we can decide what to do, possibly more algorithms would be suggested implemented and tested for speed, complexity and compression rate. Once all that is done, the best option could then be added to the specification. If i didnt miss some communication on this then i dont think we are at the stage where we can make any changes in the specification. |
The idea here is actually to not change the compression algorithm (on purpose): the suggested transformation is similar to what already exists with RGB (the "only" addition is a Dg, Difference Green, with preliminary tests about compression already done showing similar compression compared to merge green ), and reuse all other parts of the current specification (so no new algorithm, change in the encoder and decoder is limited to the count of components to handle and the transformation layer). This is an easy step, both in bitstream specification and encoders/decoders, for supporting a different "color space". In my opinion there is a difference between My analysis: I think it is better to support Bayer with an additional complexity near 0 today than waiting for a potential better algorithm, knowing that adding this In summary, I think it is better to have a basic support now at very low (near 0) cost than nothing due to high cost of R&D nobody is ready to pay. I already put a place holder on CELLAR: https://www.ietf.org/mail-archive/web/cellar/current/msg01398.html , I can put a copy of this text on CELLAR if you prefer. |
Note: the priority for me is to be able to send the current spec to IETF standardization, so this patch and this debate have a very lower priority than all other PRs related to clarification of what already exists. |
you have a point that this is a low complexity solution. But i disagree that doing better has high R/D cost also this is the wrong place to discuss it, clearly belongs to CELLAR so ill try to reply there. About priority, iam very interrested in extending ffv1 to bayer and other features, i must admit that the v3 standarization work is a bit boring compared to that :) |
It is not fun for me either, but it is a blocker for me for working on new features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also requires a sync with master since it's unmergable.
ffv1.md
Outdated
log2_h_chroma_subsample | ur | ||
log2_v_chroma_subsample | ur | ||
if (colorspace_type == 2) { | | ||
rggb_order | ur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is undefined
ffv1.md
Outdated
rggb_order | ur | ||
} else { | | ||
chroma_planes | br | ||
log2( h_chroma_subsample ) | ur |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires a global change of log2_h_chroma_subsample
to log2( h_chroma_subsample )
, same for vertical. Also this approach would require a version bump, so version 4, else an existing version 3 decoder would interpret these Parameters incorrectly, so I'd suggest a if (version >= 4)
outside of the if (colorspace_type == 2)
A decoder rejects the stream if unknown coder or colorspace so no need of version bump here IMO. I guess we need to decide what means "reserved" but IIRC e.g. h264 does same for reserved stuff (stream is just not decoded), I am in favor to not increase version number for reserved values we decide to use but it should be a global debate for all reserved values (then I adapt this PR) |
@JeromeMartinez ok with me. Still some hanging comments from #100 (comment) and a rebase is needed for review. |
Added "Inferred to be 0 if not present." for chroma_subsample. |
What is the status of this? |
On my side lack of sample files for any demonstration. |
Files should be available on the websites of the sensor and camera manufacturers. (I can send you some as soon as I will be back at the lab.) |
Finally got some files, directly supported by FFmpeg. I also would like to prioritize an effective support with small changes at short term versus lot of changes (based on studies of algorithms performance etc) at long term, so the new proposal keeps most of the internal structure of current FFV1 v0-v3:
This permits minimal changes in FFmpeg code for v0-v3, in order to have an immediate working implementation, and we could work on having a better support with common optimizations for YUV/RGB/RGGB in v4 (see below). Not reinventing the wheel, metadata for RGGB support are based on how TIFF/EP & EXIF (direct view of CFAPattern) add the support of RGGB:
Notes:
Beside the spec update, I implemented the support of RGGB in FFmpeg: FFmpeg patch for FFMPEG FFV1 RGGB support. That said, I remarked during my tests that not using JPEG2000-RCT provides a better compression (~10% better) with the samples I have (which are not representative). In my opinion using or not using JPEG2000-RCT is a more global issue not related to RGGB support, and should be a item to discuss during v4 about what to do (transform, no transform, for the whole stream or by slice; I actually already see some code about that in the v4 draft), so we should here just use current code (so JPEG2000-RCT) without complicated additional stuff (because it is the same bitstream versions and we just add a new |
| Other | reserved for future use | reserved for future use | reserved for future use | | ||
|
||
Restrictions: | ||
If `colorspace_type` is 1, then `chroma_planes` MUST be 1, `log2_h_chroma_subsample` MUST be 0, and `log2_v_chroma_subsample` MUST be 0. | ||
If `colorspace_type` is 2, then `chroma_planes` MUST be 1, `log2_h_chroma_subsample` MUST be 0, and `log2_v_chroma_subsample` MUST be 0, transparency MUST be 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of transparency MUST be 1
you should use the defined name of this value which is alpha_plane
. But this is a hack, since alpha_plane=1 implies that a transparency plane is present. You'd have to adjust the definition of alpha_plane (maybe rename it), so that you're not implying that there is a transparency plane when there isn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also should consider #47 (comment), which comments on the semantics of infrared scans stored in an alpha plane.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I "copied" the way RGB support is added, i.e. I keep previous spec as much as possible. I prepare another PR focused on this renaming, which should be merged before this one.
@@ -391,6 +391,10 @@ In JPEG2000-RCT, the coding order would be left to right and then top to bottom, | |||
|
|||
Y[1,1] Y[2,1] Cb[1,1] Cb[2,1] Cr[1,1] Cr[2,1] Y[1,2] Y[2,2] Cb[1,2] Cb[2,2] Cr[1,2] Cr[2,2] | |||
|
|||
### Color Filter Array | |||
|
|||
(TODO) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what coming here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general introduction to CFA, order of planes... as for the 2 other colors spaces.
Rebased + updated based on comments from Dave. |
Also I changed the title, as the previous title was no more relevant with the proposed changes. |
10% compression gain by not doing a step, sounds like a good idea. |
About the two green planes. IIUC the proposal does store "even" and "odd" green samples in 2 different planes. That would seperate the most correlated samples of the green plane, or am i misunderstanding ? |
I fully agree. |
Yes and no: we can also consider that we reused most of the current standard. It is compression ratio vs additional complexity.
Issue is that I have a poor set of sample files.
I can propose a solution "merging" both greens, but it makes the bitstream changes and code bigger, I am OK for trying to implement that but please in that case do not reject the changes due to too many changes and/or not enough samples. |
Just to be clear: I do not split a plane in 2 smaller ones, the CFA input is with 2 green "planes". So only one "way" to split is used, the one from the CFA hardware.
I understand that you think to merge in some way first and second greens. With:
I tested with my sample files:
One advantage of this scenario (merged G1 & G2) is that we keep the 4th plane for the transparency, but compression is worse (and if we want to keep transparency support, we can just decide that we can accept up to 5 planes). I think I tested all ideas now. My personal preference is to use the implementation which will be accepted in v3 (and v0/1?) spec before the standard is released, so please vote for your preference based on my tests (which are not exhaustive about sample files, but this is all I have right now) and I'll implement it, I see decision to take on the following parts ("permit both" = a flag in the header): If I have to choose, I would choose 1a (trying to avoid code complexity), 2c (not complex to have both), 3c (not complex to have both), mixed feelings about 4 (no RGGBA pix_fmt in FFmpeg or other tools, vs it does not harm to specify alpha plane without implementation, in case of future need so in practice implementations keep 4 planes up to a need of support of transparency) |
yes |
If the spectral response for the samples is the same then there is one plane. If the spectral response is not the same then one is not green. So there is only one green plane. what any API or hardware gives us is irrelevant, for all we know it could give us the samples in 8 planes of 2 bit per sample. That wouldnt result in us wanting to store 8 planes
There are many ways to subtract the planes, if this was attempted then it would be kind of storing the difference for each pixel based on its prediction from the other plane. For this all pixels of the first plane and also the red and blue planes might be usable to predict the 2nd green one. The same can be done with other planes itself too. This alone is probably weeks or months full time work if done and tested properly.
Try storing the green in diagonal scans. If that makes no sense, take a paper with RGGB raster on it and |
This is something I can not afford for the moment, I suggest that we find something intermediate, doing some tests in order to find a good ratio between time spent and compression. Wanting the best compression now will just have empty result as outcome, I think it is better to have something "good enough" than nothing, it is better IMO to demonstrate we already do not too bad compression instead of nothing for finding funding for improving compression. With my sample files, stream size is divided by 2 to 3, which is already very useful compared to just nothing as it is now.
Got it. Beside more code (you said it is fine so OK, in that case width changes every line, not too much difficult to handle but more complex code), and as we read memory in a "random" (not sequential e.g. 0,3 then 1,2 then 2,1 then 3,0) I am afraid about some memory speed impact (maybe low compared to a potential compression gain). Can we have this deal: I test (crappy but working FFmpeg code for coder and decoder, same framemd5 input and coder-decoder output) compression ratio with 45° rotation for green, on the sample files I have, then based on results we choose between the different options listed in this discussion? |
If I find enough time for this, I am planning to pass our implementation to Carl Eugen next week in London. We already discussed it briefly at the Video Dev Days one month ago in Paris. (I am afraid, my health is too weak for the toxic and unwelcoming environment inside FFmpeg.) |
This can be reduced by applying techniques derived from the so-called artificial intelligence (to me it’s one of the rare cases where it makes sense to use it). I guess, I mentioned that en passant at No Time to Wait last year in Vienna. It’s exactly what we did when we implemented Bayer on our side, but we did only what was actually useful in production. I do not have the money to pay for fundamental research… To do it in a systematic way could possibly be an interesting project for the next G**gle Summer of Coding. |
I think you overestimate the abilities and interrest of students. |
I dont think the plane prediction is tied to bayer. This is a valuable and time consuming thing to work on but i dont think its blocking for this here. We can continue without plane- plane prediction and maybe add a placeholder field in the header to select other predictions
It should be possible to limit the impact by reordering the samples in blocks to maximize cache use before encoding / after decoding the lines. But thats something for the future, its to early to optimize this i think
I think design choices for "IETF standards/drafts" are not made by such deals ;) |
Possibly. (I was just thinking that AI is currently a hype.) |
Indeed. This is the same we do when digitising Dufaycolor (in this case by rotating by ca. 23° the sensor, in order to obtain horizontal and vertical aligned «pixels»). Of course, it also reduces aliasing. |
Work in progress, I would like to be sure there is an agreement about the path to use before adapting more things (quant_table_set_index_count...)
Since this is a new
colorspace_type
value, we can add it to FFV1 v0-v3 without breaking anything (old decoders would only reject the file with "colorspace not supported" message)It depends on #89, only the last commit is part of this PR.