support X-Trans sensor raw files #588

Merged
merged 48 commits into from Jul 14, 2014

Conversation

Projects
None yet
7 participants
@dtorop
Contributor

dtorop commented Jun 9, 2014

X-Trans specific changes:

  • use RawSpeed to read RAF files
  • downsampling for fast display
  • interpolation: bilinear, VNG, and modified AHD (aka Markesteijn), all adapted from dcraw; these operate on floats and use OpenMP
  • highlights, hotpixels, invert, rawdenoise, temperature handle X-Trans
  • disable cacorrect for X-Trans
  • white balance presets and basecurve for X100S
  • cameras.xml and adobe_coeff.c entries X-Series cameras
  • documentation updates
  • dngmeta.sh handles X-Trans

None of these changes should alter performance for users of Bayer sensors. The changes have been tested on Bayer and X-Trans matrix raws as well LDR files.

The code in this pull request doesn't:

  • handle "dynamic range" feature on X-Series cameras
  • implement SSE/OpenCL optimizations
  • implement cacorrect for X-Trans sensors
  • include white balance data and basecurves for X-Series cameras besides X100S

There is generally one commit for each major change. For the VNG/Markesteijn demosaic code, one commit imports/wraps code from dcraw, then another modifies this code to use floats/OpenMP and otherwise make it more specific to darktable.

This pull request closes #9951 and fixes #9388.

If it becomes necessary to update to the latest LibRaw, see separate branch libraw-0.16 (https://github.com/dtorop/darktable/tree/libraw-0.16). This pull request for the "xtrans2" branch keeps the extant LibRaw 0.14.7 (with the addition of a small patch to not break the temperature iop for the X-Pro1) so as not to add another LibRaw dependency.

src/common/image.h
@@ -204,6 +207,9 @@ dt_image_flipped_filter(const dt_image_t *img)
const int orient = dt_image_orientation(img);
uint32_t filters = img->filters;
+ if (filters == 9)
+ // x-trans 6x6 filter is rotated on raw image load

This comment has been minimized.

@LebedevRI

LebedevRI Jun 9, 2014

Member

Minor issue: comment looks ugly.
Either add braces for if, or move comment to the end of return, or something else.

@LebedevRI

LebedevRI Jun 9, 2014

Member

Minor issue: comment looks ugly.
Either add braces for if, or move comment to the end of return, or something else.

src/common/imageio_rawspeed.cc
@@ -163,6 +163,26 @@ dt_imageio_open_rawspeed(
img->flags &= ~DT_IMAGE_LDR;
img->flags |= DT_IMAGE_RAW;
if(r->getDataType() == TYPE_FLOAT32) img->flags |= DT_IMAGE_HDR;
+ // special handling for x-trans sensors
+ if ((r->cfa.size.x == 6) && (r->cfa.size.y == 6)) {

This comment has been minimized.

@LebedevRI

LebedevRI Jun 9, 2014

Member

Braces should be on new line

@LebedevRI

LebedevRI Jun 9, 2014

Member

Braces should be on new line

src/common/mipmap_cache.c
- out, (const uint16_t *)buf.buf,
- &roi_out, &roi_in, roi_out.width, roi_in.width,
- dt_image_flipped_filter(image));
+ if(image->filters!=9) // Bayer

This comment has been minimized.

@LebedevRI

LebedevRI Jun 9, 2014

Member

While i agree that original code had no braces, but new code with all it's if's would probably look better with braces (at least outermost - if(image->filters!=9) {} else {})

@LebedevRI

LebedevRI Jun 9, 2014

Member

While i agree that original code had no braces, but new code with all it's if's would probably look better with braces (at least outermost - if(image->filters!=9) {} else {})

src/develop/imageop.c
+ const int samples = round(px_footprint/3);
+
+#ifdef _OPENMP
+ #pragma omp parallel for default(none) shared(xtrans, out) schedule(static)

This comment has been minimized.

@LebedevRI

LebedevRI Jun 9, 2014

Member

Idea: make xtrans really const and remove it from shared()

@LebedevRI

LebedevRI Jun 9, 2014

Member

Idea: make xtrans really const and remove it from shared()

src/develop/imageop.c
+ float num[3] = {0.0f};
+
+ float fx = (x + roi_out->x)*px_footprint;
+ int px = (int)fx;

This comment has been minimized.

@LebedevRI

LebedevRI Jun 9, 2014

Member

Hm, maybe ceil/floor ?

@LebedevRI

LebedevRI Jun 9, 2014

Member

Hm, maybe ceil/floor ?

src/iop/demosaic.c
+/*
+ Frank Markesteijn's algorithm for Fuji X-Trans sensors
+ */
+void

This comment has been minimized.

@LebedevRI

LebedevRI Jun 9, 2014

Member

This function(and probably others, didn't check yet) is not supposed to be used outside of this file, so it should be marked as static.

@LebedevRI

LebedevRI Jun 9, 2014

Member

This function(and probably others, didn't check yet) is not supposed to be used outside of this file, so it should be marked as static.

src/iop/demosaic.c
+ // algorithm avoid discontinuities at image edges.
+#define TRANSLATE(n,size) ((n<6)?(6-n):((n>=size-6)?(2*size-n-20):(n-6)))
+#ifdef _OPENMP
+ #pragma omp parallel for default(none) shared(in, image, xtrans, roi_in) schedule(static)

This comment has been minimized.

@LebedevRI

LebedevRI Jun 9, 2014

Member

I believe in,xtrans,roi_in can be made really const and be removed from shared()

@LebedevRI

LebedevRI Jun 9, 2014

Member

I believe in,xtrans,roi_in can be made really const and be removed from shared()

src/iop/demosaic.c
+ roi_out->x = 0;
+ roi_out->y = 0;
+
+ for (int row=0; row < roi_out->height; row++)

This comment has been minimized.

@LebedevRI

LebedevRI Jun 9, 2014

Member

Seems like good place for OpenMP

@LebedevRI

LebedevRI Jun 9, 2014

Member

Seems like good place for OpenMP

src/iop/demosaic.c
@@ -764,9 +1404,11 @@ process (struct dt_iop_module_t *self, dt_dev_pixelpipe_iop_t *piece, void *i, v
}
else
{
- // sample half-size raw
+ // sample half (or 1/3)-size raw

This comment has been minimized.

@LebedevRI

LebedevRI Jun 9, 2014

Member

State that 1/3 is for X-Trans, e.g:
sample half-size raw (Bayer), or 1/3 (X-Trans)

@LebedevRI

LebedevRI Jun 9, 2014

Member

State that 1/3 is for X-Trans, e.g:
sample half-size raw (Bayer), or 1/3 (X-Trans)

src/iop/demosaic.c
@@ -82,10 +84,18 @@ typedef struct dt_iop_demosaic_data_t
}
dt_iop_demosaic_data_t;
+#define DEMOSAIC_XTRANS 1024 // maks for non-Bayer demosaic ops

This comment has been minimized.

@LebedevRI

LebedevRI Jun 11, 2014

Member

s/maks/mask/

@LebedevRI

LebedevRI Jun 11, 2014

Member

s/maks/mask/

src/develop/imageop.c
+ */
+void
+dt_iop_clip_and_zoom_demosaic_third_size_xtrans(
+ float *const out,

This comment has been minimized.

@LebedevRI

LebedevRI Jun 11, 2014

Member

Much better, but i should have stated that clearer in the first place: openmp shared() should contain only buffers that you write. Buffers that are only being read should be const.

In this case, you write to out in the end of the function:

float *outc = out + 4*(out_stride*y);
...
outc[0] = sum[0] / 65535.0f / num[0];
@LebedevRI

LebedevRI Jun 11, 2014

Member

Much better, but i should have stated that clearer in the first place: openmp shared() should contain only buffers that you write. Buffers that are only being read should be const.

In this case, you write to out in the end of the function:

float *outc = out + 4*(out_stride*y);
...
outc[0] = sum[0] / 65535.0f / num[0];

This comment has been minimized.

@dtorop

dtorop Jun 11, 2014

Contributor

Yes, I'm just sorting through confusion on this! When I declare out shared, the compiler complains:

imageop.c:2316:48: error: ‘out’ is predetermined ‘shared’ for ‘shared’
   #pragma omp parallel for default(none) shared(out) schedule(static)
                                                ^

Here out is declared float *const out. Does OpenMP only require the pointer itself to be declared shared() (if it is non-const), and not enforce anything about sharing of the pointed-to data?

I was surprised that, when I made a bunch of pointers float *const, the compiler mandated dropping them from the omp parallel clauses. It doesn't seem sensible to declare them float * and put them back as shared() if the pointer itself doesn't change? Any experience on this is most welcome!

@dtorop

dtorop Jun 11, 2014

Contributor

Yes, I'm just sorting through confusion on this! When I declare out shared, the compiler complains:

imageop.c:2316:48: error: ‘out’ is predetermined ‘shared’ for ‘shared’
   #pragma omp parallel for default(none) shared(out) schedule(static)
                                                ^

Here out is declared float *const out. Does OpenMP only require the pointer itself to be declared shared() (if it is non-const), and not enforce anything about sharing of the pointed-to data?

I was surprised that, when I made a bunch of pointers float *const, the compiler mandated dropping them from the omp parallel clauses. It doesn't seem sensible to declare them float * and put them back as shared() if the pointer itself doesn't change? Any experience on this is most welcome!

This comment has been minimized.

@dtorop

dtorop Jun 14, 2014

Contributor

Am still skeptical about writing shared (out) which seems to share the pointer but not the underlying data. Not to mention that as out is declared float *const out, OpenMP doesn't allow sharing of it anyhow without dropping the const.

That said, declaring out as float *out and shared(out) is in keeping with the rest of the dt code. If this is questionable, better that I take it up overall sometime later. There seems to be no performance difference with shared() or without. As I'm new to OpenMP, I'm happy to defer here, rather than being (except in this note) pedantic.

(Does sharing out create a critical section around writes to the memory to which out points? Not that that's necessary in this case, as each thread writes to its own section of memory.)

@dtorop

dtorop Jun 14, 2014

Contributor

Am still skeptical about writing shared (out) which seems to share the pointer but not the underlying data. Not to mention that as out is declared float *const out, OpenMP doesn't allow sharing of it anyhow without dropping the const.

That said, declaring out as float *out and shared(out) is in keeping with the rest of the dt code. If this is questionable, better that I take it up overall sometime later. There seems to be no performance difference with shared() or without. As I'm new to OpenMP, I'm happy to defer here, rather than being (except in this note) pedantic.

(Does sharing out create a critical section around writes to the memory to which out points? Not that that's necessary in this case, as each thread writes to its own section of memory.)

src/develop/imageop.c
+ for (int ii=0; ii < 3; ++ii)
+ for (int jj=0; jj < 3; ++jj)
+ {
+ const uint8_t c = xtrans[(j+jj+roi_in->y)%6][(i+ii+roi_in->x)%6];

This comment has been minimized.

@LebedevRI

LebedevRI Jun 11, 2014

Member

I would recommend defining function something like this:

inline uint8_t
FCxtrans(const uint8_t (*const xtrans)[6]), size_t x, size_t y)
{
  return xtrans[x % 6][y % 6];
}

And use it everywhere (like we use FC already):

const uint8_t c = FCxtrans(xtrans, j+jj+roi_in->y, i+ii+roi_in->x);
@LebedevRI

LebedevRI Jun 11, 2014

Member

I would recommend defining function something like this:

inline uint8_t
FCxtrans(const uint8_t (*const xtrans)[6]), size_t x, size_t y)
{
  return xtrans[x % 6][y % 6];
}

And use it everywhere (like we use FC already):

const uint8_t c = FCxtrans(xtrans, j+jj+roi_in->y, i+ii+roi_in->x);
src/iop/demosaic.c
+ for (int pass=0; pass < passes; pass++)
+ {
+ if (pass == 1)
+ memcpy (rgb+=4, buffer, 4*sizeof *rgb);

This comment has been minimized.

@LebedevRI

LebedevRI Jun 11, 2014

Member

memcpy (rgb+=4, buffer, 4*sizeof *rgb);
/me didn't parse that

@LebedevRI

LebedevRI Jun 11, 2014

Member

memcpy (rgb+=4, buffer, 4*sizeof *rgb);
/me didn't parse that

This comment has been minimized.

@dtorop

dtorop Jun 11, 2014

Contributor

This is taken straight from dcraw and the fiendishly complicated mind of Dave Coffin (or the mysterious Frank Markesteijn?). Is it good to elucidate the succinct mysteries of dcraw via comments, or even to make this into a multi-line statement that reads more easily? It could become:

// bounce rgb data to second set of buffers on second pass
memcpy(rgb+4, rgb, 4*sizeof *rgb);
// work in this second set of buffers
rgb += 4;

But there's something amazing about the original dcraw code.

@dtorop

dtorop Jun 11, 2014

Contributor

This is taken straight from dcraw and the fiendishly complicated mind of Dave Coffin (or the mysterious Frank Markesteijn?). Is it good to elucidate the succinct mysteries of dcraw via comments, or even to make this into a multi-line statement that reads more easily? It could become:

// bounce rgb data to second set of buffers on second pass
memcpy(rgb+4, rgb, 4*sizeof *rgb);
// work in this second set of buffers
rgb += 4;

But there's something amazing about the original dcraw code.

This comment has been minimized.

@xsdg

xsdg Jun 11, 2014

Member

A comment from the peanut gallery: I would definitely avoid refactoring
code like this. For one, it's an easy way for someone to inadvertently
add bugs to the code. And beyond that, by essentially making a fork of
the original, it becomes unnecessarily hard to import newer versions of
the code when the upstream version changes down the line.

Keep in mind that future maintainers may not have as good of an idea of
what this code does, and where in the dcraw codebase it came from, as
you do.

--xsdg

@xsdg

xsdg Jun 11, 2014

Member

A comment from the peanut gallery: I would definitely avoid refactoring
code like this. For one, it's an easy way for someone to inadvertently
add bugs to the code. And beyond that, by essentially making a fork of
the original, it becomes unnecessarily hard to import newer versions of
the code when the upstream version changes down the line.

Keep in mind that future maintainers may not have as good of an idea of
what this code does, and where in the dcraw codebase it came from, as
you do.

--xsdg

This comment has been minimized.

@dtorop

dtorop Jun 11, 2014

Contributor

This is a good point. A caveat is that the dcraw code is already somewhat refactored. The original dcraw only works on 16-bit ints, so there are some changes to make it work with floats and OpenMP (see SHA: 0a87cec and SHA: 11c3ba9). There are other changes to make it in accord with dt coding style, and a big patch to change colorspace of homogeneity operation (SHA: ddee52a). Also a fix for cleaner edges (SHA: c9f46fc). Hence it's already becoming a fork.

I certainly agree that trying to clarify complicated code is a great way to introduce bugs.

One reason for importing that dcraw code just wrapped, then making commits to alter it, was in hopes that if dcraw changed, it would become clear how to keep this code in sync. As another case, dt's wavelet denoise code is also borrowed from dcraw, and has gone through quite a few permutations by now.

@dtorop

dtorop Jun 11, 2014

Contributor

This is a good point. A caveat is that the dcraw code is already somewhat refactored. The original dcraw only works on 16-bit ints, so there are some changes to make it work with floats and OpenMP (see SHA: 0a87cec and SHA: 11c3ba9). There are other changes to make it in accord with dt coding style, and a big patch to change colorspace of homogeneity operation (SHA: ddee52a). Also a fix for cleaner edges (SHA: c9f46fc). Hence it's already becoming a fork.

I certainly agree that trying to clarify complicated code is a great way to introduce bugs.

One reason for importing that dcraw code just wrapped, then making commits to alter it, was in hopes that if dcraw changed, it would become clear how to keep this code in sync. As another case, dt's wavelet denoise code is also borrowed from dcraw, and has gone through quite a few permutations by now.

src/iop/rawdenoise.c
+static void wavelet_denoise_xtrans(const float *const in, float *const out, const dt_iop_roi_t *const roi, float threshold, uint8_t (*const xtrans)[6])
+{
+ static const float noise[] =
+ { 0.8002,0.2735,0.1202,0.0585,0.0291,0.0152,0.0080,0.0044 };

This comment has been minimized.

@LebedevRI

LebedevRI Jun 11, 2014

Member

(Might be wrong) shouldn't this coeffs be different for xtrans?

@LebedevRI

LebedevRI Jun 11, 2014

Member

(Might be wrong) shouldn't this coeffs be different for xtrans?

This comment has been minimized.

@dtorop

dtorop Jun 11, 2014

Contributor

Good question. The wavelet denoise algorithm is originally from dcraw, which keeps the same coeffs for Bayer or X-Trans. The coeffs are used per channel in wavelet/hat-transformed space, as factored by threshold. The worst I can see here is that threshold parameter will have slightly different meaning in the x-trans case. Hard to see this from observation so far. Will look into this more.

@dtorop

dtorop Jun 11, 2014

Contributor

Good question. The wavelet denoise algorithm is originally from dcraw, which keeps the same coeffs for Bayer or X-Trans. The coeffs are used per channel in wavelet/hat-transformed space, as factored by threshold. The worst I can see here is that threshold parameter will have slightly different meaning in the x-trans case. Hard to see this from observation so far. Will look into this more.

@dtorop dtorop referenced this pull request in klauspost/rawspeed Jun 15, 2014

Merged

Fujifilm X-Series updates #15

src/iop/demosaic.c
+ }
+ float (*pix)[4] = image + row*width + col;
+ short *hex = allhex[row%3][col%3][0];
+ if (max==0)

This comment has been minimized.

@LebedevRI

LebedevRI Jun 17, 2014

Member

typeof max = float
Generally, floats can not be compared using ==
(And also not 0, but 0.0f)

@LebedevRI

LebedevRI Jun 17, 2014

Member

typeof max = float
Generally, floats can not be compared using ==
(And also not 0, but 0.0f)

This comment has been minimized.

@dtorop

dtorop Jun 17, 2014

Contributor

Thank you for the sharp eye on this! Checking if max==0.0f could be OK as it's really checking if 0.0f has been assigned to max somewhere else in the loop (as max is overloaded, serving both as a max vlaue and as a flag that it needs to be calculated)? But this whole section is awkward, due to my less-than-ideal conversion of dcraw's code to handle floats rather than ints. I'll work up an overall clean-up/clarification instead...

@dtorop

dtorop Jun 17, 2014

Contributor

Thank you for the sharp eye on this! Checking if max==0.0f could be OK as it's really checking if 0.0f has been assigned to max somewhere else in the loop (as max is overloaded, serving both as a max vlaue and as a flag that it needs to be calculated)? But this whole section is awkward, due to my less-than-ideal conversion of dcraw's code to handle floats rather than ints. I'll work up an overall clean-up/clarification instead...

src/iop/rawdenoise.c
+ const int width = roi->width;
+ const int height = roi->height;
+ const size_t size = (size_t)width*height;
+ float *const fimg = malloc(size*4*sizeof *fimg);

This comment has been minimized.

@LebedevRI

LebedevRI Jun 18, 2014

Member

One more sizeof *fimg

@LebedevRI

LebedevRI Jun 18, 2014

Member

One more sizeof *fimg

@dtorop

This comment has been minimized.

Show comment
Hide comment
@dtorop

dtorop Jun 25, 2014

Contributor

Things should be pretty sound with this. Though a rawtherapee developer reports (https://code.google.com/p/rawtherapee/issues/detail?id=2415#c51) export of X-Trans images in darktable causing a segfault. Not something which I have observed, though. If anyone can document a segfault (and backtrace!) it would be much appreciated. Regardless, it looks like the rawtherapee people are doing some great work on X-Trans sensors, and it would be interesting to cross-pollinate more.

Note also that https://github.com/dtorop/darktable/tree/xtrans2-dev has some more changes. This is to keep the xtrans2 branch frozen for review.

Contributor

dtorop commented Jun 25, 2014

Things should be pretty sound with this. Though a rawtherapee developer reports (https://code.google.com/p/rawtherapee/issues/detail?id=2415#c51) export of X-Trans images in darktable causing a segfault. Not something which I have observed, though. If anyone can document a segfault (and backtrace!) it would be much appreciated. Regardless, it looks like the rawtherapee people are doing some great work on X-Trans sensors, and it would be interesting to cross-pollinate more.

Note also that https://github.com/dtorop/darktable/tree/xtrans2-dev has some more changes. This is to keep the xtrans2 branch frozen for review.

@dtorop

This comment has been minimized.

Show comment
Hide comment
@dtorop

dtorop Jun 30, 2014

Contributor

The tiling commit (a075735) fixes a potential segfault for small values of host_memory_limit. With luck that is the segfault reported by the RawTherapee user.

Note that changes on xtrans2-dev not currently in this PR are:

  • VNG4 for Bayer sensors
  • remove "linear" demosaic for X-Trans in GUI as it is so low quality
  • some amazing optimizations to Markesteijn demosaic thanks to Ingo Weyrich of RawTherapee
  • noise profile for Fujifilm X100S

Am keeping these out of this PR in the name of a feature freeze.

Contributor

dtorop commented Jun 30, 2014

The tiling commit (a075735) fixes a potential segfault for small values of host_memory_limit. With luck that is the segfault reported by the RawTherapee user.

Note that changes on xtrans2-dev not currently in this PR are:

  • VNG4 for Bayer sensors
  • remove "linear" demosaic for X-Trans in GUI as it is so low quality
  • some amazing optimizations to Markesteijn demosaic thanks to Ingo Weyrich of RawTherapee
  • noise profile for Fujifilm X100S

Am keeping these out of this PR in the name of a feature freeze.

src/iop/demosaic.c
+FCxtrans(size_t y, size_t x,
+ const uint8_t (*const xtrans)[6])
+{
+ return xtrans[(y+6) % 6][(x+6) % 6];

This comment has been minimized.

@houz

houz Jul 1, 2014

Member

Why the +6? That's a NOP when doing %6 afterwards.

@houz

houz Jul 1, 2014

Member

Why the +6? That's a NOP when doing %6 afterwards.

src/iop/highlights.c
+ const dt_iop_roi_t *const roi,
+ const uint8_t (*const xtrans)[6])
+{
+ return xtrans[(row+roi->y+6) % 6][(col+roi->x+6) % 6];

This comment has been minimized.

@houz

houz Jul 1, 2014

Member

Unneeded +6 again.

@houz

houz Jul 1, 2014

Member

Unneeded +6 again.

src/iop/hotpixels.c
+ const dt_iop_roi_t *const roi,
+ const uint8_t (*const xtrans)[6])
+{
+ return xtrans[(row+roi->y+6) % 6][(col+roi->x+6) % 6];

This comment has been minimized.

@houz

houz Jul 1, 2014

Member

Guess what :)

@houz

houz Jul 1, 2014

Member

Guess what :)

@LebedevRI

This comment has been minimized.

Show comment
Hide comment
@LebedevRI

LebedevRI Jul 4, 2014

Member

@upegelow are you fine with the fact that usermanual changes are bundled into this PR?

Member

LebedevRI commented Jul 4, 2014

@upegelow are you fine with the fact that usermanual changes are bundled into this PR?

@@ -204,6 +207,11 @@ dt_image_flipped_filter(const dt_image_t *img)

This comment has been minimized.

@hanatos

hanatos Jul 6, 2014

Member

most of this function will be gone after integration of #584 but the logic should only get simpler, so probably no big deal to merge either way

@hanatos

hanatos Jul 6, 2014

Member

most of this function will be gone after integration of #584 but the logic should only get simpler, so probably no big deal to merge either way

src/common/imageio_rawspeed.cc
+ for (int i=0; i < 6; ++i)
+ for (int j=0; j < 6; ++j)
+ xorig[j][i] = r->cfa.getColorAt(i,j);
+ dt_imageio_flip_buffers((char *)xtemp, (char *)xorig, sizeof(uint8_t), 6, 6, 6, 6, 6, img->orientation);

This comment has been minimized.

@hanatos

hanatos Jul 6, 2014

Member

see above, the logic here will simplify if the only place we flip things is the orientation module.

@hanatos

hanatos Jul 6, 2014

Member

see above, the logic here will simplify if the only place we flip things is the orientation module.

This comment has been minimized.

@dtorop

dtorop Jul 6, 2014

Contributor

@hanatos: Indeed, #584 looks like a great change! The X-Trans CFA gets offset here according to the distance between raw and usable data, which can vary depending on which corner becomes top-left upon rotation. Once that is not a worry, it would also be possible to modify (as with Bayer sensors) the CFA data in cameras.xml to pre-offset it depending on the cropped usable raw area.

@dtorop

dtorop Jul 6, 2014

Contributor

@hanatos: Indeed, #584 looks like a great change! The X-Trans CFA gets offset here according to the distance between raw and usable data, which can vary depending on which corner becomes top-left upon rotation. Once that is not a worry, it would also be possible to modify (as with Bayer sensors) the CFA data in cameras.xml to pre-offset it depending on the cropped usable raw area.

+ const dt_iop_roi_t *const roi,
+ const uint8_t (*const xtrans)[6])
+{
+ return xtrans[(row+roi->y) % 6][(col+roi->x) % 6];

This comment has been minimized.

@hanatos

hanatos Jul 6, 2014

Member

modulo operations used to be dead slow. not sure it's still the case on newer processors. any evidence that this becomes a bottleneck? might be faster to just use a padded 8-wide array and do & instead of %?

@hanatos

hanatos Jul 6, 2014

Member

modulo operations used to be dead slow. not sure it's still the case on newer processors. any evidence that this becomes a bottleneck? might be faster to just use a padded 8-wide array and do & instead of %?

This comment has been minimized.

@dtorop

dtorop Jul 7, 2014

Contributor

From some quick profiling on a Core i7, speed appears limited by memory access, and the modulo isn't a bottleneck. I can't figure out how to make a power of 2 sized array work with the modulo 6 CFA (either padded or by repeating each rows/cols). Is it a mathematical impossibility or am I being dense? Any advice most welcome.

There are also some fast modulo 6 divisions (based on a fast modulo 3 http://homepage.cs.uiowa.edu/~jones/bcd/mod.shtml, https://stackoverflow.com/questions/1697358/fast-modulo-3-or-division-algorithm, http://compilers.iecc.com/comparch/article/99-10-056). Knowing that row & column will not be that large also helps, and this would also make things nicer with the negative row/col values which happen in demosaicing. But is this worth the obscure code if it isn't really a bottleneck -- and the compiler and CPU architecture are already doing a good job?

Regardless, as the CFA data is loaded into dt_image_t every time an image is read, and not stored in sidecar files or such, it should be possible to implement a changed format for it later if that becomes necessary.

@dtorop

dtorop Jul 7, 2014

Contributor

From some quick profiling on a Core i7, speed appears limited by memory access, and the modulo isn't a bottleneck. I can't figure out how to make a power of 2 sized array work with the modulo 6 CFA (either padded or by repeating each rows/cols). Is it a mathematical impossibility or am I being dense? Any advice most welcome.

There are also some fast modulo 6 divisions (based on a fast modulo 3 http://homepage.cs.uiowa.edu/~jones/bcd/mod.shtml, https://stackoverflow.com/questions/1697358/fast-modulo-3-or-division-algorithm, http://compilers.iecc.com/comparch/article/99-10-056). Knowing that row & column will not be that large also helps, and this would also make things nicer with the negative row/col values which happen in demosaicing. But is this worth the obscure code if it isn't really a bottleneck -- and the compiler and CPU architecture are already doing a good job?

Regardless, as the CFA data is loaded into dt_image_t every time an image is read, and not stored in sidecar files or such, it should be possible to implement a changed format for it later if that becomes necessary.

This comment has been minimized.

@dtorop

dtorop Jul 8, 2014

Contributor

Modulos don't look as slow as they used to be. The dtorop/darktable@1a67360 commit (to experimental xtrans2-dev branch) gets rid of modulos (though doesn't cleverly pack data) and gives an approx. 2% speed-up. Hence this doesn't seem worth dealing with in this PR.

@dtorop

dtorop Jul 8, 2014

Contributor

Modulos don't look as slow as they used to be. The dtorop/darktable@1a67360 commit (to experimental xtrans2-dev branch) gets rid of modulos (though doesn't cleverly pack data) and gives an approx. 2% speed-up. Hence this doesn't seem worth dealing with in this PR.

@@ -131,6 +131,8 @@ basecurve_preset_t;
static const basecurve_preset_t basecurve_presets[] =
{
// copy paste your measured basecurve line at the top here, like so (note the exif data and the last 1):
+ // contributed by Dan Torop

This comment has been minimized.

@hanatos

hanatos Jul 6, 2014

Member

nice :)

@hanatos

hanatos Jul 6, 2014

Member

nice :)

@dtorop

This comment has been minimized.

Show comment
Hide comment
@dtorop

dtorop Jul 7, 2014

Contributor

Just made some (final?) changes to the PR. It seems important to note that X-Trans CFA alignment currently behaves slightly differently from Bayer CFAs. Once #584 happens, it would be easier to make X-Trans and Bayer CFAs not diverge in shift handling.

Also, it seems important to hide linear demosaic in the GUI now, as it really is so inferior to other options and it would be a pity to have anyone trying this code have their history depend upon its presence.

Contributor

dtorop commented Jul 7, 2014

Just made some (final?) changes to the PR. It seems important to note that X-Trans CFA alignment currently behaves slightly differently from Bayer CFAs. Once #584 happens, it would be easier to make X-Trans and Bayer CFAs not diverge in shift handling.

Also, it seems important to hide linear demosaic in the GUI now, as it really is so inferior to other options and it would be a pity to have anyone trying this code have their history depend upon its presence.

@upegelow

This comment has been minimized.

Show comment
Hide comment
@upegelow

upegelow Jul 8, 2014

Member

After short discussion on IRC I had a look at the usermanual changes: all fine!!!

Member

upegelow commented Jul 8, 2014

After short discussion on IRC I had a look at the usermanual changes: all fine!!!

dtorop added some commits Jun 7, 2014

dt_image_t & filter flipping deal w/x-trans raws
- storage for the x-trans 6x6 pattern in dt_image_t as well the Bayer
  pattern
- if dt_image_t.filters (the Bayer demosaic pattern) is 9, should use
  dt_image_t.xtrans (the X-Trans demosaic pattern) instead
- dt_image_flipped_filter() is a nop for X-Trans raw files, as the
  x-trans filter will be flipped on image load
raw denoise for x-trans
Use per-channel nearest-neighbor interpolation rather than downsizing
the channel.
invert module for x-trans
Also flip roi_in x/y for some bayer cases and don't divide by
0xffff in bayer float case (typos?).
x-trans camera data
- white balance presets for x100s
- measured fujifilm x100s basecurve
more demosaic tuning for x-trans
- for x-trans don't downsample if scale > 33.3%
- don't use Bayer-only edge-aware median for < 1/3 scaled exports
- hide Bayer-only options in GUI
- use OpenMP for bilinear demosaic
- disable green-matching in demosaic for x-trans
- s/xtrans_lin_interpolate_f/xtrans_lin_interpolate/
- use a bitmask to show difference between Bayer and xtrans
  dt_iop_demosaic_method_t's
- default to X-Trans or Bayer demosaicer depending on image type

Regarding the first change: for Bayer don't downsample if scale > 50%.
For x-trans, the fast down-sample works off of 3x3 sensor matrix (which
is 1/4 of whole sensor pattern), rather than 2x2 as with Bayer.

dtorop added some commits Jun 9, 2014

don't block X-Pro1 RAFs
temperature iop needs to read their camera white balance preset
don't disable OpenCL demosaic for x-trans twice
it is already disabled in commit_params()
const and/or *const for function params & vars
and then remove them from OpenMP shared(). Don't 0 roi_out's origin in
xtrans_markesteijn_interpolate() and xtrans_lin_interpolate() as
process() already does that. Which allows for making it const as well.
avoid potential OpenMP race
Markesteijn works on overlapping tiles, so should should process
preceding tile before moving onto the next. Haven't observed any
problems in practice w/out this "ordered" directive, but better to be
safe?
clarify buffer handling and sizeof()'s
dcraw is very succinct -- some comments and spreading out of
statements may make it easier to see what is happening. Also use
parentheses around sizeof() args in keeping with dt style.
implement FCxtrans() as xtrans parallel to FC()
Replaces sundry macros or direct uses of xtrans data. Note that
FCxtrans() is defined slightly differently in various iops, with the
simplest possible definiton per iop. For those iops where FCxtrans()
is always relative to the topleft of roi, use that as a parameter.
Similarly the necessity to +6 the row/col in cases where these may be
>= -6. Also bypass const-correctness hoops in type decl of xtrans
param in a couple cases.

Also, add one more (size_t).
share output buffers
Am suspicious of this commit. It seems to me that as the pointers do
not change, and the different threads writing to their underlying data
do not have overlapping writes, that there is no need to enforce
sharing (or synchronization). Still, have changed this in accord with
overall dt usage, dt dev request, and my inexperience with OpenMP.
Review/explanation of this from those with OpenMP knowledge is most
appreciated!

(Note that had to redeclare these pointers without *const so that they
are allowable in shared() clauses.)
identical Bayer/xtrans wavelet denoise constants ok
Note why per-level wavelet denoise thresholds are same for Bayer and
X-Trans. Am not a wavelet expert, but this seems to be the right
answer. This is why dcraw (the source of this code) uses the same
constants for Bayer and X-Trans sensors.
don't need to set dcraw magic for x-trans filter
In dcraw, filters == 9 means an X-Trans image, which carries over to
code for dt handling X-Trans images. The latest RawSpeed sets filters
to 9 for X-Trans, so no need to do it again in imageio_rawspeed.cc.
Markesteijn: clarify/clean-up min/max green calc
- Use 0.0f instead of 0 (which was an artifact from ushort-based dcraw
  code)
- use consts when possible
- add comments to clarify implementation

A hope of the comments is to give a rationale for using == to compare
two floats. The code works as written, but if it continues to look
worrisome, a workaround could be to create a separate "new_pair"
boolean rather than assigning 0.0f to max to signify a new pair.
fix awkward sizeof, add more size_t casts
- dt-ify a dcraw-style sizeof in wavelet_denoise_xtrans()
- add (size_t) a few more places

Note that wavelet_denoise() has a similar dcraw-style alloc, but
haven't touched that so as to keep these patches x-trans specific.
Note also that wavelet_denoise() uses calloc instead of malloc to
quiet a clang scan-build warning. Perhaps the (size_t) cast will also
keep clang happy? No need to calloc() the buffer as the necessary
portion is memset to 0.
properly handle tiling for X-Trans images
- account for extra image buffer allocated in Markesteijn demosaic
- align/overlap properly for Markesteijn demosaic
- X-Trans image scale factor
update "supported file formats" in manual
RAW was listed twice as a format, make first instance into RAF.
FCxtrans parameter fixes
- highlights: don't need to +6 row & col params, they will never be
  negative

- demosaic and hotpixels: row & col params can be -1 or -2; note this
  in comments as the rationale for +6 then modulo 6

- row/col params for FCxtrans() should be const int, not size_t:
  allows for negative numbers (to not break demosaic/hotpixels), is
  still large enough, matches type used by callers, and enforces const

- demosaic: use row & col instead of y & x to match FC() and other
  FCxtrans()
eliminate X-Trans linear demosaic from UI
Linear is fast, but not that good quality. The actual
xtrans_lin_interpolate() code needs to remain, as
xtrans_vng_interpolate() uses it.

Conflicts:
	doc/usermanual/darkroom/modules/basic/demosaic.xml
	src/iop/demosaic.c
note X-Trans CFA alignment is different than Bayer
It may well be better for X-Trans to conform to the Bayer example, so
note this difference. This is especially true as, once the image
rotation happens in the flip IOP, it will no longer be necessary to
determine the shift of the X-Trans CFA depending on image rotation.
PR 584: no pre-flipping needed for X-Trans CFA
Now that flip happens in IOP flip rather than loading, don't need to
flip the X-Trans matrix on load. Also, as per that PR, make constants
unsigned for special "9" filter signifying an X-Trans matrix.
@dtorop

This comment has been minimized.

Show comment
Hide comment
@dtorop

dtorop Jul 8, 2014

Contributor

Rebased to handle #584 and made one more commit as no longer need to pre-flip the X-Trans CFA.

Contributor

dtorop commented Jul 8, 2014

Rebased to handle #584 and made one more commit as no longer need to pre-flip the X-Trans CFA.

@hanatos

This comment has been minimized.

Show comment
Hide comment
@hanatos

hanatos Jul 14, 2014

Member

good stuff, i think this is good to merge, congrats and thanks for all the hard work!

i'm sure more issues will pop up after a lot of users test it on their images :)

Member

hanatos commented Jul 14, 2014

good stuff, i think this is good to merge, congrats and thanks for all the hard work!

i'm sure more issues will pop up after a lot of users test it on their images :)

hanatos added a commit that referenced this pull request Jul 14, 2014

Merge pull request #588 from dtorop/xtrans2
support X-Trans sensor raw files

@hanatos hanatos merged commit 9d68914 into darktable-org:master Jul 14, 2014

@xsdg

This comment has been minimized.

Show comment
Hide comment
@xsdg

xsdg Jul 14, 2014

Member

On 07/14/2014 09:44 AM, hanatos wrote:

Merged #588 #588.

Woohoo! :o)

--xsdg

Member

xsdg commented Jul 14, 2014

On 07/14/2014 09:44 AM, hanatos wrote:

Merged #588 #588.

Woohoo! :o)

--xsdg

@dtorop

This comment has been minimized.

Show comment
Hide comment
@dtorop

dtorop Jul 15, 2014

Contributor

Yes, totally exciting! Many thanks... And will look forward to hearing about issues. Also more fixes/improvements are lurking in xtrans2-dev, I'll work on filtering those through.

Contributor

dtorop commented Jul 15, 2014

Yes, totally exciting! Many thanks... And will look forward to hearing about issues. Also more fixes/improvements are lurking in xtrans2-dev, I'll work on filtering those through.

@kanru

This comment has been minimized.

Show comment
Hide comment

kanru commented Jul 15, 2014

woot!

@dtorop

This comment has been minimized.

Show comment
Hide comment
@dtorop

dtorop Jul 15, 2014

Contributor

Note that this also fixes #9951 though perhaps that will require manual closing in redmine?

Contributor

dtorop commented Jul 15, 2014

Note that this also fixes #9951 though perhaps that will require manual closing in redmine?

@LebedevRI

This comment has been minimized.

Show comment
Hide comment
@LebedevRI

LebedevRI Jul 15, 2014

Member

@dtorop done.
I'll try to look into merging bayer and x-trans codepaths in IOPs, and in using SSE and OpenCL there.
(except probably demosaic algos)

Member

LebedevRI commented Jul 15, 2014

@dtorop done.
I'll try to look into merging bayer and x-trans codepaths in IOPs, and in using SSE and OpenCL there.
(except probably demosaic algos)

@dtorop

This comment has been minimized.

Show comment
Hide comment
@dtorop

dtorop Jul 15, 2014

Contributor

@LebedevRI Thank you and that would be great about merging bayer/x-trans codepaths. I was conservative about keeping them separate for the sake of a cleaner patch, but there's definitely overlap & it could happen in a way not to slow the Bayer path.

There are a bunch of speed-ups & memory reductions for Markesteijn demosaic which I'll package into a separate PR. Also there is a modification to VNG demosaic which lets it act as VNG4 for Bayer images, which could likewise be a separate PR. Only other pending code on my part is profiled denoise for Fujifilm X100S, again for a separate PR.

Contributor

dtorop commented Jul 15, 2014

@LebedevRI Thank you and that would be great about merging bayer/x-trans codepaths. I was conservative about keeping them separate for the sake of a cleaner patch, but there's definitely overlap & it could happen in a way not to slow the Bayer path.

There are a bunch of speed-ups & memory reductions for Markesteijn demosaic which I'll package into a separate PR. Also there is a modification to VNG demosaic which lets it act as VNG4 for Bayer images, which could likewise be a separate PR. Only other pending code on my part is profiled denoise for Fujifilm X100S, again for a separate PR.

@dtorop dtorop deleted the dtorop:xtrans2 branch Jul 16, 2014

@LebedevRI LebedevRI modified the milestone: 1.5 Oct 16, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment