Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible RAVU improvement ideas #10

Closed
haasn opened this issue Aug 3, 2017 · 15 comments
Closed

Possible RAVU improvement ideas #10

haasn opened this issue Aug 3, 2017 · 15 comments

Comments

@haasn
Copy link
Collaborator

haasn commented Aug 3, 2017

To make sure they don't get forgotten to time:

  • Add naive anti-ringing. You could do something like bilinear or bicubic sampling along the “edge direction” in a line (or small bundle of offset lines), gathering a few samples, and clamping your output pixel to this value range. Alternatively, you could try adapting the “in-place” antiringing filter from my antiring.hook and adding it as a separate post-RAVU pass, slightly adjusted to account for the fact that you introduce an offset.

  • Train kernels differently? Right now, you said you use a linear regression to combine multiple scaler kernels - but how do you actually choose the kernels to combine to begin with? Are they jinc functions? Or am I misunderstanding how the algorithm works?

@bjin
Copy link
Owner

bjin commented Aug 3, 2017

Add naive anti-ringing.

If naive anti-ringing is not done in-place, it can always be applied independent from prescaler. Save the texture before prescaling, filtering after prescaling with offset considered. So I'm not in favor of integrating naive anti-ringing unless it can be proven to have reasonably small quality loss (in PSNR/SSIM, for example). But I will at least try some in-place anti-ringing method, probably with edge direction considered. I'm currently working on smoothtest2, which uses an even more aggressive smooth function, there are also a few other ideas I also would like to try to reduce ringing fundamentally.

Right now, you said you use a linear regression to combine multiple scaler kernels - but how do you actually choose the kernels to combine to begin with? Are they jinc functions? Or am I misunderstanding how the algorithm works?

It's not how RAVU works. No math functions like jinc are used during training. Just plain linear regression. Once key is obtained from local gradient, samples will be added into corresponding bucket. For each bucket (associated with particular key, an angle+strength+coherence combination), least square method is used to solve the linear regression problem (which minimizing mean squared error, i.e. PSNR).

I mentioned combination of different existing kernel function, just to demonstrate an idea that tries to make RAVU work with arbitrary scaling factor. But I later realized the model will only be trained to "reverse" certain downscaling kernel. So please ignore it.

@haasn
Copy link
Collaborator Author

haasn commented Aug 3, 2017

It doesn't make sense to use a linear regression for a convolution, surely? Why not fit a cubic function or something instead?

@bjin
Copy link
Owner

bjin commented Aug 3, 2017

It doesn't make sense to use a linear regression for a convolution, surely? Why not fit a cubic function or something instead?

You mean cubic function as model (weighted sum of x^3 for all sample x), or as cost function for training?

I assume you mean that we need to use alternative cost function like cubic function. Then we have to use algorithm like gradient descent to solve it. Doable but requires more effects, and much much longer training time.

@haasn
Copy link
Collaborator Author

haasn commented Aug 3, 2017

I think I'm completely lost. What exactly is linear? What do the weights look like? My point is that I'd expect the trained weights themselves to look like (appropriately shaped) jinc functions, but your description makes it seem like they're linear functions or something.

@bjin
Copy link
Owner

bjin commented Aug 3, 2017

Okay, I will try to explain. Suppose we are interpolate from four (known) points p1,p2,p3,p4 with convolution kernel w1,w2,w3,w4, the final result is p_predict=p1*w1+p2*w2+p3*w2+p4*w4. There is also a p_real which is the actual value from training image. Now we collected a huge amount of (p1,p2,p3,p4) and p_real pairs (samples). We want to solve a linear regression problem, finding some solution (w1_min,w2_min,w3_min,w4_min), that for all samples, the sum of (p_real-p_predict)^2 is minimized.

The final answer (w1_min,w2_min,w3_min,w4_min) which minimized the cost function will also maximize the overall PSNR. But those four values (weight function) are not linear. It's the model (convolution kernel, the way to calculate p_predict) that is linear.

@haasn
Copy link
Collaborator Author

haasn commented Aug 3, 2017

Oh, I see now. Hmm; I still wonder if you could find a way to visualize the kernels for different keys. Would almost surely help with optimizing RAVU; if we could get a visual understanding of what's going on.

@bjin
Copy link
Owner

bjin commented Aug 23, 2017

I still wonder if you could find a way to visualize the kernels for different keys.

Here is the visualization of all ravu-r3 and ravu-r4 convolution kernels grouped by (angle, strength, coherence) keys.

red is negative weights, blue is positive weights. Both Are normalized and sigmoidized.

There are 24 columns, from angle=0 to angle=23, means the direction of main gradient contributor.

There are 21 rows, divided into three row groups, from coherence=0 to coherence=2, means whether strength of second gradient contributor is similar to strength of main gradient contributor.

Each row group is divided into 9 rows, from strength=0 to strength=8, means strength of main gradient contributor.

//!DESC RAVU r3 visualizer
//!HOOK MAIN
//!BIND HOOKED
//!BIND ravu_lut3
//!WIDTH 144
//!HEIGHT 162

const int radius = 3;
const int quant_angle = 24;
const int quant_strength = 9;
const int quant_coherence = 3;

const int n = radius * 2;
const int width = quant_angle * n; // 144
const int height = quant_strength * quant_coherence * n; // 162

const vec4 red = vec4(1.0, 0.0, 0.0, 0.0);
const vec4 blue = vec4(0.0, 0.0, 1.0, 0.0);
const vec4 white = vec4(1.0, 1.0, 1.0, 0.0);

vec4 hook() {
    ivec2 pos = ivec2(floor(HOOKED_pos * vec2(float(width), float(height))));
    int angle = pos.x / n;
    int coherence = pos.y / n / quant_strength;
    int strength = pos.y / n % quant_strength;
    int id = (pos.x % n) * n + (pos.y % n);
    float w = texelFetch(ravu_lut3, ivec2(id / 4, (angle * quant_strength + strength) * quant_coherence + coherence), 0)[id % 4];
    w *= n * n;
    w = w / (1 + abs(w));
    if (w < 0) {
        w = -w;
        return mix(white, red, vec4(w));
    }
    return mix(white, blue, vec4(w));
}

ravu-r3-vis

//!DESC RAVU r4 visualizer
//!HOOK MAIN
//!BIND HOOKED
//!BIND ravu_lut4
//!WIDTH 192
//!HEIGHT 216

const int radius = 4;
const int quant_angle = 24;
const int quant_strength = 9;
const int quant_coherence = 3;

const int n = radius * 2;
const int width = quant_angle * n; // 192
const int height = quant_strength * quant_coherence * n; // 216

const vec4 red = vec4(1.0, 0.0, 0.0, 0.0);
const vec4 blue = vec4(0.0, 0.0, 1.0, 0.0);
const vec4 white = vec4(1.0, 1.0, 1.0, 0.0);

vec4 hook() {
    ivec2 pos = ivec2(floor(HOOKED_pos * vec2(float(width), float(height))));
    int angle = pos.x / n;
    int coherence = pos.y / n / quant_strength;
    int strength = pos.y / n % quant_strength;
    int id = (pos.x % n) * n + (pos.y % n);
    float w = texelFetch(ravu_lut4, ivec2(id / 4, (angle * quant_strength + strength) * quant_coherence + coherence), 0)[id % 4];
    w *= n * n;
    w = w / (1 + abs(w));
    if (w < 0) {
        w = -w;
        return mix(white, red, vec4(w));
    }
    return mix(white, blue, vec4(w));
}

ravu-r4-vis

EDIT: fix ravu-r3-vis.png

@haasn
Copy link
Collaborator Author

haasn commented Aug 24, 2017

As I thought, we could almost certainly make use of mirror symmetry in this file, and probably also rotational symmetry within each “section”. So we could reduce the width to one fourth of what it is currently.

@haasn
Copy link
Collaborator Author

haasn commented Aug 24, 2017

Also, what does it change if you apply a linear factor to the negative weights? (e.g. 0.5 * w)

@bjin
Copy link
Owner

bjin commented Aug 24, 2017

So we could reduce the width to one fourth of what it is currently.

Yes, the training time and LUT size could be reduced by 75%. However, it won't make rendering significantly faster since the number of LUT calls is not reduced. Actually, it will become slower since we need to rotate/flip the weight matrix after they are fetched from LUT.

Also, what does it change if you apply a linear factor to the negative weights? (e.g. 0.5 * w)

It will make result noticeably blurrier (assuming all weights are normalized after that). If we want to clamp the negative weights, the proper way would be regularize it with cost function. For example, use MSE(predict_i - actual_i)+alpha*MSE(negative_weights) instead of MSE(predict_i - actual_i) alone. But then, we have to use general optimization method like gradient descent.

@haasn
Copy link
Collaborator Author

haasn commented Aug 24, 2017

Yes, the training time and LUT size could be reduced by 75%.

Well, you could still train on the “reduced” image set and then rotate it when generating the weight texture. That way you get more samples and fewer equations to train, which would in theory lead to a better result with fewer images; since the “identical” kernels will all have shared weights.

@bjin
Copy link
Owner

bjin commented Aug 24, 2017

Well, you could still train on the “reduced” image set and then rotate it when generating the weight texture.

I did that already. With rotation and flipping we have 7 times more samples.

@bjin
Copy link
Owner

bjin commented Sep 25, 2017

Closing since antiring is solved by smoothtest in some degree

@bjin bjin closed this as completed Sep 25, 2017
@haasn
Copy link
Collaborator Author

haasn commented Sep 5, 2019

However, it won't make rendering significantly faster since the number of LUT calls is not reduced.

Rereading through this and related issues to understand RAVU again, this comment stood out to me. Even if the number of LUT calls would not change, making the texture smaller improves cache locality which can make those LUT calls significantly faster.

Also, we don't need to extend this by complicated flipping logic, if we can arrange for the texture to be laid out with perfect horizontal/vertical symmetry we can simply use the "mirrored repeat" sampling mode, which reflects/mirrors any out-of-bounds texture read.

@bjin
Copy link
Owner

bjin commented Sep 6, 2019

Currently, the weight texture isn't exactly arrange in the way the above visualization pictures have shown. Instead, one dimension is the kernel ID (with size 24*9*3), the other dimension is location in the kernel (with size 4*radius*radius). Simple "mirrored repeat" probably won't work.

However, the weight texture indeed have been reduced to half of its size sometime ago, utilizing the symmetry. (Yes, the visualization shader above won't work with the current weight texture.) This change was meant to reduce the shader file size at that time though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants