-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic differenciation for complex numbers #29
Comments
So this is actually working entirely by accident; from Zygote's perspective a complex number is just a pair of reals I'd be happy to help you start to add real support for this though. The first step would be to add gradient definitions for The other thing apart from making this work is developing a real API for it; but see here for some good thinking on that. |
I just added those definitions in f2a8560, so a couple of simple things work now. julia> gradient(x -> abs2(log(x)), 1+2im)
(0.21478062759933578 - 0.29549955871733735im,) Let me know if I can help with any more support for this stuff. |
Zygote is like a magic! I sincerely hope it can be stablized and documented as soon as possible, so that we can try cool stuffs on it. Since I am not familiar with this fresh new framework. Could you please explain a bit why it is able to treat complex as a tuple by accident? Is it decoding the underlying structure of an instance? |
Glad you like it :) That's the gist of it. As a source-to-source AD, Zygote sees (and has to differentiate through) everything in your code, including things like the creation of structs. So if you write something like struct Foo; x; end
foo.x This effectively has a gradient definition like @grad getfield(f::Foo, :x) = f.x, dx -> ((x = dx,),) Where we've by default decided that the "gradient" of the foo struct should be a named tuple. (You can of course change this, which is what I did in the complex numbers commit). |
Even with the fix on master I think this is incorrect? using Zygote
myeps = 1e-8
mydiff(f,z) = (f(z+myeps)-f(z))/myeps, (f(z+im*myeps)-f(z))/myeps
f(z) = abs2(log(z))
@show gradient(f,1.0+2im)
@show mydiff(f,1.0+2im)
I'm not 100% sure how that |
To clarify what's going on, a simple example :
The gradient computation is (with h an infinitesimal)
hence the gradient is (imag(1/z), real(1/z)) = -0.5, 0.5. I'm guessing currently the AD code does something like assuming imag is holomorphic, hence the error? |
@antoine-levitt Thanks for explaination, but I think Zygote has already treated an imaginary number as a tuple of real numbers. The problem here is probably in log? One thing we should notice is gradients are different from derivatives for complex numbers (the first equation in |
An easy fix should be wrapping a ‘conj’ for every holomophic functions gradients in backward pass. |
Indeed, that was the problem; the correct definition is
I got confused by the mentions of holomorphism; this has nothing to do with holomorphism (it's just that for holomorphic functions the gradient operation happens to be a complex multiplication). There's the possibility of nasty bugs here, with gradients defined with real numbers in mind getting used with complex numbers. As long as the real functions are typed appropriately this should be fine. |
You are right, and it seems the function in f2a8560 should be fixed. |
Yeah this is probably just a bug in the gradient definition. I've pushed a fix but I haven't been that careful about implementing this (we really need gradient checking tests and so on); I'm happy to help anyone who wants to hack on that or improve things generally. |
Would this be expected to work in the latest release? using Zygote
f(z) = z^2
gradient(f, 1+5im) Currently it gives an error "Function output is not scalar" because of this check. Now I know the complex derivative isn't defined for non-holomorphic functions so I'm not sure if edit: I no longer think it's a good idea to have |
Seems fishy to use |
I think it'd be fine to just change that restriction to |
Bump. I Really would like this to happen. The following errors on
If some changes to master are needed and they are not extraordinarily complicated, if you guide me I'm more than happy to do it myself. |
Fixed that example on 2c30968; I just forgot the gradient for the julia> gradient(x -> x^2, 1+5im)
(2 - 10im,) I'm closing this issue for now, I think the core is there now (broadcast works too thanks to #100) so we can open new ones for any specific things that break. |
I think this is a really bad idea. It will silently fail if the function is not holomorphic:
or even if the function is holomorphic but has non-holomorphic parts inside:
The only target this could plausibly aim at is the optimized computation of holomorphic C^n to C functions (https://en.wikipedia.org/wiki/Several_complex_variables). This seems to me to be a marginal use case compared to computation of derivatives of holomorphic C->C functions (better tackled by forwarddiff) or gradients of C^n->R functions (which already works), and can be done efficiently by computing the gradient of the real part and using the Cauchy-Riemann equations. |
The second example is a bug that would have come up anyway, I pushed a fix. julia> gradient(x -> (real(x)+imag(x)*im)^2, 1+5im)
(2 - 10im,) In general any issues that come up with this are going to come up if you take the gradient of For the |
I just don't think gradients of complex-valued functions should be defined at all. Assuming that you still want to do it, there are three cases, for scalar-valued functions:
Given a C^n to C function, how do you choose what to return between the last two? This change chose version 2. Given this, there are two options: either blacklist non-holomorphic operations (real, imag, conj, abs) on complex numbers, or accept the fact that a wrong result will be silently returned on functions that are non-holomorphic in any part of their computational graph, even if they are themselves holomorphic (the This is why I think the gradient of complex-valued functions should just be left undefined. Users can choose if they want to compute jacobians or holomorphic derivatives themselves.
Why? The gradient of a C^n to R function is well-defined, and follows by just seeing C as R^2. |
Hmm, I just tried and:
This is wrong, according to the only "reasonable" definition of gradient for a C^n to R function (nabla_f[i] = re(df/dxi)+im(df/dxi)*i). This is the definition that follows from seeing ComplexF64 as a struct of two reals, and the definition that is useful for e.g. gradient descent optimization. |
Ok, so we need to be careful about any non-holomorphic functions in DiffRules. Fixed in e1b00f7. I agree that neither of those options are good, but I don't yet see that they are inevitable. Let's zero in on:
I'd like to see an example, work through any issues with our current gradient definitions and see that there's a fundamental conflict. I don't currently see how this can be the case: there's only one gradient for a holomorphic function, and if Zygote doesn't get that right then it's a bug. |
Related to the just submitted fix in DiffRules: Is it really possible to approach this with the suggested black list? Isn't a white list a better solution as most functions will not be holomorphic? To the more general question: I would be eager to provide examples, but as @antoine-levitt alluded to, currently Zygote does not seem to have an official stance on what is the definition of "derivative of C^n -> C" function. One definition is "a 2n by 2 Jacobian of real values" (it is the only general one I can think of). Other definitions have to deal with "is the function holomorphic or not". In particular, here are some confusing results on current master: All of these should be 2x2 real matrices in my opinion: julia> gradient(z -> real(z), 10+20im) #
(1 + 0im,)
julia> gradient(z -> imag(z), 10+20im) # I do not understand this result
(0 + 1im,)
julia> gradient(z -> real(z)+imag(z), 10+20im) # this seems definitely wrong
(1 + 1im,) Compare these two, which are supposedly the same function, but one of them is explicitly holomorphic
|
I think I disagree with this statement. There are derivative-like operations that make sense in mathematical sets that have nice structure (e.g. lie derivatives or directional derivatives in differential geometry; complex derivatives of holomorphic functions in complex analysis). These derivative-like operations share the important property that they are basis-independent. When implemented on a computer (unless we are using symbolic algebra systems), we usually choose a basis (e.g. x+iy or r*exp(iφ) for complex analysis). The moment we have chosen a basis, all the nice basis-independent entities can be expressed numerically, but we also loose all promises that the derivative-like operations continue to make sense. The real numbers are an exception, because they are 1D where these issues simply do not arise. For everything else we have to explicitly choose to interpret expressions in basis-independent form (e.g. holomorphic functions act like that) or as basis-dependent form (e.g. using real and imag and conj). There is no general way to know whether a particular basis-dependent expression is actually equivalent to a basis-independent one. It is in general unsolvable to know whether a mixture or imag/real/conj/abs/angle ends up being a holomorphic function. Edit: The use of the word "basis" above was mathematically imprecise / abusive. The correct term would be a "coordinate patch" or something similar. |
The cases you've listed are actually the easy ones: if the function output Zygote's current position on complex gradients is that y, pullback = forward(f, x)
dx = pullback(1) This naturally generalises to complex numbers; it's actually disabling them that involves a special case. You can also think of this as being the gradient for The only other option we have, AFAICT, is to return a Jacobian. But I think this has to be an explicit, separate function. Reasoning: writing |
Thanks, that clarified some things. The gradient as defined for a The derivative of explicitly holomorphic The intermediate regime between the two (functions that are holomorphic, but the way they are written does not make that explicit) is very confusing for me... Could you please help me reason through this example (I think it is a bug): # this is the z->z function written in polar coordinates
julia> gradient(z -> sqrt(abs2(z))*exp(1im*angle(z)), 10+20im)
(0.19999999999999996 + 0.4000000000000001im,) EDIT: there were a couple more examples, that were not actually as problematic. I erased them in order to not distract. |
To rephrase, for It either returns Or it returns the holomorphic derivative, which is useful for holomorphic functions. This also happens to be the first Wirtinger derivative. And it happens to be a sum of In the language of this post, currently Zygote switches between the "realistic" and the "native" view of the problem, and it is not clear how it makes that choice: http://www.ekinakyurek.com/complex-derivatives-wirtinger/ |
Maybe
|
Hm, thinking a bit more, I see I was straw-manning against something that was proposed for forwarddiff, and so I was confused: if done correctly, Zygote will actually do the right thing: (implicitly) propagate the 2x2 jacobian inside the computational graph, and only assume holomorphicness for the last step. This is actually quite clever. Then OK, this thing should actually work: it's just that for non-holomorphic functions it will not return the full derivative info (that would be the 2x2 jacobian). This has the potential to be very confusing and should be documented appropriately at some point, but at least it's consistent. Now there's just to make sure it works correctly. |
Ok, great, I'm glad we're all on roughly the same page – this has certainly been an enlightening discussion for me. @Krastanov yes, that was indeed a bug which is fixed now. I'll also cross reference that ForwardDiff issue: JuliaDiff/ForwardDiff.jl#157. To summarise: we have two derivatives we might be interested in, the Wirtinger derivatives and the "Jacobi" ones (i.e. columns of the equivalent
Hopefully it's clear that Zygote isn't switching behaviour at any point here, it just happens that the current Jacobi derivative lines up with more standard things where they exist. Ideally, where it doesn't (complex non-holomorphic), we'd just throw an error, but of course we can't tell automatically. |
One thing that might be confusing is that the gradient (as defined here, ie the gradient of the real part of f) of a holomorphic function is the conjugate of the usual derivative. As long as this is well-documented somewhere it's fine. |
Yes, @ssfrr just raised that point on slack as well. It's an extra complication. I think this is probably in some sense consistent with the fact that Zygote produces adjoints, but we should think through it carefully. Hope he won't mind me reposting his thoughts:
I think it makes some sense for |
Normally, what we care about is the back propagation of Consider a function BTW: Is Can't agree more on seperating |
Yes, to be clear the core technical problem is solved, and so gradients of |
From @MikeInnes
I'd think |
Yes, I've disabled that in d82581f (that statement wasn't meant as an argument for or against that choice.). Although I've specifically disabled |
Great, I checked several cases quickly (master branch), the gradient turns out to be correct. |
@MikeInnes, thank you, this is pretty amazing. And yes, your last post made it very clear that "it's clear that Zygote isn't switching behaviour at any point here". |
Ok, we now have docs up that clarify these things: http://fluxml.ai/Zygote.jl/dev/complex/ Thanks again for the discussion, and please do report any bugs! |
I find
Zygote
supports complex number AD, I like it more now! e.g.But I have several questions,
Question 1:
Why it returns a
NamedTuple
rather than a complex number.Question 2:
Why the following expression complaints not implemented error
log
is holomophic and complex gradient forabs
is already defined in Zygote, there is no reason to break here. I suggest to create some traits for functions (is it possible?!) to help program knowlog
is holomophic.Question 3:
About convention, is the way of
Zygote
defining gradients for complex numbers proper?We have two conventions here
I think they are conjugate to each other.
I have implementation of the latter one here
https://github.com/GiggleLiu/poorman_nn/blob/44b54d4a86f689f528a301e3a4db6b05210bb16a/poornn/functions.py#L1034
I choose the latter convention because we want
gradient(log, x) = log'(x) = 1/x
rather thangradient(log, x) = log'(x)' = 1/x^*
, it is more convenient for holomophic functions, right?The text was updated successfully, but these errors were encountered: