Question about the theory in the paper #14

Frankie123421 · 2022-09-06T12:23:25Z

Hi Xu,

First of all thanks for your nice work. I've read your paper, and I have some questions on the proof of the equivariance of the transition kernel. In detail, suppose $\mathcal{C}^t$ is roto-translation invariant, and (thus) $\mu_{\theta}(\mathcal{C}^t, \mathcal{G}, t)$ is roto-translation equivariant with desgined GNN, we need to prove that $p(\mathcal{C}^{t-1} | \mathcal{C}^{t}, \mathcal{G}, t)$ is equivariant. I wonder if it is due to the following derivation:

$$\begin{aligned}
p(R \mathcal{C}^{t-1} + g | R \mathcal{C}^{t} + g, \mathcal{G}, t) &= \mathcal{N}(R\mathcal{C}^{t-1} + g; \boldsymbol{\mu}{\theta}(R\mathcal{C}^t + g, \mathcal{G}, t), \sigma_t^2 \mathbf{I}) \
&= \frac{1}{(2 \pi)^{\frac{p}{2}}|\boldsymbol{\Sigma}|^{-\frac{1}{2}}} e^{-\frac{1}{2}(R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta))^T \boldsymbol{\Sigma}^{-1}(R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta))} \
&= \frac{1}{(2 \pi)^{\frac{p}{2}}|\boldsymbol{\Sigma}|^{-\frac{1}{2}}} e^{-\frac{1}{2}(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta)^T \boldsymbol{\Sigma}^{-1}(\mathcal{C}^{t-1}-\boldsymbol{\mu}_\theta)}
\end{aligned}$$

where $\boldsymbol{\Sigma} = \sigma_t^2 \mathbf{I}$. I am not sure if it's correct, hope to receive your clarification. Thanks.

MinkaiXu · 2022-09-06T19:12:08Z

The derivation you showed seems correct. But I think there are some minor miss-points:

Since we only consider CoM-free systems, $C^t$ and $\mu_{\theta}(\mathcal{C}^t, \mathcal{G}, t)$ are both translation-invariant.
And they should be both rotationally equivariant.

I feel like your derivation is actually based on the above statements. Ping me if you still have any questions.

Frankie123421 · 2022-09-07T03:35:16Z

Thanks for your kind and prompt reply. I have some new questions based on your response.

My current understanding is that if we ensure that the initial density $p(x_T)$ is rotation and translation invariant (from moving to CoM-free systems and keep isotropic Gaussian), and $p(x_{t-1}|x_{t})$ is rotation and translation invariant, then $p(x_{T-1}), p(x_{T-2}), \dots, p(x_0)$ will naturally all be rotation and translation invariant in a step-by-step manner. (It is correct?)
What do you mean by "And they should be both equivariant", rotationally?
But if 1 is correct, then

$$
\mu_\theta(\mathcal{C}^t, \mathcal{G}, t)=\frac{1}{\sqrt{\alpha_t}}(\mathcal{C}^t-\frac{\beta_t}{\sqrt{1-\bar{\alpha}t}} \epsilon\theta(\mathcal{G}, \mathcal{C}^t, t))
$$

will be rotation and translation invariant no matter how $\epsilon_{\theta}$ is designed since $R\mathcal{C}^t + g = \mathcal{C}^t$, so I doubt that 1 is not correct. And my original derivation is actually base on $\mu_\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu_\theta(\mathcal{C}^t, \mathcal{G}, t) + g$.

I am quite confused now )-: . Really looking forward to your answer. Thanks in advance.

MinkaiXu · 2022-09-07T04:11:57Z

I think the statement is correct.
Yes, I have updated my reply :)
Sorry but I'm also kind of confused about your statement...

Firstly, $R\mathcal{C}^t + g \neq \mathcal{C}^t$. By considering CoM free system, we can only have $R\mathcal{C}^t + g = R\mathcal{C}^t$ (since we always move all conformations to zero CoM).
Second, similarly, $\mu_\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu_\theta(\mathcal{C}^t, \mathcal{G}, t)$.
And besides, I didn't fully get the paradox to point 1. As I say, $C^t$ is rotationally equivariant, and $\epsilon$ should also be equivariant for making $\mu$ equivariant.

Frankie123421 · 2022-09-07T04:22:44Z

Thanks for your reply. I think the main point is that my statement in 1 says $\mathcal{C}^t$ is rotationally invariant, but as what you said above, $\mathcal{C}^t$ is actually rotationally equivariant. I wonder why and doesn't it reveal that 1 is wrong? My understanding is that because $\mathcal{C}^t$ are always isotropic Gaussian, they are rotationally invariant rather than equivariant. I think I must misunderstand something. Sorry for confusing you, and also looking forward to your answer.

MinkaiXu · 2022-09-07T06:59:47Z

$C^t$ is data, not distribution, e.g., it is an $N \times 3$ tensor. With rotation, the tensor will also rotate. Invariance is the property of the Gaussian distribution, not the data (tensor) itself.

Frankie123421 · 2022-09-07T08:43:19Z

Thanks for your reply. I kind of got it now, but still have some questions. Before raising them, I would like to carefully ask that:

how to ensure that $\mathcal{C}^{t-1}$ is also in the CoM-free system? Is it induced from previous $\mathcal{C}^{t}$ (in the CoM-free system) and equivariant Markov transition kernel? Since as $\mu_\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu_\theta(\mathcal{C}^t, \mathcal{G}, t)$, and if $R\mathcal{C}^{t-1} + g \neq R\mathcal{C}^{t-1}$, we can't get $R \mathcal{C}^{t-1} + g - \boldsymbol{\mu}{\theta}(R\mathcal{C}^t + g, \mathcal{G}, t) = R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta)$ as the previous derivation, and thus the kernel is even not equivariant, except that $R\mathcal{C}^{t-1} + g = R\mathcal{C}^{t-1}$ is pre-known.

I am quite confused about the logic here.

MinkaiXu · 2022-09-07T18:26:13Z

For ensuring $C^t$ to be a CoM-free system, actually one can just always move CoMs of any $C$ to zero, making translation-invariant an intrinsic property of $C$.

Then yes, as I have explained before, we have:

Firstly, $R\mathcal{C}^t + g \neq \mathcal{C}^t$. By considering CoM free system, we can only have $R\mathcal{C}^t + g = R\mathcal{C}^t$ (since we always move all conformations to zero CoM).
Second, similarly, $\mu_\theta(R\mathcal{C}^t + g, \mathcal{G}, t) = R\mu_\theta(\mathcal{C}^t, \mathcal{G}, t)$.

So the derivation is just simply $(R \mathcal{C}^{t-1} + g) - \boldsymbol{\mu}_{\theta}(R\mathcal{C}^t + g, \mathcal{G}, t) = R(\mathcal{C}^{t-1}-\boldsymbol{\mu}\theta)$.

Frankie123421 · 2022-09-08T01:33:13Z

Thanks for your response. Overall I see $\mathcal{C}^{t-1}$ is indeed in CoM-free system. Actually what I concerned about above is that, for example, considering $\mathcal{C}^T$ and $\mathcal{C}^{T-1}$, we've sample $\mathcal{C}^T$ from isotropic Gaussian and move it to CoM-free system, and could the next step $\mathcal{C}^{T-1}$ be naturally ensured to be in CoM-free system by the Markov transition kernel without any other operation? Now to my understanding (and with the help of your answer) it could not and this is actually achieved by sampling $\mathcal{C}^{T-1}$ from $p(\mathcal{C}^{T-1}|\mathcal{C}^T)$ and then move it to CoM-free system. (?)
How could I mathematically prove that CoM-free system is translationally-invariant?
Could the proof of $R \mathbf{x}^{l+1}, \mathbf{h}^{l+1}=\operatorname{GFN}\left(R \mathbf{x}^l, R \mathcal{C}+g, \mathbf{h}^l\right)$ be simplified by ignoring the term "+g", if we consider that $\mathcal{C}$ is in CoM-free system as the paper? (though the proof is correct.)
For any $y \in U$, I don't understand how to obtain $p(y) = \hat{p}(y)$ by $||y||_2^2=||Q y||_2^2$(1) even if I've seen your answer to the same question in openreview. (You said that $p(y) = \hat{p}(Qy) = \hat{p}(y)$(2), I doubt that maybe it's $p(y) = {p}(Qy)$, but overall I still don't know the connection between (1) and (2).

MinkaiXu · 2022-09-08T03:09:51Z

Yes, you can also view this as first moving the output of $\epsilon$ to zero mean, which can be regarded as part of the parameterization. Then the $C^{t-1}$ will naturally be CoM-free.
Should be the proof in Appendix A.5, for CoM-free Gaussian?
Yes, definitely. But indeed the network also holds translation-invariant, so I also mention "+g".
I remember $p(y) = \hat{p}(Qy)$ comes from the calculation of corresponding Gaussians.

Frankie123421 · 2022-09-08T04:44:16Z

Thanks, but as for 2, I don't seem to see any proof about showing that if $x \in U$, then $x + g = x$. Besides, maybe it holds for any CoM-free system, not just for CoM-free Gaussian?

MinkaiXu · 2022-09-08T06:07:52Z

I think maybe I should say, the CoM free $x$ and $x+g$ actually should be $Qx$ and $Q(x+g)$. I.e., no matter how $x$ is moved, we will always first move it back to zero-CoM. In this sense, we have $Qx = Q(x+g)$.

Frankie123421 · 2022-09-08T07:36:26Z

Got it. Thanks for taking time answering my questions. Really appreciate that!

Frankie123421 closed this as completed Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the theory in the paper #14

Question about the theory in the paper #14

Frankie123421 commented Sep 6, 2022

MinkaiXu commented Sep 6, 2022 •

edited

Frankie123421 commented Sep 7, 2022

MinkaiXu commented Sep 7, 2022

Frankie123421 commented Sep 7, 2022

MinkaiXu commented Sep 7, 2022

Frankie123421 commented Sep 7, 2022

MinkaiXu commented Sep 7, 2022

Frankie123421 commented Sep 8, 2022

MinkaiXu commented Sep 8, 2022

Frankie123421 commented Sep 8, 2022

MinkaiXu commented Sep 8, 2022

Frankie123421 commented Sep 8, 2022

Question about the theory in the paper #14

Question about the theory in the paper #14

Comments

Frankie123421 commented Sep 6, 2022

MinkaiXu commented Sep 6, 2022 • edited

Frankie123421 commented Sep 7, 2022

MinkaiXu commented Sep 7, 2022

Frankie123421 commented Sep 7, 2022

MinkaiXu commented Sep 7, 2022

Frankie123421 commented Sep 7, 2022

MinkaiXu commented Sep 7, 2022

Frankie123421 commented Sep 8, 2022

MinkaiXu commented Sep 8, 2022

Frankie123421 commented Sep 8, 2022

MinkaiXu commented Sep 8, 2022

Frankie123421 commented Sep 8, 2022

MinkaiXu commented Sep 6, 2022 •

edited