Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the logvar prediction #12

Open
daxintan-cuhk opened this issue Aug 4, 2021 · 10 comments
Open

About the logvar prediction #12

daxintan-cuhk opened this issue Aug 4, 2021 · 10 comments

Comments

@daxintan-cuhk
Copy link

daxintan-cuhk commented Aug 4, 2021

Thank you for your excellent code! I have encountered some problem when I use the mutual information constraint in a speech processing task. In the process of the training, I found that the logvar prediction network, whose last layer is 'Tanh', always output the '-1', no matter what the input is. And the overall mutual information prediction network seems to lose effect, as the loglikelihood of the positive sample in the training batch is all very small value, something like -1,000,000. Does other user meet this problems before? Or do you have any advice? Thank you a lot!

Yours,
Daxin

@Linear95
Copy link
Owner

Hi Daxin,

For the output of logvar, you can try any other activation functions you want. Here I guess the main problem in your case is your q(y|x) network is not well-learned before doing the mi minimization. The probable solution might be enlarging the learning rate for q(y|x)'s parameters, or increasing the training step of CLUB within each mi minimization iteration.

Thanks and good luck!

@guanyadong
Copy link

Thank you for your excellent code! I have encountered some problem when I use the mutual information constraint in a speech processing task. In the process of the training, I found that the logvar prediction network, whose last layer is 'Tanh', always output the '-1', no matter what the input is. And the overall mutual information prediction network seems to lose effect, as the loglikelihood of the positive sample in the training batch is all very small value, something like -1,000,000. Does other user meet this problems before? Or do you have any advice? Thank you a lot!

Yours, Daxin

I also encountered this problem, have you fixed it?

@gaoxinrui
Copy link

Hi, thanks for the good work. I have a general question: according to your code, it is actually a Gaussian distribution is estimated using NN. However, for Gaussian, the first two moments can be calculated directly from samples. So what is the advantage of using NN to estimate it? Further, do you have an idea of how to estimate a general distribution using NN?

Thanks a lot.

@Linear95
Copy link
Owner

Linear95 commented Mar 3, 2022

Hi, thanks for the good work. I have a general question: according to your code, it is actually a Gaussian distribution is estimated using NN. However, for Gaussian, the first two moments can be calculated directly from samples. So what is the advantage of using NN to estimate it? Further, do you have an idea of how to estimate a general distribution using NN?

Thanks a lot.

In this work, we do not try to directly estimate a general Gaussian distribution from samples. Instead, we aim to estimate the conditional distribution p(Y|X) with a variational neural network. In our setups, given each value of X=X0, the conditional distribution p(Y|X=X0) is a Gaussian distribution. What we want the neural network to learn is not one Gaussian distribution p(Y|X=X0) (which as you said, you can estimate with moments), but the relation between X and Y, so that given each X=X0 value (even if the value X0 is unseen in samples) we can approximate the conditional distribution P(Y|X=X0). We do not have any constraint on the marginal distribution of Y, as P(Y).

To estimate a general Gaussian distribution with samples, there are plenty of prior works. The moment estimation can be one of the methods (which is similar to max likelihood estimation (MLE)). However, the estimation of the covariance matrix is quite complicated, which is also difficult to calculate the density function when the sample dimension is high.

Estimating a general distribution is also an interesting topic. To my knowledge, Generative Adversarial Networks (GANs) can directly draw nice samples from the distribution of given sample data. If you want to obtain the density function of a general distribution from samples, you can check methods such as kernel density estimation (KDE).

@gaoxinrui
Copy link

Thanks. I understand that you were calculating the conditional distribution, which is assumed to be Gaussian. For Gaussian conditional distribution p(Y|X), the optimal prediction of Y, i.e., the conditional expectation E(Y|X), is essentially a linear function of X. This linear function is related to Pearson's correlation ρ, as in your recently uploaded Mutual Information Minimization Demo. The Pearson's correlation can easily be obtained, no need to use a NN to estimate. If the relationship between X and Y is strongly nonlinear, which will lead to a non-Gaussian conditional distribution,I wonder if the method still works well.

@Linear95
Copy link
Owner

Linear95 commented Mar 3, 2022

Thanks. I understand that you were calculating the conditional distribution, which is assumed to be Gaussian. For Gaussian conditional distribution p(Y|X), the optimal prediction of Y, i.e., the conditional expectation E(Y|X), is essentially a linear function of X. This linear function is related to Pearson's correlation ρ, as in your recently uploaded Mutual Information Minimization Demo. The Pearson's correlation can easily be obtained, no need to use a NN to estimate. If the relationship between X and Y is strongly nonlinear, which will lead to a non-Gaussian conditional distribution,I wonder if the method still works well.

Good question. That is also the reason why we introduce a variational NN to approximate p(Y|X) as q_\theta(Y|X) in our paper. To handle the non-linearity between X and Y, we parameterize p(Y|X) as N(mu(x), sigma^2(x)), so that we can non-linearly predict the mean mu(x) and variance sigma^2(x) of the conditional gaussian as the NN's outputs with X as the input.

@bonehan
Copy link

bonehan commented Apr 28, 2022

Hi, thanks for the good work. I have a general question: according to your code, the positive term in the pytorch version minors a term of logvar but in ther tensorflow version it doesn't. Does it remain any tips in this two versions? And I also encounter a problem in MI minimization that the MI in the earlier training epoches is always <0, is it resonable and any tips to slove it?

@Linear95
Copy link
Owner

Calculating CLUB without log-var is equivalent to setting the variance of the conditional Gaussian p(y|x) as 1, which is still within our theoretical framework. By fixing the variance of p(y|x), we can obtain a more stable but less flexible MI estimation. For negative MI estimation, you check my suggestion here.

@LindgeW
Copy link

LindgeW commented Apr 9, 2024

When var is set to 1, will it lead to any performance degradation?

@LindgeW
Copy link

LindgeW commented Apr 11, 2024

Is the tanh activation function of the logvar required? Can you remove it or just replace it with something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants