Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use learned latent direction from .npy files #1

Closed
tlack opened this issue Oct 9, 2021 · 3 comments
Closed

How to use learned latent direction from .npy files #1

tlack opened this issue Oct 9, 2021 · 3 comments

Comments

@tlack
Copy link

tlack commented Oct 9, 2021

Hey there,

I've been eagerly setting up WarpedGAN on a Google Colab today and ran into a problem.

I was able to successfully run traverse_attribute_space and I see gender.npy, etc.

But these are (128,33) and ProGAN's z is (1,512).

I think I have to apply the loaded latent to the Support Set, but the exact mechanism is unclear to me.

Is there somewhere in the source that I can see how this works? How did you generate those nifty GIFs on the dev-eval branch?

@tlack tlack changed the title How to use learned latent direction in .npy files How to use learned latent direction from .npy files Oct 9, 2021
@chi0tzp
Copy link
Owner

chi0tzp commented Oct 9, 2021

Hi @tlack ,

Thanks for your interest in our work. Have you trained a model yourself, or have you used a pre-trained one? The npy files produced by traverse_attribute_space.py contain the attribute paths for a given latent code. In the example you mention, 128 denotes the number of paths (i.e., the number of warping functions you've learned), while 33 denotes the number of images generated across this path.

You may want to use the pre-trained model for ProgGAN (which you can download using download_models.py -- not in the master branch yet).

The GIFS in dev-eval branch are produced by create_gif.py for a given path-id; for instance:

python create_gif.py --gif-size=196 --num-imgs=7 --dir=experiments/complete/ProgGAN-ResNet-K200-D512-LearnGammas-eps0.1_0.2/results/ProgGAN_4_F/56_0.15_8.4/435c92ab04f994fd192526b9107396747caf283a/ --path-id=96

The magic number (--path-id=96) is given by rank_interpretable_paths.py which I'm currently refactoring and will push on master very soon. As explained in the paper (Sect. 4), this script ranks the discovered paths based on the correlation of each path with the attribute vector. In this example, path 96 gives the greatest correlation for the attribute Lip Corner Puller aka AU_12 aka Smiling.

I'll merge dev-eval to master soon, and I'll also add rank_interpretable_paths.py asap, but you may start looking at the GIFS of the discovered paths already (e.g., experiments/complete/ProgGAN-ResNet-K200-D512-LearnGammas-eps0.1_0.2/results/ProgGAN_4_F/56_0.15_8.4/paths_gifs/).

@tlack
Copy link
Author

tlack commented Oct 9, 2021

You can see my awful, fledgling attempts at getting your stuff going here:

https://colab.research.google.com/drive/188bKhg_tNwjUVo4BXsiwywKnCSaT3e0x?usp=sharing

My goal for this experiment is to enter a bunch of English descriptors (skinny / fat, young / old, scared / excited), create attribute directions by adding a CLIP step into traverse_attribute.. (in the same fashion you have used those other classifiers), and then allow the end user to navigate through those latents using the learned descriptors directions, along with other manipulations.

I'm starting from ProGAN because I've had good luck with that family in other experiments.

I think I understand your guidance here: determine the best paths for each attribute using rank... and then use that path ID to retrieve workable Z's for the GAN space.

I guess create-gif kinda works from the output of traverse.. (in that it reads finished images) so I will try to understand the linkage there.

Thanks for your detailed and very rapid response. And for providing code that actually works out of the box! This may be a first in machine learning paper history. :)

@chi0tzp
Copy link
Owner

chi0tzp commented Oct 9, 2021

You can see my awful, fledgling attempts at getting your stuff going here:

https://colab.research.google.com/drive/188bKhg_tNwjUVo4BXsiwywKnCSaT3e0x?usp=sharing

My goal for this experiment is to enter a bunch of English descriptors (skinny / fat, young / old, scared / excited), create attribute directions by adding a CLIP step into traverse_attribute.. (in the same fashion you have used those other classifiers), and then allow the end user to navigate through those latents using the learned descriptors directions, along with other manipulations.

Hey @tlack, first of all, thanks for taking the time to extend our method! We've also been thinking in this direction, and may try something in the future. Before everything, please have a look at another very relevant ICCV'21 paper. It's very close to what we do (they try to optimize a vector field), but they do that in a supervised way. They also have an NLP module for editing based on verbal instructions.

I'm starting from ProGAN because I've had good luck with that family in other experiments.

I think I understand your guidance here: determine the best paths for each attribute using rank... and then use that path ID to retrieve workable Z's for the GAN space.

I'm not trying to be cryptic or anything, I just need some time to refactor the script and provide an easy-to-follow piece of code. Regardless, what we really do, as we try to describe briefly in the paper as follows:

In order to obtain a measure on how well the paths generated by a warping function are correlated with a certain attribute, we estimate the average Pearson’s correlation between the index of the step along the path and the corresponding values in the attribute vector.

Thus, we compute the Pearson's correlation between the step of the path (i.e., the index of the path: 1, 2, ..., num_of_generated_images_in_path) and the values of the respective attribute. So, support that A is an MxN numpy array where the t-th row A_t = A[t, :] contains the values for an attribute for the t-th attribute and all images across a path. Then, the correlation (which technically is not exacrly Pearson's, but an "un-normalized" version of it) is given as:

A_t_idx = np.arange(A_t.shape[0])
corr = np.cov(A_t, A_t_idx)[0, 1] / np.sqrt(np.cov(A_t_idx))

I guess create-gif kinda works from the output of traverse.. (in that it reads finished images) so I will try to understand the linkage there.

create_gif.py is really trivial. It just takes a directory with the images generated by traverse_latent_space.py across a given path, for a given latent code, etc, for instance, via --dir=experiments/complete/ProgGAN-ResNet-K200-D512-LearnGammas-eps0.1_0.2/results/ProgGAN_4_F/56_0.15_8.4/435c92ab04f994fd192526b9107396747caf283a/ --path-id=96, i.e., the directory experiments/complete/ProgGAN-ResNet-K200-D512-LearnGammas-eps0.1_0.2/results/ProgGAN_4_F/56_0.15_8.4/435c92ab04f994fd192526b9107396747caf283a/paths_images/path_096/ and just creates the GIF -- that's only been created for the README.md :)

Thanks for your detailed and very rapid response. And for providing code that actually works out of the box! This may be a first in machine learning paper history. :)

Thank you! Please consider closing the issue if the above answer your questions. I'll push the remaining script asap, stay tuned :)

@tlack tlack closed this as completed Oct 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants