## Controllable Generation

Related Paper:

- [Interpreting the Latent Space of GANs for Semantic Face Editing](https://arxiv.org/pdf/1907.10786)

Helpful Resource:
- [GANs specialization on Coursera](https://www.coursera.org/specializations/generative-adversarial-networks-gans)  $\; \rightarrow \;$ most of the content of this notebook has been borrowed from this course
- [The GAN Zoo](https://github.com/hindupuravinash/the-gan-zoo)

Controllable generation refers to modifying specific features in GAN outputs after training. It focuses on controlling features (e.g., age, hair color, sunglasses) by tweaking the input noise vector $z$. It works by feeding the modified noise vector into the generator to produce the desired output. For eg: if we want to generate an image of a woman, who originally had red hair, with blue hair, we can tweak the noise vector to achieve this. 

| **Conditional Generation** | **Controllable Generation** |
|:--------------------------:|:---------------------------:|
| Requires labeled data during training. | Does not require labeled data during training. |
| Outputs examples from the classes we want (eg: dog, cat, etc.) | Outputs examples with the features we specified (eg: old dog, black cat, etc.). |
| Appends a class vector to the noise vector. | Manipulates the noise vector $z$ after model has been trained. | 

#### **Vector Algebra in the Z-space**

- The space in which the noise vector $z$ resides is called the Z-space.

**Similarity of Noise Vectors to Word Embeddings**

- We can think of the noise vector as a word embedding vector. 
- Just like how each of the different columns in a word embedding vector represent different features of a word, each of the different dimensions in the noise vector represent different features of the image that is to be generated. 
- For eg: in the case of a word embedding vector, the different columns represented features like masculine/feminine, singular/plural, etc. In the case of a noise vector, the different dimensions could represent features like hair color, age, etc.

**Direction of features in Z-space**

- Suppose we have two noise vectors $z_1$ and $z_2$ that produce two different outputs.
- $z_1$ produces a woman with red hair and $z_2$ produces the same woman but with blue hair.
- The difference between the two noise vectors $z_2$ and $z_1$ gives us the direction in which we have to move in the Z-space to modify the hair color of the images we generate.
- This direction vector is called $d$.
- By moving along this direction vector, we can modify the hair color of the images we generate. In other words, if we move in the direction of this direction vector the hair color of the output image will gradually change from red to blue, and we can get any shade of hair color in between by moving along this direction vector.
- By simply adding $d$ to $z_1$, we get $z_2$. In other words, by adding a direction vector that contains the path or direction to the new feature we want to our current noise vector, we obtain the new noise vector that contains our desired feature.
- In controllable generation, we need to find direction vectors for the features we want. This helps us produce outputs with features we want.

#### **Challenges with Controllable Generation**

**Challeges**
- Feature correlation.
- Z-space entanglement.

**Feature correlation**

When different features have strongly correlated in the training dataset, it becomes difficult to control specific features without modifying the ones that are correlated to them.

For eg: suppose we want to add a beard to a woman's face. Now, we would just like to add a beard and not alter anything else. If the feature containing beard is uncorrelated, then adding a beard to woman's face will do just that and not bring about any other side effects. However, if that feature is correlated with some other feature, say masculinity, then adding a beard to the woman's face will also make her face masculine. Thus, some features in training data are strongly correlated (eg: beards and masculinity). Changing one feature (eg: adding a beard) may unintentionally alter others (eg: making the face appear more masculine).

!["feature_corr"](./imgs/control_gen_feature_corr.png "feature_corr")

**Z-space entanglement**

- Features are intertwined in \( Z \)-space.
- Adjusting one feature (e.g., glasses) may also change unrelated features (e.g., hair or beard).
- Happens when \( Z \)-space dimensions are too few or poorly trained.

#### Causes of Challenges
- **Dataset-related:** High correlation between features in the dataset.
- **Model-related:** Insufficient dimensions in \( Z \)-space or suboptimal training.

#### Summary
- Controlling features in GANs can be difficult due to correlations in data and entanglement in \( Z \)-space.
- Overcoming these issues requires better data preparation and model design.

#### **4. Classifier Gradients**

##### Purpose
- Use pre-trained classifiers to find directions in \( Z \)-space for controlling features post-training.

#### How it Works
1. Pass a batch of \( Z \)-vectors through the generator to create images.
2. Use a pre-trained classifier to identify the presence of a feature (e.g., sunglasses).
3. Adjust \( Z \)-vectors:
   - Move in the gradient direction that increases the likelihood of the feature.
   - Example: Penalize images without sunglasses, adjust \( Z \)-vectors to produce sunglasses.

#### Advantages
- Simple and efficient way to control features.
- No need to modify the trained generator.

#### Requirements
- Access to a reliable, pre-trained classifier for the desired feature.
- Optionally, train your own classifier if unavailable.

#### Summary
- Classifier gradients allow post-training feature control by leveraging pre-trained classifiers to adjust \( Z \)-vectors.
- The method is straightforward but requires a suitable classifier.

#### **5. Disentanglement**

#### Definition
- **Entangled Z-space:** Changes in one dimension of \( Z \)-vector affect multiple features.
- **Disentangled Z-space:** Each dimension corresponds to a single feature, enabling precise control.

#### Importance of Disentanglement
- Allows targeted changes:
  - Example: Modify glasses without affecting hair or beard.
- Simplifies feature control for continuous attributes.

#### Encouraging Disentanglement
1. **Supervised Methods:**
   - Label data and associate specific \( Z \)-dimensions with features.
   - Example: Label images by hair color or length.
   - Limitation: Time-consuming and impractical for continuous features.

2. **Unsupervised Methods:**
   - Add regularization to the loss function during training.
   - Encourage \( Z \)-dimensions to independently represent features.
   - Advanced techniques can achieve this without labeled data.

#### Summary
- Disentangled \( Z \)-spaces ensure individual dimensions control specific features.
- Supervised and unsupervised methods can help achieve disentanglement, improving controllable generation.