# **HW2-P2: StyleGAN2-Ada**

In this assignment, you will work with a state-of-the-art generative model called **StyleGAN2-Ada** ([Karras et al., 2020a](https://arxiv.org/abs/2006.06676)).


## **Background**

### **StyleGAN**

To understand StyleGAN2-Ada, we first revisit the **StyleGAN** architecture ([Karras et al., 2019](https://arxiv.org/abs/1812.04948)). Unlike a standard GAN, StyleGAN’s generator is designed to provide **control over different levels of visual detail**, such as:

<br/>

| Detail Level | Examples |
|-------------|----------|
| Fine        | Hair strands, freckles |
| Mid         | Eye openness, hairstyle |
| Coarse      | Face shape, head pose, glasses |

<br/>

This control comes from the generator architecture shown in Figure 1 of the paper.

StyleGAN first maps a latent code $z \in \mathbb{R}^{512}$  into an **intermediate latent** $w \in \mathbb{R}^{512}$.  The synthesis network then gradually converts \( w \) into an image by progressively increasing resolution (from $4 \times 4$ to $1024 \times 1024$), allowing different network layers to control fine, medium, and coarse features.

**Additional techniques to improve training include:**

- WGAN-GP objective (Arjovsky et al., 2017)
- Adaptive Instance Normalization (AdaIN)
- Truncation trick in latent space (Brock et al., 2018)

### **StyleGAN2**

StyleGAN sometimes produces undesirable distortions such as **“water droplet” / “blob-like” visual artifacts**, caused by AdaIN and progressive upsampling. StyleGAN2 resolves this by:

- Replacing AdaIN with **demodulation operation** (removing artifacts)
- Replacing progressive upsampling with **skip connections** (improving stability)

### **StyleGAN2-Ada**

Training StyleGAN2 from scratch requires millions of images and weeks of GPU time.  However, training on small datasets normally causes discriminator overfitting. StyleGAN2-Ada introduces **Adaptive Discriminator Augmentation (ADA)**, which automatically adjusts augmentation strength during training. This enables **high-quality training using only a few thousand images**, while maintaining StyleGAN2’s output quality.

In this assignment, we will use a **pre-trained StyleGAN2-Ada generator**.

<br/>

---

<br/>

## **Experiments**

<br/>

### 1.  **Sampling and Identifying Fake Images**

Your goal is to generate a small row of 3–5 images using a pre-trained model.

**Instructions:**

1.  **Unlock Generator:** Choose and unlock one of the pre-trained generators.
2.  **Complete Functions:** Implement the `generate_latent_code` and `generate_images` functions.
3.  **Follow Documentation:** To complete these functions, follow the instructions in this notebook and refer to the [Official PyTorch implementation](https://github.com/NVlabs/stylegan2-ada-pytorch.git)
4.  **(optional) Review Images:** Once your images are generated, you can use [Which Face Is Real?](https://www.whichfaceisreal.com/learn.html) as a guideline to help you spot any imperfections.

<br/>

### 2. **Latent Space Interpolation**

Your goal is to complete the `interpolate_images` function. This will generate a sequence of images that smoothly transitions between two random faces.

**Instructions:**

1.  **Get Latent Vectors:** Use your `generate_latent_code` function (from Part 1) to create two separate latent vectors, $z_1$ and $z_2$.
2.  **Implement Interpolation:** Linearly interpolate between $z_1$ and $z_2$ using the formula below. You'll need to create several steps for the value $r$ (e.g., `0, 0.1, 0.2, ... 1.0`) to create a smooth transition.

    $$z = r z_{1} + (1 - r) z_{2}, \quad r \in [0, 1]$$

3.  **Generate Images:** Feed each resulting interpolated latent code ($z$) into the StyleGAN2-Ada generator. This will produce the final sequence of images.

<br/>

### 3. **Style Mixing and Fine Detail Control**

In this final part, your goal is to reproduce the famous style mixing example from the original StyleGAN paper (Figure 3). This involves generating images using specific latent codes at different levels of the synthesis network.

<br/>

#### Step 1: **Generate W-space latents from Z-space seeds**

Your first task is to convert random Z-space vectors (from seeds) into W-space latents using the generator’s **mapping network**. These W-space latents (`w`) are used in later steps for style mixing.

**Instructions:**

1. Complete the `latent_to_w` function to map latent codes from Z-space to W-space.  
2. Apply the **truncation trick**:  
   - Compute the average W latent (`w_avg`) from the mapping network.  
   - Truncate each latent using:  

     $$
     w' = w_{avg} + (w - w_{avg}) \times \psi
     $$

     where $\psi$ is your truncation constant (recommended 0.7).  

3. The resulting W-space latents will be used as input for the synthesis network (`G_ema.synthesis`) to generate images.

<br/>

#### Step 2: **Generate images from subsets of the generator (coarse, middle, fine styles)**

Next, use your W-space latents to generate images from different **subsets of layers** in the generator. This allows you to see how different layers influence specific aspects of the image.

- **Coarse layers:** control high-level structure (pose, face shape).  
- **Middle layers:** control medium-level features (facial features, hair style).  
- **Fine layers:** control textures and colors.  

**Task:** Implement the `w_to_image` function

<br/>

#### Step 3: **Generate and Analyze the Style-Mixed Grid**

Now you will use your new function to create the mixing grid and analyze the results.

**Part 3a: Generate the Grid**

1.  Use the code from `style_mixing` into the indicated location in the final cell.
2.  Initialize the `col_seeds`, `row_seeds`, and `col_styles` variables as instructed.
3.  Run the final cell to generate the grid of images.

**Part 3b: Your Analysis (Deliverable)**

After generating the grid, experiment by changing the `col_styles` variable. Then, write your analysis.

1.  In a few sentences, **explain what the `col_styles` variable does.** Describe what the different numbers in the list correspond to (Hint: think about coarse vs. fine details).
2.  Create a **simple experiment** to back up your explanation.
3.  Include **one or two sets of images** (as screenshots) that clearly illustrate the effect of changing `col_styles` and support your argument.

<br/>

---

**Note: To run this notebook efficiently in Google Colab, make sure to use T4 GPU.**

In [None]:
# Install StyleGAN2-ADA PyTorch
!git clone https://github.com/NVlabs/stylegan2-ada-pytorch.git
%cd stylegan2-ada-pytorch

# Install dependencies
!pip install torch torchvision ninja imageio-ffmpeg tqdm

In [39]:
import torch
import pickle
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# Import StyleGAN2-ADA PyTorch modules
from torch_utils import misc
from training import networks

Next, we will load a pre-trained StyleGan2-ada network.

Each of the following pre-trained network is specialized to generate one type of image.

In [None]:
# The pre-trained networks are stored as standard pickle files
# Use one of the following URL to begin

# Human faces: https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl
# CIFAR-10 tiny images: https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/cifar10.pkl
# European portrait paintings: https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl
# Cats: https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/afhqcat.pkl
# Dogs: https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/afhqdog.pkl

!wget https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/ffhq.pkl

In [None]:
pkl_file = '/content/stylegan2-ada-pytorch/ffhq.pkl'  # or any other pretrained model
with open(pkl_file, 'rb') as f:
    data = pickle.load(f)

# In PyTorch StyleGAN2-ADA, usually you get:
# 'G_ema' : EMA (average) generator
# 'G'     : instantaneous generator (for training)
# 'D'     : discriminator
G_ema = data['G_ema'].cuda().eval()   # long-term average generator
G     = data['G'].cuda().eval()       # instantaneous generator
D     = data['D'].cuda().eval()       # discriminator

print(G_ema, G, D)

In [43]:
import warnings

# Ignore warnings and failiers. These are optional performance optimizations.
warnings.filterwarnings("ignore", category=UserWarning, module="torch_utils.ops.upfirdn2d")
warnings.filterwarnings("ignore", category=UserWarning, module="torch_utils.ops.bias_act")

### 1.  **Sampling and Identifying Fake Images**

Your goal is to generate a small row of 3–5 images using a pre-trained model.

**Instructions:**

1.  **Generator:** Choose one of the pre-trained generators.
2.  **Complete Functions:** Implement the `generate_latent_code` and `generate_images` functions.
3.  **Follow Documentation:** To complete these functions, follow the instructions in this notebook and refer to the [Official PyTorch implementation](https://github.com/NVlabs/stylegan2-ada-pytorch.git)
4.  **Review Images:** Once your images are generated, you can use [Which Face Is Real?](https://www.whichfaceisreal.com/learn.html) as a guideline to help you spot any imperfections.

<br/>

In [44]:
# Sample a batch of latent codes {z_1, ...., z_B}, B is your batch size.
def generate_latent_code(SEED, BATCH, LATENT_DIMENSION=512):
    """
    Generate a batch of random latent codes.

    Args:
        SEED (int): random seed
        BATCH (int): number of latent codes to generate
        LATENT_DIMENSION (int): dimensionality of latent vector (default 512)

    Returns:
        torch.Tensor: latent codes of shape [BATCH, LATENT_DIMENSION]
    """

    # Set the random seed for reproducibility
    np.random.seed(SEED)

    # Generate random latent codes from a normal distribution
    latent_codes = np.random.randn(BATCH, LATENT_DIMENSION).astype(np.float32)

    return torch.from_numpy(latent_codes).cuda()  # move to GPU

In [45]:


def generate_images(SEED, BATCH=3, TRUNCATION=0.7):
    """
    Generate a batch of images from latent codes using the PyTorch StyleGAN2-ADA generator.

    Args:
        SEED (int): random seed for latent codes
        BATCH (int): number of images to generate
        TRUNCATION (float): truncation psi (recommended 0.7)

    Returns:
        PIL.Image: concatenated row of generated images
    """
    latent_codes = generate_latent_code(
        SEED, BATCH)  # Use the function from previous cell

    # Generate images using the generator
    # First get the W space latents using the mapping network with truncation
    w_latents = G_ema.mapping(latent_codes, None, truncation_psi=TRUNCATION)
    # Then generate images using the synthesis network
    images = G_ema.synthesis(w_latents, noise_mode='const')

    # Convert from tensor to numpy and adjust range from [-1, 1] to [0, 255]
    images = (images.permute(0, 2, 3, 1) * 127.5 +
              128).clamp(0, 255).to(torch.uint8).cpu().numpy()

    # Concatenate images horizontally
    concatenated = np.concatenate(images, axis=1)
    return Image.fromarray(concatenated)



In [None]:
# Generate and display your images
generate_images(SEED=14, BATCH=5, TRUNCATION=0.7)

The output is in Part1_Output.png

### 2. **Latent Space Interpolation**

Your goal is to complete the `interpolate_images` function. This will generate a sequence of images that smoothly transitions between two random faces.

**Instructions:**

1.  **Get Latent Vectors:** Use your `generate_latent_code` function (from Part 1) to create two separate latent vectors, $z_1$ and $z_2$.
2.  **Implement Interpolation:** Linearly interpolate between $z_1$ and $z_2$ using the formula below. You'll need to create several steps for the value $r$ (e.g., `0, 0.1, 0.2, ... 1.0`) to create a smooth transition.

    $$z = r z_{1} + (1 - r) z_{2}, \quad r \in [0, 1]$$

3.  **Generate Images:** Feed each resulting interpolated latent code ($z$) into the StyleGAN2-Ada generator. This will produce the final sequence of images.

<br/>

**Submission Notes:**

* Include a **small row of your interpolation images** in your final notebook for submission.
* If submission file size becomes too large, you may also screenshot the results, erase the cell output, and simply add the screenshot to the final ZIP for uploading.

In [47]:


def interpolate_images(SEED1, SEED2, INTERPOLATION=6, TRUNCATION=0.7):
    """
    Linearly interpolate between two latent codes and generate images.

    Args:
        SEED1 (int): random seed for first latent code
        SEED2 (int): random seed for second latent code
        INTERPOLATION (int): number of interpolated steps (recommended 6-10)
        TRUNCATION (float): truncation psi (recommended 0.7)

    Returns:
        PIL.Image: concatenated row of interpolated images
    """
    latent_code_1 = generate_latent_code(
        SEED1, 1)  # Generate first latent code
    latent_code_2 = generate_latent_code(
        SEED2, 1)  # Generate second latent code

    # Create interpolation steps
    r_values = np.linspace(0, 1, INTERPOLATION)

    # List to store interpolated latent codes
    interpolated_latents = []

    for r in r_values:
        interpolated_z = r * latent_code_1 + (1 - r) * latent_code_2
        interpolated_latents.append(interpolated_z)

    # Concatenate all interpolated latent codes
    all_interpolated = torch.cat(interpolated_latents, dim=0)

    # Generate images using the generator
    w_latents = G_ema.mapping(all_interpolated, None,
                              truncation_psi=TRUNCATION)
    # Then generate images using the synthesis network
    generated_images = G_ema.synthesis(w_latents, noise_mode='const')

    images = (generated_images.permute(0, 2, 3, 1) * 127.5 +
              128).clamp(0, 255).to(torch.uint8).cpu().numpy()

    concatenated = np.concatenate(images, axis=1)
    return Image.fromarray(concatenated)



In [None]:
# Create an interpolation of generated images
interpolate_images(SEED1=14, SEED2=17, INTERPOLATION=6, TRUNCATION=0.7)

The Output is in Part2_Output.png

**Note:** After you have generated interpolated images, an interesting task would be to see how you can create a GIF. Feel free to explore a little bit more.

## Part 3: **Style Mixing and Fine Control**

In this final part, your goal is to reproduce the famous style mixing example from the original StyleGAN paper.

#### Step 1: **Generate W-space latents from Z-space seeds**

In this step, you will convert random Z-space vectors (generated from seeds) into W-space using the StyleGAN mapping network. These W-space latents will later allow you to control and mix styles across different layers of the generator.

In [49]:


def latent_to_w(latents, truncation=0.7):
    """Map Z-space latents to W-space using G_ema's mapping network."""

    # Get the average W latent from the generator's mapping network
    w_avg = G_ema.mapping.w_avg

    # Map the input latents to W-space using the mapping network
    w = G_ema.mapping(latents, None)

    # Apply truncation trick: w' = w_avg + (w - w_avg) * truncation
    w = w_avg + (w - w_avg) * truncation

    return w



#### Step 2: **Generate images from subsets of the generator (coarse, middle, fine styles)**

In this step, you will use the W-space latents produced by your mapping function to generate images from different subsets of the StyleGAN synthesis network. This will allow you to visualize how coarse, middle, and fine style layers individually influence the final image.

In [50]:


def w_to_image(w_latents):
    """Generate images from W-space latents using G_ema synthesis."""

    images = G_ema.synthesis(w_latents, noise_mode='const')
    return images



#### Step 3: **Run Experiments and Analyze**

This step has two parts: running your style-mixing code to generate a grid of images, and then analyzing what happens when different style layers are mixed.

<br/>

**1. Generate the Grid**

In the code cell below, you will:

1. **Initialize** the `col_seeds`, `row_seeds`, and `col_styles` values used for mixing.  
2. **Use** your completed PyTorch functions (`generate_latent_code`, `latent_to_w`, `w_to_image`, and `style_mixing`).  
3. **Run the cell** to produce a grid showing how selected layers from the column seeds modify the base styles of the row seeds.

A recommended set of experiments is:

- **Experiment 1 (Coarse Styles):**
  - `col_seeds = [1, 2, 3, 4, 5]`
  - `row_seeds = [6]`
  - `col_styles = [0, 1, 2, 3, 4]`

- **Experiment 2 (Fine Styles):**
  - `col_seeds = [1, 2, 3, 4, 5]`
  - `row_seeds = [6]`
  - `col_styles = [8, 9, 10, 11, 12]`

These suggested layer ranges align with how early, middle, and late layers influence spatial detail in StyleGAN2 (see StyleGAN and StyleGAN2 papers).

<br/>

**2. Your Analysis (Submission Requirement)**

After generating the grids, experiment by changing `col_styles` and include the following in your submission:

- **Explanation:** Describe what `col_styles` controls and how these indices map to coarse, middle, or fine style layers.  
- **Evidence:** Provide a small experiment (similar to the examples above) that demonstrates the effect.  
- **Images:** Include *at most two* sets of style-mixing results as supporting evidence (screenshots recommended to reduce file size).  
- **References:**  
  - StyleGAN: https://arxiv.org/pdf/1812.04948.pdf  
  - StyleGAN2: https://arxiv.org/pdf/1912.04958.pdf  

In [None]:
def style_mixing(row_seeds, col_seeds, col_styles):
    """
    Generate a style-mixed image grid (without header cells).
    """
    # Generate W-space latents
    all_seeds = list(set(row_seeds + col_seeds))
    w_dict = {}
    for seed in all_seeds:
        z = generate_latent_code(seed, BATCH=1)
        w_dict[seed] = latent_to_w(z)

    # Generate style-mixed images
    image_dict = {}
    for row_seed in row_seeds:
        for col_seed in col_seeds:
            # Start with the row seed's W vector
            w = w_dict[row_seed].clone()
            # Replace specific style layers with column seed's W vector
            w[:, col_styles] = w_dict[col_seed][:, col_styles]

            # Generate image using the modified W vector
            img = w_to_image(w)[0]
            # Convert from tensor to numpy and adjust range from [-1, 1] to [0, 255]
            img = (img.permute(1, 2, 0) * 127.5 + 128).clamp(0,
                                                             255).to(torch.uint8).cpu().numpy()
            image_dict[(row_seed, col_seed)] = img

    # Create grid
    img_h, img_w = img.shape[:2]
    canvas = Image.new('RGB', (len(col_seeds)*img_w,
                       len(row_seeds)*img_h), 'black')
    for row_idx, row_seed in enumerate(row_seeds):
        for col_idx, col_seed in enumerate(col_seeds):
            canvas.paste(Image.fromarray(image_dict[(row_seed, col_seed)], 'RGB'),
                         (col_idx*img_w, row_idx*img_h))
    return canvas


# Try the experiments again with error handling
try:
    # Experiment 1 (Coarse Styles):
    col_seeds = [1, 2, 3, 4, 5]
    row_seeds = [6]
    # Coarse styles (early layers control pose, face shape)
    col_styles = [0, 1, 2, 3, 4]
    print("Generating coarse style mixing grid...")
    image_grid_coarse = style_mixing(row_seeds, col_seeds, col_styles)
    image_grid_coarse.save('style_mixing_coarse.png')
    print("Coarse style mixing grid (affects pose, face shape):")
    display(image_grid_coarse)  # Use display() for better compatibility

except Exception as e:
    print(f"Error generating coarse style mixing grid: {e}")
    print("This may be due to CUDA extension compilation failures.")

try:
    # Experiment 2 (Fine Styles):
    col_seeds = [1, 2, 3, 4, 5]
    row_seeds = [6]
    # Fine styles (late layers control textures, details)
    col_styles = [8, 9, 10, 11, 12]
    print("Generating fine style mixing grid...")
    image_grid_fine = style_mixing(row_seeds, col_seeds, col_styles)
    image_grid_fine.save('style_mixing_fine.png')
    print("Fine style mixing grid (affects textures, fine details):")
    display(image_grid_fine)  # Use display() for better compatibility

except Exception as e:
    print(f"Error generating fine style mixing grid: {e}")
    print("This may be due to CUDA extension compilation failures.")

The outputs are in Part3_Output_1.png and Part3_Output_2.png

In [None]:
# Additional Analysis: Experiment with Mixed Layer Ranges

print("Additional Analysis: Effect of Different col_styles Ranges\n")

# Experiment 1: Very narrow range focusing on specific middle layers
col_seeds = [10, 20, 30, 40, 50]
row_seeds = [7]
# Very specific middle layers that might control eye/hair details
col_styles = [10, 11]
image_grid_specific = style_mixing(row_seeds, col_seeds, col_styles)
image_grid_specific.save('style_mixing_specific.png')
print("Specific style mixing grid (affects specific facial features, possibly eyes/hair):")
display(image_grid_specific)

# Experiment 2: Broader range covering multiple levels
col_seeds = [10, 20, 30, 40, 50]
row_seeds = [7]
# Broader range covering coarse and medium features
col_styles = [0, 1, 2, 3, 4, 5, 6, 7, 8]
image_grid_broad = style_mixing(row_seeds, col_seeds, col_styles)
image_grid_broad.save('style_mixing_broad.png')
print("Broad style mixing grid (affects both coarse and medium features):")
display(image_grid_broad)

# Experiment 3: Fine detail range
col_seeds = [10, 20, 30, 40, 50]
row_seeds = [7]
# Late layers affecting fine textures and details
col_styles = [12, 13, 14, 15, 16]
image_grid_fine = style_mixing(row_seeds, col_seeds, col_styles)
image_grid_fine.save('style_mixing_fine.png')
print("Fine style mixing grid (affects textures and fine details):")
display(image_grid_fine)

# Experiment 4: Medium range focusing on facial features
col_seeds = [10, 20, 30, 40, 50]
row_seeds = [7]
# Middle layers affecting facial features and expressions
col_styles = [6, 7, 8, 9]
image_grid_medium = style_mixing(row_seeds, col_seeds, col_styles)
image_grid_medium.save('style_mixing_medium.png')
print("Medium style mixing grid (affects facial features and expressions):")
display(image_grid_medium)

# Experiment 5: Coarse range affecting overall structure
col_seeds = [10, 20, 30, 40, 50]
row_seeds = [7]
col_styles = [0, 1, 2, 3, 4, 5]  # Early layers affecting pose and face shape
image_grid_coarse = style_mixing(row_seeds, col_seeds, col_styles)
image_grid_coarse.save('style_mixing_coarse.png')
print("Coarse style mixing grid (affects pose and face shape):")
display(image_grid_coarse)

print("\nComparison Summary:")
print("- Coarse [0-5]: Affects overall structure (pose, face shape)")
print("- Medium [6-9]: Affects facial features and expressions")
print("- Specific [10-11]: Affects targeted features (likely eyes/hair)")
print("- Fine [12-16]: Affects textures and fine details")
print("- Broad [0-8]: Affects multiple levels simultaneously")

## Comparison Summary

- **Coarse [0–5]:** Affects overall structure (pose, face shape)  
- **Medium [6–9]:** Affects facial features and expressions  
- **Specific [10–11]:** Affects targeted features (likely eyes/hair)  
- **Fine [12–16]:** Affects textures and fine details  
- **Broad [0–8]:** Affects multiple levels simultaneously  


The outputs are in Part3_Output_3.png, Part3_Output_4.png, Part3_Output_5.png, Part3_Output_6.png and Part3_Output_7.png

# Analysis of col_styles in Style Mixing

## Explanation
The `col_styles` parameter determines which layers of the StyleGAN2 synthesis network are influenced by the column seeds in style mixing. Different ranges control different levels of image detail:
- **[0-4]:** Coarse features (overall pose, face shape, general structure)
- **[5-8]:** Medium features (facial features, hairstyle, expression)
- **[9-14]:** Fine features (textures, skin details, hair strands)

## Evidence: Additional Experiments

### Specific Range Experiment
Testing with a very narrow range [6,7] to demonstrate targeted control on specific facial features (possibly eyes/hair).

### Broad Range Comparison
Using a broader range [0,1,2,3,4,5,6,7,8] to cover both coarse and medium features for comparison.

## Analysis
Comparing the specific range [6,7] with the broad range [0-8]:
- The specific range shows more subtle changes focused on particular features
- The broad range shows more dramatic changes affecting overall structure and features
- This demonstrates how `col_styles` directly controls which visual aspects are influenced
- Different layer indices correspond to different levels of spatial detail in the image
