In [1]:
import torch
import torch.nn as nn

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.linear = nn.Linear(2, 3)  # Define a linear layer

        # Apply Xavier initialization
        nn.init.xavier_uniform_(self.linear.weight)

    def forward(self, x):
        return self.linear(x)

# Create an instance of the model
model = MyModel()

# Print the initialized weights
print(model.linear.weight)

Parameter containing:
tensor([[-0.3898,  0.1902],
        [ 0.9608, -1.0769],
        [-0.4479, -1.0935]], requires_grad=True)


In PyTorch, the choice of weight initialization technique can significantly impact the training and performance of neural networks. Here’s a brief guide on when to use each method:

### Xavier (Glorot) Initialization

- **When to Use**: 
  - Best suited for layers with **sigmoid** or **tanh** activation functions.
  - Helps maintain the variance of activations across layers, preventing vanishing or exploding gradients.

- **How to Implement**:
  - Use `torch.nn.init.xavier_uniform_()` for uniform distribution or `torch.nn.init.xavier_normal_()` for normal distribution.

### Kaiming (He) Initialization

- **When to Use**:
  - Recommended for layers with **ReLU** or **Leaky ReLU** activation functions.
  - Specifically designed to account for the non-linearity of ReLU, which can lead to dead neurons if weights are not initialized properly.

- **How to Implement**:
  - Use `torch.nn.init.kaiming_uniform_()` or `torch.nn.init.kaiming_normal_()`.

### Summary

- Use **Xavier Initialization** for sigmoid/tanh activations to stabilize gradients.
- Use **Kaiming Initialization** for ReLU activations to enhance learning efficiency and avoid dead neurons. 

Choosing the appropriate initialization method can lead to faster convergence and better overall performance in training neural networks.

In [3]:
class MyModel2(nn.Module):
    def __init__(self):
        super(MyModel2, self).__init__()
        self.linear = nn.Linear(2, 3)  # Define a linear layer

    def forward(self, x):
        return self.linear(x)

# Create an instance of the model
model = MyModel2()

# Print the initialized weights
print(model.linear.weight)

Parameter containing:
tensor([[ 0.0571,  0.6348],
        [-0.4832, -0.2329],
        [ 0.1129, -0.2058]], requires_grad=True)


In [5]:
class MyModel3(nn.Module):
    def __init__(self):
        super(MyModel3, self).__init__()
        self.linear = nn.Linear(2, 3)  # Define a linear layer
        nn.init.xavier_normal_(self.linear.weight)

    def forward(self, x):
        return self.linear(x)

# Create an instance of the model
model = MyModel3()

# Print the initialized weights
print(model.linear.weight)

Parameter containing:
tensor([[-0.0337, -0.2174],
        [ 0.2005, -0.4811],
        [ 0.0487, -0.2465]], requires_grad=True)


In [2]:
def max_alternating_sequence(arr):
    if not arr:  # if the list is empty
        return 0
    
    max_count = 1  # Start with 1 as a minimum count
    current_count = 1  # Current sequence count
    
    for i in range(1, len(arr)):
        if arr[i] != arr[i - 1]:  # If current element is different from the previous
            current_count += 1
        else:  # Reset the sequence when a repeat is found
            max_count = max(max_count, current_count)
            current_count = 1  # Restart the count with the new sequence
            
    # Final comparison to check the last sequence
    max_count = max(max_count, current_count)
    
    return max_count

In [4]:
arr1 = [0]
arr2 = [0, 1, 0, 1, 0]
arr3 = [0, 1, 0, 1, 0, 1, 1, 1, 0]  # Expected output: 2
print(max_alternating_sequence(arr1))  # Output: 1
print(max_alternating_sequence(arr2))  # Output: 5
print(max_alternating_sequence(arr3))

1
5
6
