

> Add blockquote



**AI Lab: Deep Learning for Computer Vision**


Error Caused By Incorrect Input Layer Size






Let's start with imports, then define and run a PyTorch model.

In [None]:
import torch
import torch.nn as nn

The code crashes because of a RuntimeError, which means something went wrong when the program tried to run the model.

To understand this kind of error, it's best to read the stack trace from the bottom up. The important line in your code is:

python
Copy
Edit
output = model(input_tensor)
This line sends input data through your model to get predictions. But something goes wrong during this process.

The error message tells us exactly why:

java
Copy
Edit
mat1 and mat2 shapes cannot be multiplied (32x100 and 3200x128)
This means you're trying to multiply two matrices, but their shapes don't line up properly:

The input has shape (32, 100) → that's 32 samples with 100 features each.

The model expects an input of shape (?, 3200) because its first layer is defined as nn.Linear(3200, 128).

But in matrix multiplication, the number of columns in the first matrix must match the number of rows in the second matrix.
In this case:

100 columns (from your input)

3200 rows (from your model layer)

Those don’t match — so the multiplication fails

In [None]:
# Define a model
model = torch.nn.Sequential()
linear1 = nn.Linear(in_features=100, out_features=128)
model.append(linear1)
model.append(torch.nn.ReLU())
model.append(torch.nn.Dropout())
linear2 = nn.Linear(in_features=128, out_features=64)
model.append(linear2)
model.append(torch.nn.ReLU())
model.append(torch.nn.Dropout())
linear3 = nn.Linear(in_features=64, out_features=10)
model.append(linear3)
model.append(torch.nn.ReLU())
model.append(torch.nn.Dropout())
print(model)

# Create a random input tensor
input_tensor = torch.randn(size=(32, 100))
print("Input shape:", input_tensor.shape)

# Run the model
output = model(input_tensor)
print("Output shape:", output.shape)

Sequential(
  (0): Linear(in_features=100, out_features=128, bias=True)
  (1): ReLU()
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=128, out_features=64, bias=True)
  (4): ReLU()
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=64, out_features=10, bias=True)
  (7): ReLU()
  (8): Dropout(p=0.5, inplace=False)
)
Input shape: torch.Size([32, 100])
Output shape: torch.Size([32, 10])


**Task 2.1.1: Modify the model to match the input tensor size.**

In [None]:
# Define a revised model
model = torch.nn.Sequential()
linear1 = nn.Linear(in_features=10, out_features=64)
model.append(linear1)
model.append(torch.nn.ReLU())
model.append(torch.nn.Dropout())
linear2 = nn.Linear(in_features=64, out_features=32)
model.append(linear2)
model.append(torch.nn.ReLU())
model.append(torch.nn.Dropout())
linear3 = nn.Linear(in_features=32, out_features=1)
model.append(linear3)
model.append(torch.nn.ReLU())
model.append(torch.nn.Dropout())
print(model)

# Create a random input tensor
input_tensor = torch.randn(size=(32, 10))
print("Input shape:", input_tensor.shape)

# Run the model
output = model(input_tensor)
print("Output shape:", output.shape)

Sequential(
  (0): Linear(in_features=10, out_features=64, bias=True)
  (1): ReLU()
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=64, out_features=32, bias=True)
  (4): ReLU()
  (5): Dropout(p=0.5, inplace=False)
  (6): Linear(in_features=32, out_features=1, bias=True)
  (7): ReLU()
  (8): Dropout(p=0.5, inplace=False)
)
Input shape: torch.Size([32, 10])
Output shape: torch.Size([32, 1])


Error Caused By Adding The Same Layer Twice
Let's build another PyTorch model, this time a Convolutional Neural Network (CNN).

In [None]:
# Define a model
model = torch.nn.Sequential()
conv1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1)
model.append(conv1)
model.append(torch.nn.ReLU())
max_pool1 = nn.MaxPool2d(2, 2)
model.append(max_pool1)
conv2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
model.append(conv2)  # Add the new layer
model.append(torch.nn.ReLU())
model.append(max_pool1)
print(model)

# Create a random input tensor
input_tensor = torch.randn(size=(1, 3, 224, 224))
print("Input shape:", input_tensor.shape)

# Run the model
output = model(input_tensor)
print("Output shape:", output.shape)

Sequential(
  (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): ReLU()
  (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
Input shape: torch.Size([1, 3, 224, 224])
Output shape: torch.Size([1, 32, 56, 56])


When the cell is executed, it also generates a RuntimeError. The message "Given groups=1, weight of size [16, 3, 3, 3], expected input[1, 16, 224, 224] to have 3 channels, but got 16 channels instead" is a clue that the dimensions do not line up correctly. That is caused because the same layer was accidentally added twice to a model. This is typically caused by a copy-and-paste mistake.

To resolve this issue, you'll need to carefully review your code and identify where this dimensional inconsistency occurs. Pay particular attention to the layer where you might have accidentally duplicated a component, leading to unexpected channel dimensions.

The code can be fixed by adding a different layer that has the appropriate dimensions.

**Task 2.1.2: Fix a RuntimeError by not adding the same layer twice.**

In [None]:
# Define a model
model = torch.nn.Sequential()
conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=2, padding=1)
model.append(conv1)
model.append(torch.nn.ReLU())
max_pool1 = nn.MaxPool2d(2, 2)
model.append(max_pool1)
conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, padding=1)
model.append(conv2)
model.append(torch.nn.ReLU())
max_pool2 = nn.MaxPool2d(2, 2)
model.append(max_pool2)
conv3 = nn.Conv2d(in_channels=16, out_channels=8, kernel_size=3, padding=1)
model.append(conv3)
model.append(torch.nn.ReLU())
max_pool3 = nn.MaxPool2d(2, 2)
model.append(max_pool3)
conv4 = nn.Conv2d(in_channels=8, out_channels=1, kernel_size=2, padding=1)

model.append(conv4)
model.append(torch.nn.ReLU())
max_pool4 = nn.MaxPool2d(2, 2)
model.append(max_pool4)
print(model)

# Create a random input tensor
input_tensor = torch.randn(size=(1, 1, 224, 224))
print("Input shape:", input_tensor.shape)

# Run the model
output = model(input_tensor)
print("Output shape:", output.shape)

Sequential(
  (0): Conv2d(1, 8, kernel_size=(2, 2), stride=(1, 1), padding=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): ReLU()
  (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Conv2d(16, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): ReLU()
  (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (9): Conv2d(8, 1, kernel_size=(2, 2), stride=(1, 1), padding=(1, 1))
  (10): ReLU()
  (11): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
Input shape: torch.Size([1, 1, 224, 224])
Output shape: torch.Size([1, 1, 14, 14])


**Task 2.1.3: Fix a RuntimeError caused by forgetting to flatten.**

In [None]:
# Define a model
model = torch.nn.Sequential()
model.append(nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, padding=1))
model.append(nn.ReLU())
model.append(nn.MaxPool2d(kernel_size=2, stride=2))
model.append(nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1))
model.append(nn.ReLU())
model.append(nn.MaxPool2d(kernel_size=2, stride=2))
model.append(nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1))
model.append(nn.ReLU())
model.append(nn.MaxPool2d(kernel_size=2, stride=2))
model.append(nn.Flatten())  # add this layer for removing error
model.append(nn.Linear(in_features=64 * 28 * 28, out_features=256))
model.append(nn.ReLU())
model.append(nn.Linear(in_features=256, out_features=64))
model.append(nn.ReLU())
model.append(nn.Linear(in_features=64, out_features=10))
print(model)

# Create a random input tensor
input_tensor = torch.randn(size=(1, 3, 224, 224))
print("Input shape:", input_tensor.shape)

# Run the model
output = model(input_tensor)
print("Output shape:", output.shape)

Sequential(
  (0): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): ReLU()
  (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (3): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (4): ReLU()
  (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (7): ReLU()
  (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (9): Flatten(start_dim=1, end_dim=-1)
  (10): Linear(in_features=50176, out_features=256, bias=True)
  (11): ReLU()
  (12): Linear(in_features=256, out_features=64, bias=True)
  (13): ReLU()
  (14): Linear(in_features=64, out_features=10, bias=True)
)
Input shape: torch.Size([1, 3, 224, 224])
Output shape: torch.Size([1, 10])


**Task 2.1.4: Fix a RuntimeError caused by incorrect dimensions after flattening.**

In [None]:
import torch
import torch.nn as nn

model = torch.nn.Sequential()
model.append(nn.Conv2d(in_channels=3, out_channels=64, kernel_size=7, stride=2, padding=3))
model.append(nn.ReLU())
model.append(nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
model.append(nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1))
model.append(nn.ReLU())
model.append(nn.MaxPool2d(kernel_size=2, stride=2))
model.append(nn.Conv2d(in_channels=128, out_channels=256, kernel_size=3, padding=1))
model.append(nn.ReLU())
model.append(nn.MaxPool2d(kernel_size=2, stride=2))
model.append(nn.Flatten())

# ✅ Fixed in_features here
model.append(nn.Linear(in_features=256 * 14 * 14, out_features=1000))
model.append(nn.ReLU())
model.append(nn.Linear(in_features=1000, out_features=10))

print(model)

# Input
input_tensor = torch.randn(size=(1, 3, 224, 224))
print("Input shape:", input_tensor.shape)

# Run model
output = model(input_tensor)
print("Output shape:", output.shape)


**Task 2.2.1: Create a variable for the train directory.**

In [None]:
data_dir = "data_p2"
train_dir = os.path.join(data_dir,"train")

print("Data directory:", train_dir)

In the training directory, each class gets its own directory. There are five categories:

healthy
green mottle virus (CGM)
bacterial blight (CBB)
brown streak disease (CBSD)
mosaic disease (CMD)
We'll use the directory names as our classes.

In [None]:
classes = os.listdir(train_dir)
classes

Let's look at a few examples of each. The function below opens four randomly selected images with the PIL library we used in the previous project, and displays them in a line.

In [None]:
def sample_images(data_path, classname):
    # Gets the files in the directory
    class_dir = os.path.join(data_path, classname)
    if not os.path.exists(class_dir):
        return "Invalid directory"
    image_list = os.listdir(class_dir)
    if len(image_list) < 4:
        return "Not enough images in folder"

    # Pick four random images
    images_sample = random.sample(image_list, 4)

    # Plot them
    plt.figure(figsize=(20, 20))
    for i in range(4):
        img_loc = os.path.join(class_dir, images_sample[i])
        img = PIL.Image.open(img_loc)
        plt.subplot(1, 4, i + 1)
        plt.imshow(img)
        plt.axis("off")

In [None]:
sample_images(train_dir, "cassava-healthy")

**Task 2.2.2: Use the sample_images function to look at examples of the first disease class (green mottle virus).**

In [None]:
class_name =  classes[0]
print(class_name)

sample_images(train_dir, class_name)

**Task 2.2.3: Use the sample_images function to look at examples of the second disease class (bacterial blight).**

In [None]:
class_name = classes[1]
print(class_name)

sample_images(train_dir, class_name)

If you're using the classes variable, be careful about order. We want to skip over the healthy plants, since we've already seen them.

**Task 2.2.4: Use the sample_images function to look at examples of the third disease class (brown streak disease).**

In [None]:
class_name = classes[3]
print(class_name)

sample_images(train_dir, class_name)

**Task 2.2.5: Use the sample_images function to look at examples of the fourth disease class (mosaic disease).**

In [None]:
class_name = classes[4]
print(class_name)

sample_images(train_dir, class_name)

*Prepare our Dataset**
Now that we've seen the images we're working with, we need to get them ready for PyTorch. We'll use the same tools as the last project to load in the data.

We'll create transformations that:

Convert any grayscale images to RGB format with a custom class
Resize the image, so that they're all the same size (we chose
 x
, but other sizes would work as well)
Convert the image to a Tensor of pixel values

In [2]:
class ConvertToRGB():
   def __cal__(self,img):
      if img.mode !="RGB":
         img=img.convert("RGB")
      return img


In [None]:
transform_basic = transforms.Compose(
    [
        ConvertToRGB(),
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
    ]
)

**Task 2.2.6: Use ImageFolder to create a dataset applying the transform_basic, and DataLoader to create a data loader. Use a batch size of 32**


In [None]:
batch_size = 32
dataset = datasets.ImageFolder(root=train_dir,transform=transform_basic)
dataset_loader = DataLoader(dataset,batch_size=batch_size)

batch_shape = next(iter(dataset_loader))[0].shape
print("Getting batches of shape:", batch)

**Normalize Data**


In the last project, we saw that normalized data made our network perform better. That is data with a mean of
 and a standard deviation of
. We'll want to normalize this data as well.

This is the function we used in the last project to compute the mean and standard deviation of our images.

In [None]:
def get_mean_std(loader):
    """Computes the mean and standard deviation of image data.

    Input: a `DataLoader` producing tensors of shape [batch_size, channels, pixels_x, pixels_y]
    Output: the mean of each channel as a tensor, the standard deviation of each channel as a tensor
            formatted as a tuple (means[channels], std[channels])"""

    channels_sum, channels_squared_sum, num_batches = 0, 0, 0
    for data, _ in tqdm(loader, desc="Computing mean and std", leave=False):
        channels_sum += torch.mean(data, dim=[0, 2, 3])
        channels_squared_sum += torch.mean(data**2, dim=[0, 2, 3])
        num_batches += 1
    mean = channels_sum / num_batches
    std = (channels_squared_sum / num_batches - mean**2) ** 0.5

    return mean, std

**Task 2.2.7: Run the get_mean_std function on the dataset_loader, and save the means and standard deviations to variables mean and std. There should be a value for each color channel, giving us vectors of length 3**

In [None]:
mean, std = get_mean_std(dataset_loader)

print(f"Mean: {mean}")
print(f"Standard deviation: {std}")


As expected, the means aren't
 and the standard deviations aren't
. We'll again change our transformations to include a Normalize. As in the last project, we need to give it a mean and standard deviation for each color channel. This is what we computed with our function.

In [None]:
transform_norm = transforms.Compose(
    [
        ConvertToRGB(),
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=mean, std=std),
    ]
)

**Task 2.2.8: Make a new normalized dataset using ImageFolder and a new DataLoader. Be sure to use a the same batch size of 32**
.

In [None]:
norm_dataset = datasets.ImageFolder(root=train_dir,transform=transform_norm)
norm_loader = DataLoader(norm_dataset,batch_size=batch_size)

batch_shape = next(iter(norm_loader))[0].shape
print("Getting batches of shape:", batch_shape)

After the new normalize transformation, the data should have mean
 and standard deviation
 in each color channel. Let's check that.

**Task 2.2.9: Use the get_mean_std function to verify the mean and standard deviation are correct in the norm_loader data.**

In [None]:
norm_mean, norm_std = get_mean_std(norm_loader)

print(f"Mean: {norm_mean}")
print(f"Standard deviation: {norm_std}")

Train-Validation Split
The next thing we'll need to do is create a training dataset and a validation dataset. We can use the random_split tool as we did in the last project.


Task 2.2.10: Use random_split to create a training dataset with 80% of the data, and a validation dataset with 20% of the data. Be sure to use norm_dataset. **bold text**

In [None]:
train_dataset, val_dataset = random_split(norm_dataset,[0.8,0.2])

length_train = len(train_dataset)
length_val = len(val_dataset)
length_dataset = len(norm_dataset)
percent_train = np.round(100 * length_train / length_dataset, 2)
percent_val = np.round(100 * length_val / length_dataset, 2)

print(f"Train data is {percent_train}% of full data")
print(f"Validation data is {percent_val}% of full data")

As in the previous project, we want to make sure the training data and validation data are similar. We'll want to double check that the two sets have similar proportions of each class.

In the previous project, we used a class_counts function to count how many observations were in each class. This function is now in the training.py file, so we can import it from there. The function takes a dataset (not a data loader) and returns a pandas Series of the counts of each class.

In [None]:
train_counts = class_counts(train_dataset)
train_counts

In [None]:
train_counts.plot(kind="bar");

Unbalanced Classes






Unless you got really unlucky, the two graphs should be very similar.

But the different classes have different numbers of images. There are fewer bacterial blight images, and more brown streak disease and mosaic disease images. This can cause our model to be biased — it will be less likely to call an image bacterial blight. But that doesn't reflect the world, just our training set!

This is an example of unbalanced classes. We can correct this by adjusting our dataset. We can either get more images until they're about the same, or remove images from the ones with more.

Since we can't get more images, we'll remove some from the larger classes. This is called undersampling. The function below will do this for us.

In [None]:
def undersample_dataset(dataset_dir, output_dir, target_count=None):
    """
    Undersample the dataset to have a uniform distribution across classes.

    Parameters:
    - dataset_dir: Path to the directory containing the class folders.
    - output_dir: Path to the directory where the undersampled dataset will be stored.
    - target_count: Number of instances to keep in each class. If None, the class with the least instances will set the target.
    """
    # Mapping each class to its files
    classes_files = {}
    for class_name in os.listdir(dataset_dir):
        class_dir = os.path.join(dataset_dir, class_name)
        if os.path.isdir(class_dir):
            files = os.listdir(class_dir)
            classes_files[class_name] = files

    # Determine the minimum class size if target_count is not set
    if target_count is None:
        target_count = min(len(files) for files in classes_files.values())

    # Creating the output directory if it doesn't exist
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    # Perform undersampling
    for class_name, files in classes_files.items():
        print("Copying images for class", class_name)
        class_output_dir = os.path.join(output_dir, class_name)
        if not os.path.exists(class_output_dir):
            os.makedirs(class_output_dir)

        # Randomly select target_count images
        selected_files = random.sample(files, min(len(files), target_count))

        # Copy selected files to the output directory
        for file_name in tqdm(selected_files):
            src_path = os.path.join(dataset_dir, class_name, file_name)
            dst_path = os.path.join(class_output_dir, file_name)
            copy2(src_path, dst_path)

    print(f"Undersampling completed. Each class has up to {target_count} instances.")

**Task 2.2.12: Create a variable output_dir for the new directory data_p2/data_undersampled/train**

In [None]:
output_dir = os.path.join("data-p2","data_undersampled","train")
print("Output directory:", output_dir)

We'll erase that directory if it already exists, and run our function to create a new, balanced dataset.

In [None]:
! rm -rf {output_dir}

In [None]:
undersample_dataset(train_dir, output_dir)

The claim is that all classes should have the same number of images now. Let's check that by running our count. We'll need to create a new dataset first, using the new data.


Task 2.2.13: Create a dataset with ImageFolder using the data in the output_dir. Transform the data with the transform_norm. **bold text**


In [None]:
undersampled_dataset = datasets.ImageFolder(root=train_dir, transform=transform_norm)

In [None]:
undersampled_dataset.classes


And all the classes should now have the same number of counts. If we recreate the bar plot, every bar should be the same height. We'll also print out the counts to make sure they're exactly the same.

**Task 2.2.14: Use class_counts to make a pandas Series from the undersampled data, and create the bar chart from it.**

In [None]:
# Important, don't change this
fig, ax = plt.subplots(figsize=(10, 6))

under_counts = class_counts(undersampled_dataset)

# Create a bar chart from under_counts
# important, you must leave `ax=ax`
under_counts.plot(kind="bar", ax=ax)

from training import class_counts