<div class="alert alert-block alert-success">
  <h3><center>MSDS-462: Computer Vision</center></h3>
  <h2><center>Assignment 1A: Lab 1 - An Image is Just Numbers</center></h2>
  <b>Author</b>: Aishwarya Mathuria
</div>

<div class="alert alert-block alert-info">
    <h2>Setup</h2>
    <p>Import torch, torchvision, and cv2 (OpenCV).</p>
</div>

In [4]:
import torch
import torchvision
from torchvision import transforms
import cv2
import numpy as np
import requests

<div class="alert alert-block alert-info">
    <h2>Load Image</h2>
    <p>Use cv2.imread() to load a sample image from a URL or file path.</p>
</div>

In [5]:
image_url = "https://raw.githubusercontent.com/aishwaryamathuria/NWU_MSDS462/refs/heads/main/Assignment1A/test_image.jpg"
response = requests.get(image_url)
image_path = "temp_image.jpg"
with open(image_path, "wb") as f:
    f.write(response.content)

image_bgr = cv2.imread(image_path)

# Convert BGR to RGB
image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)

<div class="alert alert-block alert-info">
    <h2>Convert to Tensor</h2>
    <p>Use torchvision.transforms.ToTensor() to convert the image into a PyTorch tensor.</p>
</div>

In [6]:
to_tensor = transforms.ToTensor()
image_tensor = to_tensor(image_rgb)

<div class="alert alert-block alert-info">
    <h2>Inspect the Tensor</h2>
    <ul>
        <li>Print the tensor to observe that it's just a collection of numbers.</li>
        <li>Print the tensor's .shape, .dtype, and .device.</li>
        <li>In a markdown cell, explain what each dimension of the shape represents (e.g., for an RGB image, what do the three dimensions stand for?).</li>
    </ul>
</div>

In [7]:
print("Tensor values:")
print(image_tensor)

Tensor values:
tensor([[[0.6863, 0.6824, 0.6667,  ..., 0.5020, 0.5020, 0.5020],
         [0.6902, 0.6824, 0.6706,  ..., 0.5059, 0.5059, 0.5098],
         [0.6941, 0.6863, 0.6667,  ..., 0.5137, 0.5176, 0.5176],
         ...,
         [0.0314, 0.0941, 0.0471,  ..., 0.2706, 0.1765, 0.1216],
         [0.0196, 0.1098, 0.0235,  ..., 0.3765, 0.2784, 0.2784],
         [0.0078, 0.0784, 0.0196,  ..., 0.3098, 0.2667, 0.3333]],

        [[0.4824, 0.4784, 0.4588,  ..., 0.3333, 0.3333, 0.3333],
         [0.4863, 0.4784, 0.4627,  ..., 0.3373, 0.3373, 0.3412],
         [0.4902, 0.4824, 0.4706,  ..., 0.3412, 0.3490, 0.3490],
         ...,
         [0.0275, 0.0980, 0.0510,  ..., 0.2824, 0.1843, 0.1294],
         [0.0235, 0.1137, 0.0275,  ..., 0.3882, 0.2863, 0.2863],
         [0.0157, 0.0863, 0.0275,  ..., 0.3333, 0.2902, 0.3569]],

        [[0.5725, 0.5686, 0.5608,  ..., 0.4510, 0.4431, 0.4431],
         [0.5765, 0.5686, 0.5647,  ..., 0.4549, 0.4471, 0.4510],
         [0.5804, 0.5725, 0.5686,  ..., 0.4

In [8]:
print("\nTensor shape:", image_tensor.shape)
print("Tensor dtype:", image_tensor.dtype)
print("Tensor device:", image_tensor.device)


Tensor shape: torch.Size([3, 183, 275])
Tensor dtype: torch.float32
Tensor device: cpu


<div style="background: #e6e6fa; padding: 15px 30px; border: 1px solid #a3a3eb">
    <h2>Explanation</h2>
    <p>The tensor shape here is [3, 183, 275] which maps to (C, H, W) which is a standard format expected by PyTorch.</p>
    <p>The first dimension C here has value (3) which represents the number of color channels in the image. Since this is an RGB image, there are three channels: <strong>Red, Green, and Blue</strong>.</p>
    <p>The second dimension H here has value (183) which represents the height of the image which is 183px measured along the vertical axis.</p>
    <p>The third dimension W here has value (275) which represents the width of the image which is 275px measured along the horizontal axis.</p>
    <p>The tensorâ€™s data type is float32, and all pixel values are normalized to the range 0.0 to 1.0. The ToTensor() here has normalized the original uint8 pixel values (which range from 0 to 255) into floating-point values suitable for neural networks. Each number in the tensor corresponds to the intensity of a specific color channel at a particular pixel location.</p>
</div>


<div class="alert alert-block alert-info">
    <h2>Simple Manipulation with OpenCV</h2>
    <p>Use a simple OpenCV function like cv2.cvtColor() to convert the image to grayscale and repeat the inspection of the tensor's shape.</p>
</div>

In [9]:
gray_image = cv2.cvtColor(image_rgb, cv2.COLOR_RGB2GRAY)
gray_tensor = to_tensor(gray_image)

In [10]:
print("Tensor values:")
print(gray_tensor)

Tensor values:
tensor([[[0.5529, 0.5490, 0.5333,  ..., 0.3961, 0.3961, 0.3961],
         [0.5569, 0.5490, 0.5373,  ..., 0.4000, 0.4000, 0.4039],
         [0.5608, 0.5529, 0.5412,  ..., 0.4078, 0.4118, 0.4118],
         ...,
         [0.0275, 0.0941, 0.0471,  ..., 0.2549, 0.1608, 0.1137],
         [0.0196, 0.1098, 0.0235,  ..., 0.3608, 0.2627, 0.2627],
         [0.0118, 0.0824, 0.0235,  ..., 0.2941, 0.2510, 0.3176]]])


In [11]:
print("\nTensor shape:", gray_tensor.shape)
print("Tensor dtype:", gray_tensor.dtype)
print("Tensor device:", gray_tensor.device)


Tensor shape: torch.Size([1, 183, 275])
Tensor dtype: torch.float32
Tensor device: cpu


<div style="background: #e6e6fa; padding: 15px 30px; border: 1px solid #a3a3eb">
    <h2>Explanation</h2>
    <p>After converting the image to grayscale, the tensor shape (C, H, W) changed from [3, 183, 275] to [1, 183, 275]. The first dimension here is now (1) which shows that in a grayscale image there is only a single channel because each pixel is described by one intensity value rather than separate red, green, and blue components. This single value represents the brightness of the pixel.</p>
    <p>The second dimension (183) represents the height of the image which is 183 px and this remains unchanged from the original RGB image because grayscaling does not affect image resolution. The third dimension (275) represents the width of the image which is 275px and remains unchanged after conversion to grayscale as well. The tensor's data type is float32, and the pixel values are normalized to the range 0.0 to 1.0, just like in the RGB case. These values represent grayscale intensity, where lower values correspond to darker pixels and higher values correspond to brighter pixels.</p>
</div>


<div class="alert alert-block alert-info">
    <h2>Reflect</h2>
    <p>In a final markdown cell, write 2-3 sentences answering the prompt: "After seeing an image represented as a tensor, what do you think is the biggest challenge for a computer trying to identify the main object in that image?"</p>
</div>

<div style="background: #e6e6fa; padding: 15px 30px; border: 1px solid #a3a3eb">
    <h2>Reflection</h2>
    <p>In my opinion, the biggest challenge for a computer trying to identify an image is that the computer only sees a grid of numbers, not an object the way a human does. It has to learn which parts of those numbers represent the main object and which parts are just background and hence small variations in how an object looks, backgrounds, shadows and lighting changes can easily confuse the model, even when the main object feels obvious to a human.</p>
</div>
