In [1]:
import torch
from typing import List, Tuple

Question 1 - Tensor Mechanics

What's wrong with the following pieces of code, fix and please explain the reasoning to the interviewer

1A

Perform a basic conv followed by a relu and another conv. What's wrong with the following code? Can you fix it?

In [None]:
x = torch.rand(3,1,3,3) # B,C,H,W
conv1 = torch.nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3)
conv2 = torch.nn.Conv2d(in_channels=16, out_channels=64, kernel_size=3)
out = conv1(x)
out = torch.relu(out)
out = conv2(out)

Solution:

conv1: in_channels = 1

conv2 in_channels = 6, kernel_size=1

1B

Swap dimensions and reshape.Why doesn't this work?

In [None]:
x = torch.rand(3,4,5)
x.transpose(1,2).view(15,-1)

Solution:

x.reshape(15, 4)

Because view requires the tensor to be stored in contiguous spot to display it, but does not change its storage

transpose() can make a tensor non-contiguous

reshape() = contiguous().view() by making sure the tensor is stored contiguously before viewing it

1C

We have a large vector of points in 3D (x,y,z) and want to project it to a 2D screen in image space (u,v).

The linear matrix equation is U = K * P * X

where U the coordinates in image space, a Nx3 vector in homogenous coordinates (u,v,1).

Simply ignore the last dimension if you are not familiar with homogenous coordinates.

K is a 3x3 camera matrix 

P is a 3x4 projection matrix (it is a horizontal concat of a 3x3 rotation matrix R with a 3x1 translation vector T)

X is a Nx4 vector point in 3d (again in homogenous coordinates) -> (x,y,z,1)

Can you correct all the errors in the following project_camera function? Hint: you don't need to understand the theory behind camera projections. You can solve this question entirely based on fixing the tensor dimensions.


In [None]:
def project_camera(X,R,T,K):
    P = torch.stack(R, T)
    U = K * P * X
    norm_U = torch.div(U, U[:,2])
    return norm_U


X = torch.rand(1000,4)
R = torch.rand(3,3)
T = torch.rand(3,1)
K = torch.rand(3,3)
result = project_camera(X,R,T,K)

Solution:

In [None]:
def project_camera(X,R,T,K):
    P = torch.cat((R, T), dim=1)
    # print(P.shape)
    U = K @ P @ X.T
    U = U.T
    # print(U.shape)
    norm_U = torch.div(U, U[:,[2]])
    return norm_U


X = torch.rand(1000,4)
R = torch.rand(3,3)
T = torch.rand(3,1)
K = torch.rand(3,3)
result = project_camera(X,R,T,K)

torch.Size([3, 4])
torch.Size([1000, 3])



Question 2 - Debug a neural network

2A. Debug the network

Our intern, Bob, decided to build a neural network to find the location of a circle in an image and output the color. However, the network does not train properly nor output the correct results. Please help the intern debug it! You will be provided a dataset that generate images with a single circle in them with a color that can be in set of (red,green,blue).

Dataset

The CircleDataset generates blank canvases and draws a single circle with random radius, position and color. It also contains a visualization utility function to help visualize the dataset and predictions. For the purpose of this interview, you can safely assume this code is correct.


In [None]:

                            color: List[int],
                            fill: bool = True) -> torch.Tensor:
    blank_image = np.zeros((img_h,img_w,3), np.uint8)
    coords = (cx-r,cy-r,cx+r,cy+r)
    cv2.circle(blank_image, (int(cx), int(cy)), max(int(r),0), color, cv2.FILLED if fill else 3)
    img_arr = torch.from_numpy(blank_image.transpose((2, 0, 1))).type(torch.float)
    return img_arr

  def _generate_data(self) -> Tuple[List,List,List]:
    images, color_labels, pos_labels = [],[],[]

Model

The following code defines a simple model that can output a position, size, and color for a circle. You should NOT assume this code is correct.

In [None]:
import torch.nn as nn

class CircleRegressor(nn.Module):
    def __init__(self):
        super().__init__()
        self.loc_loss = nn.MSELoss()
        self.color_loss = nn.CrossEntropyLoss()
        self.backbone = nn.Sequential()

    def forward(self, x):
        x = self.backbone(x)

    def loss(self, loc, loc_label, color, color_label):
        loc_loss = self.loc_loss(loc_label, loc_label)
        color_loss = self.color_loss(color, color_label)
        return loc_loss, color_loss

cr = CircleRegressor()
loc, color = cr.forward(torch.rand(1,3,100,100))
print(loc, color)


Training code

The following code is used to train and eval the provided model and dataset. After training for 10 epochs, Bob tries to visualize the eval outputs, the results show circle predictions that are almost always a single color and not well localized. The solid circles represent the ground truths and the circle outlines represent the predictions. Can you help Bob fix his model?

In [None]:

    loss_color = []
    for epoch in range(0, num_epochs):
        for i, (inputs, loc_label, color_label) in enumerate(train_loader):
            optimizer.zero_grad()
            location, color = model.forward(inputs)
            loc_loss, color_loss = model.compute_loss(location, loc_label, color, color_label)
            total_loss = loc_loss*0.01 + color_loss
            loss_loc.append(loc_loss.item())
            loss_color.append(color_loss.item())
            total_loss.backward()



2B. Follow-ups discussions (No coding required)

CircleRegressor statically depends on the input image size, how can we make it dynamic? (ie: If you try to change the input image size from 100 to 99, the network will break.)

What if we want to train the network on a batch of non-uniform sized images as input? For example we want to compose a batch of 2 images where image 1 has size 100 and image 2 has size 99.

How do we train CircleRegressor in parallel on multiple GPUs or nodes?