**Objects In 3D**

**Problem Statement**

APOCALYPSE, the nefarious organization bent on global domination, is on the cusp of unleashing their most devastating weapon yet. But there's hope. Deep within these images lie the key to stopping them.

At the heart of the image is a cube, seemingly simple yet hiding a complex secret. Its position and orientation hold the power to unlock the secrets of APOCALYPSE's deadliest weapon. But time is running out, and we need your help to solve the puzzle.

Attached Files
[OutofPerspective.tar.xz](https://api.t.cyberthon24.ctf.sg/file?id=clu5ovpwl0aws0806bhyjkhii&name=OutofPerspective.tar.xz)

**Solution**

We are going to use ResNet152 to get the maximum possible results

The code below are to be used for colab

You need to put OutofPerspective.tar.xz into a folder <br>
Mount Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
!tar -xf /content/drive/MyDrive/Cyberthon/OutofPerspective.tar.xz

Import necessary libraries

In [None]:
import pandas as pd
import numpy as np
import torch
import torchvision.transforms as transforms
import torchvision.transforms.functional as TF
import torch.nn as nn
from torchvision.utils import make_grid
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
import torch.optim as optim
from torch.optim import lr_scheduler
from torchvision import models
from PIL import Image

Creates the CustomDataset for the training data

In [None]:
class CustomDataset(Dataset):
    def __init__(self, csv_file, root_dir, transform=None):
        self.df = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        img_name = self.df.iloc[idx, 0]
        img_path = f"{self.root_dir}/{img_name}"
        image = Image.open(img_path)
        if self.transform:
            image = self.transform(image)
        posx = self.df.iloc[idx, 1]
        posy = self.df.iloc[idx, 2]
        rotx = self.df.iloc[idx, 3]
        roty = self.df.iloc[idx, 4]

        labels = torch.tensor([posx, posy, rotx, roty], dtype=torch.float32)

        return image, labels

Intialise the dataset with training data <br>
The transform is required to change the images into 3 channels instead of 4

In [None]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x[:3, :, :]),
])

dataset = CustomDataset(csv_file='package/train.csv', root_dir='package/renders', transform=transform)
cube_loader = DataLoader(dataset, batch_size=32, shuffle=True)

Print out the first image in dataset

In [None]:
image = TF.to_pil_image(dataset[0][0].cpu())
image

Create the ResNet152 and fine tune it for our needs <br>
We have 4 outputs which are posx, posy, rotx, roty so we set the final output to be 4

In [None]:
model = models.resnet152(pretrained=True)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 4)

We use HuberLoss as our loss function <br>
We also use a learning rate scheduler to increase training efficiency

In [None]:
model = model.to("cuda" if torch.cuda.is_available() else "cpu")
loss_func = nn.HuberLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0001)
scheduler = lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

Time to train the model with our dataset

In [None]:
EPOCHS = 30
for epoch in range(EPOCHS):
    model.train()
    epoch_losses = []
    for x, y in cube_loader:
        optimizer.zero_grad()
        pred = model(x.to("cuda" if torch.cuda.is_available() else "cpu"))
        loss = loss_func(pred, y.to("cuda" if torch.cuda.is_available() else "cpu"))
        epoch_losses.append(loss.item())
        loss.backward()
        optimizer.step()
    epoch_loss = np.mean(epoch_losses)
    print(f"[{epoch+1}th Epoch] Training Loss: {epoch_loss}")
    scheduler.step()

Now load test.csv, get all the images and plug it into the model <br>
Lastly, write outputs back to test.csv

In [None]:
model.eval()

df = pd.read_csv("package/test.csv")

pred_list = []
for row in df["file"]:
    img_path = f"package/renders/{row}"
    img = Image.open(img_path).convert('RGB')
    img_tensor = transform(img).unsqueeze(0)

    with torch.no_grad():
        predict = model(img_tensor.to("cuda" if torch.cuda.is_available() else "cpu"))
        print([row, round(predict[0][0].item(),3) , round(predict[0][1].item(), 3), round(predict[0][2].item(), 3), round(predict[0][3].item(), 3)])
    pred_list.append([row, round(predict[0][0].item(),3) , round(predict[0][1].item(), 3), round(predict[0][2].item(), 3), round(predict[0][3].item(), 3)])

pred_df = pd.DataFrame(pred_list, columns=["file", "posx", "posy", "rotx", "roty"])
pred_df.to_csv("package/test.csv", index=False)


This yields and accuracy of **89.3600%** which is the highest right now