# COMP52715 Coursework: Task 1 - 3D PacMan
Task 1 is to complete the below skeleton code to play a Pac-Man-esque game in 3D. You are allowed to utilise the deep learning method discussed in the course. It is designed to test multiple components of the module syllabus, including 3D geometry, object detection and image manipulation.

The aim of the coursework is to step through a 3D pointcloud of the mysterious PhD lab. Several large spheres have been placed within the space, it is your job to move through the pointcloud in an automated fashion, detecting the location of the sphere and moving to the predicted 3D location. If you land close enough to a sphere it will be captured and removed from the pointcloud. 

You will need to design deep neural networks to detect the spheres within an image. You can then use the helper function provided in the PacMan_Helper.py module to obtain the XYZ coordinates of the pixel you predict to be a sphere.


## Imports
Here we will do our usual imports. I would recommend numpy, scipy, skimage, sklearn, pytorch, and matplotlib. If you wish to utilise pointcloud visualisation then you can do that as described in the handout via Open3D. We will want to import our PacMan_Helper module as well.

In [1]:
from lib import PacMan_Helper_Accelerated as PacMan
# import any package that requires
import os
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, Subset
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
import copy
from ultralytics import YOLO
from PIL import Image
import open3d

Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.


In [2]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if device.type == 'cuda':
    gpu_name = torch.cuda.get_device_name(0)
    print(f'Using GPU: {gpu_name}')
else:
    print('CUDA is not available, using CPU.')

Using GPU: NVIDIA GeForce RTX 3070


## Game setup.
This cell will initialise the game world and add all of our spheres to the world. Do not edit the code here.

In [3]:
# Call startup_scene() to load the initial game scene
global_cloud, spheres_collected = PacMan.startup_scene()

In [None]:
# View our pointcloud if we want
pcd = open3d.geometry.PointCloud(open3d.utility.Vector3dVector(global_cloud['Positions']))
pcd.colors = open3d.utility.Vector3dVector(global_cloud['Colors']/255)

open3d.visualization.draw_geometries([pcd])capture_spheres()

## Load the training data: Positives and Negatives
In the handout zip file there is a directory which contains numerous patches extracted from sample images. These patches are labelled as either containing a sphere or not. You may wish to use these to train a classifier for sphere detection. This classifier can then be used later to detect spheres and move our camera towards them.

In [None]:
# Get the training samples for both positive and negative patches


## You are required to attempt both Basic Solution and Advance Solution

## Basic Solution : Train a DNN based binary classifier
Depending on your desired apporach, you may want to build a simple binary classification using the training patches you have been given. These can then be used to detect whether the image patch contains the target or not. Several backbone networks are talked about both in the lectures and labs. 

In [None]:
# Train a classification model to perform binary classification of the patch into whether it contains a sphere or not.

## Advance Solution : Train a DNN based object detector
Depending on your desired apporach, you may want to create a suitable dataset using both the training patches you have been given and the 3D point cloud dataset. These can then be used to train our desired object detector to detect the target in the image. Several object detection frameworks are talked about both in the lectures and labs. 

In [None]:
# Train a detector to perform target object detection.

## Initialise our starting point in the game and get our first view into the scene
We should start the game at a position of XYZ = [0, 0, 0] and a camera angle of [0, 0, 0]. We can then get our first image projected to the camera plane to start our game. 

In [4]:
model = YOLO('models/YOLO.pt')

In [15]:
# Initialise position and angle variable 
position = np.zeros([3])
angle = np.zeros([3])

In [6]:
def navigate(position, angle, target_position):
    # Calculate the direction vector towards the target
    direction = target_position - position
    direction = direction / np.linalg.norm(direction)
    
    # Calculate the yaw angle towards the target
    yaw = np.arctan2(direction[1], direction[0])
    
    # Calculate the pitch angle towards the target
    pitch = np.arctan2(direction[2], np.linalg.norm(direction[:2]))
    
    # Adjust the camera angle based on the yaw and pitch
    angle[0] = yaw
    angle[1] = pitch
    
    # Move the camera position towards the target
    position += direction * 0.5  # Adjust the step size as needed
    
    return position, angle

## Perform a detection-navigation loop to collect all objects.
This will be the main bulk of your implementation, utilising the trained models from the above cells. In each loop of the program we will want to:
-  Get current view into the scene, and use the trained model to detect the sphere
-  Collect the sphere by moving towards it. If you land close enough to the object it will be automatically captured and removed from the scene. 


In [7]:
x_range, y_range, z_range = PacMan.calculate_pointcloud_bounds()

In [3]:
def predict_target_position(model, image, mapx, mapy, mapz, depth):
    # Run inference on the image using your YOLO model
    results = model(image)

    # Process the results
    for result in results:
        boxes = result.boxes

        # Check if any target spheres are detected
        if len(boxes) > 0:
            # Get the bounding box coordinates of the first detected sphere
            target_bbox = boxes[0].xyxy.cpu().numpy().squeeze()

            # Get the center coordinates of the target bounding box
            center_x = (target_bbox[0] + target_bbox[2]) // 2
            center_y = (target_bbox[1] + target_bbox[3]) // 2

            # Get the corresponding 3D coordinates and depth value
            target_x = mapx[int(center_y), int(center_x)]
            target_y = mapy[int(center_y), int(center_x)]
            target_z = mapz[int(center_y), int(center_x)]
            target_depth = depth[int(center_y), int(center_x)]

            # Create the target position array
            target_position = np.array([target_x, target_y, target_z])

            return target_position

    # Return None if no target spheres are detected
    return None

In [4]:
def predict_target_position(model, image, mapx, mapy, mapz, depth):
    # Run inference on the image using your YOLO model
    results = model(image)

    # Process the results
    for result in results:
        boxes = result.boxes

        # Check if any target spheres are detected
        if len(boxes) > 0:
            # Get the bounding box coordinates of the first detected sphere
            target_bbox = boxes[0].xyxy.cpu().numpy().squeeze()

            # Get the center coordinates of the target bounding box
            center_x = (target_bbox[0] + target_bbox[2]) // 2
            center_y = (target_bbox[1] + target_bbox[3]) // 2

            # Get the corresponding 3D coordinates and depth value
            target_x = mapx[int(center_y), int(center_x)]
            target_y = mapy[int(center_y), int(center_x)]
            target_z = mapz[int(center_y), int(center_x)]
            target_depth = depth[int(center_y), int(center_x)]

            # Create the target position array
            target_position = np.array([target_x, target_y, target_z])

            return target_position

    # Return None if no target spheres are detected
    return None

In [9]:
def capture_spheres():
    # Load your pre-trained YOLO model
    model = YOLO('models/YOLO.pt')

    # Call startup_scene() to load the initial game scene
    global_cloud, spheres_collected = PacMan.startup_scene()

    # Initialize position and angle variables
    position = np.zeros(3)
    angle = np.zeros(3)

    # Set the maximum number of iterations
    max_iterations = 100

    # Set the angle step size for steering
    angle_step = np.pi / 8  # 22.5 degrees

    # Set the position step size for movement
    position_step = 0.5

    # Set the maximum number of consecutive no detections before moving
    max_no_detections = 25

    # Initialize the consecutive no detections counter
    consecutive_no_detections = 0

    # Iterate until all spheres are collected or the maximum number of iterations is reached
    for i in range(max_iterations):
        # Capture an image and its corresponding maps
        image, mapx, mapy, mapz, depth = PacMan.project_pointcloud_image(global_cloud, angle, position)

        # Convert the image to PIL Image format
        pil_image = Image.fromarray(np.uint8(image * 255))

        # Predict the target position
        predicted_position = predict_target_position(model, pil_image, mapx, mapy, mapz, depth)

        # Print the current position and angle
        print(f"Current Position: {position}")
        print(f"Current Angle: {angle}")

        if predicted_position is not None:
            # Reset the consecutive no detections counter
            consecutive_no_detections = 0

            # Update the current position to the predicted target position
            position = predicted_position

            # Update the scene and check if the sphere is captured
            prev_collected_count = sum(spheres_collected)
            global_cloud, spheres_collected = PacMan.update_scene(position, spheres_collected)

            # Check if a new sphere is captured
            if sum(spheres_collected) > prev_collected_count:
                # Print the number of spheres collected
                print(f"Spheres Collected: {sum(spheres_collected)}")
                print("---")

            # Check if all spheres are collected
            if all(spheres_collected):
                print("All spheres collected!")
                break
        else:
            # Increment the consecutive no detections counter
            consecutive_no_detections += 1

            # If the consecutive no detections exceed the maximum limit
            if consecutive_no_detections >= max_no_detections:
                # Reset the consecutive no detections counter
                consecutive_no_detections = 0

                # Move the camera position based on the current angle
                position[0] += position_step * np.cos(angle[1])  # Move in the x-direction
                position[2] += position_step * np.sin(angle[1])  # Move in the z-direction

            # Steer the angle
            angle[1] += angle_step  # Increment the yaw angle (rotation around the vertical axis)

            # Normalize the angle to keep it within the range [-pi, pi]
            angle[1] = np.arctan2(np.sin(angle[1]), np.cos(angle[1]))

            # Ensure the position stays within the point cloud boundaries
            x_range, y_range, z_range = PacMan.calculate_pointcloud_bounds()
            position = np.clip(position, [0, 0, 0], [x_range, y_range, z_range])

    print("Capturing spheres completed.")
    
    # Visualize the final 3D world using Open3D
    return global_cloud

In [10]:
final_world = capture_spheres()


0: 448x640 3 targets, 7.6ms
Speed: 0.6ms preprocess, 7.6ms inference, 0.4ms postprocess per image at shape (1, 3, 448, 640)
Current Position: [          0           0           0]
Current Angle: [          0           0           0]
Spheres Collected: 1
---

0: 448x640 3 targets, 8.3ms
Speed: 0.7ms preprocess, 8.3ms inference, 0.6ms postprocess per image at shape (1, 3, 448, 640)
Current Position: [   -0.18764    0.057548      2.3094]
Current Angle: [          0           0           0]
Spheres Collected: 2
---

0: 448x640 2 targets, 8.2ms
Speed: 0.7ms preprocess, 8.2ms inference, 0.5ms postprocess per image at shape (1, 3, 448, 640)
Current Position: [   -0.31516   -0.034299      4.6735]
Current Angle: [          0           0           0]
Spheres Collected: 3
---

0: 448x640 2 targets, 7.5ms
Speed: 0.8ms preprocess, 7.5ms inference, 0.5ms postprocess per image at shape (1, 3, 448, 640)
Current Position: [   -0.92114     0.19851      7.0118]
Current Angle: [          0           0   

In [None]:
np.save('final_world_positions.npy', final_world['Positions'])
np.save('final_world_colors.npy', final_world['Colors'])

In [11]:
pcd = open3d.geometry.PointCloud(open3d.utility.Vector3dVector(final_world['Positions']))
pcd.colors = open3d.utility.Vector3dVector(final_world['Colors']/255)

open3d.visualization.draw_geometries([pcd])

In [None]:
while not np.all(spheres_collected): # While there are spheres to find
# Get current image from viewpoint
    
# Use the trained model to detect the sphere
    
# You may use prediction confidences ("probabilities") to find the sphere coordinates in 3D

# Update camera appropriately
    
# Update scene if needed