**Computer Vision II: Final Project**

*Improving on 3D Reconstruction Using a Single Image*

Approach:

1) Integration of ResNet and VAE for Enhanced Feature Extraction and Model Flexibility:
- Approach from [19†source]: Use deep residual networks (ResNets) for their ability to learn deep features effectively due to residual connections that help mitigate the vanishing gradient problem in deep architectures.
- Residual Networks (ResNets): ResNets are beneficial for learning deep features from images due to their architecture that includes skip connections, allowing them to mitigate the vanishing gradient problem often encountered in deep neural networks. This characteristic makes them highly effective at processing complex image data and extracting rich feature sets that are crucial for accurate 3D modeling.
(More about ResNets can be found in He et al., 2016: https://arxiv.org/abs/1512.03385)
-------------------------------------------------------------------------------------------------
- Approach from [19†source]: Combine this with the capabilities of Variational Auto-Encoders (VAEs) to manage the encoding and decoding processes within a probabilistic framework. This would allow for the generation of continuous, dense latent spaces that are ideal for complex 3D shapes.
- Variational Auto-Encoders (VAEs): VAEs are designed to encode inputs into a compressed latent space and reconstruct the input from this space. In the context of 3D reconstruction, VAEs can be used to generate and refine 3D models from these latent representations. They provide a probabilistic framework that handles the inherent uncertainties in predicting 3D shapes from 2D images, making them particularly suitable for tasks where precise and detailed reconstructions are required.
(Introductory details on VAEs can be accessed in Kingma and Welling, 2013: https://arxiv.org/abs/1312.6114).
-------------------------------------------------------------------------------------------------
- Innovation: By integrating ResNets and VAEs, you can create a robust feature extraction mechanism that captures intricate details and variability of 3D shapes, while the VAE framework handles the uncertainty and variability in 3D object dimensions more naturally.
- Combination for 3D Reconstruction:
By combining ResNets and VAEs, you can leverage the deep feature extraction capabilities of ResNets along with the generative properties of VAEs. This combination allows for the detailed capture of 3D geometries from 2D data while accommodating the variability and complexity of real-world objects.
This approach enables the model to not only identify and replicate the general shape of the object but also to fine-tune the details and textures that are critical for a realistic 3D output.

Imports

In [None]:
!pip install tensorflow
!pip install numpy
!pip install matplotlib

In [1]:
import tensorflow as tf

print("TensorFlow version:", tf.__version__)
print("Is CUDA available:", tf.test.is_built_with_cuda())
print("GPUs available:", len(tf.config.list_physical_devices('GPU')))

ModuleNotFoundError: No module named 'tensorflow'

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import Layer, Input, Dense, Flatten, Reshape, Conv2DTranspose
from tensorflow.keras.models import Model
from tensorflow.keras.losses import MeanSquaredError

import torch
import torchvision.models as models
from torch import nn, optim
from torch.utils.data import DataLoader, Dataset
import torchvision.transforms as transforms, ToTensor, Resize, Compose
from PIL import Image
from torchvision.models import resnet18
import os
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from sklearn.manifold import TSNE
from vae_net import VAE

Pre-processing

In [None]:
class ShapeNetVoxelDataset(Dataset):
    def __init__(self, image_dir, voxel_dir, transform=None):
        """
        Args:
            image_dir (string): Directory with all the images.
            voxel_dir (string): Directory with all the voxel data, aligned with the images by filename.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.image_files = [os.path.join(image_dir, f) for f in os.listdir(image_dir) if f.endswith('.jpg') or f.endswith('.png')]
        self.voxel_dir = voxel_dir
        self.transform = transform

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, index):
        img_path = self.image_files[index]
        base_filename = os.path.basename(img_path)
        voxel_filename = base_filename.replace('.jpg', '.npy').replace('.png', '.npy')
        voxel_path = os.path.join(self.voxel_dir, voxel_filename)

        image = Image.open(img_path).convert('RGB')
        voxel = np.load(voxel_path)

        if self.transform:
            image = self.transform(image)

        image = ToTensor()(image)
        voxel = torch.tensor(voxel, dtype=torch.float32)

        return image, voxel

# Example usage
transform = Compose([Resize((256, 256))])
train_dataset = ShapeNetVoxelDataset(image_dir='3DRecon/train_imgs',
                                     voxel_dir='3DRecon/train_voxels',
                                     transform=transform)
train_loader = DataLoader(train_dataset, batch_size=10, shuffle=True)

val_dataset = ShapeNetVoxelDataset(image_dir='3DRecon/val_imgs',
                                   voxel_dir='3DRecon/val_voxels',
                                   transform=transform)
val_loader = DataLoader(val_dataset, batch_size=10, shuffle=False)

Paper 1: Deep Residual Learning for Image Recognition (ResNets)

In [None]:
# Load a pretrained ResNet50 model
resnet = models.resnet50(pretrained=True)
for param in resnet.parameters():
    param.requires_grad = False  # Freeze all layers to prevent learning during training

# Remove the final classification layer to use as a feature extractor
modules = list(resnet.children())[:-2]  # Remove the last pooling and linear layer
resnet = nn.Sequential(*modules)

Paper 2: Auto-Encoding Variational Bayes (VAEs)

In [None]:
# Setup args
class Args:
    def __init__(self, Nz, decoder_type):
        self.Nz = Nz
        self.decoder_type = decoder_type

# Initialize VAE with appropriate dimensions and arguments
vae_args = Args(Nz=100, decoder_type='gaussian')  # Example arguments
feature_dim = 2048 * 7 * 7  # Adjust based on your ResNet output
vae = VAE(args=vae_args, d=feature_dim, h_num=500, scaled=True)

# Adapt the forward function in the main training loop
def forward(data):
    data = resnet(data)  # Get features from ResNet
    data = data.view(data.size(0), -1)  # Flatten the features
    output, _, _, _ = vae(data)  # Assuming Gaussian output
    return output

Data-loader Batch Processing

In [None]:
# Define Loss Function for VAE with Gaussian outputs
def loss_function(recon_x, x, mu, logvar):
    BCE = nn.functional.mse_loss(recon_x, x, reduction='sum')  # Use MSE if output is continuous
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return BCE + KLD

# Example Training Loop
optimizer = torch.optim.Adam(vae.parameters(), lr=0.001)
for epoch in range(10):
    for data in train_loader:
        images, _ = data  # Assuming images are your input
        images = images.to(device)
        optimizer.zero_grad()
        outputs = forward(images)
        mu, logvar = vae.encode(images.view(images.size(0), -1))
        loss = loss_function(outputs, images.view(images.size(0), -1), mu, logvar)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

In [None]:
with torch.no_grad():
    sample = torch.randn(64, 20).to(device)
    sample = vae.decode(sample).cpu()
    # Visualize or further process the sample output

Visualizing Results

In [None]:
# Assume we have some output to visualize
fig, axes = plt.subplots(1, 2)
axes[0].imshow(batch[0].numpy().squeeze(), cmap='gray')  # Original Image
axes[1].imshow(reconstructed_images[0].detach().numpy().squeeze(), cmap='gray')  # Reconstructed Image
plt.show()

Alternative Approaches (For Experimental Purposes) - Simple CNN

In [None]:
class CNNto3D(nn.Module):
    def __init__(self):
        super(CNNto3D, self).__init__()
        # Feature Extractor
        self.feature_extractor = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2, padding=1),
            nn.Conv2d(64, 192, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=3, stride=2),
            # More layers as needed
        )
        # 3D Generator (simplified version, assuming output as voxels)
        self.generator = nn.Sequential(
            nn.Linear(192 * 28 * 28, 4096),  # Adjust size according to feature map size
            nn.ReLU(),
            nn.Linear(4096, 64*64*64),  # Assuming outputting a 64x64x64 voxel grid
            nn.Sigmoid()  # Assuming voxel values between 0 and 1
        )

    def forward(self, x):
        features = self.feature_extractor(x)
        features = features.view(features.size(0), -1)
        output_3d = self.generator(features)
        output_3d = output_3d.view(-1, 64, 64, 64)  # Reshape to voxel grid
        return output_3d

In [None]:
# Training

model = CNNto3D()  # Initializing the model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

criterion = nn.MSELoss()  # Using Mean Squared Error loss for voxel reconstruction
optimizer = optim.Adam(model.parameters(), lr=0.001)

def train_model(train_loader, val_loader, model, criterion, optimizer, num_epochs=10):
    for epoch in range(num_epochs):
        model.train()  # Set model to training mode
        running_loss = 0.0

        # Training phase
        for inputs, true_voxels in train_loader:
            inputs, true_voxels = inputs.to(device), true_voxels.to(device)

            optimizer.zero_grad()
            predicted_voxels = model(inputs)
            loss = criterion(predicted_voxels, true_voxels)
            loss.backward()
            optimizer.step()
            running_loss += loss.item()

        # Print training loss every epoch
        print(f'Epoch {epoch+1}, Train Loss: {running_loss / len(train_loader):.3f}')

        # Validation phase
        model.eval()  # Set model to evaluate mode
        val_loss = 0.0
        with torch.no_grad():
            for inputs, true_voxels in val_loader:
                inputs, true_voxels = inputs.to(device), true_voxels.to(device)
                predicted_voxels = model(inputs)
                loss = criterion(predicted_voxels, true_voxels)
                val_loss += loss.item()

        # Print validation loss every epoch
        print(f'Epoch {epoch+1}, Val Loss: {val_loss / len(val_loader):.3f}')

train_model(train_loader, val_loader, model, criterion, optimizer)

In [None]:
def visualize_slices(voxel_grid, slice_direction='z', num_slices=10):
    if slice_direction == 'z':
        slices = np.linspace(0, voxel_grid.shape[2] - 1, num_slices, dtype=int)
        fig, axs = plt.subplots(1, len(slices), figsize=(15, 3))
        for i, slice_idx in enumerate(slices):
            axs[i].imshow(voxel_grid[:, :, slice_idx], cmap='gray')
            axs[i].title.set_text(f'Slice {slice_idx}')
            axs[i].axis('off')
    plt.show()

# Example of how to use this function with a sample batch from the validation set
model.eval()
with torch.no_grad():
    images, voxels = next(iter(val_loader))
    images, voxels = images.to(device), voxels.to(device)
    preds = model(images)
    visualize_slices(preds[0].cpu().numpy())  # Visualizing the first voxel grid