### Progetto

1. **Encoder**:
   - rete CNN semplice (ad esempio, una versione ridotta di ResNet o VGG) per estrarre caratteristiche dalle immagini 2D.

2. **Aggregazione delle Caratteristiche**:
   - operazione di pooling (ad esempio max pooling o average pooling) per combinare le caratteristiche estratte dalle diverse visuali dell'oggetto.
   - In alternativa, concatenare le caratteristiche o utilizzare una semplice rete fully connected per combinarle.

3. **Decoder**:
   - decoder 3D che prende le caratteristiche aggregate e genera una rappresentazione voxelizzata 3D. Questo può essere fatto con una serie di strati di convoluzione 3D.


In [1]:
import torch
import torch.nn as nn
import torch.optim.lr_scheduler as lr_scheduler
# import the pretrained models
import torchvision.models as models
from torchvision.models import VGG16_Weights

class PrintSize(nn.Module):
  def __init__(self):
    super(PrintSize, self).__init__()
    
  def forward(self, x):
    print(x.shape)
    return x


class Simple3DReconstructionNet(nn.Module):
    """
    A simple 3D reconstruction network.

    This network takes in a batch of 2D images and reconstructs a 3D representation of the scene.

    Args:
        image_size (int): The size of the input images.

    Attributes:
        encoder (nn.Sequential): The encoder module that extracts features from the input images.
        fc (nn.Linear): The fully connected layer that maps the encoded features to a feature vector.
        gru (nn.GRU): The recurrent layer that combines the features from different views.
        fc3d (nn.Linear): The fully connected layer that maps the combined features to a 3D representation.
        decoder (nn.Sequential): The decoder module that reconstructs the 3D representation.

    """

    def __init__(self, use_lstm):
        super(Simple3DReconstructionNet, self).__init__()
        self.use_lstm = use_lstm
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=7, stride=1, padding=3),#137
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, padding=1),#69
            nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1),#69
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),#69
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, padding=1),#35
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),#35
            nn.ReLU(),
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),#35
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, padding=1),#18
            nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),#18
            nn.ReLU(),
            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),#18
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, padding=0),#9
            nn.Flatten()
        ) # given 137x137 images, the output of the encoder is 256 * 18 * 18
        
        #conv_output_size = 256 * (image_size // 8 + 1) * (image_size // 8 + 1)
        
        conv_output_size = 256 * 9 * 9
        self.fc = nn.Linear(conv_output_size, 512) # 2048 is the size of the feature vector 
        
        
        if not use_lstm:
            # a recurrent layer is used to combine the features from different views 
            self.gru = nn.GRU(512, 512, num_layers=4, batch_first=True) # hold a long-term memory of the features
        else:
            # alternative to the GRU layer, we can use a lstm layer
            self.lstm = nn.LSTM(512, 512, num_layers=4, batch_first=True)
        
        self.fc3d = nn.Linear(512, 128 * 4 * 4 * 4)
        self.decoder = nn.Sequential(
            nn.Conv3d(128, 128, kernel_size=3, stride=1, padding=1),#4
            nn.ReLU(),
            nn.Conv3d(128, 64, kernel_size=3, stride=1, padding=1),#4
            nn.ReLU(),
            nn.ConvTranspose3d(64, 64, kernel_size=4, stride=2, padding=1),#8
            nn.Conv3d(64, 64, kernel_size=3, stride=1, padding=1),#8
            nn.ReLU(),
            nn.Conv3d(64, 32, kernel_size=3, stride=1, padding=1),#8
            nn.ReLU(),
            nn.ConvTranspose3d(32, 32, kernel_size=4, stride=2, padding=1),#16
            nn.Conv3d(32, 32, kernel_size=3, stride=1, padding=1),#16
            nn.ReLU(),
            nn.Conv3d(32, 16, kernel_size=3, stride=1, padding=1),#16
            nn.ReLU(),
            nn.ConvTranspose3d(16, 16, kernel_size=4, stride=2, padding=1),#32
            nn.Conv3d(16, 16, kernel_size=3, stride=1, padding=1),#32
            nn.ReLU(),
            nn.Conv3d(16, 1, kernel_size=3, stride=1, padding=1),#32
            nn.Sigmoid()
        )
    
        
    def forward(self, x):
        """
        Forward pass of the network.

        Args:
            x (torch.Tensor): The input batch of 2D images.

        Returns:
            torch.Tensor: The reconstructed 3D representation.

        """
        batch_size = x.size(0)  # Get the batch size from the input images
        encoded_views = [self.encoder(view) for view in x]
        combined_features = torch.mean(torch.stack(encoded_views), dim=1)
        fc_out = self.fc(combined_features)
        if not self.use_lstm:
            fc_out, _ = self.gru(fc_out.unsqueeze(1))
        else:
            # use LSTM to combine the features
            fc_out, _ = self.lstm(fc_out.unsqueeze(1))
        
        fc_out = self.fc3d(fc_out)
        fc_out = fc_out.view(batch_size, 128, 4, 4, 4)  # Ensure the batch size is preserved
        reconstructed_3d = self.decoder(fc_out)
        return reconstructed_3d.squeeze(1)  # Remove the channel dimension


### Procedura di Addestramento e Valutazione

1. **Addestramento**:
   - Utilizza il dataset ShapeNet per generare immagini sintetiche multi-view degli oggetti.
   - Addestra la rete con una loss funzione come Binary Cross Entropy (BCE) tra la predizione voxel e il ground truth.

2. **Valutazione**:
   - Valuta il modello addestrato sui dati di test utilizzando IoU e voxel accuracy.
   - Confronta i risultati con i ground truth 3D voxelizzati.

### Scrittura della Loss Function e Addestramento del Modello

In [2]:
def read_binvox(filepath):
    with open(filepath, 'rb') as f:
        line = f.readline().strip()
        if not line.startswith(b'#binvox'):
            raise ValueError("Not a binvox file")
        
        dims = []
        translate = []
        scale = 0.0
        
        line = f.readline().strip()
        while line:
            if line.startswith(b'dim'):
                dims = list(map(int, line.split(b' ')[1:]))
            elif line.startswith(b'translate'):
                translate = list(map(float, line.split(b' ')[1:]))
            elif line.startswith(b'scale'):
                scale = float(line.split(b' ')[1])
            elif line.startswith(b'data'):
                break
            line = f.readline().strip()
        
        raw_data = np.frombuffer(f.read(), dtype=np.uint8)
        values, counts = raw_data[::2], raw_data[1::2]
        voxels = np.repeat(values, counts).astype(bool).reshape(dims)
        return voxels

In [3]:
def write_binvox(voxels, filepath, dims=(32, 32, 32), translate=(0.0, 0.0, 0.0), scale=1.0):
    """
    Writes a numpy array as a binvox file.
    
    Parameters:
    - voxels: numpy array of shape (dims)
    - filepath: path to save the binvox file
    - dims: dimensions of the voxel grid
    - translate: translation vector
    - scale: scale factor
    """
    with open(filepath, 'wb') as f:
        f.write(b'#binvox 1\n')
        f.write(f'dim {dims[0]} {dims[1]} {dims[2]}\n'.encode('ascii'))
        f.write(f'translate {translate[0]} {translate[1]} {translate[2]}\n'.encode('ascii'))
        f.write(f'scale {scale}\n'.encode('ascii'))
        f.write(b'data\n')
        
        voxels_flat = voxels.flatten()
        current_value = voxels_flat[0]
        count = 0
        
        for v in voxels_flat:
            if v == current_value:
                count += 1
                if count == 255:
                    f.write(bytes([int(current_value), int(count)]))  # Converti entrambi i valori in interi
                    count = 0
            else:
                f.write(bytes([int(current_value), int(count)]))  # Converti entrambi i valori in interi
                current_value = v
                count = 1
        
        if count > 0:
            f.write(bytes([int(current_value), int(count)]))  # Converti entrambi i valori in interi


In [4]:
def voxel2mesh(voxels, surface_view):
    cube_verts = [[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0],
                  [1, 1, 1]]  # 8 points

    cube_faces = [[0, 1, 2], [1, 3, 2], [2, 3, 6], [3, 7, 6], [0, 2, 6], [0, 6, 4], [0, 5, 1],
                  [0, 4, 5], [6, 7, 5], [6, 5, 4], [1, 7, 3], [1, 5, 7]]  # 12 face

    cube_verts = np.array(cube_verts)
    cube_faces = np.array(cube_faces) + 1

    scale = 0.01
    cube_dist_scale = 1.1
    verts = []
    faces = []
    curr_vert = 0

    positions = np.where(voxels > 0.3)
    voxels[positions] = 1 
    for i, j, k in zip(*positions):
        # identifies if current voxel has an exposed face 
        if not surface_view or np.sum(voxels[i-1:i+2, j-1:j+2, k-1:k+2]) < 27:
            verts.extend(scale * (cube_verts + cube_dist_scale * np.array([[i, j, k]])))
            faces.extend(cube_faces + curr_vert)
            curr_vert += len(cube_verts)  
              
    return np.array(verts), np.array(faces)


def write_obj(filename, verts, faces):
    """ write the verts and faces on file."""
    with open(filename, 'w') as f:
        # write vertices
        f.write('g\n# %d vertex\n' % len(verts))
        for vert in verts:
            f.write('v %f %f %f\n' % tuple(vert))

        # write faces
        f.write('# %d faces\n' % len(faces))
        for face in faces:
            f.write('f %d %d %d\n' % tuple(face))


def voxel2obj(filename, pred, surface_view = True):
    verts, faces = voxel2mesh(pred, surface_view)
    write_obj(filename, verts, faces)

In [5]:
def calculate_iou(pred, target):
    intersection = torch.sum((pred > 0.4) & target )
    union = torch.sum((pred > 0.4) | target )
    iou = intersection / union
    return iou

def calculate_accuracy(pred, target):
    correct = torch.sum((pred > 0.4) == (target > 0.4)).float()
    total = pred.numel()
    accuracy = correct / total
    return accuracy


In [None]:
pip install plotly

In [None]:
import plotly.graph_objs as go
def plot_voxel(voxel_data):
    x, y, z = np.where(voxel_data)
    
    fig = go.Figure(data=[go.Scatter3d(
        x=x, y=y, z=z,
        mode='markers',
        marker=dict(
            size=2,
            color='red',                # set color to an array/list of desired values
            opacity=0.8
        )
    )])

    fig.update_layout(scene=dict(
        xaxis_title='X',
        yaxis_title='Y',
        zaxis_title='Z'
    ))

    fig.show()


In [6]:
import os
import torch
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from PIL import Image
import numpy as np
from timeit import default_timer as timer

import torch.nn as nn
import torch.optim as optim
import matplotlib.pyplot as plt
import torch.optim.lr_scheduler as lr_scheduler

class ShapeNetDataset(Dataset):
    def __init__(self, data_dir, data_dir_vox):
        self.data_dir = data_dir
        self.data_dir_vox = data_dir_vox
        self.model_ids = [d for d in os.listdir(data_dir_vox) if os.path.isdir(os.path.join(data_dir_vox, d))]
        self.n_images = 5
        #print("number of images for this batch:", self.n_images)

    def __len__(self):
        return len(self.model_ids)

    def __getitem__(self, idx):
        model_id = self.model_ids[idx]
        images = []
        choosen_indxs = [-1]*self.n_images
        
        for i in range(self.n_images): # Assuming 24 views per model
            while True:
                ind = np.random.randint(24)
                if ind not in choosen_indxs:
                    choosen_indxs[i]=ind
                    break
            image_path = os.path.join(self.data_dir, model_id, f"rendering/{ind:02d}.png")
            image = Image.open(image_path).convert('RGB')
            image = transforms.ToTensor()(image)
            images.append(image)
        images = torch.stack(images)

        voxel_path = os.path.join(self.data_dir_vox, model_id, "model.binvox")
        voxels = read_binvox(voxel_path)
        voxels = torch.from_numpy(voxels).bool()
        return images, voxels

def voxel_loss(pred, target):
    return nn.BCELoss()(pred, target.float())

def early_stopping(current_loss, best_loss, previous_loss, patience, counter):
    
    if current_loss < previous_loss:
        counter = 0
        if current_loss < best_loss:
            best_loss = current_loss
    else:
        counter +=1
    
    '''
    if current_loss < best_loss:
        best_loss = current_loss
        counter = 0
    else:
        counter += 1
    '''
    
    if counter >= patience:
        return True, best_loss, current_loss, counter
    return False, best_loss, current_loss, counter

def train(model, device, train_dataloader, val_dataloader, optimizer, scheduer, path, epochs=10, checkpoint_path='../working/checkpoint.pth', patience=5):
    train_losses = []
    train_accuracies = []
    train_ious = []
    val_losses = []
    val_accuracies = []
    val_ious = []
    #all_val_losses = []
    #all_val_accuracies = []
    #all_val_ious = []

    # Check if a checkpoint exists
    if checkpoint_path is not None and os.path.exists(checkpoint_path):
        checkpoint = torch.load(checkpoint_path)
        model.load_state_dict(checkpoint['model_state_dict'])
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        start_epoch = checkpoint['epoch'] + 1 
        print(f"Resuming training from epoch {start_epoch + 1}")
    else:
        start_epoch = 0

    best_loss = float('inf')
    counter = 0

    for epoch in range(start_epoch, epochs):
        model.train()
        total_loss = 0
        total_accuracy = 0
        total_iou = 0
        #j = 0
        for i, (images, voxels) in enumerate(train_dataloader):
            images, voxels = images.to(device), voxels.to(device)
            optimizer.zero_grad()
            outputs = model(images)
            outputs = outputs.squeeze(1)  # Ensure the output dimensions match the target dimensions
            loss = voxel_loss(outputs, voxels)
            loss.backward()
            optimizer.step()
            accuracy = calculate_accuracy(outputs, voxels)
            iou = calculate_iou(outputs, voxels)
            total_loss += loss.item()
            total_accuracy += accuracy
            total_iou += iou
            
            if i % 10 == 0:
                print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item()}, Accuracy: {accuracy}, IoU: {iou}")
                '''
                output_voxels = outputs[0].detach().cpu().numpy()
                output_path = os.path.join(output_dir, f'epoch{epoch+1}_model{j+1}.obj')
                img = images.squeeze(1)
                plt.imshow(img[0].permute(1, 2, 0).cpu())
                plt.show()
                plot_voxel(voxels[0].cpu())
                voxel2obj(output_path, output_voxels)
                j+=1'''

        avg_train_loss = total_loss / len(train_dataloader)
        avg_train_accuracy = total_accuracy / len(train_dataloader)
        avg_train_iou = total_iou / len(train_dataloader)
        print(f"Training Results Epoch {epoch+1} - Average Loss: {avg_train_loss}, Average Accuracy: {avg_train_accuracy}, Average IoU: {avg_train_iou}")

        train_losses.append(avg_train_loss)
        train_accuracies.append(avg_train_accuracy)
        train_ious.append(avg_train_iou)

        #avg_val_loss, avg_val_accuracy, avg_val_iou, vet_losses, vet_accuracies, vet_ious= validate(model, val_dataloader)
        avg_val_loss, avg_val_accuracy, avg_val_iou = validate(model, val_dataloader)
        
        scheduler.step()
        
        val_losses.append(avg_val_loss)
        val_accuracies.append(avg_val_accuracy)
        val_ious.append(avg_val_iou)
        
        #all_val_losses.extend(vet_losses)
        #all_val_accuracies.extend(vet_accuracies)
        #all_val_ious.extend(vet_ious)

        # Save results to a file csv
        with open(path+'results.csv', 'a') as f:
            if epoch == 0:
                f.write("Training Results\n")
                f.write("Epoch, Train Loss, Train Accuracy, Train IoU, Val Loss, Val Accuracy, Val IoU\n")
                f.write("---------------------------\n")
            f.write(f"{epoch+1},{avg_train_loss},{avg_train_accuracy},{avg_train_iou},{avg_val_loss},{avg_val_accuracy},{avg_val_iou}\n")

        # Save checkpoint
        if checkpoint_path is not None:
            checkpoint = {
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
            }
            torch.save(checkpoint, checkpoint_path)

        # Early stopping
        if epoch==start_epoch:
            stop, best_loss, previous_loss, counter = early_stopping(avg_val_loss, best_loss, avg_val_loss, patience, counter)
        else:
            stop, best_loss, previous_loss, counter = early_stopping(avg_val_loss, best_loss, previous_loss, patience, counter)

        if stop:
            print("Early stopping triggered.")
            print("Best loss:",best_loss)
            break

    return train_losses, train_accuracies, train_ious, val_losses, val_accuracies, val_ious
    #return train_losses, train_accuracies, train_ious, all_val_losses, all_val_accuracies, all_val_ious


def validate(model, dataloader):
    model.eval()
    total_accuracy = 0
    total_loss = 0
    total_iou = 0
    #val_losses = []
    #val_accuracies = []
    #val_ious = []
    with torch.no_grad():
        for i, (images, voxels) in enumerate(dataloader):
            images, voxels = images.to(device), voxels.to(device)
            outputs = model(images)
            outputs = outputs.squeeze(1)
            loss = voxel_loss(outputs, voxels)
            accuracy = calculate_accuracy(outputs, voxels)
            iou = calculate_iou(outputs, voxels)
            total_loss += loss.item()
            total_accuracy += accuracy
            total_iou += iou
            #val_losses.append(loss.item())
            #val_accuracies.append(accuracy)
            #val_ious.append(iou)
            print(f"Validation Batch {i+1}, Loss: {loss.item()}, Accuracy: {accuracy}, IoU: {iou}")

    avg_loss = total_loss / len(dataloader)
    avg_accuracy = total_accuracy / len(dataloader)
    avg_iou = total_iou / len(dataloader)
    print(f"Validation Results - Average Loss: {avg_loss}, Average Accuracy: {avg_accuracy}, Average IoU: {avg_iou}")
    #return avg_loss, avg_accuracy, avg_iou, val_losses, val_accuracies, val_ious
    return avg_loss, avg_accuracy, avg_iou

def test(model, dataloader):
    model.eval()
    total_accuracy = 0
    total_loss = 0
    total_iou = 0
    print(len(dataloader))
    with torch.no_grad():
        for i, (images, voxels) in enumerate(dataloader):
            images, voxels = images.to(device), voxels.to(device)
            outputs = model(images)
            outputs = outputs.squeeze(1)
            loss = voxel_loss(outputs, voxels)
            accuracy = calculate_accuracy(outputs, voxels)
            iou = calculate_iou(outputs, voxels)
            total_loss += loss.item()
            total_accuracy += accuracy
            total_iou += iou
            print(f"Test Batch {i+1}, Loss: {loss.item()}, Accuracy: {accuracy}, IoU: {iou}")

    avg_loss = total_loss / len(dataloader)
    avg_accuracy = total_accuracy / len(dataloader)
    avg_iou = total_iou / len(dataloader)
    print(f"Testing Results - Average Loss: {avg_loss}, Average Accuracy: {avg_accuracy}, Average IoU: {avg_iou}")
    return avg_loss, avg_accuracy, avg_iou

def plot_graph(train_loss, train_accuracy, val_loss, val_accuracy):
    # Move tensors to CPU and convert to numpy
    train_loss = [t.cpu().numpy() if isinstance(t, torch.Tensor) else t for t in train_loss]
    train_accuracy = [t.cpu().numpy() if isinstance(t, torch.Tensor) else t for t in train_accuracy]
    val_loss = [t.cpu().numpy() if isinstance(t, torch.Tensor) else t for t in val_loss]
    val_accuracy = [t.cpu().numpy() if isinstance(t, torch.Tensor) else t for t in val_accuracy]

    # Plot the metrics
    plt.plot(train_loss, label='Train Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Train Loss')
    plt.show()

    plt.plot(train_accuracy, label='Train Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Train Accuracy')
    plt.show()
    
    plt.plot(val_loss, label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Validation Loss')
    plt.show()
    
    plt.plot(val_accuracy, label='Validation Accuracy')
    plt.xlabel('Epoch')
    plt.ylabel('Validation Accuracy')
    plt.show()
    
    #plt.legend()

**Creation of the model and the datasets**

In [None]:
# Dataset and Dataloader
rendering_path="/kaggle/input/mlprojectrenderings-full/ShapeNetRendering"
voxels_path="/kaggle/input/mlprojectvoxels-full/ShapeNetVox32"
os.chdir(rendering_path)
#print(os.getcwd())
folders=os.listdir()
os.chdir("/kaggle/working")
datasets = []

train_size = 0.70
val_size = 0.15
test_size = 0.15

objects = len(folders)

train_datasets = [0]*objects
val_datasets = [0]*objects
test_datasets = [0]*objects

for i in range(objects):
    #print(folders[i])
    data_dir = rendering_path+"/"+folders[i]#+"/"+folders[i]
    data_dir_vox = voxels_path+"/"+folders[i]#+"/"+folders[i]
    dataset = ShapeNetDataset(data_dir, data_dir_vox)
    train_datasets[i], val_datasets[i], test_datasets[i] = torch.utils.data.random_split(dataset, [train_size, val_size, test_size])
    print("dimensioni oggetto:",folders[i]," train:",len(train_datasets[i])," validation:", len(val_datasets[i])," test:",len(test_datasets[i]))

train_dataset=torch.utils.data.ConcatDataset(train_datasets)
val_dataset=torch.utils.data.ConcatDataset(val_datasets)
test_dataset=torch.utils.data.ConcatDataset(test_datasets)

#print("Dimensioni totali")
#print("Train:",len(train_dataset)," Validation:",len(val_dataset)," Test:",len(test_dataset))

In [None]:
if not os.path.isdir('../working/renderings'):
    os.mkdir('../working/renderings')

output_dir = '../working/renderings'
    
batch_size = 64
lr = 0.0001
epochs = 50
use_lstm = False

train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
#print("Train dataloader: ", len(train_dataloader))
device = torch.device('cuda')  # Use GPU
model = Simple3DReconstructionNet(use_lstm)
model = model.to(device)
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = lr_scheduler.StepLR(optimizer, step_size=1, gamma=1)
#scheduler = lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.1)
checkpoint_path = '../working/renderings/checkpoint.pth'

**Training, validation and testing**

In [None]:
start = timer()

# Train and validate the model
train_losses, train_accuracies, train_ious, val_losses, val_accuracies, val_ious = train(model, device, train_dataloader, val_dataloader, optimizer, scheduler, path="renderings/", epochs=epochs, checkpoint_path=checkpoint_path)
torch.cuda.empty_cache()
print("training duration:", timer() - start)

# Plot the metrics
plot_graph(train_losses, train_accuracies, val_losses, val_accuracies)

# Save the model
torch.save(model, '../working/renderings/model.pth')

# Test the model
start = timer()
test(model, test_dataloader)
print("test duration:", timer() - start)

**Separate test results for each object**

In [None]:
for i in range(objects):
    print(folders[i])
    test_dataloader = DataLoader(test_datasets[i], batch_size=batch_size, shuffle=True, drop_last=True)
    start = timer()
    test(model, test_dataloader)
    print("test duration:", timer() - start)

In [None]:
images = []
model = torch.load('../input/model_test50/pytorch/default/4/model_test_67.pth')
img = Image.open('/kaggle/input/mlprojectrenderings/car/car/1005ca47e516495512da0dbf3c68e847/rendering/00.png').convert('RGB')
img = transforms.ToTensor()(img)
images.append(img)
images = torch.stack(images)
images = images.cuda()
images = torch.unsqueeze(images, 0)
output = model(images)
output = torch.squeeze(output, 0)
print(output.size())
output_path = '/kaggle/working/output.obj'
plot_voxel(output.detach().cpu().numpy())
voxel2obj(output_path, output.detach().cpu().numpy())

In [None]:
#os.chdir("../input/model_test50/pytorch/default/1")
#print(os.listdir())
#os.chdir("../../../../../working")
#print(os.getcwd())
#os.chdir("../input")
print(os.listdir("../input/model_test50/pytorch/default/2"))

In [None]:
print(model)

In [7]:
class SingleImage3DReconstructionNet(nn.Module):
    """
    A simple 3D reconstruction network.

    This network takes in a batch of 2D images and reconstructs a 3D representation of the scene.

    Args:
        image_size (int): The size of the input images.

    Attributes:
        encoder (nn.Sequential): The encoder module that extracts features from the input images.
        fc (nn.Linear): The fully connected layer that maps the encoded features to a feature vector.
        gru (nn.GRU): The recurrent layer that combines the features from different views.
        fc3d (nn.Linear): The fully connected layer that maps the combined features to a 3D representation.
        decoder (nn.Sequential): The decoder module that reconstructs the 3D representation.

    """

    def __init__(self, use_lstm, model, dropout_prob=0.5):
        super(SingleImage3DReconstructionNet, self).__init__()
        self.use_lstm = use_lstm
        self.encoder = model.encoder
        self.fc = nn.Sequential(
            model.fc,
            #nn.Dropout(p=dropout_prob),  # Aggiunta di Dropout
            #nn.BatchNorm1d(model.fc.out_features)  # Aggiunta di Batch Normalization
        )
        self.gru = model.gru
        self.fc3d = nn.Sequential(
            model.fc3d,
            #nn.Dropout(p=dropout_prob),  # Aggiunta di Dropout
        )
        
        model = nn.Sequential(*list(model.decoder.children())[:-2])
        self.decoder = nn.Sequential(
            model,
            nn.Conv3d(16, 8, kernel_size=3, stride=1, padding=1),
            nn.ConvTranspose3d(8, 8, kernel_size=4, stride=2, padding=1),
            nn.Conv3d(8, 1, kernel_size=3, stride=1, padding=1),
            nn.Sigmoid()
        )
         
        
    def forward(self, x):
        """
        Forward pass of the network.

        Args:
            x (torch.Tensor): The input batch of 2D images.

        Returns:
            torch.Tensor: The reconstructed 3D representation.

        """
        batch_size = x.size(0)  # Get the batch size from the input images
        encoded_views = [self.encoder(view) for view in x]
        combined_features = torch.mean(torch.stack(encoded_views), dim=1)
        fc_out = self.fc(combined_features)
        if not self.use_lstm:
            fc_out, _ = self.gru(fc_out.unsqueeze(1))
        else:
            # use LSTM to combine the features
            fc_out, _ = self.lstm(fc_out.unsqueeze(1))
        
        fc_out = self.fc3d(fc_out)
        fc_out = fc_out.view(batch_size, 128, 4, 4, 4)  # Ensure the batch size is preserved
        reconstructed_3d = self.decoder(fc_out)
        return reconstructed_3d.squeeze(1)  # Remove the channel dimension


In [7]:
class SingleImage3DReconstructionNet(nn.Module):
    """
    A simple 3D reconstruction network.

    This network takes in a batch of 2D images and reconstructs a 3D representation of the scene.

    Args:
        image_size (int): The size of the input images.

    Attributes:
        encoder (nn.Sequential): The encoder module that extracts features from the input images.
        fc (nn.Linear): The fully connected layer that maps the encoded features to a feature vector.
        gru (nn.GRU): The recurrent layer that combines the features from different views.
        fc3d (nn.Linear): The fully connected layer that maps the combined features to a 3D representation.
        decoder (nn.Sequential): The decoder module that reconstructs the 3D representation.

    """

    def __init__(self, use_lstm):
        super(SingleImage3DReconstructionNet, self).__init__()
        self.use_lstm = use_lstm
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 32, kernel_size=7, stride=1, padding=3),#137
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, padding=1),#69
            nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1),#69
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),#69
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, padding=1),#35
            nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),#35
            nn.ReLU(),
            nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1),#35
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, padding=1),#18
            nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),#18
            nn.ReLU(),
            nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),#18
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, padding=0),#9
            nn.Flatten()
        ) # given 137x137 images, the output of the encoder is 256 * 18 * 18
        
        #conv_output_size = 256 * (image_size // 8 + 1) * (image_size // 8 + 1)
        
        conv_output_size = 256 * 9 * 9
        self.fc = nn.Linear(conv_output_size, 512) # 2048 is the size of the feature vector 
        
        
        if not use_lstm:
            # a recurrent layer is used to combine the features from different views 
            self.gru = nn.GRU(512, 512, num_layers=4, batch_first=True) # hold a long-term memory of the features
        else:
            # alternative to the GRU layer, we can use a lstm layer
            self.lstm = nn.LSTM(512, 512, num_layers=4, batch_first=True)
        
        self.fc3d = nn.Linear(512, 128 * 4 * 4 * 4)
        self.decoder = nn.Sequential(
            nn.Conv3d(128, 128, kernel_size=3, stride=1, padding=1),#4
            nn.ReLU(),
            nn.Conv3d(128, 64, kernel_size=3, stride=1, padding=1),#4
            nn.ReLU(),
            nn.ConvTranspose3d(64, 64, kernel_size=4, stride=2, padding=1),#8
            nn.Conv3d(64, 64, kernel_size=3, stride=1, padding=1),#8
            nn.ReLU(),
            nn.Conv3d(64, 32, kernel_size=3, stride=1, padding=1),#8
            nn.ReLU(),
            nn.ConvTranspose3d(32, 32, kernel_size=4, stride=2, padding=1),#16
            nn.Conv3d(32, 32, kernel_size=3, stride=1, padding=1),#16
            nn.ReLU(),
            nn.Conv3d(32, 16, kernel_size=3, stride=1, padding=1),#16
            nn.ReLU(),
            nn.ConvTranspose3d(16, 16, kernel_size=4, stride=2, padding=1),#32
            nn.Conv3d(16, 16, kernel_size=3, stride=1, padding=1),#32
            nn.ReLU(),
            nn.Conv3d(16, 8, kernel_size=3, stride=1, padding=1),
            nn.ConvTranspose3d(8, 8, kernel_size=4, stride=2, padding=1),
            nn.Conv3d(8, 1, kernel_size=3, stride=1, padding=1),
            nn.Sigmoid()
        )
    
        
    def forward(self, x):
        """
        Forward pass of the network.

        Args:
            x (torch.Tensor): The input batch of 2D images.

        Returns:
            torch.Tensor: The reconstructed 3D representation.

        """
        batch_size = x.size(0)  # Get the batch size from the input images
        encoded_views = [self.encoder(view) for view in x]
        combined_features = torch.mean(torch.stack(encoded_views), dim=1)
        fc_out = self.fc(combined_features)
        if not self.use_lstm:
            fc_out, _ = self.gru(fc_out.unsqueeze(1))
        else:
            # use LSTM to combine the features
            fc_out, _ = self.lstm(fc_out.unsqueeze(1))
        
        fc_out = self.fc3d(fc_out)
        fc_out = fc_out.view(batch_size, 128, 4, 4, 4)  # Ensure the batch size is preserved
        reconstructed_3d = self.decoder(fc_out)
        return reconstructed_3d.squeeze(1)  # Remove the channel dimension


**Real world dataset**

In [8]:
class RealWorldDataset(Dataset):
    def __init__(self, data_dir, transform):
        self.data_dir = data_dir
        #self.data_dir_vox = data_dir_vox
        self.model_ids = [d for d in os.listdir(data_dir) if os.path.isdir(os.path.join(data_dir, d))]
        self.transform = transform
        
    def __len__(self):
        return len(self.model_ids)

    def __getitem__(self, idx):
        model_id = self.model_ids[idx]
        images = []
        
        idx = idx+1
        image_path = os.path.join(self.data_dir, model_id, model_id+".png")
        image = Image.open(image_path).convert('RGB')
        image = self.transform(image)
        #print(image.size(), image_path)
        #image = transforms.ToTensor()(image)
        #print(torch.max(image))
        images.append(image)
        images = torch.stack(images)
                                  
        voxel_path = os.path.join(self.data_dir, model_id, "model.binvox") 
        voxels = read_binvox(voxel_path)
        voxels = torch.from_numpy(voxels).bool()
                                  
        return images,voxels
            
        '''
        for i in range(self.n_images): # Assuming 24 views per model
            while True:
                ind = np.random.randint(24)
                if ind not in choosen_indxs:
                    choosen_indxs[i]=ind
                    break
            image_path = os.path.join(self.data_dir, model_id, f"rendering/{ind:02d}.png")
            image = Image.open(image_path).convert('RGB')
            image = transforms.ToTensor()(image)
            images.append(image)
        images = torch.stack(images)

        voxel_path = os.path.join(self.data_dir_vox, model_id, "model.binvox")
        voxels = read_binvox(voxel_path)
        voxels = torch.from_numpy(voxels).bool()
        return images, voxels
        '''
    


In [9]:
# Dataset and Dataloader
if not os.path.isdir('../working/realWorld'):
    os.mkdir('../working/realWorld')
    
output_dir = '../working/realWorld'

dir_path="/kaggle/input/mlprojectrealworlddatasetaugmented"
voxels_path="/kaggle/input/mlprojectvoxels"
os.chdir(dir_path)
#print(os.getcwd())
folders=os.listdir()
os.chdir("/kaggle/working")
datasets = []

train_size = 0.80
val_size = 0.10
test_size = 0.10

objects = len(folders)

train_datasets = []
val_datasets = []
test_datasets = []

transform = transforms.Compose([
    transforms.ToTensor(), #le immagini sono tra 0 e 1
    transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])
])

for i in range(objects):
    #print(folders[i])
    if i != 2 and i != 3 and i != 6:
        data_dir = dir_path+"/"+folders[i]
        data_dir_vox = voxels_path+"/"+folders[i]+"/"+folders[i]
        dataset = RealWorldDataset(data_dir, transform)
        train_s, val, test_s = torch.utils.data.random_split(dataset, [train_size, val_size, test_size])
        train_datasets.append(train_s)
        val_datasets.append(val)
        test_datasets.append(test_s)
        print("dimensioni oggetto:",folders[i]," train:",len(train_s)," validation:", len(val)," test:",len(test_s))

train_dataset=torch.utils.data.ConcatDataset(train_datasets)
val_dataset=torch.utils.data.ConcatDataset(val_datasets)
test_dataset=torch.utils.data.ConcatDataset(test_datasets)

#print("Dimensioni totali")
#print("Train:",len(train_dataset)," Validation:",len(val_dataset)," Test:",len(test_dataset))


dimensioni oggetto: sofa  train: 13  validation: 1  test: 1
dimensioni oggetto: bookcase  train: 2  validation: 0  test: 0
dimensioni oggetto: bed  train: 8  validation: 1  test: 1
dimensioni oggetto: chair  train: 16  validation: 2  test: 2




In [9]:
# Dataset and Dataloader
if not os.path.isdir('../working/realWorld'):
    os.mkdir('../working/realWorld')
    
output_dir = '../working/realWorld'

dir_path="/kaggle/input/mlprojectrealworlddatasetreducted"
os.chdir(dir_path)
#print(os.getcwd())
folders=os.listdir()
os.chdir("/kaggle/working")
datasets = []

train_size = 0.80
val_size = 0.10
test_size = 0.10

objects = len(folders)

train_datasets = []
val_datasets = []
test_datasets = []

transform = transforms.Compose([
    transforms.ToTensor(), #le immagini sono tra 0 e 1
    transforms.Normalize([0.5,0.5,0.5], [0.5,0.5,0.5])
])

for i in range(objects):
    #print(folders[i])
    new_path=dir_path+"/"+folders[i]
    os.chdir(new_path)
    models=os.listdir()
    m=len(models)
    tot_tr=0
    tot_v=0
    tot_te=0
    for j in range(m):
        data_dir = new_path+"/"+models[j]
        dataset = RealWorldDataset(data_dir, transform)
        train_s, val, test_s = torch.utils.data.random_split(dataset, [train_size, val_size, test_size])
        train_datasets.append(train_s)
        val_datasets.append(val)
        test_datasets.append(test_s)
        tot_tr=tot_tr+len(train_s)
        tot_v=tot_v+len(val)
        tot_te=tot_te+len(test_s)
        #print("dimensioni modello:",models[j]," train:",len(train_s)," validation:", len(val)," test:",len(test_s))

    print("dimensioni oggetto:",folders[i]," train:",tot_tr," validation:", tot_v," test:",tot_te)

train_dataset=torch.utils.data.ConcatDataset(train_datasets)
val_dataset=torch.utils.data.ConcatDataset(val_datasets)
test_dataset=torch.utils.data.ConcatDataset(test_datasets)

os.chdir("/kaggle/working")

#print("Dimensioni totali")
#print("Train:",len(train_dataset)," Validation:",len(val_dataset)," Test:",len(test_dataset))


dimensioni oggetto: sofa  train: 5779  validation: 720  test: 717
dimensioni oggetto: bookcase  train: 359  validation: 45  test: 44
dimensioni oggetto: wardrobe  train: 621  validation: 78  test: 77
dimensioni oggetto: desk  train: 1435  validation: 179  test: 178
dimensioni oggetto: bed  train: 2643  validation: 329  test: 328
dimensioni oggetto: chair  train: 6437  validation: 804  test: 795
dimensioni oggetto: table  train: 2599  validation: 324  test: 321


**Train and testing**

In [None]:
if not os.path.isdir('../working/renderings'):
    os.mkdir('../working/renderings')
    
batch_size = 32
lr = 0.00005
epochs = 50
use_lstm = False

model=torch.load('../input/model_test50/pytorch/default/7/model_test_67.pth')
#print(model.decoder)

train_dataloader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
val_dataloader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
test_dataloader = DataLoader(test_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
#print("Train dataloader: ", len(train_dataloader))
device = torch.device('cuda')  # Use GPU
model = SingleImage3DReconstructionNet(use_lstm, model)
#model = SingleImage3DReconstructionNet(use_lstm)
#model.load_state_dict(torch.load('../input/model_test50/pytorch/default/2/model_test63.pth'))
model = model.to(device)
optimizer = optim.Adam(model.parameters(), lr=lr)
scheduler = lr_scheduler.StepLR(optimizer, step_size=1, gamma=1)
checkpoint_path = '../working/realWorld/train_checkpoint.pth'

start = timer()

# Train and validate the model
train_losses, train_accuracies, train_ious, val_losses, val_accuracies, val_ious = train(model, device, train_dataloader, val_dataloader, optimizer, scheduler, path="realWorld/", epochs=epochs, checkpoint_path=checkpoint_path, patience=10)
torch.cuda.empty_cache()

print("training duration:", timer() - start)

# Plot the metrics
plot_graph(train_losses, train_accuracies, val_losses, val_accuracies)

# Save the model
torch.save(model, '../working/realWorld/model.pth')

# Test the model
start = timer()
test(model, test_dataloader)
print("test duration:", timer() - start)

Epoch 1/50, Loss: 0.7728444337844849, Accuracy: 0.23239171504974365, IoU: 0.08427158743143082
Epoch 1/50, Loss: 0.654724657535553, Accuracy: 0.08288311958312988, IoU: 0.0803380087018013
Epoch 1/50, Loss: 0.6478564739227295, Accuracy: 0.054351091384887695, IoU: 0.052820559591054916
Epoch 1/50, Loss: 0.6244069337844849, Accuracy: 0.19421446323394775, IoU: 0.09623806178569794
Epoch 1/50, Loss: 0.48165085911750793, Accuracy: 0.6355829238891602, IoU: 0.061135560274124146
Epoch 1/50, Loss: 0.34334778785705566, Accuracy: 0.864176869392395, IoU: 0.059661321341991425
Epoch 1/50, Loss: 0.2623664140701294, Accuracy: 0.8817324638366699, IoU: 0.07502368092536926
Epoch 1/50, Loss: 0.24201655387878418, Accuracy: 0.9083254337310791, IoU: 0.04712684080004692
Epoch 1/50, Loss: 0.2718891501426697, Accuracy: 0.8994936943054199, IoU: 0.038160622119903564
Epoch 1/50, Loss: 0.23382975161075592, Accuracy: 0.9147593975067139, IoU: 0.023175755515694618
Epoch 1/50, Loss: 0.2064545452594757, Accuracy: 0.930245876

In [None]:
for i in range(objects):
    print(folders[i])
    test_dataloader = DataLoader(test_datasets[i], batch_size=2, shuffle=True, drop_last=True)
    start = timer()
    test(model, test_dataloader)
    print("test duration:", timer() - start)

### Metodo di Valutazione

Per valutare la tua rete, puoi adottare i seguenti metodi di valutazione basati su quelli del paper "3D-R2N2":

1. **Intersection-over-Union (IoU)**:
   - L'IoU misura la sovrapposizione tra il modello 3D predetto e il ground truth. È un buon indicatore della qualità della ricostruzione.
   - Formula:$$\text{IoU} = \frac{\text{Intersezione}}{\text{Unione}}$$
     
   - Puoi calcolare l'IoU voxel-wise tra la predizione e il modello ground truth.

2. **Voxel Accuracy**:
   - La voxel accuracy misura la percentuale di voxel correttamente predetti (sia come pieni che come vuoti).
   - Formula:$$\text{Accuracy} = \frac{\text{Voxels corretti}}{\text{Totale voxels}}$$

In [None]:
from IPython.display import FileLink 

FileLink(r'model.pth')

In [None]:
rm -r ../working/*

In [None]:
rm -r ../working/renderings/*

In [14]:
rm -r ../working/realWorld/*