# ANÁLISIS DE REDES NEURONALES PREENTRENADAS PARA LA ESTIMACIÓN DE PROFUNDIDAD

Se pretende realizar el análisis de una serie de redes neuronales preentrenadas para la estimación de profundidad que son: 
- MonoDepth2
- MiDAS
- SC-DepthV2
- DenseDepth

Para poder comprobar su precisión se han sacado unas imágenes de test con una cámara Interl RealSense tanto en modo monocular como en RGBD que se han guardado en sus respectivas carpetas ./monocular_photos y ./rgbd_photos. Las imágenes monoculares servirán como entrada para las redes y las rgbd se emplearán como ground truth y así obtener la precisión de los mapas de profundidad de la salida de las redes.

A mayores también se pueden hacer pruebas con los datasets de repositorios públicos como: 

- https://theairlab.org/tartanair-dataset/
- https://cvg.cit.tum.de/data/datasets/rgbd-dataset/download
- https://projects.asl.ethz.ch/datasets/doku.php?id=kmavvisualinertialdatasets
- https://www.cvlibs.net/datasets/kitti/

Sin embargo, el problema es que en este caso puede que alguna de las redes haya sido entrenada con las imágenes de alguno o varios de esos repositorios, y por lo tanto el test no será fiable. Deberíamos de estar seguros que las imágenes nunca hayan sido vistas por las redes. 


## MonoDepth2

https://github.com/nianticlabs/monodepth2

En concreto el modelo mono_640x192 está entrenado solo con imágenes monoculares, sin información estéreo. <br>
Por otro lado, el modelo mono+stereo_640x192 está entrenado usando tanto imágenes monoculares como pares estéreo durante el entrenamiento. <br>
Aunque solo utiliza monoculares para la predicción, el entrenamiento con datos estéreo puede ayudar a mejorar la precisión. <br><br>

Por lo tanto, el mejor modelo y el escogido para mi caso es el mono+stereo_640x192.

1. Descarga del modelo preentrenado.

In [3]:
!python3 ./monodepth2-master/test_simple.py --image_path monocular_photos --model_name mono+stereo_640x192 --ext jpg

-> Downloading pretrained model to models/mono+stereo_640x192.zip
   Unzipping model...
   Model unzipped to models/mono+stereo_640x192
-> Loading model from  models/mono+stereo_640x192
   Loading pretrained encoder
   Loading pretrained decoder
-> Predicting on 41 test images
  return F.conv2d(input, weight, bias, self.stride,
   Processed 1 of 41 images - saved predictions to:
   - monocular_photos/color-20240503-111615_disp.jpeg
   - monocular_photos/color-20240503-111615_disp.npy
   Processed 2 of 41 images - saved predictions to:
   - monocular_photos/color-20240503-111633_disp.jpeg
   - monocular_photos/color-20240503-111633_disp.npy
   Processed 3 of 41 images - saved predictions to:
   - monocular_photos/color-20240503-111916_disp.jpeg
   - monocular_photos/color-20240503-111916_disp.npy
   Processed 4 of 41 images - saved predictions to:
   - monocular_photos/color-20240503-111651_disp.jpeg
   - monocular_photos/color-20240503-111651_disp.npy
   Processed 5 of 41 images - save

In [66]:
import os
import torch
from torchvision import transforms, datasets
from PIL import Image

In [67]:
# Configuración de la carpeta de imágenes y resultados
input_folder = './monocular_photos'
output_folder = './network_depth_maps'
model_name = 'mono+stereo_640x192'
if not os.path.exists(output_folder):
    os.makedirs(output_folder)


In [68]:
if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

print(device)

cuda


In [69]:
def download_model_if_doesnt_exist(model_name):
    """If pretrained kitti model doesn't exist, download and unzip it
    """
    # values are tuples of (<google cloud URL>, <md5 checksum>)
    download_paths = {
        "mono_640x192":
            ("https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_640x192.zip",
             "a964b8356e08a02d009609d9e3928f7c"),
        "stereo_640x192":
            ("https://storage.googleapis.com/niantic-lon-static/research/monodepth2/stereo_640x192.zip",
             "3dfb76bcff0786e4ec07ac00f658dd07"),
        "mono+stereo_640x192":
            ("https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono%2Bstereo_640x192.zip",
             "c024d69012485ed05d7eaa9617a96b81"),
        "mono_no_pt_640x192":
            ("https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_no_pt_640x192.zip",
             "9c2f071e35027c895a4728358ffc913a"),
        "stereo_no_pt_640x192":
            ("https://storage.googleapis.com/niantic-lon-static/research/monodepth2/stereo_no_pt_640x192.zip",
             "41ec2de112905f85541ac33a854742d1"),
        "mono+stereo_no_pt_640x192":
            ("https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono%2Bstereo_no_pt_640x192.zip",
             "46c3b824f541d143a45c37df65fbab0a"),
        "mono_1024x320":
            ("https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono_1024x320.zip",
             "0ab0766efdfeea89a0d9ea8ba90e1e63"),
        "stereo_1024x320":
            ("https://storage.googleapis.com/niantic-lon-static/research/monodepth2/stereo_1024x320.zip",
             "afc2f2126d70cf3fdf26b550898b501a"),
        "mono+stereo_1024x320":
            ("https://storage.googleapis.com/niantic-lon-static/research/monodepth2/mono%2Bstereo_1024x320.zip",
             "cdc5fc9b23513c07d5b19235d9ef08f7"),
        }

    if not os.path.exists("./monodept2/models"):
        os.makedirs("models")

    model_path = os.path.join("./monodept2/models", model_name)

In [70]:
from monodepth2 import networks

# Cargar el modelo
# download_model_if_doesnt_exist(model_name)
model_path = os.path.join('./monodepth2/models',model_name)
encoder_path = os.path.join(model_path,'encoder.pth')
depth_decoder_path = os.path.join(model_path, 'depth.pth')

encoder = networks.ResnetEncoder(18, False)
depth_decoder = networks.DepthDecoder(num_ch_enc=encoder.num_ch_enc, scales=range(4))

loaded_dict_enc = torch.load(encoder_path, map_location=device)    
feed_height = loaded_dict_enc['height']
feed_width = loaded_dict_enc['width']
print("Altura:", feed_height, "Anchura:", feed_width) #extraccion de las dimensiones de las imagenes con las que fue entrenado el modelo
filtered_dict_enc = {k: v for k, v in loaded_dict_enc.items() if k in encoder.state_dict()}
encoder.load_state_dict(filtered_dict_enc)
encoder.to(device)
encoder.eval()

depth_decoder.load_state_dict(torch.load(depth_decoder_path, map_location=device))
depth_decoder.to(device)
depth_decoder.eval()

RuntimeError: CUDA error: unspecified launch failure
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


In [54]:
import glob

if os.path.isdir(input_folder):
    # Searching folder for images
    paths = glob.glob(os.path.join(input_folder, '*.{}'.format('jpg')))
    output_directory = input_folder
else:
    raise Exception("Can not find args.image_path: {}".format(input_folder))

print("-> Predicting on {:d} test images".format(len(paths)))

-> Predicting on 41 test images


In [59]:
from monodepth2.layers import disp_to_depth
from monodepth2.evaluate_depth import STEREO_SCALE_FACTOR
# PREDICTING ON EACH IMAGE IN TURN
with torch.no_grad():
    for idx, image_path in enumerate(paths):

        if image_path.endswith("_disp.jpg"):
            # don't try to predict disparity for a disparity image!
            continue

        # Load image and preprocess
        input_image = Image.open(image_path).convert('RGB')
        original_width, original_height = input_image.size
        input_image = input_image.resize((feed_width, feed_height), Image.LANCZOS)
        input_image = transforms.ToTensor()(input_image).unsqueeze(0)

        # PREDICTION
        input_image = input_image.to(device)
        features = encoder(input_image)
        outputs = depth_decoder(features)

        disp = outputs[("disp", 0)]
        disp_resized = torch.nn.functional.interpolate(
            disp, (original_height, original_width), mode="bilinear", align_corners=False)

        # Saving numpy file
        output_name = os.path.splitext(os.path.basename(image_path))[0]
        scaled_disp, depth = disp_to_depth(disp, 0.1, 100)

        name_dest_npy = os.path.join(output_folder, "{}_depth.npy".format(output_name))
        metric_depth = STEREO_SCALE_FACTOR * depth.device().numpy()
        np.save(name_dest_npy, metric_depth)

        # Saving colormapped depth image
        disp_resized_np = disp_resized.squeeze().cpu().numpy()
        vmax = np.percentile(disp_resized_np, 95)
        normalizer = mpl.colors.Normalize(vmin=disp_resized_np.min(), vmax=vmax)
        mapper = cm.ScalarMappable(norm=normalizer, cmap='magma')
        colormapped_im = (mapper.to_rgba(disp_resized_np)[:, :, :3] * 255).astype(np.uint8)
        im = Image.fromarray(colormapped_im)

        name_dest_im = os.path.join(output_directory, "{}_disp.jpeg".format(output_name))
        im.save(name_dest_im)

        print("   Processed {:d} of {:d} images - saved predictions to:".format(
            idx + 1, len(paths)))
        print("   - {}".format(name_dest_im))
        print("   - {}".format(name_dest_npy))

print('-> Done!')

ModuleNotFoundError: No module named 'layers'

In [None]:
import os
import torch
from torchvision import transforms
from PIL import Image
import networks

# Configuración de la carpeta de imágenes y resultados
input_folder = './monocular_photos'
output_folder = './depth_maps'
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# Cargar el modelo
model_path = 'models/mono+stereo_640x192'
encoder_path = os.path.join(model_path, 'encoder.pth')
depth_decoder_path = os.path.join(model_path, 'depth.pth')

encoder = networks.ResnetEncoder(18, False)
depth_decoder = networks.DepthDecoder(num_ch_enc=encoder.num_ch_enc, scales=range(4))

loaded_dict_enc = torch.load(encoder_path, map_location='cpu')
filtered_dict_enc = {k: v for k, v in loaded_dict_enc.items() if k in encoder.state_dict()}
encoder.load_state_dict(filtered_dict_enc)

depth_decoder.load_state_dict(torch.load(depth_decoder_path, map_location='cpu'))

encoder.eval()
depth_decoder.eval()

# Procesar imágenes
with torch.no_grad():
    for image_file in os.listdir(input_folder):
        if image_file.endswith((".png", ".jpg", ".jpeg")):
            input_image = Image.open(os.path.join(input_folder, image_file)).convert('RGB')
            original_width, original_height = input_image.size
            input_image = input_image.resize((640, 192), Image.LANCZOS)
            input_image = transforms.ToTensor()(input_image).unsqueeze(0)

            # Predicción de profundidad
            features = encoder(input_image)
            outputs = depth_decoder(features)

            disp = outputs[("disp", 0)]
            disp_resized = torch.nn.functional.interpolate(
                disp, (original_height, original_width), mode="bilinear", align_corners=False)
            disp_resized_np = disp_resized.squeeze().cpu().numpy()

            # Guardar el resultado
            depth_filename = os.path.join(output_folder, os.path.splitext(image_file)[0] + '_depth.png')
            Image.fromarray((disp_resized_np * 255).astype(np.uint8)).save(depth_filename)

print("Procesamiento completado.")
