Fine-Tuning Vgg16 For Depth Estimation

Introduction

This repository contains a set of python scripts to fine-tune a vgg16 model in order to do real-time depth estimation task

Network Architecture

I've added a 1*1 conv in order to reduce the number of channels of the last conv layer from 512 to 128.This reduces the model size in order to make it fit in my poor GPU [2GB].

Note: I think implementing a FC-Layers is an improper approach to do this task instead i am currently training a model using a simple Up-Conv technique , I've got that feeling after several failed training sessions and the FC-layers are discriminative by its' nature.

I've added a Scale-Invarient Loss because i think learning relative depth estimation is much easier.

Dataset

I've used the NYU Depth V2 dataset.

Training

For the FC Implementation : I am working on it, but currently i am stucked at 0.15 RMSE on training data and 0.45 RMSE on validation data.

For the Up-Conv Implementation : I've reached a 0.109 RMSE on Training data and a 0.165 RMSE on Validation data.

Output

Note : the provided output samples are predicted by the Up-Conv implementation

Conclusion

I've used only 1449 image-depthmap pairs during this fine-tuning process, I think getting more data will help me to significantly improve my results.

Some tricks as the 1*1 conv can make life easier and training faster while preserving most of the model's capcity

Using Upsampling to upsize the last feature map makes it sparse "Around 75% of the weights are zeros" , I think using this Up-Projection block to Up-Sample the last feature map will help.

Note : This block was introduced into "Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions," NIPS 2016.

Note : I've already implemented this paper. "DeeperDepthEstimation,"

Future Work

I will collect more NYU-Depth V2 data by extracting them from the provided raw data.
I will train my model on AWS to get a chance to train on a powerful resources.
I will try to predict larger depth map.
I will add the gradient at the x and y directions in order to reduce sudden changes in depth , This idea was introduced into "Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture*," NIPS 2015.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.idea		.idea
data/nyu_datasets		data/nyu_datasets
output		output
Arch.png		Arch.png
DepthLoss.py		DepthLoss.py
HelperAPI.py		HelperAPI.py
README.md		README.md
Utills.py		Utills.py
data_preprocessing.py		data_preprocessing.py
featuresextration.py		featuresextration.py
loss.png		loss.png
output.png		output.png
sub_train.csv		sub_train.csv
test.csv		test.csv
train.csv		train.csv
train.py		train.py
up_projection.png		up_projection.png
vgg16.py		vgg16.py

MahmoudSelmy/DepthEstimationVGG

Folders and files

Latest commit

History

Repository files navigation

Fine-Tuning Vgg16 For Depth Estimation

Introduction

Network Architecture

Dataset

Training

Output

Conclusion

Future Work

About

Resources

Stars

Watchers

Forks

Languages