GPU memory and speed #4

mowangmodi · 2023-08-13T02:22:58Z

How much GPU memory is required for training and how much time is spent

chenhsuanlin · 2023-08-13T10:43:20Z

With the default setup, it takes ~24GB GPU memory and ~16 hours to train 500k iterations on A100 GPUs.

There are many hyperparameters that can affect performance (both GPU memory and time). The most significant one is probably the hash table size (model.object.sdf.encoding.hashgrid.dict_size), whose default value is 22 (corresponding to a dictionary size of 2^22). As a reference,

if we set model.object.sdf.encoding.hashgrid.dict_size=20 instead (i.e. 1/4 the dictionary size), then it would consume ~16GB GPU memory and take ~10.5 hours to train (on A100);
if we further set data.train.batch_size=1, then the speed would improve to ~8 hours.

These would likely come with the cost of slightly lower quality in the results though.

mli0603 · 2023-08-13T17:41:53Z

Hi @mowangmodi

Thank you for your interest in the project.

As @chenhsuanlin mentioned, both dictionary size and batch size will affect the training requirements.

Another pointer to reduce memory usage is the model.object.sdf.encoding.hashgrid.dim, which is defaulted to 8. Lowering this will also improve the speed and reduce the memory. I hope this helps :)

chenhsuanlin · 2023-08-18T11:37:53Z

Closing due to inactivity, please feel free to reopen if there are further issues!

mli0603 · 2023-08-18T21:39:12Z

Please see the FAQ section on how to adjust the hyperparameters

mowangmodi · 2023-08-19T02:49:37Z

With the default setup, it takes ~24GB GPU memory and ~16 hours to train 500k iterations on A100 GPUs.

There are many hyperparameters that can affect performance (both GPU memory and time). The most significant one is probably the hash table size (model.object.sdf.encoding.hashgrid.dict_size), whose default value is 22 (corresponding to a dictionary size of 2^22). As a reference,

if we set model.object.sdf.encoding.hashgrid.dict_size=20 instead (i.e. 1/4 the dictionary size), then it would consume ~16GB GPU memory and take ~10.5 hours to train (on A100);

if we further set data.train.batch_size=1, then the speed would improve to ~8 hours.

These would likely come with the cost of slightly lower quality in the results though.
For it takes~~24GB GPU memory and~~16 hours to train 500k iterations on A100 GPU, how many pieces of A100 did you use？ I used two 3090 ，Iteration 50k takes 14 hours,A total of 36GB of memory was spent.
Here are my parameters:
checkpoint:
save_epoch: 9999999999
save_iter: 20000
save_latest_iter: 9999999999
save_period: 9999999999
strict_resume: true
cudnn:
benchmark: true
deterministic: false
data:
name: dummy
num_images: null
num_workers: 4
preload: true
readjust:
center:
- 0.0
- 0.0
- 0.0
scale: 1.0
root: data_DTU/dtu_scan24
train:
batch_size: 1
image_size:
- 1200
- 1600
subset: null
type: projects.neuralangelo.data
use_multi_epoch_loader: true
val:
batch_size: 1
image_size:
- 300
- 400
max_viz_samples: 16
subset: 1
image_save_iter: 9999999999
inference_args: {}
local_rank: 0
logdir: logs/example_group_dtu24/dtu_24
logging_iter: 9999999999999
max_epoch: 9999999999
max_iter: 500000
metrics_epoch: null
metrics_iter: null
model:
appear_embed:
dim: 8
enabled: false
background:
enabled: true
encoding:
levels: 10
type: fourier
encoding_view:
levels: 3
type: spherical
mlp:
activ: relu
activ_density: softplus
activ_density_params: {}
activ_params: {}
hidden_dim: 256
hidden_dim_rgb: 128
num_layers: 8
num_layers_rgb: 2
skip:
- 4
skip_rgb: []
view_dep: true
white: false
object:
rgb:
encoding_view:
levels: 3
type: spherical
mlp:
activ: relu_
activ_params: {}
hidden_dim: 256
num_layers: 4
skip: []
weight_norm: true
mode: idr
s_var:
anneal_end: 0.1
init_val: 1.4
sdf:
encoding:
coarse2fine:
enabled: true
init_active_level: 4
step: 5000
hashgrid:
dict_size: 20
dim: 2
max_logres: 11
min_logres: 5
range:
- -2
- 2
levels: 16
type: hashgrid
gradient:
mode: numerical
taps: 4
mlp:
activ: softplus
activ_params:
beta: 100
geometric_init: true
hidden_dim: 256
inside_out: false
num_layers: 2
out_bias: 0.5
skip: []
weight_norm: true
render:
num_sample_hierarchy: 4
num_samples:
background: 32
coarse: 64
fine: 16
rand_rays: 512
stratified: true
type: projects.neuralangelo.model
nvtx_profile: false
optim:
fused_opt: false
params:
lr: 0.001
weight_decay: 0.001
sched:
gamma: 10.0
iteration_mode: true
step_size: 9999999999
two_steps:
- 300000
- 400000
type: two_steps_with_warmup
warm_up_end: 5000
type: AdamW
pretrained_weight: null
source_filename: projects/neuralangelo/configs/dtu.yaml
speed_benchmark: false
test_data:
name: dummy
num_workers: 0
test:
batch_size: 1
is_lmdb: false
roots: null
type: imaginaire.datasets.images
timeout_period: 9999999
trainer:
amp_config:
backoff_factor: 0.5
enabled: false
growth_factor: 2.0
growth_interval: 2000
init_scale: 65536.0
ddp_config:
find_unused_parameters: false
static_graph: true
depth_vis_scale: 0.5
ema_config:
beta: 0.9999
enabled: false
load_ema_checkpoint: false
start_iteration: 0
grad_accum_iter: 1
image_to_tensorboard: false
init:
gain: null
type: none
loss_weight:
curvature: 0.0005
eikonal: 0.1
render: 1.0
type: projects.neuralangelo.trainer
validation_iter: 5000
wandb_image_iter: 10000
wandb_scalar_iter: 100

prettybot · 2023-08-25T16:34:15Z

With the default setup, it takes ~24GB GPU memory and ~16 hours to train 500k iterations on A100 GPUs.

@chenhsuanlin Thanks for your effort on this great project first. I have some questions.
How many A100 GPUs did you use for training?
I just tried with single A100 40GB GPU, by training the colmap data from lego.ma4. Each epoch needs to consume up to 5 seconds.（I updated the dict_size param from 22 to be 21.）

~~According to your conclusion, each epoch should take:~~
~~163600/5001000=0.1152 second~~

~~As you can see, it's a really big difference.~~

~~Can you help to explain it? I think the training speed is the key point whether this paper can be used for production.~~

#57 (comment)
My mistake. 500k is for iteration instead of epoch. Sorry for that. Plz ignore this comment.

chenhsuanlin mentioned this issue Aug 13, 2023

CUDA out of memory, google colab T4, 15G VRAM on toy example #7

Closed

chenhsuanlin mentioned this issue Aug 14, 2023

ValueError when extracting mesh #13

Closed

abhishekmonogram mentioned this issue Aug 15, 2023

Hyperparameters for High Quality object reconstruction #19

Closed

chenhsuanlin closed this as completed Aug 18, 2023

chenhsuanlin added question Further information is requested and removed question Further information is requested labels Aug 18, 2023

This was referenced Aug 22, 2023

Problem with Docker Container #38

Closed

I run neuralangelo on pure Ubuntu 22.04 LTS. This is normal ? #73

Closed

Quirlight mentioned this issue Apr 25, 2024

Correlation of training speed influencing hyperparameters #201

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory and speed #4

GPU memory and speed #4

mowangmodi commented Aug 13, 2023

chenhsuanlin commented Aug 13, 2023

mli0603 commented Aug 13, 2023 •

edited

Loading

chenhsuanlin commented Aug 18, 2023

mli0603 commented Aug 18, 2023

mowangmodi commented Aug 19, 2023

prettybot commented Aug 25, 2023 •

edited

Loading

GPU memory and speed #4

GPU memory and speed #4

Comments

mowangmodi commented Aug 13, 2023

chenhsuanlin commented Aug 13, 2023

mli0603 commented Aug 13, 2023 • edited Loading

chenhsuanlin commented Aug 18, 2023

mli0603 commented Aug 18, 2023

mowangmodi commented Aug 19, 2023

prettybot commented Aug 25, 2023 • edited Loading

mli0603 commented Aug 13, 2023 •

edited

Loading

prettybot commented Aug 25, 2023 •

edited

Loading