Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory and speed #4

Closed
mowangmodi opened this issue Aug 13, 2023 · 6 comments
Closed

GPU memory and speed #4

mowangmodi opened this issue Aug 13, 2023 · 6 comments

Comments

@mowangmodi
Copy link

How much GPU memory is required for training and how much time is spent

@chenhsuanlin
Copy link
Contributor

With the default setup, it takes ~24GB GPU memory and ~16 hours to train 500k iterations on A100 GPUs.

There are many hyperparameters that can affect performance (both GPU memory and time). The most significant one is probably the hash table size (model.object.sdf.encoding.hashgrid.dict_size), whose default value is 22 (corresponding to a dictionary size of 2^22). As a reference,

  • if we set model.object.sdf.encoding.hashgrid.dict_size=20 instead (i.e. 1/4 the dictionary size), then it would consume ~16GB GPU memory and take ~10.5 hours to train (on A100);
  • if we further set data.train.batch_size=1, then the speed would improve to ~8 hours.

These would likely come with the cost of slightly lower quality in the results though.

@mli0603
Copy link
Collaborator

mli0603 commented Aug 13, 2023

Hi @mowangmodi

Thank you for your interest in the project.

As @chenhsuanlin mentioned, both dictionary size and batch size will affect the training requirements.

Another pointer to reduce memory usage is the model.object.sdf.encoding.hashgrid.dim, which is defaulted to 8. Lowering this will also improve the speed and reduce the memory. I hope this helps :)

@chenhsuanlin
Copy link
Contributor

Closing due to inactivity, please feel free to reopen if there are further issues!

@chenhsuanlin chenhsuanlin added question Further information is requested and removed question Further information is requested labels Aug 18, 2023
@mli0603
Copy link
Collaborator

mli0603 commented Aug 18, 2023

Please see the FAQ section on how to adjust the hyperparameters

@mowangmodi
Copy link
Author

With the default setup, it takes ~24GB GPU memory and ~16 hours to train 500k iterations on A100 GPUs.

There are many hyperparameters that can affect performance (both GPU memory and time). The most significant one is probably the hash table size (model.object.sdf.encoding.hashgrid.dict_size), whose default value is 22 (corresponding to a dictionary size of 2^22). As a reference,

  • if we set model.object.sdf.encoding.hashgrid.dict_size=20 instead (i.e. 1/4 the dictionary size), then it would consume ~16GB GPU memory and take ~10.5 hours to train (on A100);
  • if we further set data.train.batch_size=1, then the speed would improve to ~8 hours.

These would likely come with the cost of slightly lower quality in the results though.
For it takes24GB GPU memory and16 hours to train 500k iterations on A100 GPU, how many pieces of A100 did you use? I used two 3090 ,Iteration 50k takes 14 hours,A total of 36GB of memory was spent.
Here are my parameters:
checkpoint:
save_epoch: 9999999999
save_iter: 20000
save_latest_iter: 9999999999
save_period: 9999999999
strict_resume: true
cudnn:
benchmark: true
deterministic: false
data:
name: dummy
num_images: null
num_workers: 4
preload: true
readjust:
center:
- 0.0
- 0.0
- 0.0
scale: 1.0
root: data_DTU/dtu_scan24
train:
batch_size: 1
image_size:
- 1200
- 1600
subset: null
type: projects.neuralangelo.data
use_multi_epoch_loader: true
val:
batch_size: 1
image_size:
- 300
- 400
max_viz_samples: 16
subset: 1
image_save_iter: 9999999999
inference_args: {}
local_rank: 0
logdir: logs/example_group_dtu24/dtu_24
logging_iter: 9999999999999
max_epoch: 9999999999
max_iter: 500000
metrics_epoch: null
metrics_iter: null
model:
appear_embed:
dim: 8
enabled: false
background:
enabled: true
encoding:
levels: 10
type: fourier
encoding_view:
levels: 3
type: spherical
mlp:
activ: relu
activ_density: softplus
activ_density_params: {}
activ_params: {}
hidden_dim: 256
hidden_dim_rgb: 128
num_layers: 8
num_layers_rgb: 2
skip:
- 4
skip_rgb: []
view_dep: true
white: false
object:
rgb:
encoding_view:
levels: 3
type: spherical
mlp:
activ: relu_
activ_params: {}
hidden_dim: 256
num_layers: 4
skip: []
weight_norm: true
mode: idr
s_var:
anneal_end: 0.1
init_val: 1.4
sdf:
encoding:
coarse2fine:
enabled: true
init_active_level: 4
step: 5000
hashgrid:
dict_size: 20
dim: 2
max_logres: 11
min_logres: 5
range:
- -2
- 2
levels: 16
type: hashgrid
gradient:
mode: numerical
taps: 4
mlp:
activ: softplus
activ_params:
beta: 100
geometric_init: true
hidden_dim: 256
inside_out: false
num_layers: 2
out_bias: 0.5
skip: []
weight_norm: true
render:
num_sample_hierarchy: 4
num_samples:
background: 32
coarse: 64
fine: 16
rand_rays: 512
stratified: true
type: projects.neuralangelo.model
nvtx_profile: false
optim:
fused_opt: false
params:
lr: 0.001
weight_decay: 0.001
sched:
gamma: 10.0
iteration_mode: true
step_size: 9999999999
two_steps:
- 300000
- 400000
type: two_steps_with_warmup
warm_up_end: 5000
type: AdamW
pretrained_weight: null
source_filename: projects/neuralangelo/configs/dtu.yaml
speed_benchmark: false
test_data:
name: dummy
num_workers: 0
test:
batch_size: 1
is_lmdb: false
roots: null
type: imaginaire.datasets.images
timeout_period: 9999999
trainer:
amp_config:
backoff_factor: 0.5
enabled: false
growth_factor: 2.0
growth_interval: 2000
init_scale: 65536.0
ddp_config:
find_unused_parameters: false
static_graph: true
depth_vis_scale: 0.5
ema_config:
beta: 0.9999
enabled: false
load_ema_checkpoint: false
start_iteration: 0
grad_accum_iter: 1
image_to_tensorboard: false
init:
gain: null
type: none
loss_weight:
curvature: 0.0005
eikonal: 0.1
render: 1.0
type: projects.neuralangelo.trainer
validation_iter: 5000
wandb_image_iter: 10000
wandb_scalar_iter: 100

@prettybot
Copy link

prettybot commented Aug 25, 2023

With the default setup, it takes ~24GB GPU memory and ~16 hours to train 500k iterations on A100 GPUs.

@chenhsuanlin Thanks for your effort on this great project first. I have some questions.
How many A100 GPUs did you use for training?
I just tried with single A100 40GB GPU, by training the colmap data from lego.ma4. Each epoch needs to consume up to 5 seconds.(I updated the dict_size param from 22 to be 21.)
image

According to your conclusion, each epoch should take:
163600/5001000=0.1152 second

As you can see, it's a really big difference.

Can you help to explain it? I think the training speed is the key point whether this paper can be used for production.


#57 (comment)
My mistake. 500k is for iteration instead of epoch. Sorry for that. Plz ignore this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants