Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for nerfstudio #98

Merged
merged 2 commits into from
Feb 2, 2024

Conversation

Jing1Ling
Copy link
Contributor

  1. According to the template provided by nerfstudio, several related files have been added.
    • 'zipnerf_config.py': parameters configuration.
    • 'zipnerf_model.py': use a model wrapper to reuse the Model class in 'internal/models.py'.

  2. Replace 'cam_dirs' with 'directions' as dicussed in this issue.
    You can use some tools provided by nerfstudio (e.g. viewer) with this patch. Except for the modification of cast_ray(), the original content will not be affected. This is also because camera directions are not provided in Nerfstudio's input data which named RayBundle.

@Jing1Ling
Copy link
Contributor Author

I simply tested the difference between ’cam_dirs‘ and ’directions‘ on the garden scene. Each method trained two models based on nerfstudio, and the psnr difference of the validation set was within 0.1.

@SuLvXiangXin
Copy link
Owner

  1. I'm not familiar with nerfstudio, but it seems that we need to perform additional install. Can you give detailed instruction in readme?
  2. I find that using the latest version of nerfstudio, can cause some error by running ns-train zipnerf --data /SSD_DISK/datasets/360_v2/bicycle/, leading to AssertionError: Colmap path /SSD_DISK/datasets/360_v2/bicycle/colmap/sparse/0 does not exist., and maybe additional argument need to add for that?

@Jing1Ling
Copy link
Contributor Author

  1. Done! Sorry for missing that.
  2. This is because the default colmap path of nerfstudio is 'colmap/sparse/0'. I changed it to 'sparse/0' through config file. It can also be achieved through 'ns-train zipnerf --data xxx colmap --colmap-path sparse/0'.

@SuLvXiangXin
Copy link
Owner

Hi,when I start training, it appears to be index out of bounds, which comes from here. I find that the ray_indices[:,1].max()==image_height, which is wrong, it should be image_height-1, so does the width. I'm not sure how to fix this.

@Jing1Ling
Copy link
Contributor Author

Jing1Ling commented Feb 1, 2024

I mentioned this situation in README.md:

*Nerfstudio's ColmapDataParser rounds down the image size when downscaling, which is different from the 360_v2 dataset.You can use nerfstudio to reprocess the data or modify the code logic for downscale in the library as dicussed in nerfstudio-project/nerfstudio#1438.

Fastest Solution
change the two line here to:

self.height = torch.floor(0.5 + (self.height * scaling_factor)).to(torch.int64)
self.width = torch.floor(0.5 + (self.width * scaling_factor)).to(torch.int64)

@SuLvXiangXin SuLvXiangXin merged commit 1768cdb into SuLvXiangXin:main Feb 2, 2024
@Pioneer6gun9
Copy link

Excuse me, I followed the above method, but I ran into an unsolvable problem when executing ns-train zipnerf --data bicycle colmap --colmap-path sparse/0
RuntimeError: CUDA error: device-side assert triggered
Is there any good way to solve it

@Jing1Ling
Copy link
Contributor Author

Excuse me, I followed the above method, but I ran into an unsolvable problem when executing ns-train zipnerf --data bicycle colmap --colmap-path sparse/0 RuntimeError: CUDA error: device-side assert triggered Is there any good way to solve it

Hi @Pioneer6gun9! Someone reminded me that the rounding strategy of mipnerf360 is not ceil but round. I've updated the code above. Btw, I've submit a pull request for nerfstudio for this issue.
I'm not sure if this is the reason, feel free to contact me if you still have any questions.

@Pioneer6gun9

This comment was marked as resolved.

@unanan
Copy link

unanan commented Apr 12, 2024

I mentioned this situation in README.md:

*Nerfstudio's ColmapDataParser rounds down the image size when downscaling, which is different from the 360_v2 dataset.You can use nerfstudio to reprocess the data or modify the code logic for downscale in the library as dicussed in nerfstudio-project/nerfstudio#1438.

Fastest Solution change the two line here to:

self.height = torch.floor(0.5 + (self.height * scaling_factor)).to(torch.int64)
self.width = torch.floor(0.5 + (self.width * scaling_factor)).to(torch.int64)

I modify the codes here, and resolve the problem:

dataparser=ColmapDataParserConfig(downscale_factor=4,orientation_method="up",center_method="poses", colmap_path="sparse/0"),

to

dataparser=ColmapDataParserConfig(downscale_factor=4,orientation_method="up",center_method="poses", colmap_path="sparse/0", downscale_rounding_mode="round"),

@Jing1Ling
Copy link
Contributor Author

Jing1Ling commented Apr 15, 2024

Hi @unanan, you're right. Now that the PR submitted to nerfstudio about rounding mode has been merged. I will submit a PR to update the readme of this repo later.
You can also specify the rounding mode when entering training instruction:
ns-train zipnerf --data path/to/data colmap --downscale_rounding_mode round

@s1eeveW
Copy link

s1eeveW commented May 27, 2024

@Jing1Ling Hello mate, Do you have any solutions to resolve this issue?:

NameError: name 'segment_coo' is not defined

The entire error info is:

(nerfstudio) E:\zipnerf-pytorch>ns-train zipnerf --data ./data/flowers colmap --colmap-path sparse/0
E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\tyro_fields.py:343: UserWarning: The field colmap_path is annotated with type <class 'pathlib.Path'>, but the default value sparse/0 has type <class 'str'>. We'll try to handle this gracefully, but it may cause unexpected behavior.
warnings.warn(
[03:08:49] Using --data alias for --data.pipeline.datamanager.data
train.py:230
──────────────────────────────────────────────────────── Config ────────────────────────────────────────────────────────
TrainerConfig(
_target=<class 'nerfstudio.engine.trainer.Trainer'>,
output_dir=WindowsPath('outputs'),
method_name='zipnerf',
experiment_name=None,
project_name='nerfstudio-project',
timestamp='2024-05-28_030849',
machine=MachineConfig(seed=42, num_devices=1, num_machines=1, machine_rank=0, dist_url='auto', device_type='cuda'),
logging=LoggingConfig(
relative_log_dir=WindowsPath('.'),
steps_per_log=10,
max_buffer_size=20,
local_writer=LocalWriterConfig(
_target=<class 'nerfstudio.utils.writer.LocalWriter'>,
enable=True,
stats_to_track=(
<EventName.ITER_TRAIN_TIME: 'Train Iter (time)'>,
<EventName.TRAIN_RAYS_PER_SEC: 'Train Rays / Sec'>,
<EventName.CURR_TEST_PSNR: 'Test PSNR'>,
<EventName.VIS_RAYS_PER_SEC: 'Vis Rays / Sec'>,
<EventName.TEST_RAYS_PER_SEC: 'Test Rays / Sec'>,
<EventName.ETA: 'ETA (time)'>
),
max_log_size=10
),
profiler='basic'
),
viewer=ViewerConfig(
relative_log_filename='viewer_log_filename.txt',
websocket_port=None,
websocket_port_default=7007,
websocket_host='0.0.0.0',
num_rays_per_chunk=32768,
max_num_display_images=512,
quit_on_train_completion=False,
image_format='jpeg',
jpeg_quality=75,
make_share_url=False,
camera_frustum_scale=0.1,
default_composite_depth=True
),
pipeline=ZipNerfPipelineConfig(
_target=<class 'zipnerf_ns.zipnerf_pipeline.ZipNerfPipeline'>,
datamanager=ZipNerfDataManagerConfig(
_target=<class 'zipnerf_ns.zipnerf_datamanager.ZipNerfDataManager'>,
data=WindowsPath('data/flowers'),
masks_on_gpu=False,
images_on_gpu=False,
dataparser=ColmapDataParserConfig(
_target=<class 'nerfstudio.data.dataparsers.colmap_dataparser.ColmapDataParser'>,
data=WindowsPath('.'),
scale_factor=1.0,
downscale_factor=4,
downscale_rounding_mode='round',
scene_scale=1.0,
orientation_method='up',
center_method='poses',
auto_scale_poses=True,
assume_colmap_world_coordinate_convention=True,
eval_mode='interval',
train_split_fraction=0.9,
eval_interval=8,
depth_unit_scale_factor=0.001,
images_path=WindowsPath('images'),
masks_path=None,
depths_path=None,
colmap_path=WindowsPath('sparse/0'),
load_3D_points=True,
max_2D_matches_per_3D_point=0
),
train_num_rays_per_batch=8192,
train_num_images_to_sample_from=-1,
train_num_times_to_repeat_images=-1,
eval_num_rays_per_batch=8192,
eval_num_images_to_sample_from=-1,
eval_num_times_to_repeat_images=-1,
eval_image_indices=(0,),
collate_fn=<function nerfstudio_collate at 0x000001E2B32A3C10>,
camera_res_scale_factor=1.0,
patch_size=1,
camera_optimizer=None,
pixel_sampler=PixelSamplerConfig(
_target=<class 'nerfstudio.data.pixel_samplers.PixelSampler'>,
num_rays_per_batch=4096,
keep_full_image=False,
is_equirectangular=False,
ignore_mask=False,
fisheye_crop_radius=None,
rejection_sample_mask=True,
max_num_iterations=100
)
),
model=ZipNerfModelConfig(
_target=<class 'zipnerf_ns.zipnerf_model.ZipNerfModel'>,
enable_collider=True,
collider_params={'near_plane': 2.0, 'far_plane': 6.0},
loss_coefficients={'rgb_loss_coarse': 1.0, 'rgb_loss_fine': 1.0},
eval_num_rays_per_chunk=32768,
prompt=None,
gin_file=['configs/360.gin'],
compute_extras=True,
proposal_weights_anneal_max_num_iters=1000,
rand=True,
zero_glo=False
)
),
optimizers={
'model': {
'optimizer': AdamOptimizerConfig(
_target=<class 'torch.optim.adam.Adam'>,
lr=0.008,
eps=1e-15,
max_norm=None,
weight_decay=0
),
'scheduler': ExponentialDecaySchedulerConfig(
_target=<class 'nerfstudio.engine.schedulers.ExponentialDecayScheduler'>,
lr_pre_warmup=1e-08,
lr_final=0.001,
warmup_steps=1000,
max_steps=25000,
ramp='cosine'
)
}
},
vis='viewer',
data=WindowsPath('data/flowers'),
prompt=None,
relative_model_dir=WindowsPath('nerfstudio_models'),
load_scheduler=True,
steps_per_save=5000,
steps_per_eval_batch=1000,
steps_per_eval_image=5000,
steps_per_eval_all_images=25000,
max_num_iterations=25000,
mixed_precision=True,
use_grad_scaler=False,
save_only_latest_checkpoint=True,
load_dir=None,
load_step=None,
load_config=None,
load_checkpoint=None,
log_gradients=False,
gradient_accumulation_steps={}
)
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Saving config to: outputs\flowers\zipnerf\2024-05-28_030849\config.yml experiment_config.py:136
Saving checkpoints to: outputs\flowers\zipnerf\2024-05-28_030849\nerfstudio_models
trainer.py:137
Setting up training dataset...
Caching all 151 images.
Setting up evaluation dataset...
Caching all 22 images.
E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torchmetrics\utilities\prints.py:62: FutureWarning: Importing PeakSignalNoiseRatio from torchmetrics was deprecated and will be removed in 2.0. Import PeakSignalNoiseRatio from torchmetrics.image instead.
_future_warning(
╭─────────────── viser ───────────────╮
│ ╷ │
│ HTTP │ http://0.0.0.0:7007
│ Websocket │ ws://0.0.0.0:7007 │
│ ╵ │
╰─────────────────────────────────────╯
[NOTE] Not running eval iterations since only viewer is enabled.
Use --vis {wandb, tensorboard, viewer+wandb, viewer+tensorboard} to run with eval.
No Nerfstudio checkpoint to load, so training from scratch.
Disabled comet/tensorboard/wandb event writers
Printing profiling stats, from longest to shortest duration in seconds
VanillaPipeline.get_train_loss_dict: 0.2297
Trainer.train_iteration: 0.2297
Traceback (most recent call last):
File "E:\Programming\Anaconda\envs\nerfstudio\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "E:\Programming\Anaconda\envs\nerfstudio\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "E:\Programming\Anaconda\envs\nerfstudio\Scripts\ns-train.exe_main
.py", line 7, in
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 262, in entrypoint
main(
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 247, in main
launch(
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 189, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\scripts\train.py", line 100, in train_loop
trainer.train()
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 261, in train
loss, loss_dict, metrics_dict = self.train_iteration(step)
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\utils\profiler.py", line 112, in inner
out = func(*args, **kwargs)
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\engine\trainer.py", line 496, in train_iteration
_, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\utils\profiler.py", line 112, in inner
out = func(*args, **kwargs)
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\pipelines\base_pipeline.py", line 301, in get_train_loss_dict
model_outputs = self._model(ray_bundle) # train distributed data parallel model if world_size > 1
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\nerfstudio\models\base_model.py", line 143, in forward
return self.get_outputs(ray_bundle)
File "E:\zipnerf-pytorch\zipnerf_ns\zipnerf_model.py", line 94, in get_outputs
renderings, ray_history = self.zipnerf(
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "E:\Programming\Anaconda\envs\nerfstudio\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "E:\zipnerf-pytorch\internal\models.py", line 307, in forward
loss_hash_decay = segment_coo(param ** 2,
NameError: name 'segment_coo' is not defined

@Jing1Ling
Copy link
Contributor Author

Hi @s1eeveW ! 'segment_coo‘ is a function in pytorch_scatter package. You can install pytorch_scatter in your python envirionment. Also, you can simply comment these lines and use this line to calculate 'loss_hash_decay'. They only have little difference and I think the replacement won't effect much thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants