What camera intrinsic used for finetuning the gray bg Zero123Plus? #5

cwchenwang · 2024-04-15T01:20:35Z

Thanks for the amazing work. I noticed that after finetuning to white bg, the output image has a large scale:

Do you use different intrinsics when finetuning? Would large scale output better for the reconstruction stage?

bluestyle97 · 2024-04-15T07:18:29Z

@cwchenwang Hi, we aimed to strictly follow the camera setting of Zero123++ v1.2 (fov=30) during fine-tuning. We asked the authors of Zero123++ about the object normalization and camera distance in this issue. The original answer was that the object should be normalized into a unit cube (it has been correced to unit sphere), which was an unintentional mistake resulting in larger objects in the rendered image.

This will not influence the reconstruction results in most cases. However, if the shape of the object is close to a cube, it will occupy a very large region in the generated image and make the reconstruction result cropped since it exceeds the [-1, 1] representation range of the triplane. To alleviate this issue temporarily, you can parse a smaller --scale argument in run.py to decrease the size of the reconstructed object. We plan to fix the object normalization issue and provide a new model in the future.

msingh27 · 2024-05-24T17:56:28Z

@bluestyle97 thanks a lot for open-sourcing the codebase for fine-tuning the zero123++ models.
I am also facing some issues while rendering the 6 views for a 3d model in blender.
It would be great if you could tell more about the creation of training dataset for zero123++ models:
Specifically the camera_distance and normalization of the 3d model for blender rendering.
I am modifying the blender script from here
but for some objects the views doesn't look like zero123++ (scaling issues, camera distance issue).

I am using this for validation views in blender:

def set_camera_location_validation(camera, view_i):
    # cam_distance = (0.5 / np.tan(np.radians(30/2))) # not sure if this is correct
    cam_distance = 2.0
    azimuths = np.array([30, 90, 150, 210, 270, 330])
    elevations = np.array([20, -10, 20, -10, 20, -10])
    azimuths = np.deg2rad(azimuths)
    elevations = np.deg2rad(elevations)

    x = cam_distance * np.cos(elevations) * np.cos(azimuths)
    y = cam_distance * np.cos(elevations) * np.sin(azimuths)
    z = cam_distance * np.sin(elevations)

    camera.location = x[view_i], y[view_i], z[view_i]

    # adjust orientation
    direction = - camera.location
    rot_quat = direction.to_track_quat('-Z', 'Y')
    camera.rotation_euler = rot_quat.to_euler()
    return camera

#for normalization using normalize_scene function 
normalize_scene(box_scale=2) # this is unit cube normalization , maybe sphere normalization is required

# camera_setup
cam.data.lens = 30 # 24 default for openLRM?

Any suggestions would be super helpful
cc: @cwchenwang
Thanks :D

mengxuyiGit · 2024-06-05T15:19:52Z

@bluestyle97 thanks a lot for open-sourcing the codebase for fine-tuning the zero123++ models. I am also facing some issues while rendering the 6 views for a 3d model in blender. It would be great if you could tell more about the creation of training dataset for zero123++ models: Specifically the camera_distance and normalization of the 3d model for blender rendering. I am modifying the blender script from here but for some objects the views doesn't look like zero123++ (scaling issues, camera distance issue).

I am using this for validation views in blender:
def set_camera_location_validation(camera, view_i):
    # cam_distance = (0.5 / np.tan(np.radians(30/2))) # not sure if this is correct
    cam_distance = 2.0
    azimuths = np.array([30, 90, 150, 210, 270, 330])
    elevations = np.array([20, -10, 20, -10, 20, -10])
    azimuths = np.deg2rad(azimuths)
    elevations = np.deg2rad(elevations)

    x = cam_distance * np.cos(elevations) * np.cos(azimuths)
    y = cam_distance * np.cos(elevations) * np.sin(azimuths)
    z = cam_distance * np.sin(elevations)

    camera.location = x[view_i], y[view_i], z[view_i]

    # adjust orientation
    direction = - camera.location
    rot_quat = direction.to_track_quat('-Z', 'Y')
    camera.rotation_euler = rot_quat.to_euler()
    return camera

#for normalization using normalize_scene function 
normalize_scene(box_scale=2) # this is unit cube normalization , maybe sphere normalization is required

# camera_setup
cam.data.lens = 30 # 24 default for openLRM?
Any suggestions would be super helpful cc: @cwchenwang Thanks :D

Hi, have you found a proper scale to reproduce the results as shown in the InstantMesh? Thanks!

msingh27 · 2024-06-06T17:32:48Z

@mengxuyiGit
I think updating these parameters in blender script from openlrm can generate the images that are consistent with zero123++ 6 view images:

# Cam setup
fov = 49.13
cam.data.lens = 49.13
cam_distance = (0.5 / np.tan(np.radians(fov/2)))

# Cube normalization -> Sphere normalization
scale = box_scale / max(bbox_max - bbox_min) --> 
scale = box_scale / np.linalg.norm(bbox_max - bbox_min)

# Random normalization
normalize_scene(box_scale=0.8)

Not sure if these params were used by instantmesh authors for training cc: @bluestyle97

bluestyle97 closed this as completed Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What camera intrinsic used for finetuning the gray bg Zero123Plus? #5

What camera intrinsic used for finetuning the gray bg Zero123Plus? #5

cwchenwang commented Apr 15, 2024

bluestyle97 commented Apr 15, 2024

msingh27 commented May 24, 2024 •

edited

Loading

mengxuyiGit commented Jun 5, 2024

msingh27 commented Jun 6, 2024 •

edited

Loading

What camera intrinsic used for finetuning the gray bg Zero123Plus? #5

What camera intrinsic used for finetuning the gray bg Zero123Plus? #5

Comments

cwchenwang commented Apr 15, 2024

bluestyle97 commented Apr 15, 2024

msingh27 commented May 24, 2024 • edited Loading

mengxuyiGit commented Jun 5, 2024

msingh27 commented Jun 6, 2024 • edited Loading

msingh27 commented May 24, 2024 •

edited

Loading

msingh27 commented Jun 6, 2024 •

edited

Loading