Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What camera intrinsic used for finetuning the gray bg Zero123Plus? #5

Closed
cwchenwang opened this issue Apr 15, 2024 · 4 comments
Closed

Comments

@cwchenwang
Copy link

Thanks for the amazing work. I noticed that after finetuning to white bg, the output image has a large scale:
output-whitebg
output-ori

Do you use different intrinsics when finetuning? Would large scale output better for the reconstruction stage?

@bluestyle97
Copy link
Member

@cwchenwang Hi, we aimed to strictly follow the camera setting of Zero123++ v1.2 (fov=30) during fine-tuning. We asked the authors of Zero123++ about the object normalization and camera distance in this issue. The original answer was that the object should be normalized into a unit cube (it has been correced to unit sphere), which was an unintentional mistake resulting in larger objects in the rendered image.

This will not influence the reconstruction results in most cases. However, if the shape of the object is close to a cube, it will occupy a very large region in the generated image and make the reconstruction result cropped since it exceeds the [-1, 1] representation range of the triplane. To alleviate this issue temporarily, you can parse a smaller --scale argument in run.py to decrease the size of the reconstructed object. We plan to fix the object normalization issue and provide a new model in the future.

@msingh27
Copy link

msingh27 commented May 24, 2024

@bluestyle97 thanks a lot for open-sourcing the codebase for fine-tuning the zero123++ models.
I am also facing some issues while rendering the 6 views for a 3d model in blender.
It would be great if you could tell more about the creation of training dataset for zero123++ models:
Specifically the camera_distance and normalization of the 3d model for blender rendering.
I am modifying the blender script from here
but for some objects the views doesn't look like zero123++ (scaling issues, camera distance issue).

I am using this for validation views in blender:

def set_camera_location_validation(camera, view_i):
    # cam_distance = (0.5 / np.tan(np.radians(30/2))) # not sure if this is correct
    cam_distance = 2.0
    azimuths = np.array([30, 90, 150, 210, 270, 330])
    elevations = np.array([20, -10, 20, -10, 20, -10])
    azimuths = np.deg2rad(azimuths)
    elevations = np.deg2rad(elevations)

    x = cam_distance * np.cos(elevations) * np.cos(azimuths)
    y = cam_distance * np.cos(elevations) * np.sin(azimuths)
    z = cam_distance * np.sin(elevations)

    camera.location = x[view_i], y[view_i], z[view_i]

    # adjust orientation
    direction = - camera.location
    rot_quat = direction.to_track_quat('-Z', 'Y')
    camera.rotation_euler = rot_quat.to_euler()
    return camera

#for normalization using normalize_scene function 
normalize_scene(box_scale=2) # this is unit cube normalization , maybe sphere normalization is required

# camera_setup
cam.data.lens = 30 # 24 default for openLRM?

Any suggestions would be super helpful
cc: @cwchenwang
Thanks :D

@mengxuyiGit
Copy link

@bluestyle97 thanks a lot for open-sourcing the codebase for fine-tuning the zero123++ models. I am also facing some issues while rendering the 6 views for a 3d model in blender. It would be great if you could tell more about the creation of training dataset for zero123++ models: Specifically the camera_distance and normalization of the 3d model for blender rendering. I am modifying the blender script from here but for some objects the views doesn't look like zero123++ (scaling issues, camera distance issue).

I am using this for validation views in blender:

def set_camera_location_validation(camera, view_i):
    # cam_distance = (0.5 / np.tan(np.radians(30/2))) # not sure if this is correct
    cam_distance = 2.0
    azimuths = np.array([30, 90, 150, 210, 270, 330])
    elevations = np.array([20, -10, 20, -10, 20, -10])
    azimuths = np.deg2rad(azimuths)
    elevations = np.deg2rad(elevations)

    x = cam_distance * np.cos(elevations) * np.cos(azimuths)
    y = cam_distance * np.cos(elevations) * np.sin(azimuths)
    z = cam_distance * np.sin(elevations)

    camera.location = x[view_i], y[view_i], z[view_i]

    # adjust orientation
    direction = - camera.location
    rot_quat = direction.to_track_quat('-Z', 'Y')
    camera.rotation_euler = rot_quat.to_euler()
    return camera

#for normalization using normalize_scene function 
normalize_scene(box_scale=2) # this is unit cube normalization , maybe sphere normalization is required

# camera_setup
cam.data.lens = 30 # 24 default for openLRM?

Any suggestions would be super helpful cc: @cwchenwang Thanks :D

Hi, have you found a proper scale to reproduce the results as shown in the InstantMesh? Thanks!

@msingh27
Copy link

msingh27 commented Jun 6, 2024

@mengxuyiGit
I think updating these parameters in blender script from openlrm can generate the images that are consistent with zero123++ 6 view images:

# Cam setup
fov = 49.13
cam.data.lens = 49.13
cam_distance = (0.5 / np.tan(np.radians(fov/2)))

# Cube normalization -> Sphere normalization
scale = box_scale / max(bbox_max - bbox_min) --> 
scale = box_scale / np.linalg.norm(bbox_max - bbox_min)

# Random normalization
normalize_scene(box_scale=0.8)

Not sure if these params were used by instantmesh authors for training cc: @bluestyle97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants