Skip to content

Commit

Permalink
fix versionning issues with torch and scipy, fix some doc (#666)
Browse files Browse the repository at this point in the history
* fix versionning issues with torch and scipy, fix some doc

Signed-off-by: Clement Fuji Tsang <cfujitsang@nvidia.com>

* polish doc

Signed-off-by: Clement Fuji Tsang <cfujitsang@nvidia.com>

Signed-off-by: Clement Fuji Tsang <cfujitsang@nvidia.com>
  • Loading branch information
Caenorst committed Dec 12, 2022
1 parent 609671b commit a111177
Show file tree
Hide file tree
Showing 7 changed files with 258 additions and 9 deletions.
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Kaolin library is part of a larger suite of tools for 3D deep learning research.
notes/checkpoints
notes/diff_render
notes/spc_summary
notes/camera_summary

.. toctree::
:titlesonly:
Expand Down
237 changes: 237 additions & 0 deletions docs/notes/camera_summary.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
Camera summary
**************

.. _camera_summary:

Camera class
============

.. _camera_class:

:class:`kaolin.render.camera.Camera` is a one-stop class for all camera related differentiable / non-differentiable transformations.
Camera objects are represented by *batched* instances of 2 submodules:

- :ref:`CameraExtrinsics <camera_extrinsics_class>`: The extrinsics properties of the camera (position, orientation).
These are usually embedded in the view matrix, used to transform vertices from world space to camera space.
- :ref:`CameraIntrinsics <camera_intrinsics_class>`: The intrinsics properties of the lens
(such as field of view / focal length in the case of pinhole cameras).
Intrinsics parameters vary between different lens type,
and therefore multiple CameraIntrinsics subclasses exist,
to support different types of cameras: pinhole / perspective, orthographic, fisheye, and so forth.
For pinehole and orthographic lens, the intrinsics are embedded in a projection matrix.
The intrinsics module can be used to transform vertices from camera space to Normalized Device Coordinates.

.. note::
To avoid tedious invocation of camera functions through
``camera.extrinsics.someop()`` and ``camera.intrinsics.someop()``, kaolin overrides the ``__get_attributes__``
function to forward any function calls of ``camera.someop()`` to
the appropriate extrinsics / intrinsics submodule.

The entire pipeline of transformations can be summarized as (ignoring homogeneous coordinates)::

World Space Camera View Space
V ---CameraExtrinsics.transform()---> V' ---CameraIntrinsics.transform()---
Shape~(B, 3) (view matrix) Shape~(B, 3) |
|
(linear lens: projection matrix) |
+ homogeneus -> 3D |
V
Normalized Device Coordinates (NDC)
Shape~(B, 3)
When using view / projection matrices, conversion to homogeneous coordinates is required.
Alternatively, the `transform()` function takes care of such projections under the hood when needed.

How to apply transformations with kaolin's Camera:
1. Linear camera types, such as the commonly used pinhole camera,
support the :func:`view_projection_matrix()` method.
The returned matrix can be used to transform vertices through pytorch's matrix multiplication, or even be
passed to shaders as a uniform.
2. All Cameras are guaranteed to support a general :func:`transform()` function
which maps coordinates from world space to Normalized Device Coordinates space.
For some lens types which perform non linear transformations,
the :func:`view_projection_matrix()` is non-defined.
Therefore the camera transformation must be applied through
a dedicated function. For linear cameras,
:func:`transform()` may use matrices under the hood.
3. Camera parameters may also be queried directly.
This is useful when implementing camera params aware code such as ray tracers.
How to control kaolin's Camera:
- :class:`CameraExtrinsics`: is packed with useful methods for controlling the camera position and orientation:
:func:`translate() <CameraExtrinsics.translate()>`,
:func:`rotate() <CameraExtrinsics.rotate()>`,
:func:`move_forward() <CameraExtrinsics.move_forward()>`,
:func:`move_up() <CameraExtrinsics.move_up()>`,
:func:`move_right() <CameraExtrinsics.move_right()>`,
:func:`cam_pos() <CameraExtrinsics.cam_pos()>`,
:func:`cam_up() <CameraExtrinsics.cam_up()>`,
:func:`cam_forward() <CameraExtrinsics.cam_forward()>`,
:func:`cam_up() <CameraExtrinsics.cam_up()>`.
- :class:`CameraIntrinsics`: exposes a lens :func:`zoom() <CameraIntrinsics.zoom()>`
operation. The exact functionality depends on the camera type.
How to optimize the Camera parameters:
- Both :class:`CameraExtrinsics`: and :class:`CameraIntrinsics` maintain
:class:`torch.Tensor` buffers of parameters which support pytorch differentiable operations.
- Setting ``camera.requires_grad_(True)`` will turn on the optimization mode.
- The :func:`gradient_mask` function can be used to mask out gradients of specific Camera parameters.

.. note::
:class:`CameraExtrinsics`: supports multiple representions of camera parameters
(see: :func:`switch_backend <CameraExtrinsics.switch_backend()>`).
Specific representations are better fit for optimization
(e.g.: they maintain an orthogonal view matrix).
Kaolin will automatically switch to using those representations when gradient flow is enabled
For non-differentiable uses, the default representation may provide better
speed and numerical accuracy.

Other useful camera properties:
- Cameras follow pytorch in part, and support arbitrary ``dtype`` and ``device`` types through the
:func:`to()`, :func:`cpu()`, :func:`cuda()`, :func:`half()`, :func:`float()`, :func:`double()`
methods and :func:`dtype`, :func:`device` properties.
- :class:`CameraExtrinsics`: and :class:`CameraIntrinsics`: individually support the :func:`requires_grad`
property.
- Cameras implement :func:`torch.allclose` for comparing camera parameters under controlled numerical accuracy.
The operator ``==`` is reserved for comparison by ref.
- Cameras support batching, either through construction, or through the :func:`cat()` method.

.. note::
Since kaolin's cameras are batched, the view/projection matrices are of shapes :math:`(\text{num_cameras}, 4, 4)`,
and some operations, such as :func:`transform()` may return values as shapes of :math:`(\text{num_cameras}, \text{num_vectors}, 3)`.

Concluding remarks on coordinate systems and other confusing conventions:
- kaolin's Cameras assume column major matrices, for example, the inverse view matrix (cam2world) is defined as:

.. math::
\begin{bmatrix}
r1 & u1 & f1 & px \\
r2 & u2 & f2 & py \\
r3 & u3 & f3 & pz \\
0 & 0 & 0 & 1
\end{bmatrix}
This sometimes causes confusion as the view matrix (world2cam) uses a transposed 3x3 submatrix component,
which despite this transposition is still column major (observed through the last `t` column):

.. math::
\begin{bmatrix}
r1 & r2 & r3 & tx \\
u1 & u2 & u3 & ty \\
f1 & f2 & f3 & tz \\
0 & 0 & 0 & 1
\end{bmatrix}
- kaolin's cameras do not assume any specific coordinate system for the camera axes. By default, the
right handed cartesian coordinate system is used. Other coordinate systems are supported through
:func:`change_coordinate_system() <CameraExtrinsics.change_coordinate_system()>`
and the ``coordinates.py`` module::

Y
^
|
|---------> X
/
Z - kaolin's NDC space is assumed to be left handed (depth goes inwards to the screen).

The default range of values is [-1, 1].

CameraExtrinsics class
======================

.. _camera_extrinsics_class:

:class:`kaolin.render.camera.CameraExtrinsics` holds the extrinsics parameters of a camera: position and orientation in space.

This class maintains the view matrix of camera, used to transform points from world coordinates
to camera / eye / view space coordinates.

This view matrix maintained by this class is column-major, and can be described by the 4x4 block matrix:

.. math::
\begin{bmatrix}
R & t \\
0 & 1
\end{bmatrix}
where **R** is a 3x3 rotation matrix and **t** is a 3x1 translation vector for the orientation and position
respectively.

This class is batched and may hold information from multiple cameras.

:class:`CameraExtrinsics` relies on a dynamic representation backend to manage the tradeoff between various choices
such as speed, or support for differentiable rigid transformations.
Parameters are stored as a single tensor of shape :math:`(\text{num_cameras}, K)`,
where K is a representation specific number of parameters.
Transformations and matrices returned by this class support differentiable torch operations,
which in turn may update the extrinsic parameters of the camera::

convert_to_mat
Backend ---- > Extrinsics
Representation R View Matrix M
Shape (num_cameras, K), Shape (num_cameras, 4, 4)
< ----
convert_from_mat

.. note::

Unless specified manually with :func:`switch_backend`,
kaolin will choose the optimal representation backend depending on the status of ``requires_grad``.
.. note::

Users should be aware, but not concerned about the conversion from internal representations to view matrices.
kaolin performs these conversions where and if needed.

Supported backends:

- **"matrix_se3"**\: A flattened view matrix representation, containing the full information of
special euclidean transformations (translations and rotations).
This representation is quickly converted to a view matrix, but differentiable ops may cause
the view matrix to learn an incorrect, non-orthogonal transformation.
- **"matrix_6dof_rotation"**\: A compact representation with 6 degrees of freedom, ensuring the view matrix
remains orthogonal under optimizations. The conversion to matrix requires a single Gram-Schmidt step.

.. seealso::

`On the Continuity of Rotation Representations in Neural Networks, Zhou et al. 2019
<https://arxiv.org/abs/1812.07035>`_

Unless stated explicitly, the definition of the camera coordinate system used by this class is up to the
choice of the user.
Practitioners should be mindful of conventions when pairing the view matrix managed by this class with a projection
matrix.

CameraIntrinsics class
======================

.. _camera_intrinsics_class:

:class:`kaolin.render.camera.CameraIntrinsics` holds the intrinsics parameters of a camera:
how it should project from camera space to normalized screen / clip space.

The instrinsics are determined by the camera type, meaning parameters may differ according to the lens structure.
Typical computer graphics systems commonly assume the intrinsics of a pinhole camera (see: :class:`PinholeIntrinsics` class).
One implication is that some camera types do not use a linear projection (i.e: Fisheye lens).

There are therefore numerous ways to use CameraIntrinsics subclasses:

1. Access intrinsics parameters directly.
This may typically benefit use cases such as ray generators.
2. The :func:`transform()` method is supported by all CameraIntrinsics subclasses,
both linear and non-linear transformations, to project vectors from camera space to normalized screen space.
This method is implemented using differential pytorch operations.
3. Certain CameraIntrinsics subclasses which perform linear projections, may expose the transformation matrix
via dedicated methods.
For example, :class:`PinholeIntrinsics` exposes a :func:`projection_matrix()` method.
This may typically be useful for rasterization based rendering pipelines (i.e: OpenGL vertex shaders).

This class is batched and may hold information from multiple cameras.
Parameters are stored as a single tensor of shape :math:`(\text{num_cameras}, K)` where K is the number of
intrinsic parameters.

currently there are two subclasses of intrinsics: :class:`kaolin.render.camera.OrthographicIntrinsics` and
:class:`kaolin.render.camera.PinholeIntrinsics`.

API Documentation:
------------------

* Check all the camera classes and functions at the :ref:`API documentation<kaolin.render.camera>`.

2 changes: 1 addition & 1 deletion docs/notes/tutorial_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Simple Recipes
* `spc_trilinear_interp.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/spc/spc_trilinear_interp.py>`_: computing trilinear interpolation of a point cloud on an SPC
* Visualization:
* `visualize_main.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/tutorial/visualize_main.py>`_: using Timelapse API to write mock 3D checkpoints
* `fast_mesh_sampling.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/preprocess/fast_mesh_sampling.py>_`: Using CachedDataset to preprocess a ShapeNet dataset we can sample point clouds efficiently at runtime
* `fast_mesh_sampling.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/preprocess/fast_mesh_sampling.py>`_: Using CachedDataset to preprocess a ShapeNet dataset we can sample point clouds efficiently at runtime
* Camera:
* `cameras_differentiable.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/camera/cameras_differentiable.py>`_: optimize a camera position
* `camera_transforms.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/camera/camera_transforms.py>`_: using :func:`Camera.transform()` function
Expand Down
12 changes: 9 additions & 3 deletions kaolin/metrics/trianglemesh.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,16 @@
from ..ops.mesh import uniform_laplacian

def point_to_mesh_distance(pointclouds, face_vertices):
r"""Computes the distances from pointclouds to meshes (represented by vertices and faces.)
r"""Computes the distances from pointclouds to meshes (represented by vertices and faces).
For each point in the pointcloud, it finds the nearest triangle
in the mesh, and calculated its distance to that triangle.
.. note::
The calculated distance is the squared euclidean distance.
Type 0 indicates the distance is from a point on the surface of the triangle.
Type 1 to 3 indicates the distance is from a point to a vertices.
Expand All @@ -33,7 +39,7 @@ def point_to_mesh_distance(pointclouds, face_vertices):
pointclouds, of shape :math:`(\text{batch_size}, \text{num_points}, 3)`.
face_vertices (torch.Tensor):
vertices of each face of meshes,
of shape :math:`(\text{batch_size}, \text{num_faces}, 3, 3})`.
of shape :math:`(\text{batch_size}, \text{num_faces}, 3, 3)`.
Returns:
(torch.Tensor, torch.LongTensor, torch.IntTensor):
Expand Down Expand Up @@ -147,7 +153,7 @@ def _unbatched_naive_point_to_mesh_distance(points, face_vertices):
Args:
points (torch.Tensor): of shape (num_points, 3).
faces_vertices (torch.LongTensor): of shape (num_faces, 3, 3).
face_vertices (torch.LongTensor): of shape (num_faces, 3, 3).
Returns:
(torch.Tensor, torch.LongTensor, torch.IntTensor):
Expand Down
4 changes: 2 additions & 2 deletions kaolin/ops/conversions/tetmesh.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,8 +121,8 @@ def marching_tetrahedra(vertices, tets, sdf, return_tet_idx=False):
Args:
vertices (torch.tensor): batched vertices of tetrahedral meshes, of shape
:math:`(\text{batch_size}, \text{num_vertices}, 3)`.
faces (torch.tensor): unbatched tetrahedral mesh topology, of shape
:math:`(\text{num_tetrahedrons}, 4)`.
tets (torch.tensor): unbatched tetrahedral mesh topology, of shape
:math:`(\text{num_tetrahedrons}, 4)`.
sdf (torch.tensor): batched SDFs which specify the SDF value of each vertex, of shape
:math:`(\text{batch_size}, \text{num_vertices})`.
return_tet_idx (optional, bool): if True, return index of tetrahedron
Expand Down
4 changes: 3 additions & 1 deletion kaolin/ops/spc/spc.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,9 @@ def unbatched_query(octree, exsum, query_coords, level, with_parents=False):
to only a single level (default: False).
Returns:
pidx (torch.LongTensor): The indices into the point hierarchy of shape :math:`(\text{num_query})`.
pidx (torch.LongTensor):
The indices into the point hierarchy of shape :math:`(\text{num_query})`.
If with_parents is True, then the shape will be :math:`(\text{num_query, level+1})`.
Examples:
Expand Down
7 changes: 5 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
)
else:
import torch
torch_ver = parse_version(torch.__version__)
torch_ver = parse_version(parse_version(torch.__version__).base_version)
if (torch_ver < parse_version(TORCH_MIN_VER) or
torch_ver > parse_version(TORCH_MAX_VER)):
if IGNORE_TORCH_VER:
Expand Down Expand Up @@ -178,9 +178,12 @@ def write_version_file():

def get_requirements():
requirements = []
requirements.append('scipy>=1.2.0,<=1.7.2')
requirements.append('Pillow>=8.0.0')
requirements.append('tqdm>=4.51.0')
if sys.version_info < (3, 8):
requirements.append('scipy>=1.2.0,<=1.7.3')
else:
requirements.append('scipy>=1.2.0')
if sys.version_info >= (3, 10):
warnings.warn("usd-core is not compatible with python_version >= 3.10 "
"and won't be installed, please use supported python_version "
Expand Down

0 comments on commit a111177

Please sign in to comment.