fix versionning issues with torch and scipy, fix some doc (#666)

* fix versionning issues with torch and scipy, fix some doc Signed-off-by: Clement Fuji Tsang <cfujitsang@nvidia.com> * polish doc Signed-off-by: Clement Fuji Tsang <cfujitsang@nvidia.com> Signed-off-by: Clement Fuji Tsang <cfujitsang@nvidia.com>
NVIDIAGameWorks · Dec 12, 2022 · a111177 · a111177
1 parent 609671b
commit a111177
Show file tree

Hide file tree

Showing 7 changed files with 258 additions and 9 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -18,6 +18,7 @@ Kaolin library is part of a larger suite of tools for 3D deep learning research.
    notes/checkpoints
    notes/diff_render
    notes/spc_summary
+   notes/camera_summary
 
 .. toctree::
    :titlesonly:

diff --git a/docs/notes/camera_summary.rst b/docs/notes/camera_summary.rst
@@ -0,0 +1,237 @@
+Camera summary
+**************
+
+.. _camera_summary:
+
+Camera class
+============
+
+.. _camera_class:
+
+:class:`kaolin.render.camera.Camera` is a one-stop class for all camera related differentiable / non-differentiable transformations.
+Camera objects are represented by *batched* instances of 2 submodules:
+
+    - :ref:`CameraExtrinsics <camera_extrinsics_class>`: The extrinsics properties of the camera (position, orientation).
+      These are usually embedded in the view matrix, used to transform vertices from world space to camera space.
+    - :ref:`CameraIntrinsics <camera_intrinsics_class>`: The intrinsics properties of the lens
+      (such as field of view / focal length in the case of pinhole cameras).
+      Intrinsics parameters vary between different lens type,
+      and therefore multiple CameraIntrinsics subclasses exist,
+      to support different types of cameras: pinhole / perspective, orthographic, fisheye, and so forth.
+      For pinehole and orthographic lens, the intrinsics are embedded in a projection matrix.
+      The intrinsics module can be used to transform vertices from camera space to Normalized Device Coordinates.
+
+.. note::
+    To avoid tedious invocation of camera functions through
+    ``camera.extrinsics.someop()`` and ``camera.intrinsics.someop()``, kaolin overrides the ``__get_attributes__``
+    function to forward any function calls of ``camera.someop()`` to
+    the appropriate extrinsics / intrinsics submodule.
+
+The entire pipeline of transformations can be summarized as (ignoring homogeneous coordinates)::
+
+    World Space                                         Camera View Space
+         V         ---CameraExtrinsics.transform()--->         V'          ---CameraIntrinsics.transform()---
+    Shape~(B, 3)            (view matrix)                  Shape~(B, 3)                                     |
+                                                                                                            |
+                                                                           (linear lens: projection matrix) |
+                                                                                  + homogeneus -> 3D        |
+                                                                                                            V
+                                                                                 Normalized Device Coordinates (NDC)
+                                                                                            Shape~(B, 3)
+    When using view / projection matrices, conversion to homogeneous coordinates is required.
+    Alternatively, the `transform()` function takes care of such projections under the hood when needed.
+
+How to apply transformations with kaolin's Camera:
+    1. Linear camera types, such as the commonly used pinhole camera,
+       support the :func:`view_projection_matrix()` method.
+       The returned matrix can be used to transform vertices through pytorch's matrix multiplication, or even be
+       passed to shaders as a uniform.
+    2. All Cameras are guaranteed to support a general :func:`transform()` function
+       which maps coordinates from world space to Normalized Device Coordinates space.
+       For some lens types which perform non linear transformations,
+       the :func:`view_projection_matrix()` is non-defined.
+       Therefore the camera transformation must be applied through
+       a dedicated function. For linear cameras,
+       :func:`transform()` may use matrices under the hood.
+    3. Camera parameters may also be queried directly.
+       This is useful when implementing camera params aware code such as ray tracers.
+How to control kaolin's Camera:
+    - :class:`CameraExtrinsics`: is packed with useful methods for controlling the camera position and orientation:
+      :func:`translate() <CameraExtrinsics.translate()>`,
+      :func:`rotate() <CameraExtrinsics.rotate()>`,
+      :func:`move_forward() <CameraExtrinsics.move_forward()>`,
+      :func:`move_up() <CameraExtrinsics.move_up()>`,
+      :func:`move_right() <CameraExtrinsics.move_right()>`,
+      :func:`cam_pos() <CameraExtrinsics.cam_pos()>`,
+      :func:`cam_up() <CameraExtrinsics.cam_up()>`,
+      :func:`cam_forward() <CameraExtrinsics.cam_forward()>`,
+      :func:`cam_up() <CameraExtrinsics.cam_up()>`.
+    - :class:`CameraIntrinsics`: exposes a lens :func:`zoom() <CameraIntrinsics.zoom()>`
+      operation. The exact functionality depends on the camera type.
+How to optimize the Camera parameters:
+    - Both :class:`CameraExtrinsics`: and :class:`CameraIntrinsics` maintain
+      :class:`torch.Tensor` buffers of parameters which support pytorch differentiable operations.
+    - Setting ``camera.requires_grad_(True)`` will turn on the optimization mode.
+    - The :func:`gradient_mask` function can be used to mask out gradients of specific Camera parameters.
+
+    .. note::
+        :class:`CameraExtrinsics`: supports multiple representions of camera parameters
+        (see: :func:`switch_backend <CameraExtrinsics.switch_backend()>`).
+        Specific representations are better fit for optimization
+        (e.g.: they maintain an orthogonal view matrix).
+        Kaolin will automatically switch to using those representations when gradient flow is enabled
+        For non-differentiable uses, the default representation may provide better
+        speed and numerical accuracy.
+
+Other useful camera properties:
+    - Cameras follow pytorch in part, and support arbitrary ``dtype`` and ``device`` types through the
+      :func:`to()`, :func:`cpu()`, :func:`cuda()`, :func:`half()`, :func:`float()`, :func:`double()`
+      methods and :func:`dtype`, :func:`device` properties.
+    - :class:`CameraExtrinsics`: and :class:`CameraIntrinsics`: individually support the :func:`requires_grad`
+      property.
+    - Cameras implement :func:`torch.allclose` for comparing camera parameters under controlled numerical accuracy.
+      The operator ``==`` is reserved for comparison by ref.
+    - Cameras support batching, either through construction, or through the :func:`cat()` method.
+
+    .. note::
+        Since kaolin's cameras are batched, the view/projection matrices are of shapes :math:`(\text{num_cameras}, 4, 4)`,
+        and some operations, such as :func:`transform()` may return values as shapes of :math:`(\text{num_cameras}, \text{num_vectors}, 3)`.
+
+Concluding remarks on coordinate systems and other confusing conventions:
+    - kaolin's Cameras assume column major matrices, for example, the inverse view matrix (cam2world) is defined as:
+
+      .. math::
+          \begin{bmatrix}
+              r1 & u1 & f1 & px \\
+              r2 & u2 & f2 & py \\
+              r3 & u3 & f3 & pz \\
+              0 & 0 & 0 & 1
+          \end{bmatrix}
+
+      This sometimes causes confusion as the view matrix (world2cam) uses a transposed 3x3 submatrix component,
+      which despite this transposition is still column major (observed through the last `t` column):
+
+      .. math::
+          \begin{bmatrix}
+              r1 & r2 & r3 & tx \\
+              u1 & u2 & u3 & ty \\
+              f1 & f2 & f3 & tz \\
+              0 & 0 & 0 & 1
+          \end{bmatrix}
+
+    - kaolin's cameras do not assume any specific coordinate system for the camera axes. By default, the
+      right handed cartesian coordinate system is used. Other coordinate systems are supported through
+      :func:`change_coordinate_system() <CameraExtrinsics.change_coordinate_system()>`
+      and the ``coordinates.py`` module::
+
+            Y
+            ^
+            |
+            |---------> X
+           /
+         Z        - kaolin's NDC space is assumed to be left handed (depth goes inwards to the screen).
+
+      The default range of values is [-1, 1].
+
+CameraExtrinsics class
+======================
+
+.. _camera_extrinsics_class:
+
+    :class:`kaolin.render.camera.CameraExtrinsics` holds the extrinsics parameters of a camera: position and orientation in space.
+
+    This class maintains the view matrix of camera, used to transform points from world coordinates
+    to camera / eye / view space coordinates.
+
+    This view matrix maintained by this class is column-major, and can be described by the 4x4 block matrix:
+
+    .. math::
+
+        \begin{bmatrix}
+            R & t \\
+            0 & 1
+        \end{bmatrix}
+
+    where **R** is a 3x3 rotation matrix and **t** is a 3x1 translation vector for the orientation and position
+    respectively.
+
+    This class is batched and may hold information from multiple cameras.
+
+    :class:`CameraExtrinsics` relies on a dynamic representation backend to manage the tradeoff between various choices
+    such as speed, or support for differentiable rigid transformations.
+    Parameters are stored as a single tensor of shape :math:`(\text{num_cameras}, K)`,
+    where K is a representation specific number of parameters.
+    Transformations and matrices returned by this class support differentiable torch operations,
+    which in turn may update the extrinsic parameters of the camera::
+
+                                 convert_to_mat
+            Backend                 ---- >            Extrinsics
+        Representation R                             View Matrix M
+        Shape (num_cameras, K),                    Shape (num_cameras, 4, 4)
+                                    < ----
+                                 convert_from_mat
+
+    .. note::
+
+        Unless specified manually with :func:`switch_backend`,
+        kaolin will choose the optimal representation backend depending on the status of ``requires_grad``.
+    .. note::
+
+        Users should be aware, but not concerned about the conversion from internal representations to view matrices.
+        kaolin performs these conversions where and if needed.
+
+    Supported backends:
+
+        - **"matrix_se3"**\: A flattened view matrix representation, containing the full information of
+          special euclidean transformations (translations and rotations).
+          This representation is quickly converted to a view matrix, but differentiable ops may cause
+          the view matrix to learn an incorrect, non-orthogonal transformation.
+        - **"matrix_6dof_rotation"**\: A compact representation with 6 degrees of freedom, ensuring the view matrix
+          remains orthogonal under optimizations. The conversion to matrix requires a single Gram-Schmidt step.
+
+        .. seealso::
+
+            `On the Continuity of Rotation Representations in Neural Networks, Zhou et al. 2019
+            <https://arxiv.org/abs/1812.07035>`_
+
+    Unless stated explicitly, the definition of the camera coordinate system used by this class is up to the
+    choice of the user.
+    Practitioners should be mindful of conventions when pairing the view matrix managed by this class with a projection
+    matrix.
+
+CameraIntrinsics class
+======================
+
+.. _camera_intrinsics_class:
+
+    :class:`kaolin.render.camera.CameraIntrinsics` holds the intrinsics parameters of a camera:
+    how it should project from camera space to normalized screen / clip space.
+
+    The instrinsics are determined by the camera type, meaning parameters may differ according to the lens structure.
+    Typical computer graphics systems commonly assume the intrinsics of a pinhole camera (see: :class:`PinholeIntrinsics` class).
+    One implication is that some camera types do not use a linear projection (i.e: Fisheye lens).
+
+    There are therefore numerous ways to use CameraIntrinsics subclasses:
+
+        1. Access intrinsics parameters directly.
+        This may typically benefit use cases such as ray generators.
+        2. The :func:`transform()` method is supported by all CameraIntrinsics subclasses,
+        both linear and non-linear transformations, to project vectors from camera space to normalized screen space.
+        This method is implemented using differential pytorch operations.
+        3. Certain CameraIntrinsics subclasses which perform linear projections, may expose the transformation matrix
+        via dedicated methods.
+        For example, :class:`PinholeIntrinsics` exposes a :func:`projection_matrix()` method.
+        This may typically be useful for rasterization based rendering pipelines (i.e: OpenGL vertex shaders).
+
+    This class is batched and may hold information from multiple cameras.
+    Parameters are stored as a single tensor of shape :math:`(\text{num_cameras}, K)` where K is the number of
+    intrinsic parameters.
+
+    currently there are two subclasses of intrinsics: :class:`kaolin.render.camera.OrthographicIntrinsics` and
+    :class:`kaolin.render.camera.PinholeIntrinsics`.
+
+API Documentation:
+------------------
+
+* Check all the camera classes and functions at the :ref:`API documentation<kaolin.render.camera>`.
+
diff --git a/docs/notes/tutorial_index.rst b/docs/notes/tutorial_index.rst
@@ -82,7 +82,7 @@ Simple Recipes
     * `spc_trilinear_interp.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/spc/spc_trilinear_interp.py>`_: computing trilinear interpolation of a point cloud on an SPC
 * Visualization:
     * `visualize_main.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/tutorial/visualize_main.py>`_: using Timelapse API to write mock 3D checkpoints
-    * `fast_mesh_sampling.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/preprocess/fast_mesh_sampling.py>_`: Using CachedDataset to preprocess a ShapeNet dataset we can sample point clouds efficiently at runtime
+    * `fast_mesh_sampling.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/preprocess/fast_mesh_sampling.py>`_: Using CachedDataset to preprocess a ShapeNet dataset we can sample point clouds efficiently at runtime
 * Camera:
     * `cameras_differentiable.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/camera/cameras_differentiable.py>`_: optimize a camera position
     * `camera_transforms.py <https://github.com/NVIDIAGameWorks/kaolin/blob/master/examples/recipes/camera/camera_transforms.py>`_: using :func:`Camera.transform()` function

diff --git a/kaolin/metrics/trianglemesh.py b/kaolin/metrics/trianglemesh.py
@@ -18,10 +18,16 @@
 from ..ops.mesh import uniform_laplacian
 
 def point_to_mesh_distance(pointclouds, face_vertices):
-    r"""Computes the distances from pointclouds to meshes (represented by vertices and faces.)
+    r"""Computes the distances from pointclouds to meshes (represented by vertices and faces).
+
     For each point in the pointcloud, it finds the nearest triangle
     in the mesh, and calculated its distance to that triangle.
 
+    .. note::
+
+        The calculated distance is the squared euclidean distance.
+        
+
     Type 0 indicates the distance is from a point on the surface of the triangle.
 
     Type 1 to 3 indicates the distance is from a point to a vertices.
@@ -33,7 +39,7 @@ def point_to_mesh_distance(pointclouds, face_vertices):
             pointclouds, of shape :math:`(\text{batch_size}, \text{num_points}, 3)`.
         face_vertices (torch.Tensor):
             vertices of each face of meshes,
-            of shape :math:`(\text{batch_size}, \text{num_faces}, 3, 3})`.
+            of shape :math:`(\text{batch_size}, \text{num_faces}, 3, 3)`.
 
     Returns:
         (torch.Tensor, torch.LongTensor, torch.IntTensor):
@@ -147,7 +153,7 @@ def _unbatched_naive_point_to_mesh_distance(points, face_vertices):
 
     Args:
         points (torch.Tensor): of shape (num_points, 3).
-        faces_vertices (torch.LongTensor): of shape (num_faces, 3, 3).
+        face_vertices (torch.LongTensor): of shape (num_faces, 3, 3).
 
     Returns:
         (torch.Tensor, torch.LongTensor, torch.IntTensor):

diff --git a/kaolin/ops/conversions/tetmesh.py b/kaolin/ops/conversions/tetmesh.py
@@ -121,8 +121,8 @@ def marching_tetrahedra(vertices, tets, sdf, return_tet_idx=False):
     Args:
         vertices (torch.tensor): batched vertices of tetrahedral meshes, of shape
                                  :math:`(\text{batch_size}, \text{num_vertices}, 3)`.
-        faces (torch.tensor): unbatched tetrahedral mesh topology, of shape
-                              :math:`(\text{num_tetrahedrons}, 4)`.
+        tets (torch.tensor): unbatched tetrahedral mesh topology, of shape
+                             :math:`(\text{num_tetrahedrons}, 4)`.
         sdf (torch.tensor): batched SDFs which specify the SDF value of each vertex, of shape
                             :math:`(\text{batch_size}, \text{num_vertices})`.
         return_tet_idx (optional, bool): if True, return index of tetrahedron

diff --git a/kaolin/ops/spc/spc.py b/kaolin/ops/spc/spc.py
@@ -268,7 +268,9 @@ def unbatched_query(octree, exsum, query_coords, level, with_parents=False):
                              to only a single level (default: False).
 
     Returns:
-        pidx (torch.LongTensor): The indices into the point hierarchy of shape :math:`(\text{num_query})`.
+        pidx (torch.LongTensor):
+
+            The indices into the point hierarchy of shape :math:`(\text{num_query})`.
             If with_parents is True, then the shape will be :math:`(\text{num_query, level+1})`.
 
     Examples:

diff --git a/setup.py b/setup.py
@@ -29,7 +29,7 @@
     )
 else:
     import torch
-    torch_ver = parse_version(torch.__version__)
+    torch_ver = parse_version(parse_version(torch.__version__).base_version)
     if (torch_ver < parse_version(TORCH_MIN_VER) or
         torch_ver > parse_version(TORCH_MAX_VER)):
         if IGNORE_TORCH_VER:
@@ -178,9 +178,12 @@ def write_version_file():
 
 def get_requirements():
     requirements = []
-    requirements.append('scipy>=1.2.0,<=1.7.2')
     requirements.append('Pillow>=8.0.0')
     requirements.append('tqdm>=4.51.0')
+    if sys.version_info < (3, 8):
+        requirements.append('scipy>=1.2.0,<=1.7.3')
+    else:
+        requirements.append('scipy>=1.2.0')
     if sys.version_info >= (3, 10):
         warnings.warn("usd-core is not compatible with python_version >= 3.10 "
                       "and won't be installed, please use supported python_version "