# 3D geometric feature analysis

This module will provide an overview of most common 3D geometric features, how they are built and what they look like in the case of canonical point clouds.

## Module imports

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import numpy as np
from scipy import linalg
import pandas as pd
from sklearn.decomposition import PCA

In [3]:
import plotly.graph_objects as go
import plotly.express as px

In [4]:
from geo3dfeatures.fixtures import line, plane, sphere, wall

In [5]:
pd.options.display.float_format = "{:.3f}".format

In [6]:
SEED = 1337
np.random.seed(SEED)

## Plotting utilities

A lot of plots will be drawn in the following cells. In order to improve their quality, we define some viewing parameters and functions.

First, the height and width of the Plotly graphs:

In [7]:
HEIGHT = 400
WIDTH = 425

Then the colors:

In [8]:
COLORS = px.colors.qualitative.Plotly
VECTOR_COLORS = [px.colors.qualitative.Bold[0], px.colors.qualitative.Bold[1], px.colors.qualitative.Bold[4]]

And third, a bunch of functions for controlling the plot layouts:

In [9]:
def default_layout():
    return dict(scene=dict(xaxis=dict(title="x"), yaxis=dict(title="y"), zaxis=dict(title="z")))

In [10]:
def scatter_line(data):
    return go.Scatter3d(
        x=data[:, 0], y=data[:, 1], z=data[:, 2], mode="markers", showlegend=False,
        marker=dict(size=1, color=COLORS[0], line=dict(width=0.75))
    )

In [11]:
def scatter_plane(data):
    return go.Scatter3d(
        x=data[:, 0], y=data[:, 1], z=data[:, 2], mode="markers", showlegend=False,
        marker=dict(size=1, color=COLORS[1], line=dict(width=0.75))
    )

In [12]:
def scatter_sphere(data):
    return go.Scatter3d(
        x=data[:, 0], y=data[:, 1], z=data[:, 2], mode="markers", showlegend=False,
        marker=dict(size=1, color=COLORS[2], opacity=0.8, line=dict(width=0.75))
    )

In [13]:
def update_line_layout(layout=None):
    if layout is None:
        layout = default_layout()
    layout["scene"]["xaxis"]["range"] = [0.9, 2.1]
    layout["scene"]["yaxis"]["range"] = [-0.3, 0.3]
    layout["scene"]["zaxis"]["range"] = [-0.3, 0.3]
    return layout

In [14]:
def update_plane_layout(layout=None):
    if layout is None:
        layout = default_layout()
    layout["scene"]["zaxis"]["range"] = [4.7, 5.3]
    return layout

In [15]:
def update_sphere_layout(layout=None):
    if layout is None:
        layout = default_layout()
    layout["scene"]["xaxis"]["range"] = [-1.2, 1.2]
    layout["scene"]["yaxis"]["range"] = [-1.2, 1.2]
    layout["scene"]["zaxis"]["range"] = [-1.2, 1.2]
    return layout

In [16]:
def update_wall_layout(layout=None):
    if layout is None:
        layout = default_layout()
    return layout

In [17]:
def simple_scatter(data, name):
    return go.Scatter3d(
        x=data[:, 0], y=data[:, 1], z=data[:, 2], name=name, mode="markers",
        marker=dict(size=1, line=dict(width=0.75))
    )

In [18]:
def pca_components(pca, epsilon_z=0.05, normalized=True):
    names = ["1st", "2nd", "3rd"]
    if normalized:
        origin = pca.mean_
        # dest = origin + pca.components_ * 3 * np.sqrt(pca.explained_variance_)
        dest = origin + pca.components_
    else:
        origin = np.zeros(3)
        dest = origin + pca.components_
    return [go.Scatter3d(
        x=[origin[0], to[0]],
        y=[origin[1], to[1]],
        z=[origin[2] + epsilon_z, to[2] + epsilon_z],
        name=names[idx],
        marker=dict(size=1, color=VECTOR_COLORS[idx]),
        line=dict(width=6, color=VECTOR_COLORS[idx])
    ) for idx, to in enumerate(dest)]

## Line dataset (~ 1D)

The first canonical 3D point cloud is built as a 1D-line.

In [19]:
scene_line = line(1000)
fig_line = go.Figure(data=[scatter_line(scene_line)])

In [20]:
layout = update_line_layout()
layout["width"] = WIDTH
layout["height"] = HEIGHT
fig_line.update_layout(layout)

## Plane dataset (~ 2D)

Then we build a 2D-plane, that presents a high variability in `x` and `y` coordinates, as well as a roughly constant `z`.

In [21]:
scene_plan = plane(1000)

In [22]:
fig_plan = go.Figure(data=[scatter_plane(scene_plan)])

In [23]:
layout = update_plane_layout()
layout["width"] = WIDTH
layout["height"] = HEIGHT
fig_plan.update_layout(layout)

## Sphere dataset (~ 3D)

As another example comes the sphere, which is basically defined thanks to the three main coordinates features.

In [24]:
scene_sphere = sphere(1000)

In [25]:
fig_sphere = go.Figure(data=[scatter_sphere(scene_sphere)])

In [26]:
layout = update_sphere_layout()
layout["width"] = WIDTH
layout["height"] = HEIGHT
fig_sphere.update_layout(layout)

## Wall dataset (~ 2D)

Another common structure that may be encountered in 3D point cloud analysis is the wall, *i.e.* vertical 2D structures.

In [27]:
scene_wall = wall(1000)

In [28]:
fig_wall = go.Figure(data=[scatter_plane(scene_wall)])

In [29]:
layout = update_wall_layout()
layout["width"] = WIDTH
layout["height"] = HEIGHT
fig_wall.update_layout(layout)

## Principle Component Analysis (PCA)

Basically, the PCA is used to reduce the dataset dimension when it is too large for analysis. By projecting the dataset on another space, it mitigates colinearity phenomenon.

However, we can also use it on the whole 3D dataset in order to study its geometric structure : the $\lambda_1, \lambda_2, \lambda_3$ eigenvalues provide a decisive insight on how is spatially organized the `(x, y, z)` point cloud.

When computing geometric features, one focuses on point local neighborhoods. For the sake of the demonstration, we will run it on the previously defined datasets.

On the line, the first singular value is high, contrary to the two other ones:

In [30]:
line_pca = PCA().fit(scene_line)
line_pca.singular_values_

array([9.17223376, 0.06282132, 0.06063553])

We can easily compute the eigenvalues by considering the square of singular values; their normalized version being an important concept as well.

In [31]:
line_eigenvalues = line_pca.singular_values_ ** 2
line_eigenvalues /= line_eigenvalues.sum()

For the plane dataset, the two first singular values are high, and the third one is close to 0.

In [32]:
plan_pca = PCA().fit(scene_plan)
print(plan_pca.singular_values_)
plan_eigenvalues = plan_pca.singular_values_ ** 2
plan_eigenvalues /= plan_eigenvalues.sum()

[18.63671806 18.21997216  0.06289239]


In the sphere case, the three singular values are equivalent, that meaning that the dataset is defined over three axes.

In [33]:
sphere_pca = PCA().fit(scene_sphere)
print(sphere_pca.singular_values_)
sphere_eigenvalues = sphere_pca.singular_values_ ** 2
sphere_eigenvalues /= sphere_eigenvalues.sum()

[14.50594556 14.41261552 13.81446698]


The wall dataset looks like the plane dataset, in the sense that it is a pure-2D dataset.

In [34]:
wall_pca = PCA().fit(scene_wall)
print(wall_pca.singular_values_)
wall_eigenvalues = wall_pca.singular_values_ ** 2
wall_eigenvalues /= wall_eigenvalues.sum()

[9.13075833e+01 1.03305728e+01 3.83916873e-14]


As a summary, we can draw the normalized eigenvalues to confirm the previous insights.

In [35]:
eigenvalues = pd.DataFrame({"line": line_eigenvalues,
                           "plane": plan_eigenvalues,
                           "sphere": sphere_eigenvalues,
                           "wall": wall_eigenvalues},
                           index=["$\lambda_1$", "$\lambda_2$", "$\lambda_3$"])
eigenvalues

Unnamed: 0,line,plane,sphere,wall
$\lambda_1$,1.0,0.511,0.346,0.987
$\lambda_2$,0.0,0.489,0.341,0.013
$\lambda_3$,0.0,0.0,0.313,0.0


In every cases, we get:

$$
\lambda_1 \gt \lambda_2 \gt \lambda_3
$$

With respect to the eigenvalue orders of magnitude, one can deduce a capital information on the neighborhood geometric structure.

* In the `line` case, we get $\lambda_1 \gt \gt (\lambda_2, \lambda_3)$, the dataset is defined over one single axe => **1D**
* For the `plane`, we get $\lambda_1 \sim \lambda_2 \gt \gt \lambda_3$, two axes are enough to describe the structure => **2D**
* Regarding the `sphere`, we get $\lambda_1 \sim \lambda_2 \sim \lambda_3$, the three axes are needed => **3D**
* Finally in the `wall` case, we get $\lambda_1 \gt \lambda_2 \gt \lambda_3$, with $\lambda_3 = 0$, the third axe looks useless => **2D** (the question may be asked on the second dimension, however the wall definition domain is kind of restrictive, with a much more high order of magnitude for `z` values, hence introducing the eigenvalue imbalance)

### Eigenvector plotting

#### Line / 1D

In [36]:
fig_line = go.Figure(
    data=[scatter_line(scene_line)] + pca_components(line_pca))

In [37]:
layout = update_line_layout()
layout["width"] = WIDTH
layout["height"] = HEIGHT
fig_line.update_layout(layout)

On this plot, we can see that the first eigenvector is oriented like the point cloud and fully summarizes it. The two other eigenvectors are orthogonal to the dataset, their values is negligible.

#### Plane / 2D

In [38]:
fig_plan = go.Figure(
    data=[scatter_plane(scene_plan)] + pca_components(plan_pca))

In [39]:
layout = update_plane_layout()
layout["width"] = WIDTH
layout["height"] = HEIGHT
fig_plan.update_layout(layout)

In the `plane` case, the two first eigenvalues are high, and the corresponding eigenvectors define the 2D-plane. The third eigenvectors is orthogonal to the plane, its norm is too small to have any impact.

#### Sphere / 3D

In [40]:
fig_sphere = go.Figure(
    data=[scatter_sphere(scene_sphere)] + pca_components(sphere_pca))

In [41]:
layout = update_sphere_layout()
layout["width"] = WIDTH
layout["height"] = HEIGHT
fig_sphere.update_layout(layout)

This third plot shows that the three eigenvectors define the sphere in the 3D space, by setting three orthogonal directions (as a reminder, the PCA is a dataset reprojection that removes colinearity).

#### Wall / 2D

In [42]:
fig_sphere = go.Figure(
    data=[scatter_plane(scene_wall)] + pca_components(wall_pca))

In [43]:
layout = update_wall_layout()
layout["width"] = WIDTH
layout["height"] = HEIGHT
fig_sphere.update_layout(layout)

The plot confirms the conclusion about the eigenvalues: the wall plane is defined by two eigenvectors, however the norms looks largely imbalanced due to `z` order of magnitude. The third vector is orthogonal to the wall plane, hence do not contribute to the dataset description.

### Verticality coefficient

This geometric feature is defined starting from the singular value matrix:

$$
c_z = 1 - abs(V_{3, z})
$$

This is related to the third eigenvector:

In [44]:
print(line_pca.components_[2])
print(plan_pca.components_[2])
print(sphere_pca.components_[2])
print(wall_pca.components_[2])

[-2.72909777e-04 -2.48690889e-01  9.68582865e-01]
[-8.36426556e-05  2.34332129e-04 -9.99999969e-01]
[0.77320614 0.50505461 0.38349983]
[-4.47213595e-01  8.94427191e-01 -2.08166817e-17]


We consider the last value of the eigenvector: the closest to 0 this feature is, the closest to 1 the verticality coefficient is, and then, the more "vertical" the point cloud is.

In [45]:
print("Line verticality coefficient:", 1 - abs(line_pca.components_[2, 2]))
print("Plane verticality coefficient:", 1 - abs(plan_pca.components_[2, 2]))
print("Sphere verticality coefficient:", 1 - abs(sphere_pca.components_[2, 2]))
print("Wall verticality coefficient:", 1 - abs(wall_pca.components_[2, 2]))

Line verticality coefficient: 0.03141713455663353
Plane verticality coefficient: 3.095382061779617e-08
Sphere verticality coefficient: 0.6165001749905942
Wall verticality coefficient: 1.0


In a nutshell, we can see that the line and the sphere are not vertically defined, on the contrary to the wall. The sphere has a strong vertical component, however it is not really a "vertical" point cloud.

Let's have a look on the PCA components (*i.e.* the transformation eigenvectors). In the plane case, the first two eigenvectors have a zero `z` component, whilst the third eigenvectors is purely vertical. Say with other words, the first two components are enough to define the point cloud on a horizontal plane, then the third components is orthogonal (then vertical).

In [46]:
pd.DataFrame(plan_pca.components_.T, columns=["P1", "P2", "P3"])

Unnamed: 0,P1,P2,P3
0,-0.85,-0.527,-0.0
1,0.527,-0.85,0.0
2,0.0,-0.0,-1.0


In [47]:
fig_p = go.Figure(
    data=[simple_scatter(scene_plan, "plane")] + pca_components(plan_pca, 0.01, True))
fig_p

At the opposite case, we can illustrate the wall dataset, where the first component is purely vertical, whilst the two last components are horizontal (orthogonal to the wall plane). In particular, the third component is horizontal (zero `z` values); hence the verticality coefficient will reach 1 for this dataset.

In [48]:
pd.DataFrame(wall_pca.components_.T, columns=["P1", "P2", "P3"])

Unnamed: 0,P1,P2,P3
0,-0.0,0.894,-0.447
1,-0.0,0.447,0.894
2,-1.0,-0.0,-0.0


In [49]:
fig_wall = go.Figure(
    data=[simple_scatter(scene_wall, "wall")] + pca_components(wall_pca, 0.01, True))
fig_wall

## Singular value decomposition (SVD)

**Reminder:** SVD (Singular Value Decomposition) means decomposing any matrix in three terms:

$$
X = U \Sigma V^T
$$

where $\Sigma$ is the singular value matrix, and $V$ is the PCA component vector.

In [50]:
U_plan, Sigma_plan, V_plan = linalg.svd(scene_plan - scene_plan.mean())

In [51]:
scene_plan.shape, U_plan.shape, Sigma_plan.shape, V_plan.shape

((1000, 3), (1000, 1000), (3,), (3, 3))

The SVD relies on three main data transformations:
* $U$ generates a rotation;
* $\sigma$ applies a horizontal and vertical scaling;
* $V$ applies a second rotation.

If we focus on the first two steps, we can follow the data intermediary transformation:

In [52]:
plan_geom_1 = U_plan @ linalg.diagsvd(Sigma_plan, scene_plan.shape[0], 3)
plan_geom_1.shape

(1000, 3)

The final move is brought by the last rotation:

In [53]:
plan_geom_2 = plan_geom_1 @ V_plan.T
plan_geom_2.shape

(1000, 3)

Plotting this transformation may help to the understanding...

In [54]:
plan_scatter = [simple_scatter(scene_plan, "raw"),
                simple_scatter(plan_geom_1, "first"),
                simple_scatter(plan_geom_2, "second")]

In [55]:
go.Figure(data=plan_scatter)

On the previous plot, we have the raw point cloud in blue; the first transformation (rotation+scaling) is proposed in red; and finally the last transformation (a simple rotation) gives the green point cloud.

The same process can be done on each dataset. For the sake of illustration, let us consider the wall dataset:

In [56]:
U_wall, Sigma_wall, V_wall = linalg.svd(scene_wall - scene_wall.mean())
wall_geom_1 = U_wall @ linalg.diagsvd(Sigma_wall, scene_wall.shape[0], 3)
wall_geom_2 = wall_geom_1 @ V_wall.T

In [57]:
wall_scatter = [simple_scatter(scene_wall, "raw"),
                simple_scatter(wall_geom_1, "first"),
                simple_scatter(wall_geom_2, "second")]

In [58]:
go.Figure(data=wall_scatter)

The same scheme as before is visible: the blue point cloud is rotated and scaled to give the red one, then it is rotated again to give the green point cloud.