# Projection Theory
*Arthur Ryman, lasted updated 2025-09-12*

## Introduction

The goal of this notebook is to present the theory of 
perspective and orthographic projections.

My initial attempt at a write-up is available as a Markdown document at:
[projection.md](https://github.com/agryman/instant-insanity/blob/main/src/instant_insanity/core/projection.md).
However, GitHub no longer renders math in Markdown previews.
Furthermore, I want to execute SymPy code to verify the math.
Hence the switch to this Jupyter notebook.

### Background

Our goal is to draw 3d objects, such as cubes, in math animations using Manim.
At first I attempted to use the ThreeDScene class and OpenGL renderer.
I assumed that the OpenGL renderer would draw 3d objects correctly through the use of
a z-buffer.

Unfortunately, I ran into othed problems with OpenGL which I reported to the 
Manim Community Edition project.
Work is in-progress there to produce a high-quality OpenGL renderer but it does not seem ready
yet.

I therefore reverted to using the Scene class and Cairo renderer
As I workaround, I developed code that projects 3d objects onto 2d space and depth-sorts them.

Most of my planned content is 2d, so using the Scene class and Cairo
in conjunction with projections and depth-sorting is an acceptable short-term workaround.
Furthermore, the Cairo renderer is very stable and the vast majority of the Manim examples use it.
Cairo is therefore a safer bet until the OpenGL renderer is ready.

### Depth-Sorting

As stated above, our goal is to draw 3d objects as 2d objects in a way that conveys
the appearance of depth.
For example, a 3d cube consists of six square faces. 
We project each square face from 3d to 2d.
In general, each projected face will no longer be a square.
Instead, the projection will transform each face into
some quadrilateral whose interior angles are no longer right angles and 
whose sides are no longer equal. 
The projection distorts the square in a way that is consistent our depth perception.

In addition to projecting the faces of the cube, we need to draw them on $C$ 
in the correct order so as to achieve the desired visual appearance.
If face A is behind face B in $M$ then we need to draw the 2d projection
of face A on $C$ before we draw that of face B. 
Part of the projection of face B may cover part of the projection of face A, 
reproducing the actual visual appearance.
This drawing procedure is known as the 
[Painter's Algorithm](https://en.wikipedia.org/wiki/Painter%27s_algorithm).

For simplicity, we will assume that our 3d objects can be modelled as 
collections of opaque, convex, planar polygons and 
that we can always sort them into some drawing order that will produce the correct visual appearance.

The ability to sort the polygons is a strong assumption.
It is possible to arrange as few as three nonintersecting, convex, 
planar polygons in a way that has no corresponding correct drawing order.

Given the known requirements for our current animation project, 
all collections of 3d objects will be simple enough that a correct
drawing order always exists.

If we actually needed to draw some collection of polygons that had no correct drawing order,
then we would have to split some of the polygons.
If we split the polygons enough then a correct drawing order always exists.
In the extreme case, we could split each polygon into individual pixels.
We'll defer dealing with this situation until project requirements force us to do so.

# 3D Geometry

This section reviews some standard definitions and notation.

### Model Space
We are going to draw geometrical objects, such as cubes,
in model space which is a real 3-dimensional vector space.
Let $M$ denote model space:

$$
M \cong \mathbb{R}^3
$$

### The Origin
Let $O$ denote the zero vector of $M$, aka the origin.

$$
O = (0, 0, 0)
$$

### The Standard Unit Basis Vectors

Let $\hat{\imath}$, $\hat{\jmath}$, and $\hat{k}$ denote the standard unit basis vectors
for $M$.

$$
\begin{gather}
\hat{\imath} = (1, 0, 0) \\
\hat{\jmath} = (0, 1, 0) \\
\hat{k} = (0, 0, 1)
\end{gather}
$$

### Nonzero Vectors
Let $M^\times$ denote $M$ with the origin removed, aka the set of nonzero vectors.

$$
M^\times = M \setminus \{O\}
$$

### Lines
The projection will be defined in terms of lines and planes in $M$.

A line in $M$ is the 1-dimensional
set of points defined by some base point $b$ and some nonzero direction vector
$d$ as follows.

Let $\mbox{line}(b,d)$ denote the line defined by $b$ and $d$.

$$
\begin{gather}
b \in M \\
d \in M^\times \\
\mbox{line}(b,d) = \{~ t: \mathbb{R} \bullet b + t d ~\}
\end{gather}
$$

### Planes
A plane in $M$ is a set of points defined by some base point $b$ and a pair of linearly
independent direction vectors $(d_1, d_2)$ as follows.

Recall that a pair of vectors in 3d space is linearly independent if and only if 
their cross product is nonzero.

Let $\mbox{plane}(b, d_1, d_2)$ denote the plane defined by $b$, $d_1$, and $d_2$.

$$
\begin{gather}
b, d_1, d_2 \in M \\
d_1 \times d_2 \in M^\times \\
\mbox{plane}(b, d_1, d_2) = \{~ u, v : \mathbb{R} \bullet b + u d_1 + v d_2 ~\}
\end{gather}
$$

## Perspective Projections

Consider the following diagram of points, lines, and planes in $M$.
The points are black dots, the lines are red, and the planes are blue.
Here the $z$-axis is vertical, the $x$-axis is horizontal, and we have suppressed the $y$-axis.

It shows the geometric setup for a perspective projection $f$ which we'll define below.

<div style="text-align: center;">
    <img src="images/diagrams/perspective-projection-diagram.png" alt="Perspective Projection Diagram" style="width: 100%;"/>
</div>

### The Camera Plane

We are going to project $M$ onto a plane,
which we refer to as the camera plane $C$, that represents the display screen


$$
\begin{gather}
C \subset M \\
C \cong \mathbb{R}^2
\end{gather}
$$

Without loss of generality, we only consider camera planes that are perpendicular to the $z$-axis.
Let $c$ be a real number that defines the point where $C$ intersects the $z$-axis.

$$
\begin{gather}
c \in \mathbb{R} \\
C = \mbox{plane}(c\hat{k}, \hat{\imath}, \hat{\jmath})
\end{gather}
$$

### The Viewpoint

The projection $f$ is defined by specifying both a camera plane $C$, and
a viewpoint $v$ not contained in it.

$$
v \in M \setminus C
$$

### The Projection

Let $m$ be a point in $M$. 

$$
m \in M
$$

Let $L$ be the line from $m$ to $v$.

$$
L = \mbox{line}(m,v)
$$

The line $L$ is well-defined whenever $m$ and $v$ are distinct
points.

$$
m \neq v
$$

The projection $p = f(m)$ of $m$ onto $C$ is defined to be the point
where $L$ intersects $C$.

$$
\{~p~\} = L \cap C
$$

Define $V$ to be the plane through $v$ parallel to $C$.

$$
V = \mbox{plane}(v, \hat{\imath}, \hat{\jmath})
$$

The plane $C$, and any plane parallel to it,
appears as a horizontal line in the diagram.
For example, the horizontal lines $D$ and $E$ represent planes parallel to $C$.

The point $p$ is well-defined only if the intersection of $L$ and $C$ contains exactly one point.
This condition is violated when $L$ 
lies in $V$ because any such a line never intersects $C$.

Clearly, if $m$ lines in $V$ then $L$ lies in $V$.

$$
m \in V \implies L \subset V
$$

Therefore, $p$ is well-defined when $m$ does not lie $V$.

$$
m \notin V \implies \exists_1 p \in C \bullet L \cap C = \{~p~\}
$$

In summary, we must exclude $V$ from the domain of definition of $f$.

$$
f: M \setminus V \rightarrow C
$$

## SymPy

We'll use SymPy to represent the math.

SymPy provides two modules that are relevant to this task:
* The [Matrix](https://docs.sympy.org/latest/reference/public/matrices/index.html)
module performs vector space calculations such as vector addition, scalar multiplication,
and matrix multiplication.
* The [Geometry](https://docs.sympy.org/latest/modules/geometry/index.html)
module performs euclidean geometry calculations involving points, lines, and planes.

We'll use both.

### Scalars and Vectors

Model space $M$ is a 3-dimensional real vector space, namely a copy of $\mathbb{R}^3$.

Define the types Scalar to be a real number and Vector to be a triple of scalars.
SymPy represents vectors as matrices that have a single column, i.e. as column vectors.

Define some convenience functions to create SymPy objects that represent
scalars, positive scalars, and vectors.
We'll use the subscripts 1, 2, and 3 to label the components of vectors since the SymPy
Symbols class supports that directly.

In [1]:
from sympy import *
from instant_insanity.core.symbolic_projection import Scalar, Vector

def scalar(name: str) -> Scalar:
    return symbols(name, real=True)

def positive_scalar(name: str) -> Scalar:
    return symbols(name, real=True, positive=True)

def vector(name: str) -> Vector:
    return Matrix(symbols(name + '1:4', real=True))

###  Orientation of Model Space Coordinate Axes

Let $x, y, z$ be the usual Cartesian coordinates on $M$.

We will use the default Manim orientation of $M$ relative to the display screen, 
namely:
* $x$ increases from left (LEFT) to right (RIGHT),
* $y$ increases from bottom (DOWN) to top (UP), and 
* $z$ increases from in back (IN) to front (OUT).

### Representing Arbitrary Points in Model Space

Let $m$ represent an arbitrary point in $M$.

In [2]:
m = vector('m')

m

Matrix([
[m1],
[m2],
[m3]])

The point $m$ is a column vector.
To save vertical space on output, take the transpose.

In [3]:
m.T

Matrix([[m1, m2, m3]])

The Matrix class is an abtract base class for several other concrete types of matrix.

In [4]:
isinstance(m, Matrix), type(m)

(True, sympy.matrices.dense.MutableDenseMatrix)

Unpack $m$ into variables that reference its components.

In [5]:
m1, m2, m3 = m

m3

m3

In [6]:
type(m3)

sympy.core.symbol.Symbol

Confirm that $m_3$ is real.

In [7]:
ask(Q.real(m3))

True

### The Viewpoint

The viewpoint $v$ is an arbitrary point in $M$.

In [8]:
v = vector('v')

v.T

Matrix([[v1, v2, v3]])

### The Camera Plane

The camera plane $C$ is perpendicular to the $z$-axis. Let $c$ be its $z$-intercept.

In [9]:
c = scalar('c')

c

c

In [10]:
C = Plane(Point3D(0,0,c), normal_vector=(0,0,1))

C

Plane(Point3D(0, 0, c), (0, 0, 1))

### The Line from the Model Point to the Viewpoint

Let $L$ be the line that passes through $m$ and $v$.

The most appropriate way to represent $L$ is as an instance of Line3D.
The initializer for Line3D takes two Point3D instances.
We have already defined $m$ and $p$ as column vectors.
Can we create a Point3D from a column vector?

In [11]:
Point3D(m)

Point3D(m1, m2, m3)

Success! The SymPy Geometry module appears to play nicely with the Matrix module.

The problem we are trying to solve is to compute $p$ from $m$. We therefore need to 
define $L$ as the line from $m$ to $v$.
Try using the column vectors as arguments to Line3D and see if they get accepted.

In [12]:
L = Line3D(m, v)

L

Line3D(Point3D(m1, m2, m3), Point3D(v1, v2, v3))

This works. Next, compute the projection of $m$.

### The Projection

Let $f(m) = p$ be the projection of $m$.

In [13]:
p = vector('p')

p.T

Matrix([[p1, p2, p3]])

The point $p$ is the intersection of $L$ with $C$.

In [14]:
L.intersection(C)

[Point3D(m1 - (-c + m3)*(m1 - v1)/(m3 - v3), m2 - (-c + m3)*(m2 - v2)/(m3 - v3), c)]

In general, the intersection of linear entities is a list.
Grab the first element of the list.
This will be the RHS of an equation defining $p$.

In [15]:
rhs = L.intersection(C)[0]

rhs

Point3D(m1 - (-c + m3)*(m1 - v1)/(m3 - v3), m2 - (-c + m3)*(m2 - v2)/(m3 - v3), c)

Create an equation that equates the point $p$ to the RHS.

In [16]:
def_p = [Eq(p_i, rhs_i) for p_i, rhs_i in zip(p, rhs)]
def_p

[Eq(p1, m1 - (-c + m3)*(m1 - v1)/(m3 - v3)),
 Eq(p2, m2 - (-c + m3)*(m2 - v2)/(m3 - v3)),
 Eq(p3, c)]

Now solve the equation to associate the expressions with $p$.

In [17]:
solve(def_p, *p)

{p1: (-c*m1 + c*v1 + m1*v3 - m3*v1)/(-m3 + v3),
 p2: (-c*m2 + c*v2 + m2*v3 - m3*v2)/(-m3 + v3),
 p3: c}

## Depth-Sorting

Let $m$ and $n$ be distinct points in $M$ that project to the same
point $p$ in $C$:

$$
f(m) = f(n)
$$

We say that $m$ and $n$ are *collinear* with respect to $f$.

Suppose that $n$ is behind $m$ in the sense that $m$ blocks our view of $n$.
Denote this as:

$$ 
n \prec m
$$

In terms of coordinates, this condition is:

$$
n_3 < m_3
$$

Now consider two 3d polygons, $A$ and $B$.
We assume that $A$ and $B$ are nonintersecting, convex, planar polygons in $M$.

$$
A \cap B = \emptyset
$$

Our task is to decide on a drawing order.
Project each polygon onto $C$.
Let the projected 2d polygons be $A'$ and $B'$.
Compute their 2D intersection $I$.

$$
I = A' \cap B'
$$

If $I$ is empty then it doesn't matter in which order we draw $A$ and $B$.
However, if $I$ is not empty then the order does matter.

Assume $I$ is nonempty.
Let $p$ be an any point in the intersection.

$$
p \in I
$$

Define $L$ to be the line from $p$ to $v$.

$$
L = \mbox{line}(p, v)
$$

Find the points $a$ in $A$ and $b$ in $B$ that project to $p$.
These are the intersections of $L$ with $A$ and $B$.

$$
\begin{gather}
a = A \cap L \\
b = B \cap L
\end{gather}
$$

Now compare the $z$-components of $a$ and $b$.
The $z$-components cannot be equal to each other because $A$ and $B$ do not intersect.

$$
\begin{gather}
a_3 < b_3 \implies A \prec B \\
b_3 < a_3 \implies B \prec A
\end{gather}
$$

This establishes the relative drawing order of $A$ and $B$.

Perform this calculation for all pairs of polygons in the scene
and then compute a topological sort on the $\prec$ relation.
This gives us a global drawing order for the polygons.

## Orthographic Projection

An orthographic projection is a limiting case of a perspective projection as
the viewpoint $v$ moves off to infinity in a fixed direction.
Let $\hat{u}$ be a unit vector that defines the direction that the viewpoint moves in.
Let $\mu$ be a real parameter, let $v_0$ be an initial viewpoint,
and define the viewpoint $v(\hat{u};\mu)$ as follows:
$$
v(\hat{u};\mu) = v_0 + \mu \hat{u}
$$

Define $\hat{u}(m; \mu)$ to be the unit vector that points from $m$ to $v(\mu)$.
$$
\hat{u}(m; \mu) = \frac{v(\hat{u};\mu) - m}{\lVert v(\hat{u};\mu) - m \rVert}
$$

Clearly, as $\mu$ becomes very large, 
$v(\hat{u};\mu)$ approaches $\mu \hat{v}$.
$$
\begin{align}
\lim_{\mu \to \infty} \hat{u}(m; \mu) 
&= \lim_{\mu \to \infty} \frac{v(\hat{u};\mu) - m}{\lVert v(\hat{u};\mu) - m \rVert} \\
&= \lim_{\mu \to \infty} \frac{v_0 + \mu \hat{u} - m}{\lVert v_0 + \mu \hat{v} - m \rVert} \\
&= \lim_{\mu \to \infty} \frac{\mu \hat{u}}{\lVert \mu \hat{u} \rVert} \\
&= \lim_{\mu \to \infty} \frac{\mu \hat{u}}{\mu \lVert \hat{u} \rVert} \\
&= \hat{u}
\end{align}
$$

Therefore, an orthographic projection is like a perspective projection except that
rather than compute the unit vector $\hat{u}(v;m)$ we use the given constant unit vector $\hat{u}$.

## Unifying Perspective and Orthographic Projections

In perspective projections we defined $L$ to be the line from $m$ to $v$.
However, there is no $v$ for orthographic projections, just a unit vector $\hat{u}$.
To unify the treatment, redefine $L$ to be the line from $m$ in the direction
$\hat{u}$.

For orthographic projections, $\hat{u}$ is a given parameter.

For perspective projections, compute $\hat{u}$ from $v$ as follows:

$$
\hat{u} = \frac{v - m}{\lVert v - m \rVert}
$$

The mapping from model space to scene space is as follows. 

## Translations and Scalings

Above we have discussed the project of $M$ onto $C$.
However, it is useful to introduce another layer of mapping from $C$ to scene space $S$.
Scene space is the space in which Manin draws objects.

Translations allow us to shift the origins of model space and scene space relative to each other.

Scaling allows us to change their relative units of measure.
For example, our standard puzzle cube has side length 2.0 in model space, but might fit better with side length 1.0 in scene space.
In this case we would use a scale factor of 0.5.

Our framework should support this additional layer of mapping defined by a translate vector followed
by a uniform scaling to transform $C$ to $S$.

### TODO: Refactor the current projection classes to eliminate translation and scaling.

## Appendix: Spherical Polar Coordinates

We define an orthographic projection by giving the unit vector $u$ that specifies the direction of the projection.
Since we'll be doing exact symbolic mathematics, we'll define the unit vector in terms of spherical polar angles.
Any unit vector $u$ is parameterized by the polar angle $\theta$ and the azimuthal angle $\phi$.

Define an arbitrary unit vector $u$.

In [18]:
theta = scalar('theta')
phi = scalar('phi')

Matrix([theta, phi]).T

Matrix([[theta, phi]])

In [19]:
u_x = sin(theta) * cos(phi)
u_y = sin(theta) * sin(phi)
u_z = cos(theta)
u = Matrix([u_x, u_y, u_z])

u.T

Matrix([[sin(theta)*cos(phi), sin(phi)*sin(theta), cos(theta)]])

In [20]:
u.norm()

sqrt(sin(phi)**2*sin(theta)**2 + sin(theta)**2*cos(phi)**2 + cos(theta)**2)

In [21]:
simplify(u.norm())

1