# CSE 252B: Computer Vision II, Winter 2019 – Assignment 2

### Instructor: Ben Ochoa

### Due: Wednesday, February 6, 2019, 11:59 PM

## Instructions
* Review the academic integrity and collaboration policies on the course website.
* This assignment must be completed individually.
* This assignment contains both math and programming problems.
* All solutions must be written in this notebook
* Math problems must be done in Markdown/LATEX. Remember to show work and describe your solution.
* Programming aspects of this assignment must be completed using Python in this notebook.
* Your code should be well written with sufficient comments to understand, but there is no need to write extra markdown to describe your solution if it is not explictly asked for.
* This notebook contains skeleton code, which should not be modified (This is important for standardization to facilate effeciant grading).
* You may use python packages for basic linear algebra, but you may not use packages that directly solve the problem. Ask the instructor if in doubt.
* You must submit this notebook exported as a pdf. You must also submit this notebook as an .ipynb file.
* Your code and results should remain inline in the pdf (Do not move your code to an appendix).
* You must submit both files (.pdf and .ipynb) on Gradescope. You must mark each problem on Gradescope in the pdf.
* It is highly recommended that you begin working on this assignment early.

## Problem 1 (Math): Line-plane intersection (5 points)
  The line in 3D defined by the join of the points $\boldsymbol{X}_1 = (X_1,
  Y_1, Z_1, T_1)^\top$ and $\boldsymbol{X}_2 = (X_2, Y_2, Z_2, T_2)^\top$
  can be represented as a Plucker matrix $\boldsymbol{L} = \boldsymbol{X}_1
  \boldsymbol{X}_2^\top - \boldsymbol{X}_2 \boldsymbol{X}_1^\top$ or pencil of points
  $\boldsymbol{X}(\lambda) = \lambda \boldsymbol{X}_1 + (1 - \lambda) \boldsymbol{X}_2$
  (i.e., $\boldsymbol{X}$ is a function of $\lambda$).  The line intersects
  the plane $\boldsymbol{\pi} = (a, b, c, d)^\top$ at the point
  $\boldsymbol{X}_{\boldsymbol{L}} = \boldsymbol{L} \boldsymbol{\pi}$ or
  $\boldsymbol{X}(\lambda_{\boldsymbol{\pi}})$, where $\lambda_{\boldsymbol{\pi}}$ is
  determined such that $\boldsymbol{X}(\lambda_{\boldsymbol{\pi}})^\top \boldsymbol{\pi} =
  0$ (i.e., $\boldsymbol{X}(\lambda_{\boldsymbol{\pi}})$ is the point on
  $\boldsymbol{\pi}$).  Show that $\boldsymbol{X}_{\boldsymbol{L}}$ is equal to
  $\boldsymbol{X}(\lambda_{\boldsymbol{\pi}})$ up to scale.

Your solution here




## Problem 2 (Math): Line-quadric intersection (5 points)
  In general, a line in 3D intersects a quadric $\boldsymbol{Q}$ at zero, one
  (if the line is tangent to the quadric), or two points.  If the
  pencil of points $\boldsymbol{X}(\lambda) = \lambda \boldsymbol{X}_1 + (1 -
  \lambda) \boldsymbol{X}_2$ represents a line in 3D, the (up to two) real
  roots of the quadratic polynomial $c_2 \lambda_{\boldsymbol{Q}}^2 + c_1
  \lambda_{\boldsymbol{Q}} + c_0 = 0$ are used to solve for the intersection
  point(s) $\boldsymbol{X}(\lambda_{\boldsymbol{Q}})$.  Show that $c_2 =
  \boldsymbol{X}_1^\top \boldsymbol{Q} \boldsymbol{X}_1 - 2 \boldsymbol{X}_1^\top \boldsymbol{Q}
  \boldsymbol{X}_2 + \boldsymbol{X}_2^\top \boldsymbol{Q} \boldsymbol{X}_2$, $c_1 = 2 (
  \boldsymbol{X}_1^\top \boldsymbol{Q} \boldsymbol{X}_2 - \boldsymbol{X}_2^\top \boldsymbol{Q}
  \boldsymbol{X}_2 )$, and $c_0 = \boldsymbol{X}_2^\top \boldsymbol{Q} \boldsymbol{X}_2$.

Your solution here




## Problem 3 (Programming):  Linear Estimation of the Camera Projection Matrix (15 points)
  Download input data from the course website.  The file
  hw2_points3D.txt contains the coordinates of 50 scene points
  in 3D (each line of the file gives the $\tilde{X}_i$, $\tilde{Y}_i$,
  and $\tilde{Z}_i$ inhomogeneous coordinates of a point).  The file
  hw2_points2D.txt contains the coordinates of the 50
  corresponding image points in 2D (each line of the file gives the
  $\tilde{x}_i$ and $\tilde{y}_i$ inhomogeneous coordinates of a
  point).  The scene points have been randomly generated and projected
  to image points under a camera projection matrix (i.e., $\boldsymbol{x}_i
  = \boldsymbol{P} \boldsymbol{X}_i$), then noise has been added to the image point
  coordinates.

  Estimate the camera projection matrix $\boldsymbol{P}_\text{DLT}$ using the
  direct linear transformation (DLT) algorithm (with data
  normalization).  You must express $\boldsymbol{x}_i = \boldsymbol{P} \boldsymbol{X}_i$
  as $[\boldsymbol{x}_i]^\perp \boldsymbol{P} \boldsymbol{X}_i = \boldsymbol{0}$ (not
  $\boldsymbol{x}_i \times \boldsymbol{P} \boldsymbol{X}_i = \boldsymbol{0}$), where
  $[\boldsymbol{x}_i]^\perp \boldsymbol{x}_i = \boldsymbol{0}$, when forming the
  solution. Return
  $\boldsymbol{P}_\text{DLT}$, scaled such that
  $||\boldsymbol{P}_\text{DLT}||_\text{Fro} = 1$
  
  The following helper functions may be useful in your DLT function implementation.
  You are welcome to add any additional helper functions.

In [1]:
import numpy as np
import time

def Homogenize(x):
    # converts points from inhomogeneous to homogeneous coordinates
    return np.vstack((x,np.ones((1,x.shape[1]))))


def Dehomogenize(x):
    # converts points from homogeneous to inhomogeneous coordinates
    return x[:-1]/x[-1]


def Normalize(pts):
    # data normalization of n dimensional pts
    #
    # Input:
    #    pts - is in inhomogeneous coordinates
    # Outputs:
    #    pts - data normalized points
    #    T - corresponding transformation matrix
    """your code here"""
    num = pts.shape[0] # 3 or 2
    var_sum = np.sum(np.var(pts, axis=1)) # sum(varx, vary, varz) or sum(varx, vary)
    mean = np.mean(pts, axis=1) # ux, uy, uz or ux, uy
    s = np.sqrt(num/var_sum)
    
    
    T = np.eye(pts.shape[0]+1)
    
    diag_s = np.eye(pts.shape[0]) * s
    T[0:num, 0:num] = diag_s
    T[0:-1, -1] = -mean * s
    
    pts_homog = Homogenize(pts) # make pts from inhomog to homog
    pts = T.dot(pts_homog) # normalize homog pts
    return pts, T

def ComputeCost(P, x, X):
    # Inputs:
    #    x - 2D inhomogeneous image points
    #    X - 3D inhomogeneous scene points
    #
    # Output:
    #    cost - Total reprojection error
    n = x.shape[1]
    covarx = np.eye(2*n)
    
    """your code here"""
    X_homog = Homogenize(X)
    
    x_est_homog = P.dot(X_homog)
    x_est_inhomog = Dehomogenize(x_est_homog)
    
    dist_sqr = np.sum((x - x_est_inhomog) ** 2, axis = 0)

#     dist = np.sqrt(dist_sqr)
    cost = np.sum(dist_sqr)
    
 
    return cost


In [2]:
def DLT(x, X, normalize=True):
    # Inputs:
    #    x - 2D inhomogeneous image points
    #    X - 3D inhomogeneous scene points
    #    normalize - if True, apply data normalization to x and X
    #
    # Output:
    #    P - the (3x4) DLT estimate of the camera projection matrix
    P = np.eye(3,4)+np.random.randn(3,4)/10
        
    # data normalization
    if normalize:
        x, T = Normalize(x)
        X, U = Normalize(X)
    else:
        x = Homogenize(x)
        X = Homogenize(X)
    
    """your code here"""
    
    x_norm = np.linalg.norm(x, axis=0) # 1* 50
    sign = np.sign(x[0,:]) # 50
    e1 = np.array([1, 0, 0])[:, None]
    
    v = x + sign * x_norm * e1 # 3 * 50
    
    I = np.eye(x.shape[0])
    
    
    A = np.zeros((2 * x.shape[1], x.shape[0] * X.shape[0])) # 100 * 12
    for i in range(v.shape[1]):
        v_i = v[:, i] [:, None] # 3*1
        v_i_T = v_i.T
        Hv_i = I - 2 * v_i.dot(v_i_T)/(v_i_T.dot(v_i))
        
        x_i_LeftNullSpace = Hv_i[1:, :] * 1 # 2* 3
        
        A[2*i : 2*i + 2 , :] = np.kron(x_i_LeftNullSpace, X[:, i].reshape(-1))
        
        
    _,S,VH = np.linalg.svd(A)
        
    P_vector = VH[-1, :]
        
    P = P_vector.reshape((3, 4)) # row wise
        
 
    # data denormalize
    if normalize:
        P = np.linalg.inv(T) @ P @ U
        
    Frob_norm = np.sqrt(np.sum(P * P))
    P = P / Frob_norm
        
    return P

def displayResults(P, x, X, title):
    print(title+' =')
    print (P/np.linalg.norm(P)*np.sign(P[-1,-1]))

# load the data
x=np.loadtxt('hw2_points2D.txt').T
X=np.loadtxt('hw2_points3D.txt').T


# compute the linear estimate without data normalization
print ('Running DLT without data normalization')
time_start=time.time()
P_DLT = DLT(x, X, normalize=False)
cost = ComputeCost(P_DLT, x, X)
time_total=time.time()-time_start
# display the results
print('took %f secs'%time_total)
print('Cost=%.9f'%cost)


# compute the linear estimate with data normalization
print ('Running DLT with data normalization')
time_start=time.time()
P_DLT = DLT(x, X, normalize=True)
cost = ComputeCost(P_DLT, x, X)
time_total=time.time()-time_start
# display the results
print('took %f secs'%time_total)
print('Cost=%.9f'%cost)

Running DLT without data normalization
took 0.010076 secs
Cost=97.053718839
Running DLT with data normalization
took 0.004611 secs
Cost=84.104680130


In [3]:
# Report your P_DLT value here!
displayResults(P_DLT, x, X, 'P_DLT')

P_DLT =
[[ 6.04350846e-03 -4.84282446e-03  8.82395315e-03  8.40441373e-01]
 [ 9.09666810e-03 -2.30374203e-03 -6.18060233e-03  5.41657305e-01]
 [ 5.00625470e-06  4.47558354e-06  2.55223773e-06  1.25160752e-03]]


## Problem 4 (Programming):  Nonlinear Estimation of the Camera Projection Matrix (30 points)
  Use $\boldsymbol{P}_\text{DLT}$ as an initial estimate to an iterative
  estimation method, specifically the Levenberg-Marquardt algorithm,
  to determine the Maximum Likelihood estimate of the camera
  projection matrix that minimizes the projection error.  You must
  parameterize the camera projection matrix as a parameterization of
  the homogeneous vector $\boldsymbol{p} = vec{(\boldsymbol{P}^\top)}$.  It is
  highly recommended to implement a parameterization of homogeneous
  vector method where the homogeneous vector is of arbitrary length,
  as this will be used in following assignments.
  
  Report the initial cost (i.e. cost at iteration 0) and the cost at the end
  of each successive iteration. Show the numerical values for the final 
  estimate of the camera projection matrix $\boldsymbol{P}_\text{LM}$, scaled
  such that $||\boldsymbol{P}_\text{LM}||_\text{Fro} = 1$.
  
  The following helper functions may be useful in your LM function implementation.
  You are welcome to add any additional helper functions.
  
  Hint: LM has its biggest cost reduction after the 1st iteration. You'll know if you 
  are implementing LM correctly if you experience this.

In [4]:
# Note that np.sinc is different than defined in class
# numpy is row-wise vetorize
def Sinc(x):
    # Returns a scalar valued sinc value
    """your code here"""
    if x == 0:
        y = 1
    else:
        y = np.sin(x) / x
    return y

def d_Sinc(x):
    if x == 0:
        return 0
    else:
        return np.cos(x)/x-np.sin(x)/(x**2)

    
def Jacobian(P,p,X):
    # compute the jacobian matrix
    #
    # Input:
    #    P - 3x4 projection matrix
    #    p - 11x1 homogeneous parameterization of P
    #    X - 3n 3D scene points, homogeneous
    # Output:
    #    J - 2nx11 jacobian matrix
    J = np.zeros((2*X.shape[1],11))
    
    
    """your code here"""
    # P' / p' 12 * 11    
    P_vect = P.reshape(-1, 1)
#     P_vect_norm = np.linalg.norm(P_vect)
#     P_vect = P_vect / P_vect_norm
    
    n = P_vect.shape[0]
    a = P_vect[0, 0]
    b = P_vect[1:, 0]
    
    p_norm = np.linalg.norm(p)
    if p_norm == 0:
        da_dp = np.zeros((1, n-1))
        db_dp = 0.5 * np.eye(n-1)
    else:
        da_dp = - 0.5 * b[None, :]
        db_dp = 0.5 * Sinc(0.5 * p_norm) * np.eye(n-1) + 0.25/p_norm * d_Sinc(0.5 * p_norm) * p.dot(p.T)
        
        
    dP_dp = np.vstack((da_dp, db_dp))
    
    
    
    #dxest_dP
    P_r3 = P[-1,:][None, :] # 1 * 4
    dxEstInhomog_dP = np.zeros((2 * X.shape[1], 3 * X.shape[0])) # 100 * (3*4)
    X_homog = X #Homogenize(X) # 4 * 50

    for i, Xi_homog in enumerate(X_homog.T):
        wi = P_r3.dot(Xi_homog[:, None]) 
        
        xi_est_homog = P.dot(Xi_homog[:, None])
        xi_est_inhomog = Dehomogenize(xi_est_homog)

        xi = xi_est_inhomog.ravel()
        Xi = Xi_homog.ravel()
        
        dxEstInhomog_dP[2*i : 2*i + 2 , :] =1.0/wi * np.array([[Xi[0],Xi[1],Xi[2],Xi[3],0,0,0,0,
                                                                -xi[0]*Xi[0],-xi[0]*Xi[1],-xi[0]*Xi[2],-xi[0]*Xi[3]],
                                                               
                                                              [0,0,0,0,Xi[0],Xi[1],Xi[2],Xi[3],
                                                               -xi[1]*Xi[0],-xi[1]*Xi[1],-xi[1]*Xi[2],-xi[1]*Xi[3]]])

    J = dxEstInhomog_dP.dot(dP_dp) # (2*50) * 11
    return J





def Parameterize(P):
    # wrapper function to interface with LM
    # takes all optimization variables and parameterizes all of them
    # in this case it is just P, but in future assignments it will
    # be more useful   
    return ParameterizeHomog(P.reshape(-1,1))


def Deparameterize(p):
    # Deparameterize all optimization variables
    return DeParameterizeHomog(p).reshape(3,4)



def ParameterizeHomog(V):
    # Given a homogeneous vector V return its minimal parameterization
    """your code here"""
    
    V_normed = V / np.linalg.norm(V)
    a = V_normed[0, 0]
    b = V_normed[1:, 0]
    
    v = 2.0/Sinc(np.arccos(a)) * b # (11,)
    
    v_norm = np.linalg.norm(v)
    if v_norm > np.pi: 
        v = (1 - 2*np.pi/v_norm * np.ceil((v_norm - np.pi)/(2*np.pi))) * v 
        if v_norm == 2 * np.pi:
            v = np.zeros(11)
            v[0] = -1
    return v[:, None]


def DeParameterizeHomog(v):
    # Given a parameterized homogeneous vector return its deparameterization
    """your code here"""
    v_norm = np.linalg.norm(v)
    if v_norm > np.pi: 
        v = (1 - 2.0*np.pi/v_norm * np.ceil((v_norm - np.pi)/(2.0*np.pi))) * v 
        if v_norm == 2 * np.pi:
            v = np.zeros(11)
            v[0] = -1
        
    a = np.cos(v_norm / 2)
    b = Sinc(v_norm/2) / 2 * v 
    
    a = np.array([a])[:, None]
    V = np.vstack((a, b)) # (12, 1)
    
    V = V / np.linalg.norm(V)
    return V

In [9]:
def LM(P, x, X, max_iters, lam):
    # Input:
    #    P - initial estimate of P
    #    x - 2D inhomogeneous image points
    #    X - 3D inhomogeneous scene points
    #    max_iters - maximum number of iterations
    #    lam - lambda parameter
    # Output:
    #    P - Final P (3x4) obtained after convergence
    
    # data normalization
    x_ = x
    X_ = X
    x, T = Normalize(x) # homog
    X, U = Normalize(X) # homog
    P = P * np.sign(P[-1,-1])
    P_est = T @ P @ np.linalg.inv(U) # P is in normalized 
    P_est = P_est/ np.linalg.norm(P_est)
  


    """your code here"""
    s = T[0,0]
    cov_x_DN = np.eye(2 * X.shape[1]) * (s*s)

    
    # you may modify this so long as the cost is computed
    # at each iteration
    #step 1: 
    x_est = P_est.dot(X)
    err_vect = (Dehomogenize(x) - Dehomogenize(x_est)).reshape((-1, 1), order = 'F') # 100 * 1

 
    for i in range(max_iters): 
        #step 2:
        p_est = Parameterize(P_est)
  
        J = Jacobian(P_est, p_est, X) # 100 * 11
        
        #step 3:
        JT_cov_x_DN_J = J.T @ np.linalg.inv(cov_x_DN) @ J # 11 * 11
      
        JT_cov_x_DN_err = J.T @ np.linalg.inv(cov_x_DN) @ err_vect # 11* 1
        
        
        
        iter_num = 0
        while True:
            if iter_num>20:
                break
            #step 4:
            my_U = JT_cov_x_DN_J + lam * np.eye(J.shape[1])
            delta_vect = np.linalg.inv(my_U) @ JT_cov_x_DN_err # 11 * 1
            #step 5:
            p_est_c = p_est + delta_vect # candiate
            #step 6:
            P_est_c = Deparameterize(p_est_c)
            x_est_c = P_est_c.dot(X)
            err_vect_c = (Dehomogenize(x) - Dehomogenize(x_est_c)).reshape((-1, 1), order = 'F') # 100 * 1
            #step 7:
            cost_now = err_vect_c.T @ np.linalg.inv(cov_x_DN) @ err_vect_c
            cost_pre = err_vect.T @ np.linalg.inv(cov_x_DN) @ err_vect
#             if abs(cost_now - cost_pre) < 0.000001:
#                 pass
            if cost_now < cost_pre: #or abs(cost_now - cost_pre) < 0.000001:
#                 p_est = p_est_c
                err_vect = err_vect_c
                P_est = P_est_c
                
                lam = 0.1 * lam
                # jump to step 2
                break
            else:
                # jump to step 4, and this is not an iteration
                lam = 10 * lam
                iter_num = iter_num + 1
                
        # data denormalization
        P = np.linalg.inv(T) @ P_est @ U
        Frob_norm = np.sqrt(np.sum(P * P))
        P = P / Frob_norm
        cost = ComputeCost(P, x_, X_)
        print ('iter %03d Cost %.9f'%(i+1, cost))
    return P



# LM hyperparameters
lam = .001
max_iters = 19

# Run LM initialized by DLT estimate with data normalization

print ('Running LM with data normalization')
print ('iter %03d Cost %.9f'%(0, cost))
x=np.loadtxt('hw2_points2D.txt').T
X=np.loadtxt('hw2_points3D.txt').T
time_start=time.time()
P_LM = LM(P_DLT, x, X, max_iters, lam)
time_total=time.time()-time_start
print('took %f secs'%time_total)


Running LM with data normalization
iter 000 Cost 84.104680130
iter 001 Cost 82.791336044
iter 002 Cost 82.790238006
iter 003 Cost 82.790238005
iter 004 Cost 82.790238005
iter 005 Cost 82.790238005
iter 006 Cost 82.790238005
iter 007 Cost 82.790238005
iter 008 Cost 82.790238005
iter 009 Cost 82.790238005
iter 010 Cost 82.790238005
iter 011 Cost 82.790238005
iter 012 Cost 82.790238005
iter 013 Cost 82.790238005
iter 014 Cost 82.790238005
iter 015 Cost 82.790238005
iter 016 Cost 82.790238005
iter 017 Cost 82.790238005
iter 018 Cost 82.790238005
iter 019 Cost 82.790238005
took 0.146304 secs


In [6]:
# Report your P_LM final value here!
displayResults(P_LM, x, X, 'P_LM')

P_LM =
[[ 6.09434291e-03 -4.72647758e-03  8.79023503e-03  8.43642842e-01]
 [ 9.02017241e-03 -2.29290824e-03 -6.13330068e-03  5.36660248e-01]
 [ 4.99088611e-06  4.45205073e-06  2.53705045e-06  1.24348254e-03]]
