![fun](images/fun7dof.png)

![nonmaxsup](images/nonmaxsuppress.png)

![pose](images/staticsceneposeestimation.png) \
Scale Ambiguity in Translation:

The essential matrix only gives the direction of translation, not its magnitude. \
The rotation R is fully determined from the essential matrix (assuming no noise or degeneracies). There is no scale ambiguity here because rotations are inherently normalized (orthonormal matrices).

Option	|| Statement	|| Why It's Incorrect \
a	Cannot find the pose.	Pose can be recovered up to scale. \
b	Rotation up to arbitrary rotation and translation up to scale.	No arbitrary rotation ambiguity exists. \
d	Translation without ambiguity, but not rotation.	Translation has scale ambiguity; rotation is fully recoverable. \
e	Pose without any ambiguity.	Scale ambiguity in translation prevents this. \
f	Rotation but not translation.	Translation is recoverable (up to scale). \
g	Translation up to scale but not rotation.	Rotation is fully recoverable. \
h	Rotation up to arbitrary rotation, but not translation.	No arbitrary rotation ambiguity; translation is recoverable (up to scale).

![workflowsift](images/sift_descriptors_workflow.png)

Scale-Space Extrema Detection:

* Application of a Gaussian filter at various scales to identify potential keypoints (Option 6).

This builds a "scale space" to find stable features across different scales.

* Keypoint Localization:

Candidates are refined to eliminate low-contrast points and edge responses.

* Orientation Assignment:

Assignment of one or more dominant orientations based on local gradients (Option 7).

This ensures rotation invariance by aligning the descriptor to the dominant orientation.

* Descriptor Generation:

Calculation of gradient magnitude and orientation around the keypoint (Option 3).

Division of the region into subregions (typically 4x4) to capture spatial information (Option 5).

Creation of a histogram of gradient orientations (8 bins per subregion) (Option 4).

Aggregation of histograms from all subregions to form the final 128-dimensional descriptor (Option 8).

Normalization of the descriptor to enhance invariance to illumination changes (Option 2).

Rotation of the descriptor according to the dominant orientation (Option 9).

Option || Step	|| Relevance to SIFT
* 2	Normalization of the descriptor	Ensures illumination invariance.
* 3	Gradient magnitude/orientation	Fundamental to descriptor creation.
* 4	Histogram of gradient orientations	Core step for capturing local patterns.
* 5	Division into subregions	Encodes spatial information.
* 6	Gaussian filtering at various scales	Detects scale-invariant keypoints.
* 7	Dominant orientation assignment	Achieves rotation invariance.
* 8	Aggregation of histograms	Forms the final descriptor.
* 9	Descriptor rotation	Aligns to dominant orientation.

![zhang](images/zhang_algo1.png)\
Why Other Options are Incorrect:
a) Incorrect. Skew is not always assumed to be zero. Zhang's algorithm can estimate skew if it exists (e.g., for non-rectangular pixels).\

b) Incorrect. The right singular vector of the largest singular value is not directly used. Instead, SVD is used to solve a system of equations derived from homographies.\

c) Incorrect. While a single image can provide constraints, multiple images (typically 5-10) are needed for robust calibration.\

d) Incorrect. The world points lie on a plane (e.g.,Z=0), but this plane does not need to be parallel to the image plane. The algorithm works with arbitrary plane orientations.\

e) Incorrect. The ratio of world points to image points is irrelevant. The algorithm relies on geometric constraints, not point counts.\

f) Incorrect. The principal point is estimated, not assumed to be at the image center.\

g) Incorrect. While a single image can provide constraints, multiple images are required for full calibration. Also, skew and principal point are estimated, not assumed.

![zhang](images/zhang2.png)

✔️ Zhang’s algorithm only works for flat calibration objects (e.g., checkerboards).

Why?

Zhang’s method assumes the calibration object lies on a plane (
Z
=
0
Z=0 in its local coordinates).

It estimates homographies between the planar object and the camera, then solves for intrinsics/extrinsics.

Key Insight: Non-flat objects (e.g., 3D grids) violate the planar assumption, breaking the algorithm.

Why Other Options Are Wrong:

○ "Small field of view": False—Zhang’s works for wide FoV if corners are detectable.

○ "Adjusts focus": Irrelevant; calibration doesn’t involve lens adjustments.

○ "Decomposing 
P
P": Incorrect—intrinsics come from homography constraints, not 
P
P.

○ "Corners directly minimize error": Partly true, but corners are used to estimate homographies first.

○ "Homographies constrain fundamental matrix": False—homographies constrain intrinsics, not 
F
F.

○ "Requires distance/size": False—only the checkerboard’s pattern size (e.g., square width) is needed, not its distance.

![image](images/EFdiff.png)

Option	|| Statement	|| Why It’s Incorrect
* 2	Essential matrix = two images; fundamental matrix = single image.	Both relate two images. Neither operates within a single image.
* 3	Fundamental matrix = extrinsic; essential matrix = intrinsic.	F depends on both intrinsic/extrinsic implicitly; E requires intrinsics explicitly.
* 4	Fundamental matrix = pinhole; essential matrix = complex lenses.	Both assume pinhole models. Lens systems are irrelevant here.
* 5	Fundamental = perspective; essential = orthographic.	Both assume perspective projection. Orthographic models use different math.
* 6	Essential matrix = subset of F with zero translation.	E is derived from F with known intrinsics, but translation need not be zero.
* 7	Essential = linear; fundamental = non-linear.	Both can be estimated linearly (e.g., 8-point algorithm). Non-linear refinement is optional for both.

![e_pose](images/e_pose.png)

![images](images/levenberg.png)

![33](images/33.png)

![scalar](images/scalar.png)

![sift2](images/sift2.png)

![coidim](images/codim_ransac.png)

![stitch](images/stiching.png)

![image](images/whattoransac.png)

Option	|| Why It’s Incorrect
Homography (H)	Assumes the scene is planar (invalid for general 3D odometry). \
Fundamental Matrix (F)	Requires more points (7–8) and doesn’t directly give 
R, t. Also for uncalibrated cameras. \
Essential Matrix (DoF argument)	Wrong reasoning.  \
E has 5 DoF (not "more"), but this isn’t why it’s better. \
Direct solvePnP	Requires known 3D points (not available in monocular odometry’s first frame). \
Fundamental Matrix (DoF argument)	F has 7 DoF, but this isn’t relevant for pose estimation. \

![image](images/K.png)

![image](images/sine.png)