# Drone Trajectory Planner

In this project, we will develop the drone trajectory planner. This notebook serves as the main file for the project, where we will refer to the instructions and demonstrate our code.

Please follow week by week instructions, which includes writing the code in the `src/` folder.

In [1]:
# Import all the files and libraries required for the project
%load_ext autoreload
%autoreload 2
import copy
    
import numpy as np

from src.camera_utils import compute_image_footprint_on_surface, compute_ground_sampling_distance, project_world_point_to_image
from src.data_model import Camera, DatasetSpec
from src.plan_computation import compute_distance_between_images, compute_speed_during_photo_capture, generate_photo_plan_on_grid
from src.visualization import plot_photo_plan

# Week 1: Introduction

No code contribution expected this week

# Week 2: Camera System Modeling and Operations

We plan to
- Model the simple pinhole camera system
- Write utility functions to
    - project a 3D world point to an image
    - Compute image footprint on a surface
    - Compute the Ground Sampling Distance

## Model the camera parameters

We want to model the following camera parameters in Python:
- focal length along x axis (in pixels)
- focal length along y axis (in pixels)
- optical center of the image along the x axis (in pixels)
- optical center of the image along the y axis (in pixels)
- Size of the sensor along the x axis (in mm)
- Size of the sensor along the y axis (in mm)
- Number of pixels in the image along the x axis
- Number of pixels in the image along the y axis

I recommend to use `dataclasses` ([Python documentation](https://docs.python.org/3/library/dataclasses.html), [Blog](https://www.dataquest.io/blog/how-to-use-python-data-classes/) to model these parameters.

$\color{red}{\text{TODO: }}$ Implement `Camera` in `src/data_model.py`

In [2]:
# Define the parameters for Skydio VT300L - Wide camera
# Ref: https://support.skydio.com/hc/en-us/articles/20866347470491-Skydio-X10-camera-and-metadata-overview
fx = 4938.56
fy = 4936.49
cx = 4095.5
cy = 3071.5
sensor_size_x_mm = 13.107 # single pixel size * number of pixels in X dimension
sensor_size_y_mm = 9.830 # single pixel size * number of pixels in Y dimension
image_size_x = 8192
image_size_y = 6144

camera_x10 = Camera(fx, fy, cx, cy, sensor_size_x_mm, sensor_size_y_mm, image_size_x, image_size_y)

In [3]:
print(f"X10 camera model: {camera_x10}")

X10 camera model: Camera(fx=4938.56, fy=4936.49, cx=4095.5, cy=3071.5, sensor_size_x_mm=13.107, sensor_size_y_mm=9.83, image_size_x_px=8192, image_size_y_px=6144)


## Project 3D world points into the image


![Camera Projection](assets/image_projection.png)
Reference: [Robert Collins CSE483](https://www.cse.psu.edu/~rtc12/CSE486/lecture12.pdf)


Equations to implement:
$$ x = f_x \frac{X}{Z} $$
$$ y = f_y \frac{Y}{Z} $$
$$ u = x + c_x $$
$$ v = y + c_y $$

$\color{red}{\text{TODO: }}$ Implement function `project_world_point_to_image` in `src/camera_utils.py`

In [4]:
point_3d = np.array([25, -30, 50], dtype=np.float32)
expected_uv = np.array([6564.80, 109.60], dtype=np.float32)
uv = project_world_point_to_image(camera_x10, point_3d)

print(f"{point_3d} projected to {uv}")

assert np.allclose(uv, expected_uv, atol=1e-2)

[ 25. -30.  50.] projected to [6564.7803   109.60596]


## Compute Image Footprint on the surface

We have written code to *project* a 3D point into the image. The reverse operation is reprojection, where we take $(x, y)$ and compute the $(X, Y)$ for a given value of $Z$. Note that while going from 3D to 2D, the depth becomes ambiguous so we need the to specify the $Z$.

An image's footprint is the area on the surface which is captured by the image. We can take the two corners of the image and reproject them at a given distance to obtain the width and length of the image.

$\color{red}{\text{TODO: }}$ Implement function `compute_image_footprint_on_surface` in `src/camera_utils.py`

In [5]:
footprint_at_100m = compute_image_footprint_on_surface(camera_x10, 100)
expected_footprint_at_100m = np.array([165.88, 124.46], dtype=np.float32)

print(f"Footprint at 100m = {footprint_at_100m}")

assert np.allclose(footprint_at_100m, expected_footprint_at_100m, atol=1e-2)


Footprint at 100m = [165.87831271 124.46090238]


In [6]:
footprint_at_200m = compute_image_footprint_on_surface(camera_x10, 200)
expected_footprint_at_200m = expected_footprint_at_100m * 2

print(f"Footprint at 200m = {footprint_at_200m}")

assert np.allclose(footprint_at_200m, expected_footprint_at_200m, atol=1e-2)

Footprint at 200m = [331.75662541 248.92180476]


## Ground Sampling Distance

Ground sampling distance is the length of the ground (in m) captured by a single pixel. We have the image footpring (the dimensions of ground captured by the whole sensor, and the number of pixels along the horizontal and vertical dimension. Can we get GSD from these two quantities?

Note: Please return just one value of the GSD. Take the mininum of the values along the two axes.

In [7]:
gsd_at_100m = compute_ground_sampling_distance(camera_x10, 100)
expected_gsd_at_100m = 0.0202

print(f"GSD at 100m: {gsd_at_100m}")

assert np.allclose(gsd_at_100m, expected_gsd_at_100m, atol=1e-4)

GSD at 100m: 0.020248817469059807


## Bonus: Reprojection from 2D to 3D

If we have a 2d pixel location of a point along with the camera model, can we go back to 3D?
Do we need any additional information.


$\color{red}{\text{TODO: }}$ Implement function `reproject_image_point_to_world` in `src/camera_utils.py` and demonstrate it by running it in the notebook. Confirm that your reprojection + projection function are consistent.

In [8]:
# added by CH
# Variables added by CH have suffix *_CH
# Modifying and re-using test code (from above) for project_world_point_to_image().

from src.camera_utils import reproject_image_point_to_world

distance_to_surface_CH = 50 # distance to world point (in m)
point_3d_CH = np.array([25, -30, distance_to_surface_CH], dtype=np.float32)
expected_uv_CH = np.array([6564.80, 109.60], dtype=np.float32)
uv_CH = project_world_point_to_image(camera_x10, point_3d)

print(f"{point_3d} projected to {uv_CH}")
assert np.allclose(uv_CH, expected_uv_CH, atol=1e-2)

# recovering 3d point from image using the same distance
recovered_point_3d_CH = reproject_image_point_to_world(camera_x10, distance_to_surface_CH, np.array([uv_CH[0], uv_CH[1]]))

print(f"{uv_CH} reprojected back to {recovered_point_3d_CH}")
print(f"original world point: {point_3d_CH}")
print(f"recovered world point: {recovered_point_3d_CH}")
assert np.allclose(recovered_point_3d_CH, point_3d_CH, atol=1e-2)

[ 25. -30.  50.] projected to [6564.7803   109.60596]
[6564.7803   109.60596] reprojected back to [ 25.00000381 -29.99999809  50.        ]
original world point: [ 25. -30.  50.]
recovered world point: [ 25.00000381 -29.99999809  50.        ]


# Week 3: Model the user requirements

For this week, we will model the dataset specifications.

- Overlap: the ratio (in 0 to 1) of scene shared between two consecutive images.
- Sidelap: the ratio (in 0 to 1) of scene shared between two images in adjacent rows.
- Height: the height of the scan above the ground (in meters).
- Scan_dimension_x: the horizontal size of the rectangle to be scanned
- Scan_dimension_y: the vertical size of the rectangle to be scanned
- exposure_time_ms: the exposure time for each image (in milliseconds).


$\color{red}{\text{TODO: }}$ Implement `DatasetSpec` in `src/data_model.py`


In [9]:
# Model the nomimal dataset spec

overlap = 0.7
sidelap = 0.7
height = 30.48 # 100 ft
scan_dimension_x = 150
scan_dimension_y = 150
exposure_time_ms = 2 # 1/500 exposure time

dataset_spec = DatasetSpec(overlap, sidelap, height, scan_dimension_x, scan_dimension_y, exposure_time_ms)

print(f"Nominal specs: {dataset_spec}")

Nominal specs: DatasetSpec(overlap=0.7, sidelap=0.7, height=30.48, scan_dimension_x=150, scan_dimension_y=150, exposure_time_ms=2)


# Week 4: Compute Distance Between Photos

The overlap and sidelap are the ratio of the dimensions shared between two photos. We already know the footprint of a single image at a given distance. Can we convert the ratio into actual distances? And how does the distance on the surface relate to distance travelled by the camera?

$\color{red}{\text{TODO: }}$ Implement `compute_distance_between_images` in `src/plan_computation.py`



In [10]:
computed_distances = compute_distance_between_images(camera_x10, dataset_spec)
expected_distances = np.array([15.17, 11.38], dtype=np.float32)

print(f"Computed distance for X10 camera with nominal dataset specs: {computed_distances}")

assert np.allclose(computed_distances, expected_distances, atol=1e-2)

Computed distance for X10 camera with nominal dataset specs: [15.16791291 11.38070491]


$\color{red}{\text{TODO: }}$ define more specifications/camera parameters and check the computed distances. Does that align with your expections


In [11]:
# Additional checks 1: Double the height, expect distance to double.
camera_ = copy.copy(camera_x10)
dataset_spec_ = copy.copy(dataset_spec)
expected_distances_ = copy.copy(expected_distances)

# Double the height. We expect the computed distance to double.
print(f"We double the height. We expected the computed distance to double.")
dataset_spec_.height = 2 * dataset_spec.height

computed_distances_ = compute_distance_between_images(camera_, dataset_spec_)
expected_distances_ = 2 * expected_distances_

print(f"Computed distance at height {dataset_spec.height}: {computed_distances}")
print(f"Expected distance at height {dataset_spec.height}: {expected_distances}")
print(f"Computed distance at height {dataset_spec_.height}: {computed_distances_}")
print(f"Expected distance at height {dataset_spec_.height}: {expected_distances_}")
assert np.allclose(computed_distances_, expected_distances_, atol=1e-2)

We double the height. We expected the computed distance to double.
Computed distance at height 30.48: [15.16791291 11.38070491]
Expected distance at height 30.48: [15.17 11.38]
Computed distance at height 60.96: [30.33582583 22.76140983]
Expected distance at height 60.96: [30.34 22.76]


In [12]:
# Additional checks 2: Reduce the overlap, expect the distance to increase.
camera_ = copy.copy(camera_x10)
dataset_spec_ = copy.copy(dataset_spec)
expected_distances_ = copy.copy(expected_distances)

print(f"Reduce overlap from 0.7 to 0.4, which is the same as changing the non-overlap ratio from 0.3 to 0.6.")
print(f"We expect the distance traveled in the horizontal direction to double.")
dataset_spec_.overlap = 0.4

computed_distances_ = compute_distance_between_images(camera_, dataset_spec_)

# We expect the horizontal distance to double, but vertical distance to be unchanged.
expected_distances_[0] = 2 * expected_distances_[0]

print(f"Computed distance with overlap {dataset_spec.overlap}: {computed_distances}")
print(f"Expected distance with overlap {dataset_spec.overlap}: {expected_distances}")
print(f"Computed distance at overlap {dataset_spec_.overlap}: {computed_distances_}")
print(f"Expected distance at overlap {dataset_spec_.overlap}: {expected_distances_}")
assert np.allclose(computed_distances_, expected_distances_, atol=1e-2)

Reduce overlap from 0.7 to 0.4, which is the same as changing the non-overlap ratio from 0.3 to 0.6.
We expect the distance traveled in the horizontal direction to double.
Computed distance with overlap 0.7: [15.16791291 11.38070491]
Expected distance with overlap 0.7: [15.17 11.38]
Computed distance at overlap 0.4: [30.33582583 11.38070491]
Expected distance at overlap 0.4: [30.34 11.38]


In [13]:
# Additional checks 3: Reduce the sidelap, expect the distance to increase.
camera_ = copy.copy(camera_x10)
dataset_spec_ = copy.copy(dataset_spec)
expected_distances_ = copy.copy(expected_distances)

print(f"Reduce sidelap from 0.7 to 0.4, which is the same as changing the non-sidelap ratio from 0.3 to 0.6.")
print(f"We expect the distance traveled in the veritical direction to double.")
dataset_spec_.sidelap = 0.4

computed_distances_ = compute_distance_between_images(camera_, dataset_spec_)

# We expect the vertical distance to double, but horizontal distance to be unchanged.
expected_distances_[1] = 2 * expected_distances[1]

print(f"Computed distance with sidelap {dataset_spec.sidelap}: {computed_distances}")
print(f"Expected distance with overlap {dataset_spec.overlap}: {expected_distances}")
print(f"Computed distance with sidelap {dataset_spec_.sidelap}: {computed_distances_}")
print(f"Expected distance with sidelap {dataset_spec_.sidelap}: {expected_distances_}")
assert np.allclose(computed_distances_, expected_distances_, atol=1e-2)

Reduce sidelap from 0.7 to 0.4, which is the same as changing the non-sidelap ratio from 0.3 to 0.6.
We expect the distance traveled in the veritical direction to double.
Computed distance with sidelap 0.7: [15.16791291 11.38070491]
Expected distance with overlap 0.7: [15.17 11.38]
Computed distance with sidelap 0.4: [15.16791291 22.76140983]
Expected distance with sidelap 0.4: [15.17 22.76]


In [14]:
# Additional checks 4: Half the focal length fx and fy. Expect distance traveled to double.
# Rearranging the Perspective Projection Equation x = fx*X/Z to X = x*Z/fx,
# so we expect that if we cut fx to half of its original value, 
# then X, and therefore the horizontal distance traveled, would double.
# The effect of halving the focal length fy is the same, except it is for the vertical direction.

camera_ = copy.copy(camera_x10)
dataset_spec_ = copy.copy(dataset_spec)
expected_distances_ = copy.copy(expected_distances)

print(f"Divide the focal length fx and fy by 2.")
print(f"We expect the distance traveled in the horizontal direction and vertical direction to double.")
camera_.fx = camera_.fx / 2
camera_.fy = camera_.fy / 2

computed_distances_ = compute_distance_between_images(camera_, dataset_spec_)

# We expect the vertical distance to double, but horizontal distance to be unchanged.
expected_distances_ = 2 * expected_distances_

print(f"Computed distance with focal lengths {camera_x10.fx} {camera_x10.fy}: {computed_distances}")
print(f"Expected distance with focal lengths {camera_x10.fx} {camera_x10.fy}: {expected_distances}")
print(f"Computed distance with focal lengths {camera_.fx} {camera_.fy}: {computed_distances_}")
print(f"Expected distance with focal lengths {camera_.fx} {camera_.fy}: {expected_distances_}")
assert np.allclose(computed_distances_, expected_distances_, atol=1e-2)

Divide the focal length fx and fy by 2.
We expect the distance traveled in the horizontal direction and vertical direction to double.
Computed distance with focal lengths 4938.56 4936.49: [15.16791291 11.38070491]
Expected distance with focal lengths 4938.56 4936.49: [15.17 11.38]
Computed distance with focal lengths 2469.28 2468.245: [30.33582583 22.76140983]
Expected distance with focal lengths 2469.28 2468.245: [30.34 22.76]


In [15]:
# unused template
camera_ = copy.copy(camera_x10)
dataset_spec_ = copy.copy(dataset_spec)

computed_distances_ = compute_distance_between_images(camera_, dataset_spec_)
print(f"Computed distance: {computed_distances_}")

Computed distance: [15.16791291 11.38070491]


## Bonus: Non-Nadir photos

We have solved for the distance assuming that the camera is facing straight down to the ground. This is called [Nadir scanning](https://support.esri.com/en-us/gis-dictionary/nadir). However, in practise we might want a custom gimbal angle.

Your bonus task is to make the distance computation general. Introduce a double `camera_angle` parameter (which is the angle from the X-axis) in the dataset specification, and work out how to adapt your computation. Feel free to reach out to Ayush to discuss ideas and assumptions!

![Non Nadir Footprint](assets/non_nadir_gimbal_angle.png)

## Bonus work by CH
The new function written is `compute_distance_between_images_with_angle(camera: Camera, dataset_spec: DatasetSpec, angle_deg_x: float, angle_deg_y: float)`.
                                               
In the function compute_distance_between_images_with_angle(), we added two additional angles, for the camera angles from the x and y axes. 
For the test code below, we are only testing the angle in x, so we set the y angle to 0.

The following is the calculation for the footprint in the x direction. 

Assumptions:
* For the angle in each direction, FOV/2 + abs(camera_angle) < 90 degrees, otherwise an edge of the FOV will be at or above the horizon. (FOV = Field of View)
* The only adjustment for the distance calculation is the footprint. We do not need to make adjustment for the overlap or sidelap.

If camera_angle >= 0, We have two cases:
* If camera_angle >= FOV/2, then `footprint = (tan(camera_angle + FOV/2) - tan(camera_angle - FOV/2)) * height`
* If 0 <= camera_angle < FOV/2, then `footprint = (tan(camera_angle + FOV/2) + tan(FOV/2 - camera_angle)) * height`
![camera angle 1](assets/CH_assets/CH_camera_angle_1.png) ![camera angle 2](assets/CH_assets/CH_camera_angle_2.png)

If camera angle < 0, we still get the same equations since tan() is an odd function.

Then we use https://www.wolframalpha.com/ to simplify these functions and we found out that
both of them can be simplified to the same form, regardless of whether
the camera_angle is greater than FOV/2:

`footprint = 2 * sin(FOV) / (cos(FOV) + cos(2*camera_angle))`

The above formula was used to calculate the modified footprint for x and for y.
Lastly, we multiply the footprint by (1-overlap) or by (1-sidelap) to get the distance required.



In [16]:
from src.plan_computation import compute_distance_between_images_with_angle, fov
[fov_deg_x, fov_deg_y] = fov(camera_x10)
print(f"Field of View FOV(degrees) {fov_deg_x}, {fov_deg_y}\n")

print(f"Camera_angle_x(deg)\tSum of fov_deg_x/2 + abs(camera_angle_x)\t Computed distance for X10 camera")
for deg in range(-50,51, 2):
    computed_distances = compute_distance_between_images_with_angle(camera_x10, dataset_spec, deg, 0)
    expected_distances = np.array([15.17, 11.38], dtype=np.float32) # for camera angle = 0 degrees
    print(f"{deg:>5}\t\t\t{fov_deg_x/2 + abs(deg)}\t\t\t\t[{computed_distances[0]}\t{computed_distances[1]}]")
    

Field of View FOV(degrees) 79.34405165142869, 63.78838119894017

Camera_angle_x(deg)	Sum of fov_deg_x/2 + abs(camera_angle_x)	 Computed distance for X10 camera
  -50			89.67202582571434				[1595.736505901222	11.380704913815286]
  -48			87.67202582571434				[223.58848617448598	11.380704913815286]
  -46			85.67202582571434				[119.80826703492599	11.380704913815286]
  -44			83.67202582571434				[81.7641634341542	11.380704913815286]
  -42			81.67202582571434				[62.09458376234165	11.380704913815286]
  -40			79.67202582571434				[50.12456827416916	11.380704913815286]
  -38			77.67202582571434				[42.10693418089635	11.380704913815286]
  -36			75.67202582571434				[36.38713601325202	11.380704913815286]
  -34			73.67202582571434				[32.12164492175725	11.380704913815286]
  -32			71.67202582571434				[28.83545730656419	11.380704913815286]
  -30			69.67202582571434				[26.240819533127922	11.380704913815286]
  -28			67.67202582571434				[24.153382390884456	11.380704913815286]
  -26			65.6720258257

# Week 5: Compute Maximum Speed For Blur Free Photos

To restrict motion blur due to camera movement to tolerable limits, we need to restrict the speed such that the image contents move less than 1px away. 

How much does 1px of movement translate to movement of the scene on the ground? It is the ground sampling distance!
From previous week, we know that this is the maximum movement the camera can have. 
We have the distance now. To get speed we need to divide it with time. Do we have time already in our data models?

$\color{red}{\text{TODO: }}$ Implement `compute_speed_during_photo_capture` in `src/plan_computation.py`.

In [None]:
computed_speed = compute_speed_during_photo_capture(camera_x10, dataset_spec, allowed_movement_px=1)
expected_speed = 3.09

print(f"Computed speed during photo captures: {computed_speed:.2f}")

assert np.allclose(computed_speed, expected_speed, atol=1e-2)

$\color{red}{\text{TODO: }}$ define more specifications/camera parameters and check the computed distances. Does that align with your expections


In [None]:
camera_ = copy.copy(camera_x10)
dataset_spec_ = copy.copy(dataset_spec)

computed_speed_ = compute_speed_during_photo_capture(camera_, dataset_spec_)
print(f"Computed distance: {computed_speed_:.2f}")

# Week 6: Generate Full Flight Plans  

We now have all the tools to generate the full flight plan.

Steps for this week:
1. Define the `Waypoint` data model. What attributes should the data model have?
   1. For Nadir scans, just the position of the camera is enough as we will always look drown to the ground.
   2. For general case (bonus), we also need to define where the drone will look at.
3. Implement the function `generate_photo_plan_on_grid` to generate the full plan.
   1. Compute the maximum distance between two images, horizontally and vertically.
   2. Layer the images such that we cover the whole scan area. Note that you need to take care when the scan dimension is not a multiple of distance between images. Example: to cover 45m length with 10m between images, we would need 4.5 images. Not possible. 4 images would not satisfy the overlap, so we should go with 5. How should we arrange 5 images in the given 45m.
   3. Assign the speed to each waypoint.

$\color{red}{\text{TODO: }}$ Implement:
- `Waypoint` in `src/data_model.py`
- `generate_photo_plan_on_grid` in `src/plan_computation.py`.

In [None]:
computed_plan = generate_photo_plan_on_grid(camera_x10, dataset_spec) 

print(f"Computed plan with {len(computed_plan)} waypoints")

In [None]:
MAX_NUM_WAYPOINTS_TO_PRINT = 20

for idx, waypoint in enumerate(computed_plan[:20]):
    print(f"Idx {idx}: {waypoint}")
if len(computed_plan) >= MAX_NUM_WAYPOINTS_TO_PRINT:
    print("...")

## Bonus: Time computation 

if you have some time, you can implement a time computation function. We can make the drone fly as fast as possible between photos, but make sure it can decelerate back to the required speed at the photos. Please use the following data: 
- Max drone speed: 16m/s.
- Max acceleration: 3.5 m/s^2.

Hint: you might need to use a trapezoidal speed profile

# Week 7: Visualize Flight Plans

This week, we will use a third party plotting framework called [Plotly](https://plotly.com/python/) to visualize our plans. Please follow this [tutorial](https://www.kaggle.com/code/kanncaa1/plotly-tutorial-for-beginners) to gain some basic experience with Plotly, and then come up with your own visualization function. You are free to choose to come up with your own visualization, and use something other than Plotly.

$\color{red}{\text{TODO: }}$ Implement `plot_photo_plan` in `src/visualization.py`

In [None]:
fig = plot_photo_plan(computed_plan)
fig.show()

$\color{red}{\text{TODO: }}$ Compute the following ablations (and any other you can think of). 
You need to describe the input params you are changing, what impact you can observe, explanation behind the change in output, and practical implication of the correlation.

1. Change overlap and confirm it affects the consecutive images
2. Change sidelap and confirm it does not affect the consecutive images
3. Change the height of the scan and document the affect on scan plans
4. Change exposure time

In [None]:
camera_ = copy.deepcopy(camera_x10)
dataset_spec_ = copy.deepcopy(dataset_spec)

dataset_spec_.exposure_time_ms = 1000

print(camera_, dataset_spec_)

fig = plot_photo_plan(generate_photo_plan_on_grid(camera_, dataset_spec_))
fig.show()