# Data Extraction

Using OpenCV, we extracted the pixel coordinates of the pendulum from the video and exported them into a csv file. The code is in `appendix.ipynb`.

# Curve Fitting


## Load data

In [None]:
import pandas as pd

df = pd.read_csv("data/pendulum_output2.csv")

# The video frame number and the corresponding x and y
# positions.
frame = df["frame"]
x = df["x"]
y = df["y"]

# Swap the x and y values because the video is rotated.
x, y = y, x

# Reverse the directions of x and y to follow the convention
# that y increases upwards and x increases rightwards.
x = max(x) - x
y = max(y) - y

## Plot Data

In [None]:
import matplotlib.pyplot as plt

# Plot a scatter plot with x against frame number with marker size 1
plt.scatter(frame, x, s=1)
plt.xlabel("Video frame number")
plt.ylabel("x position (pixels)")
plt.title("x position against frame number")
plt.show()

# Plot a scatter plot with y against frame number with marker size 1
plt.scatter(frame, y, s=1)
plt.xlabel("Video frame number")
plt.ylabel("y position")
plt.title("y position against frame number")
plt.show()

As expected, both the x and y positions are a series of oscillations.
It seems that the x position can be approximated as a sine function 
because the zero point remains fixed across oscillations.

However, the zero point of the y position changes between oscillations.
This can be attributed to the camera shaking when the motion
of the pendulum was taken. Thus, the y position cannot be approximated well 
by a sine function. We will discuss the y position in `appendix.ipynb`.

# Try fitting

\begin{align}
\large x(t) = A \sin(2\pi f t + \alpha) + \text{offset}
\end{align}

We use the `curve_fit` function from SciPy to find the values of $A$, $f$, $\alpha$, and $\text{offset}$ that fits the sine curve to our data the best.

In [None]:
FPS = 30  # Video frames per second
# Assign each frame index to the number of seconds elapsed by dividing
# each frame by FPS
t = frame / FPS

import numpy as np


def func(t, A, f, alpha, offset):
    """Returns x (horizontal position) as a function of time. Refer to pendulum equation above."""
    return A * np.sin(2 * np.pi * f * t + alpha) + offset


from scipy.optimize import curve_fit

# Obtain optimal values for amplitude, frequency, oscillation phase
# and position offset using curve_fit.

# p0 is the initial guess for the parameters.
# A ~= 450 - 250 = 200.
# f ~= 9/15, because there are around 9 full cycles within the first 15s.
# alpha looks to be a little smaller than pi/2, within the first quadrant.
# offset ~= 250, obtained from the vertical offset of the graph from zero.
popt, pcov = curve_fit(func, t, x, p0=[200, 9 / 15, np.pi / 2, 250])

# popt contains the optimal values of the unknown parameters.
print("A = {}, f = {}, alpha = {}, offset = {}".format(*popt))
# pcov is the pairwise covariance of the parameters.
# They tell us whether the parameters are correlated.

# Plotting a scatter plot with x against time with marker size 1
plt.scatter(t, x, s=1)

# Plot best fit line using optimal values for the function
plt.plot(t, func(t, *popt), color="r", linestyle="--", label="Best Fit")

plt.xlabel("time /s")
plt.ylabel("x position")
plt.title("x position against time")
plt.show()

In this case, we had to guess approximate values of parameters to prevent `curve_fit`
from finding suboptimal local minima (behaviour differs between different versions
of SciPy). 

However, a much more scalable method would be to calculate the period, and
thus the frequency, by finding peaks instead. Then, we can remove one unknown parameter 
estimated by `curve_fit`, making the system more determined. This allows us to not have to set
`p0` each time we have a different dataset. (Imagine having thousands of datasets to analyse
and having to set a `p0` for each one of them.) The code for this second method can be
found in `appendix.ipynb`.

In [None]:
from sklearn.metrics import r2_score

# Coefficient of determination, R squared
r2_score(x, func(t, *popt))

Our fitted curve explains 99.8% of the variance in the x position.

# Air Drag

However, our fitted curve underpredicts the amplitude in the first half
of position-time graph and overpredicts in the second half. This suggests
the presence of resistive forces, in this case air drag. Thus, we modify
the given equation to account for amplitude decay.

An underdamped pendulum decays in amplitude exponentially<sup>1</sup>​:
\begin{align} 
\large \theta_{max}(t) = \theta_{0} e^{-\frac{\eta}{2m} t}  
\end{align}

$\eta$ : air drag constant ($ kg \cdot s^{-1} $)  
$m$ : Mass of the pendulum bob

The equation assumes linear drag.

We can combine the unknowns $\eta$ and $m$ to give a single parameter $\beta$,
where $\beta$ is negative.
So our improved model takes the following form: 

\begin{align} 
\large x(t) = x_{0}  e^{β t} sin(2πft + α) + \text{offset}
\end{align}

<sup>1</sup>: [Derivation of amplitude decay for an underdamped pendulum](https://www.ippp.dur.ac.uk/~krauss/Lectures/NumericalMethods/Oscillator/Lecture/os4.html#:~:text=The%20underdamped%20regime%20In%20this,%2C%20Ω²%3Dg%2Fl.)

In [None]:
def func_damped(t, A, f, alpha, offset, beta):
    """Returns x (horizontal position) as a function of time. Refer to damped pendulum equation above."""
    return A * np.exp(beta * t) * np.sin(2 * np.pi * f * t + alpha) + offset


# Obtain optimal values for amplitude, frequency, oscillation phase, position offset and decay constant using curve_fit
popt_damped, pcov_damped = curve_fit(
    func_damped, t, x, p0=[200, 9 / 15, np.pi / 2, 250, 0]
)

print("A = {}, f = {}, alpha = {}, offset = {}, beta = {}".format(*popt_damped))

# Plotting a scatter plot with x against time with marker size 1
plt.scatter(t, x, s=1)

# Plot best fit line using optimal values for the function
plt.plot(t, func_damped(t, *popt_damped), color="r", linestyle="--", label="Best Fit")

plt.xlabel("time /s")
plt.ylabel("x position")
plt.title("x position against time")
plt.show()

In [None]:
# Coefficient of determination, R squared
r2_score(x, func_damped(t, *popt_damped))

Our fitted curve now explains 99.9% of the variance in the x position.

We have obtained a better fit by considering amplitude decay.