# Analysis of Physical Oceanographic Data - SIO 221A
### Python version of [Sarah Gille's](http://pordlabs.ucsd.edu/sgille/sioc221a/index.html) notes by:
#### Bia Villas Bôas (avillasboas@ucsd.edu) & Gui Castelão (castelao@ucsd.edu)

## Lecture 5

#### Recap

Last time we looked at least-squares fitting.  We derived the formula for
a least-squares fit and showed that we could find a linear trend and
a sinusoidal variation.  We set up the least squares problem and looked at
sinusoidal fits, and we finished up by asking what would happen if we fit
multiple sinusoidal signals at once.  That's where we'll start today.

#### Least-squares fits and misfit

You'll recall that last time we considered a least-squares fit of the form

$$\begin{equation}
{\bf Ax} + {\bf n}= {\bf y}.\hspace{3cm} (1)
\end{equation}$$

The misfit is defined as a squared quantity so should follow
the $\chi^2$ statistic.  (Yet another use of
$\chi^2$.)  If I believe my *it a priori* uncertainties in my data are
${\bf \sigma}$, then I expect that my misfit should roughly match my uncertainty
so I can define a weighted summed misfit:

$$\begin{equation}
\chi^2 = \sum_{i=1}^N \left(\frac{y_i - \sum_{j=1}^{M} a_{i,j}x_j}{\sigma_i}\right)^2.\hspace{3cm} (2)
\end{equation}$$

Here we're summing the squared misfit of each row in our matrix equation, weighted by
our uncertainty.  If our error bars make sense, then this should yield about $N$,
reduced by the number of functions were fitting.  So we expect that $\chi^2$
will be about $N-M$, which is the number of degrees of freedom.  Formally we can
decide if our fit is too good to be true by evaluating $\chi^2$ using the incomplete
gamma function, to find where the observed $\chi^2$ falls within the expected
pdf of a $\chi^2$ distribution:

``p = scipy.special.gammaincc(nu/2, chi_squared/2)``

(Here we divide both $\chi^2$ and $\nu$ by 2 because of the way that gammainc is defined.)If $p$ is smaller than 0.05 or greater than 0.95, then our observed value is
outside the range expected for 90\% of observed $\chi^2$ values.
If $p$ is near 1, it can tell us that our fit is too good to be true.  Likewise,
if $p$ is too
small, it can tell us that our fit isn't properly representing the data---either
the model is wrong, or the a priori error bars are too large.

Alternatively, we can solve for upper and lower threshold $\chi^2$ values:
if $\chi^2>\chi_{upper}^2$, then we reject the hypothesis at the $\alpha$
level.  And if $\chi^2<\chi_{lower}^2$ we also know we're out of range.  So

``chi2_upper = scipy.special.gammainccinv(nu/2, 0.05)*2.``

``chi2_lower = scipy.special.gammaincinv(nu/2, 0.05)*2.``

What happens in the limit when we fit $N$ data points with $N$ columns in matrix ${ \bf A}$?
The matrix ${\bf A}$ is an $N\times N$ square matrix, and we are solving for as many unknowns
in ${\bf x}$ as we had data in ${\bf y}$.  In this
case, if $\chi^2$ is zero, $p$ will be 1, warning us that we're over-fitting our
data.
What happens to our noise ${\bf n}$?  By
using $N$ orthogonal functions, we obtain a perfect fit and the noise is
zero.  That's convenient, but it loses any information that we might have
had about uncertainties in our data.  If we made noisy measurements, we might
not have any reason to expect a perfect fit, but we'll have one anyway.

#### Multiple oscillatory signals

You might wonder if you'll bias your results by fitting for all of the
sinusoidal variability all at once.  Usually, the answer is no.  Assuming
that your time series is long enough, sinusoidal frequencies are
orthogonal to each other:   there is no correlation between $\sin(2\pi t/T)$
and $\sin(2\pi 2t/T)$, just as there is no correlation between
the sine and cosine components.

So we can take this to the maximum limit.  Suppose we just want to fit
sines and cosines to our data.  How many frequencies can we fit?
If I'm going to do this, then
I'll need to make sure that each row of my matrix equation is linearly
independent, which means that I'll want to make sure that each column
of ${\bf A}$ is orthogonal, so I can't choose frequencies that are too
closely spaced.

#### Monthly and fortnightly tides: Beats

If you look at pressure records from the pier, you’ll see that the amplitude of the pressure varies on monthly and fortnightly timescales. At first glance, you might wonder if this is an extra tidal forcing that you need to take into account. In reality, it’s just the interference pattern between the M2 lunar semi-diurnal tide and the S2 solar semi-diurnal tide. To see where this comes from, think about the trigonometry identies:



$$\begin{equation}
\cos(\omega_1 t)\cos(\omega_2 t) =  \frac{\cos[(\omega_1 + \omega_2)t] + \cos[(\omega_1 - \omega_2)t]}{2}
\end{equation}$$


In words, this means that the sum of two sinusoidal signals at adjacent frequencies $(\bar\omega \pm \delta)$ is the product of cosines: a rapid sinusoidal wave $\cos(\bar\omega t)$ multiplied by a slow envelope $\cos(\delta t)$. We see this for the tidal peaks, but we also expect it for other signals. For example, think about how an annual cycle might modulate an M2 tide if for example the strength of the tide changed seasonally.

#### Least-Squares Fitting Sines and Cosines

Least-squares fitting is particularly tidy when the functions that we use
for our fit, the columns of our matrix ${\bf A}$, are completely orthogonal,
because then
the fit to one function has no impact on the fit to the other functions.

Consider the special case where the columns of ${\bf A}$ are made up of sines and
cosines, so

$$\begin{equation}
A = \left[\begin{array}{cccccc} 1 & \cos(\omega t) & \sin(\omega t) &
\cos(2\omega t) & \sin(2\omega t) & \cdots \\
 \vdots & \vdots & \vdots & \vdots & \vdots & \end{array}\right], \hspace{3cm} (4)
\end{equation}$$

where $\omega = 2\pi/T$ and $T$ is the total duration of the data record.
The dot product of any two columns $i$ and $j$ of ${\bf A}$ is zero if $i\ne j$.
If I have data at $N$ evenly spaced time increments, $t_1, t_2, ... t_N$,
then this orthogonality property holds for all frequencies from $\omega$
through $N\omega/2$.  Since I have a sine and cosine at each frequency
(up to frequency $N\omega/2$ where sine might be zero at all points in
time), this means that I can define a total of $N$ independent orthogonal
columns in ${\bf A}$. 

On the other hand, if I define a column of ${\bf A}$ to have a frequency $\omega/2$,
it won't be orthogonal to my other functions over the range of this data.
For example, between 0 and $T$, $\sin(\omega/2)$ varies from 0 to 1 to 0
and is always positive, meaning that it will be positively correlated with
a constant.   In fact, sines and cosines with frequencies that are $\omega$
multiplied by integers rangings from 0 to $N/2$ make a complete set that
spans all space, and there are no additional $N$-element vectors that I can add
to ${\bf A}$ that would also be orthogonal to all other columns of ${\bf A}$.

The orthogonality of the columns of ${\bf A}$ is really important.  It means
that my solution for $x_1$ is completely independent of my solution for
$x_2$.  Here are some results for a set of 128 random numbers, $b$.

$$\begin{eqnarray}
\hat b& = & -0.0629 -0.0620\cos(\omega t) -0.1339\sin(\omega t)\hspace{3cm} (5)\\
\hat b& =& -0.0629 -0.0960\cos(2\omega t) +0.1117\sin(2\omega t)\hspace{3cm} (6)\\
\hat b& = & -0.0629 -0.0620\cos(\omega t) -0.1339\sin(\omega t) -0.0960\cos(2\omega t) +0.1117\sin(2\omega t),\hspace{3cm} (7)
\end{eqnarray}$$

where $\hat b$ is our fitted approximation to $b$.
You can see that the amplitudes of $\cos(\omega t)$ and $\cos(2\omega t)$ are
the same regardless of whether ${\bf A}$ contains 3 columns or 5 columns.

Let's look at the code to compute the fit

In [56]:
import numpy as np 
from numpy import pi, cos, sin
from scipy.linalg import inv

T = 128 # fake period
t = np.arange(T) # fake time
x = np.random.randn(T, 1) # fake data

print(t.shape)
print(x.shape)

(128,)
(128, 1)


In [61]:
A = np.array([ np.ones(T), cos(2*pi*t/T), sin(2*pi*t/T),  cos(4*pi*t/T), sin(4*pi*t/T)]).T 
print(A.shape)

(128, 5)


In [62]:
fit = np.dot(inv(np.dot(A.T, A)), np.dot(A.T, x))
print(fit)

[[ 0.08414784]
 [ 0.19507653]
 [ 0.04211141]
 [-0.01763531]
 [-0.00529046]]


Now, if we want only the low frequency sine and cosine

In [63]:
A_low = A[:, :3]
print(A_low.shape)

(128, 3)


In [64]:
fit_low = np.dot(inv(np.dot(A_low.T, A_low)), np.dot(A_low.T, x))
print(fit_low)

[[0.08414784]
 [0.19507653]
 [0.04211141]]


Now for the high frequency

In [68]:
A_high = A[:, [0,3,4]]
print(A_high.shape)

(128, 3)


In [69]:
fit_high = np.dot(inv(np.dot(A_high.T, A_high)), np.dot(A_high.T, x))
print(fit_high)

[[ 0.08414784]
 [-0.01763531]
 [-0.00529046]]


What happens if we want to fit the frequency ω/2? In this case, it won’t be orthogonal to my other functions over the range of this data. For example, between 0 and T, sin(ω/2) varies from 0 to 1 to 0 and is always positive, meaning that it will be positively correlated with a constant. In fact, sines and cosines with frequencies that are ω multiplied by integers rangings from 0 to N/2 make a complete set that spans all space, and there are no additional N-element vectors that I can add to A that would also be orthogonal to all other columns of A.

If we take a time series of $N$ elements, then the lowest frequency that
we can resolve
is 1 cycle per $N$ elements, so $\cos(2\pi i/N)$, where our counter $i$ runs
from $1$ to $N$ (or from 0 to $N-1$).   We can find two coefficients for this:
one for the $\cos$ component and one for the $\sin$ component.

Actually, maybe a better way to think about this is that the lowest frequency we
can resolve is $\cos(0 i/N) = 1$, which is a constant and represents the mean.
Since $\sin(0)=0$, there is only a cosine component for frequency 0.

At any rate, after considering 1 cycle per $N$ points, the next frequency we can
resolve that will actually be fully orthogonal is 2 cycles per $N$ points.  We
can keep counter upward:  3 cycles per $N$ points, 4 cycles per $N$ points,
and so forth.  All of these are guaranteed to be orthogonal over our domain
of $N$ points.

What is the maximum number of cycles that we can resolve in $N$ points?
One possibility would be that the maximum is $N$ cycles per $N$ points.
That would require a full sinusoidal oscillation squeezed between data
element 1 and data element 2.  But if you think about it, we wouldn't expect
to have enough information to determine the amplitude of a sine wave that had
to squeeze itself between consecutive observations. Moreover
if $N$ cycles per $N$ points were the maximum, this would mean that we'd be
solving for 2$N$
coefficients with only $N$ data points.  Clearly that would require more
information than we have available.

Last time we noted that with N data points, we can fit a maximum of $N$ functions. If we fit sine and cosine pairs, this translates into $N/2$ cosines and $N/2$ sines. The highest frequency we can resolve is 1 cycle every 2 data points, so $N/2$ cycles in $N$ points, and this is the *Nyqust frequency*.
And the strategy of least-squares fitting all possible frequencies that can be resolved represents the *discrete Fourier transform*. It’s a slow and inefficient Fourier transform, but it is the essence of this class and it will be the building block for everything we do in the remainder of the quarter.

#### Orthogonality and Sines and Cosines

Last time we talked about the importance of having independent columns in our matrix ${\bf A}$ and noted
that sines and cosines are particularly useful since they are orthogonal.  Let's work through this
a little more carefully.

Consider a record of duration $T$ with $N$ data points.  I can imagine squeezing into the
period $T$, one sine wave, or two, or three, or four.  How do I tell if my records are
orthogonal?

$$\begin{eqnarray*}
\int_0^T \sin\left(\frac{2\pi n t}{T}\right) \sin\left(\frac{2\pi m t}{T}\right)\, dt & = & \frac{1}{2} \int_0^T \cos\left(\frac{2\pi (n-m)t}{T}\right) -
\cos\left(\frac{2\pi (n+m)t}{T}\right) \, dt\\
 & = & \frac{1}{2} \frac{T}{2\pi} \left.\left[ \frac{\sin\left(\frac{2\pi (n-m)t}{T }\right)}{n-m}- \frac{\sin\left(\frac{2\pi  (n+m)t}{T}\right)}{(n+m)}\right]\  \right|_0^T \\
& = & \begin{cases}
0, & \mbox{if $n\ne m$}\\
\frac{T}{2}, & \mbox{if $n=m$}
\end{cases}
\end{eqnarray*}$$

(What matters is that this is only non-zero in the special case when $n=m$.  For the
moment, the fact that the integral yields $T/2$ when $n=m$ is a minor detail.)
By extension the same applies for two cosines, or a sine multiplied by a cosine.

#### The Fourier Transform

So our least-squares fit of $N$ data to $N$ sinusoids was clearly too good to be
true, but we're not doing fitting here, so we're going to proceed along this
line of reasoning anyway.  Our goal is to rerepresent all of the information
in our data by projecting our data onto a different basis set. In this case
we'll take the projection, warts and all, and we want to make sure we don't lose
any information.

So we want to represent our data via sines and cosines:
$$\begin{equation}
x(t) = \frac{a_0}{2} + \sum_{q=1}^{\infty}\left(a_q \cos(2\pi q f_1 t) +
b_q \sin(2\pi q f_1 t)\right), \hspace{3cm} (8)
\end{equation}$$

where $f_q = 1/T_p$, and $T_p$ is the duration of the  record
(following Bendat and Piersol).  Formally we should assume that the
data are periodic over the period $T_p$.
We find the coefficients $a$ and $b$ by projecting our data onto the
appropriate sines and cosines:

$$\begin{equation}
a_q = \frac{1}{T_p} \int_{0}^{T_p} x(t) \cos(2\pi q f_1 t)\, dt \hspace{3cm} (9)
\end{equation}$$

and

$$\begin{equation}
b_q = \frac{1}{T_p} \int_0^{T_p} x(t) \sin(2\pi q f_1 t)\, dt \hspace{3cm} (10)
\end{equation}$$

solved for $q=0,1,2,....$

It's not much fun to drag around these cosines and sines, so it's useful
to recall that

$$\begin{eqnarray}
\cos\theta & = & \frac{\exp(i\theta)+\exp(-i\theta)}{2} \hspace{3cm} (11)\\
\sin\theta & = & \frac{\exp(i\theta)-\exp(-i\theta)}{2i},  \hspace{3cm} (12)
\end{eqnarray}$$

which means that we could
redo this in terms of $e^{i\theta}$ and $e^{-i\theta}$.
In other words, we can represent our data as:

$$\begin{equation}
x(t) = \sum_{q=-\infty}^{\infty}\left[\hat{a}_q \exp(i2\pi q f_1 t)\right] =
\sum_{q=-\infty}^{\infty}\left[\hat{a}_q \exp(i\sigma_q t)\right] \hspace{3cm} (13)
\end{equation}$$

where $\sigma_q = 2\pi q/T$, and $\hat{a}_q$ represents a complex
Fourier coefficient.
If we solved for our coefficients for cosine and sine, then we can easily convert
them to find the complex  coefficients $\hat{a}_q$
for $\exp(i\sigma_q t)$ and $\exp(-i\sigma_q t).$
Consider :

$$\begin{eqnarray}
a\cos\theta + b\sin\theta & = &\frac{a}{2}(e^{i\theta}+e^{-i\theta})
+ \frac{b}{2i}(e^{i\theta}-e^{-i\theta}) \hspace{3cm} (14)\\
&= & \frac{a-ib}{2}e^{i\theta}+ \frac{a+ib}{2}e^{-i\theta} \hspace{3cm} (15).
\end{eqnarray}$$

This tells us some important things.  The coefficients for $e^{i\theta}$ and
$e^{i\theta}$ are complex conjugates.  And there's a simple relationship between
the sine and cosine coefficients and the $e^{\pm i\theta}$ coefficients.
Instead of computing $\sum_{j=1}^N a_j \cos(\omega_j t)$ and
$\sum_{j=1}^N b_j \sin(\omega_j t)$, we can instead find
$\sum_{j=1}^N \hat{a}_j \exp(i\omega_j t)$ and then use the real and imaginary
parts to represent the cosine and sine components.  This gives us
a quick shorthand for representing our results as sines and cosines.

Fourier transform in continuous form
Bracewell’s nice book on the Fourier transform refers to the data as $f(x)$ and its Fourier transform as $F(s)$, where $x$ could be interpreted as time, for example, and s as frequency. Here I’ve rewritten to roughly use Bendat and Piersol’s notation. In continuous form, the Fourier transform of $x(t)$ is $X(\omega)$ (where $\omega = qf_1$), and the process can be inverted to recover $x(t)$.


\begin{eqnarray}
X(\omega) & = & \int_{-\infty}^{\infty} x(t) e^{-i2\pi t\omega}\, dt \hspace{3cm} (16) \\
x(t) & = & \int_{-\infty}^{\infty} X(\omega)  e^{i2\pi t\omega}\, d\omega \hspace{3cm} (17)
\end{eqnarray}

(following Bracewell).
But there are lots of alternate definitions in the literature:

\begin{eqnarray}
X(\sigma) & = & \int_{-\infty}^{\infty} x(t) e^{-it\sigma}\, dt \hspace{3cm} (18)\\
x(t) & = & \frac{1}{2\pi} \int_{-\infty}^{\infty} X(\sigma)  e^{it\sigma}\, d\sigma \hspace{3cm} (19)
\end{eqnarray}

or

\begin{eqnarray}
X(\sigma) & = & \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} x(t) e^{-it\sigma}\, dt \hspace{3cm} (20)\\
x(t) & = & \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} X(\sigma)  e^{it\sigma}\, d\sigma \hspace{3cm} (21)
\end{eqnarray}

So we always have to be careful about our syntax. Given the vast array of notation, we’re going to try very hard to stick to Bendat and Piersol’s
forms:

\begin{eqnarray}
X(f) & = & \int_{-\infty}^{\infty} x(t) e^{-i2\pi f t}\, dt \hspace{3cm} (22)\\
x(t) & = & \int_{-\infty}^{\infty} X(f)  e^{i2\pi ft}\, df \hspace{3cm} (23)
\end{eqnarray}

The same questions about choices of notation apply in the discrete form that we consider when we analyze data. And we can get ourselves really confused. So we have to keep in mind one rule: we don’t get to create energy. That means that we need to have the same total variance in our data set in the time domain as we have in the frequency domain. This is Parseval’s theorem, and we’ll return to it.