# Empircal Orthogonal Function Analysis (EOFs)
Also called Principal Component Analysis (PCA)

#### Framing of the problem:

In climate, we often have lots of data that varies (and co-varies) in space and time.  For example, we have our monthly precipitation data as time series of maps with dimensions `[time, lat, lon]`. 

We want to understand the variability of the precipitation and answer questions like: 

* Why does it rain more or less at times in this location or that location? 

* What large-scale patterns are there that are associated with more or less rainfall in certain regions?  

* Is there any regularity in time about when it rains more or less?

It is impossible to look at thousands or tens of thousands of maps or even movies of our data and understand this.

__Climate data is complicated because it varies in space and time__

### We use EOFs to simplify our data

One thing we do to manage this it to simplify it by trying to identify the patterns in the data that are associated with the largest amount of variability and we want each of the patterns to be unrelated to each other (this is called orthogonality). 

__What do I mean by this?__

We want to identify some simpler set of spatial patterns (i.e. maps) that explain the most variability and a corresponding timeseries that tells us how that spatial pattern varies.  We want each spatial pattern to tell us something different than the other spatial patterns.

### Overview Summary 

EOFs will:

* Find the spatial patterns of variabilty
* Find their time variation
* Give a measure of importance of each pattern

You can think of EOFs as:

* a method for simplifying our data (data reduction method)
* a way of identifying spatial and temporal patterns of importance (in terms of variance) in climate data 

### What is it?  How does it work?

_This is a high-level explanation designed to not require extensive math.  The mathematical explanation is left for statistics class._

Given data $X$ with mean and trend removed and with dimensions `[time,space]`, the data can be re-written as:

$ X[time,space] = PC[time,mode] x EOF^T[mode,space] $

EOFs are calculated by identifying the eigenvalues and eigenvectors of the covariance matrix subject to the constraint of orthogonality.

The `covariance matrix` is a way of containing all the information about how the data co-varies with itself in space and time.

The `eigenvectors` identifies a `direction` in our data where the variance is largest based on our `covariance matrix`. For us, the `direction` corresponds to spatial dimensions of our data. The `eigenvectors` tell us which spatial patterns are most important from a variance perspective.  The `orthogonality` constraint ensures we identify independent directions/spatial patterns.

The `eigenvalues` measure the importance of the `eigenvector`, so they tell us a ranking of how important are the spatial patterns identified by the `eigenvectors`.  