# Gaussian Processes & Environmental Data

<hr>

**Why Environmental Data?**<br>

By using data such as air/water quality, temperature and other measurements, we can do the following:

- Understand underlying processes, changes (e.g. climate change, *'statistics of weather over time'*)
- Impacts on environment, health, economics, society
- Shape policies
- Forecast events, warnings (*e.g. seismic network, storms, ...*)
- Resource/energy management, *e.g. water, renewable energies*
- Use in planning, routing, backtracking, control

The underlying questions and techniques are easily transferrable to other domains. Some of the questions asked are:

- Relationships (correlations, association) in temporal, spatial dimensions
- trends; forecasting
- planning (*e.g. where to place more sensors for better measurements*)
- quantifying uncertainty, adaptive sensing

For example, in environmental data, we'll be interested in some of these questions:

- Predicting animal populations: Usually, animal populations serve as indicator of healthy ecosystems.

- Predicting disease: Environmental data can provide contextual information about disease spreading. For example, how temperature changes and rainy seasons impact the spread of malaria in the tropics.

- Renewable energy predictions: Clear understanding of wind speeds, water levels in damps, or underground thermal activity provide information for renewable energy decisions.

- Predicting extreme events: extreme events like storms and floods are usually modeled and predicted to incorporate action plans.

- Policy making: for example pollution concentrations measurements in the air can help the design of guidelines for industry contaminants. 

Examples of modeling flow data:

- Forward prediction: simulate (propogate) distributions, including variation in time
- Hindcasting (backtracking): where did this object initially come from?

<img alt="Flow Example" src="assets/flow_example.png" width="300">

*Example: Malaysian Airlines Flight 370*

<img alt="MH370" src="assets/MH370.png" width="300">

<hr>

**Spatial Correlation**

Typically metrics are highly correlated when pair distances are small. For example, temperature patterns between two relatively close locations would be likely similar. 

$\therefore$ Intuition: Correlation is a function of distance. How do we then estimate metrics in other locations?

Given temperature (*or other measurements*) of a particular location and the covariance matrix between a pair of locations, how do we derive the conditional distribution of the temperature at another location?

<img alt="Conditional Distribution of a Multivariate Normal" src="assets/conditional_multivariate_normal.jpg" width="300">

Model each location as a Gaussian random variable and formulate a multivariate Gaussian, $X \sim N(\mu_k, \Sigma_k)$

The PDF of a multivariate Gaussian RV, $p(x | \mu, \Sigma) = \frac{1}{(2\pi^{n/2}) \lvert \Sigma \rvert^{1/2}} \exp (-\frac{1}{2} (x - \mu)^T \Sigma^{-1} (x - \mu))$

where $\mu = \mathbb {E}[X]$, $\lvert \Sigma \rvert^{1/2}$ denotes the determinant of the covariance matrix $\Sigma$

Properties of the covariance operator:
- $\textsf{Cov}(X,X) = \sigma _ X^2$
- $\textsf{Cov}(aX + bY,cW +dV) = ac*\textsf{Cov}(X,W)+ad*\textsf{Cov}(X,V)+bc*\textsf{Cov}(Y,W)+bd*\textsf{Cov}(Y,V)$

A 2-dimensional Multivariate Gaussian PDF as an example:

$
\displaystyle  p\left( \begin{bmatrix}  {x}_1 \\ {x}_2 \end{bmatrix} \mid \begin{bmatrix}  \mu _1\\ \mu _2 \end{bmatrix} , \begin{bmatrix}  {\sigma }_{11} &  {\sigma }_{12} \\ {\sigma }_{21} &  {\sigma }_{22} \end{bmatrix} \right) =\frac{1}{2\pi ( {\sigma }_{11}{\sigma }_{22} -{\sigma }_{12}{\sigma }_{21} )^{1/2}} \exp \left(-\frac{1}{2}\left( \begin{bmatrix}  {x}_1 \\ {x}_2 \end{bmatrix} - \begin{bmatrix}  \mu _1\\ \mu _2 \end{bmatrix} \right) ^\intercal \begin{bmatrix}  {\sigma }_{11} &  {\sigma }_{12} \\ {\sigma }_{21} &  {\sigma }_{22} \end{bmatrix} ^{-1}\left( \begin{bmatrix}  {x}_1 \\ {x}_2 \end{bmatrix} - \begin{bmatrix}  \mu _1\\ \mu _2 \end{bmatrix} \right)\right).
$


The conditional expectation of $\mu_{A|B}$ is equal to $\mu_A + \frac{\sigma_{AB}}{\sigma_B^2} (y_B - \mu_B)$ and $\therefore$ the conditional expectation shifts by the correlation between the pair

The conditional variance reduces given that we know a related variable which gives us more information about the estimate, $\sigma_{A|B}^2 = \sigma_A^2 - \sigma_{AB} \sigma_{B}^{-2} \sigma_{AB}$

<hr>

# Basic code
A `minimal, reproducible example`

_Question 1_

Assume the temperature in City 1 $X_1$ is a Gaussian random variable with mean $\mu_1 = 60$ and $\sigma_1 = 10$, and that the temperature of City 2 is $X_2$ is a Gaussian random variable with mean $\mu_2 = 90$ and $\sigma_2 = 20$. Moreover, we know that the covariance between $X_1$ and $X_2$ is 100. Today, we have observed that the temperature in City 2 is 75. What is the probability that the new temperature in the City 1 is bigger than 56.25?

In [7]:
# Compute conditional expectation
mu_1, sigma_1 = 60, 10
mu_2, sigma_2 = 90, 20
cov = 100
observed_2 = 75

conditional_mu = mu_1 + cov/(sigma_2**2) * (observed_2 - mu_2)
print(conditional_mu)

56.25
