# Deep Learning for Geo/Environmental sciences

<center><img src="../logo_2.png" alt="logo" width="600"/></center>
*Created with ChapGPT

## What is machine learning?

<center><img src="_images/machine_learning_2x.png" alt="xkcd machine learning" width="600"/></center>
*XKCD 1838

## What is machine learning?

<center><img src="_images/ML_paradigm.png" width="600"/></center>

This is true for science as well as programming. Rather than positing a functional form based on a scientific hypothesis, we provide 'answers/labels' along with our data in order to discern rules.

## What is machine learning?

<center><img src="_images/scatterplot_matrix.png" width="800"/></center>

Machine learning can also be thought of as high-dimensional statistics. We are looking to distill key features and information directly from the data. "Machine Learning: A probabilistic perspective" by Kevin Murphy is a great source for this point of view.

## What is machine learning?

<center><img src="_images/AI_subsets.png" width="800"/></center>


Machine learning is a subset of the broader field of Artificial Intelligence first imagined by Alan Turing. It ecompases a range of different tasks, and at present represents the majority of AI research. 

## Different classes of ML

<center><img src="_images/ML_overview.png" alt="xkcd machine learning"/></center>


In addition, supervised learning can include object detection and image segmentation. Unsupervised learning also includes self-supervised learning in which we use inductive biases to encourage certain desirable behaviours.

### Supervised techniques

Regression and classification share a common toolset:
 - Linear models (including LASSO, ridge, SVM etc)
 - Decision tree and ensemble based
 - Gaussian process
 - Neural network (NN)

Image detection / segmentation can be done using 'traditional' image processing techniques such as edge-finding and watershedding, but are increasingly primarily done with Deep NNs.


### Unsupervised techniques

Clustering is a classical problem with many possible approaches depending on the dataset size and dimensionality:
 - k-means
 - DBSCAN
 - Gaussian mixture models
 - Aglomorotive clustering

### Unsupervised techniques

There are also many dimensionality reduction techniques:
 - Principle component analysis (PCA) / empirical orthogonal functions (EOF)
 - Kernel PCA
 - Density estimation
 - Gaussian mixture models

(Deep) NNs provide many other ways of learning underlying symetries in data, including self-supervised and semi-supervised approaches. If such symetries are learnt probabilistically then we call sampling from those models 'generative'. More on this later...!

### Reinforcement learning

Reinforcement learning is somewhat different - it aims to teach an agent how to behave in a dynamic environment to maximise some abstract, or delayed, reward.

From Wikipedia:

> It differs from supervised learning in not needing labelled input/output pairs to be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) with the goal of maximizing the long term reward, whose feedback might be incomplete or delayed.

These approaches are what underpin the success in playing Chess, Go and other games and seen as promising approaches for real world agents and robots, but training can be very fiddly.

## Machine Learning in Environmental and Geo-sciences

The environmental and geo sciences are well poised to leverage these advances

<center><img src="_images/Reichstein_et_al.png" alt="Overview of data challenges in Geo" width=800/></center>

### Climate - Ocean - Atmosphere Program (COAP)

This is the section I know best, but also seems to be at the forefront currently, at least in applying big deep learning models. 

Examples include:
 - Data-driven weather models that beat the state-of-the-art physical models
 - New ML parameterizations that allow representation of more detailed physical processes
 - Large and broad efforts to better leverage large volumes of remote sensing data
 - Promising new approaches for assimilating these observations into hybrid-physical models

Some relevant literature:
 - https://royalsocietypublishing.org/toc/rsta/2021/379/2194
 - https://iopscience.iop.org/article/10.1088/1748-9326/ab4e55
 - https://www.nature.com/articles/s41586-019-0912-1#Sec17
 - +One of the other big review papers and some specific examples

### Climate - Ocean - Atmosphere Program (COAP)

Later in the course we will reproduce and explore ClimateBench - a climate model emulation benchmark.

We will also look at techniques for unsupervised learning of satellite imagery

<center><img src="_images/satellite_clusters.png" alt="satellite clusters" width=600/></center>

### Geosciences of the Earth, Oceans and Planets (GEO)

Probably the most statistically literate field, Geo has been leverage ML approaches for some time (and invented Gaussian process regression!) and there have been plenty of applications of these approaches to date:
 - High resolution subsurface structure using active seismic sources in exploration geophysics
   - Pioneered the development of Physics Informed Neural Networks (PINNs)
 - Microearthquake detection, and earthquake early warning?
 - Large spatial scale remote sensing imagery classification

What did I miss?!

 - https://geo-smart.github.io/usecases
 - https://www.science.org/doi/10.1126/science.abm4470
 - https://www.science.org/doi/10.1126/sciadv.1700578
 - https://arxiv.org/abs/2006.11894
 - https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2021RG000742


### Ocean Biosciences Program (OBP)

Again, lots of potential applications in this data-rich field:
 - Detection and classification of ocean life at all scales in remote and sub-surface imagery
 - Improved modelling and network analysis
 - New opportunities to harness large volumes of acoustic data

We will explore this last example in detail later in the course - applying CNNs to the detection of whale song just off the coast of California

<center><img src="_images/whale_calls.png" alt="whale calls" width=800/></center>

And to the detection of Plankton in under-water imagery:

<center><img src="_images/plankton_examples.gif" alt="whale calls" width=600/></center>

 - https://spo.nmfs.noaa.gov/sites/default/files/TMSPO199_0.pdf
 - https://ieeexplore.ieee.org/abstract/document/7404607
 - Michaela's paper?


Ask for examples from the students

## What is deep learning?

The phrase 'Deep learning' relates to the use of deep neural networks (10-100 layers) for a given machine learning task. It has become synonymous with many aspects of machine learning because of these models flexibility and performance across a wide range of tasks.

In this course we will mostly focus on these models because of their flexibility, and some of the unique challenges in training and using them in scientific applications

### Why use deep networks?

The *universal approximation theorem* tells us that any continuous 1D function can be approximated by an infinitely wide shallow Neural Network. So why go deeper?

<center><img src="_images/shallow_network.png" alt="shallow network" width=600/></center>

### Why use deep networks?

While shallow NNs can be very effective regressors, it turns out that they can be impractical for some functions - requiring enourmous numbers of hidden units.

In fact, some functions can be modelled much more efficiently by increasing the number of layers, rather than the width of the layers - particularly those with large numbers of inputs

<center><img src="_images/deep_network.png" alt="deep network" width=600/></center>

Note, we won't actually draw out deep networks like this because drawing all the links becomes very tedious! Rather we schematically draw out the structure with the main focus being representing the shape of the matrices in each layer.

### Why use deep networks?

Historically it was challenging to train deep networks because of the relatively larger number of parameters required to get good results. GPUs made this much less of a problem

They became especially prevalent in modelling imagery because of the high dimensional input space, and the advent of convolutional layers which encode powerful inductive biases (even without training). We'll discuss these more later in the course

Practically it has been found that deep networks both train more easily, and generalize better than shallow ones. This is likely due to over-parameterization, but is not well understood and we'll also return to the topic at the end of the course

### Why use deep networks?

More prosaicly, they work! Deep NNs are the core of almost all of the most recent successes in AI and machine learning.

While the theoretical understanding is not as mature for these structures they have repeatadly demonstrated their utility in a wide variety of tasks. 

Hence, constructing and working with these models is often more akin to engineering than science. Which is why this course will focus on hands-on experience over theoretical underpinning

### Next week - data!