Skip to content

Clustering

Joaquin Bedia edited this page Feb 14, 2020 · 7 revisions

Cluster Analysis

Clustering is broadly classified as an unsupervised learning family of algorithms. The aim of Cluster Analysis is to create subgroups that will contain observations as similar as possible to each other within a given set of observations. Clustering techniques are routinely used in climate science for Circulation Weather Typing. Circulation types (CTs) and Weather types (WTs) provide a classification of atmospheric circulation patterns in synoptic climatology. In a nutshell, CTs refers to clusters of variables at atmospheric levels (also surface atmospheric pressure) and WTs refers to clusters of variables at surface level (e.g.: precipitation, near-surface temperature etc.).

Function clusterGrid

The function clusterGrid is the workhorse for the application of different clustering techniques. The primary input is a climate4R grid, possibly containing multiple variables (multigrid) and/or members. The basic preprocessing operations required are undertaken under the hood to ease its application. For example, scaling of the input variables is internally undertaken via the scaleGrid function. In addition to model training (i.e., cluster analysis of a given dataset), prediction of new data is straightforward via the newdata argument, allowing its application in different research applications, including seasonal forecasting or climate change studies. The output will be a climate4R grid containing either the training or prediction data plus the clustering analysis results saved as attributes, where attribute wt.index may have special interest among all. It is also possible to assign computed CTs to an additional input grid of an arbitrary variable on a daily basis in order to obtain WTs (argument y).

The specific clustering techniques currently available through clusterGrid are next enumerated. Follow the links for more specific details and worked examples.