### Example for removing land areas from the data array and reshaping it back to regular data grid

To demonstrate, we first create a simple example data grid (map) with latsize = 2, lonsize = 5. We also create two mask arrays, one for ocean and one for land areas in the mini-map.

In [256]:
import numpy as np

# Create testing dataset (mini-map, only one dimension) and related ocean and land mask

# Create data grid: (latsize by lonsize)
dataGrid = np.array([[1,2,3,4,5],[6,7,8,9,10]])   

# Create land/ocean mask for data grid: (latsize by lonsize)
mask = np.array([[1,0,1,1,1],[1,0,0,0,1]]) # mask for ocean area = 1, land = 0
mask_Ocean = mask == 1                     # mask for ocean area = True, land = False
mask_Land = ~mask_Ocean

Now, we need to flatten the data grid into a vector, so we can store it into a data matrix. In our case, n=2x5=10 and m = 1 (here we have only one time step, later you have to expand this example to multiple time steps or dimensions).

In [257]:
# Reshaping and/or flattening the data grid into a vector: (latsize*lonsize by 1)
X = dataGrid.copy()
X = X.reshape(10,1)
X_MaskOcn = mask_Ocean.flatten()

In [213]:
X

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10]])

In [216]:
X_MaskOcn

array([ True, False,  True,  True,  True,  True, False, False, False,
        True])

In [218]:
X_Ocean = X[X_MaskOcn]
X_Ocean

array([[ 1],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [10]])

Now we can perform PCA/SVD with the data matrix X_Ocean.

After PCA, our results are in a datamatrix of shape X_Ocean, and we want to reconstruct the grid. Here we skip the PCA, and just copy the content from the original data vector and assume it is our svd output.


In [219]:
X_Ocean_svd = X_Ocean.copy()
X_Ocean_svd

array([[ 1],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [10]])

Now we create an empty grid `dataGrid_svd`, which should have the same shape as our original map, and which should be available to be filled with the output from the SVD for the ocean areas:

In [220]:
dataGrid_out = np.zeros(dataGrid.shape)    # empty grid of original shape lat by lon
dataGrid_out

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Then we can fill this grid with the output of the SVD stored in `X_Ocean_svd`. We can assign all land areas with nan, using the land mask.

Then, to fill the ocean values into the output grid, we have to use the masking approach in a way so we fill only the ocean cells with the results from the svd:

In [262]:
dataGrid_out[mask_Land] = np.nan
dataGrid_out[mask_Ocean] = X_Ocean_svd.flatten()
dataGrid_out

array([[ 1., nan,  3.,  4.,  5.],
       [ 6., nan, nan, nan, 10.]])

Now, since we kept original values in the X_Ocean_svd (later should contain the svd output), we can see that the output data grid contains again the correct values at each grid cell.

## Same process, but for multiple columns in X (m>1)

The next step would be to expand this to a datamatrix of higher dimensions, with m>1, hence, more than one time step. The concept is the same (using the vectorized ocean mask), however, we have to index & mask the extended dimesnsions in the data arrays correctly.

To begin, we generate a data matrix of m=2 and copy a second map into the data grid (data values of the first map are simply are multiplied by 10 in the second map.)

In [265]:
dataGrid_2D = np.vstack(([dataGrid],[dataGrid*10]))
dataGrid_2D

array([[[  1,   2,   3,   4,   5],
        [  6,   7,   8,   9,  10]],

       [[ 10,  20,  30,  40,  50],
        [ 60,  70,  80,  90, 100]]])

In [266]:
timelen, latlen, lonlen = dataGrid_2D.shape
timelen, latlen, lonlen 

(2, 2, 5)

Now.

In [267]:
X_2D = dataGrid_2D.copy()
X_2D = X_2D.reshape((timelen,latlen*lonlen))
X_2D = X_2D.transpose() 
X_2D

array([[  1,  10],
       [  2,  20],
       [  3,  30],
       [  4,  40],
       [  5,  50],
       [  6,  60],
       [  7,  70],
       [  8,  80],
       [  9,  90],
       [ 10, 100]])

Now, the mask needs to be applied to each column in the data matrix. 

In [268]:
X_MaskOcn

array([ True, False,  True,  True,  True,  True, False, False, False,
        True])

In [269]:
X_2D_Ocean = X_2D[X_MaskOcn,:]
X_2D_Ocean

array([[  1,  10],
       [  3,  30],
       [  4,  40],
       [  5,  50],
       [  6,  60],
       [ 10, 100]])

In [270]:
dataGrid_out_2D = np.zeros(dataGrid_2D.shape)    # empty grid of original shape lat by lon
dataGrid_out_2D

array([[[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]]])

In [271]:
dataGrid_out_2D[:,mask_Land] = np.nan
dataGrid_out_2D[:,mask_Ocean] = X_2D_Ocean.transpose()
dataGrid_out_2D

array([[[  1.,  nan,   3.,   4.,   5.],
        [  6.,  nan,  nan,  nan,  10.]],

       [[ 10.,  nan,  30.,  40.,  50.],
        [ 60.,  nan,  nan,  nan, 100.]]])