# Manipulating Variables

In this notebook we show some tricks on how to deal with variables.

most of cdms2 variable manipulation tools are in the `MV2` package. `MV2` replicates a lot of `numpy`'s functionalities.

In [1]:
import MV2

## Creating a variable from scratch

In some case you will need to create a variable from scratch, usually because it comes from a different source or package

### Step 1: Create the Transient Variable

First let's create 120 years of random temperature on a 4x5 grid (4 degree in latitude, 5 in longitude), over 17 levels.

Values will range between -20C and 40C (253K and 313K)

In [2]:
import numpy

ta_raw = numpy.random.random((120, 17, 45, 72)) * 60. + 273.15

ta = MV2.array(ta_raw)

Unfortunately this *variable* is pretty much useless as is.

In [3]:
ta.info()

*** Description of Slab variable_2 ***
id: variable_2
shape: (120, 17, 45, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: N/A
grid_type: N/A
time_statistic: 
long_name: 
units: 
tileIndex: None
No grid present.
** Dimension 1 **
   id: axis_0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x2aaacf907898
** Dimension 2 **
   id: axis_1
   Length: 17
   First:  0.0
   Last:   16.0
   Python id:  0x2aaacf907e10
** Dimension 3 **
   id: axis_2
   Length: 45
   First:  0.0
   Last:   44.0
   Python id:  0x2aaacf907e80
** Dimension 4 **
   id: axis_3
   Length: 72
   First:  0.0
   Last:   71.0
   Python id:  0x2aaacf907ef0
*** End of description for variable_2 ***


### Step 2 variable attributes

But at least it has all the `properties` of a cdms transient variables, now let's start to decorate our variable to give it some sense

In [6]:
ta.id = 'ta'  # let's give it a name
ta.units = 'K'  # S.I. units
ta.long_name = 'Air Temperature'
ta.comment = 'Randomly generated data'
ta.history = 'random function from numpy'

ta.info()

*** Description of Slab ta ***
id: ta
shape: (120, 17, 45, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: N/A
grid_type: N/A
time_statistic: 
long_name: Air Temperature
units: K
tileIndex: None
comment: Randomly generated data
history: random function from numpy
No grid present.
** Dimension 1 **
   id: axis_0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x2aaacf907898
** Dimension 2 **
   id: axis_1
   Length: 17
   First:  0.0
   Last:   16.0
   Python id:  0x2aaacf907e10
** Dimension 3 **
   id: axis_2
   Length: 45
   First:  0.0
   Last:   44.0
   Python id:  0x2aaacf907e80
** Dimension 4 **
   id: axis_3
   Length: 72
   First:  0.0
   Last:   71.0
   Python id:  0x2aaacf907ef0
*** End of description for ta ***


### Step 3: Dimensions

Ok we now can make some sense of what this data is, but we still have no idea of which spatio-temporal blob of temperature data we're talking about. Let's add some dimensions

Our data represents 10 years of monthly temperature averages

In [7]:
import cdms2
tim = cdms2.createAxis(list(range(120)))  # 120 months
tim.id = 'time'  # time axis
tim.units = "months since 2000"  # units
tim.designateTime()  # not necessary since cdms can dtermine this from id and units, but still worth doing

It is also important to set the bounds properly, for this we will use the `cdutil` module (more on this later)

In [8]:
import cdutil
cdutil.setTimeBoundsMonthly(tim)

We should probably also set a calendar, let's use `cdtime` (more on this later)

In [9]:
import cdtime
tim.setCalendar(cdtime.GregorianCalendar)

Finally `months since` while convenient when creating data is not a very good unit (the month length varies from month to month, year to year and even calendar to calendar). Let's use `days since` within the `toRelativeTime()` function, notice the bounds are properly converted

In [10]:
print(tim.getBounds()[:3])
tim.toRelativeTime("days since 2000")
print(tim.getBounds()[:3])

[[0. 1.]
 [1. 2.]
 [2. 3.]]
[[ 0. 31.]
 [31. 60.]
 [60. 91.]]


Now our data span the 17 standard pressure levels let's define the corresponding axis

In [16]:
levels = [1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 70, 50, 30, 20, 10 ]
lev = cdms2.createAxis(levels)
lev.id = "level"
lev.units="hPa"
lev.designateLevel()  # Not necessary since cdms can determine this from units and name already

Now let's create our spatial grid. We could similarly create both latitude and longitude as above, e.g:

In [17]:
import numpy
lat = cdms2.createAxis(numpy.arange(-88,88,4))
lat.id = "latitude"
lat.designateLatitude()
lat.units = "degrees_north"
# Now let's set the bounds
print("Original bounds:", lat.getBounds())
lat.setBounds(None)  # Force cdms creation of bounds
print("Auto generated bounds:", lat.getBounds())

Original bounds: [[-90. -86.]
 [-86. -82.]
 [-82. -78.]
 [-78. -74.]
 [-74. -70.]
 [-70. -66.]
 [-66. -62.]
 [-62. -58.]
 [-58. -54.]
 [-54. -50.]
 [-50. -46.]
 [-46. -42.]
 [-42. -38.]
 [-38. -34.]
 [-34. -30.]
 [-30. -26.]
 [-26. -22.]
 [-22. -18.]
 [-18. -14.]
 [-14. -10.]
 [-10.  -6.]
 [ -6.  -2.]
 [ -2.   2.]
 [  2.   6.]
 [  6.  10.]
 [ 10.  14.]
 [ 14.  18.]
 [ 18.  22.]
 [ 22.  26.]
 [ 26.  30.]
 [ 30.  34.]
 [ 34.  38.]
 [ 38.  42.]
 [ 42.  46.]
 [ 46.  50.]
 [ 50.  54.]
 [ 54.  58.]
 [ 58.  62.]
 [ 62.  66.]
 [ 66.  70.]
 [ 70.  74.]
 [ 74.  78.]
 [ 78.  82.]
 [ 82.  86.]]
Auto generated bounds: [[-90. -86.]
 [-86. -82.]
 [-82. -78.]
 [-78. -74.]
 [-74. -70.]
 [-70. -66.]
 [-66. -62.]
 [-62. -58.]
 [-58. -54.]
 [-54. -50.]
 [-50. -46.]
 [-46. -42.]
 [-42. -38.]
 [-38. -34.]
 [-34. -30.]
 [-30. -26.]
 [-26. -22.]
 [-22. -18.]
 [-18. -14.]
 [-14. -10.]
 [-10.  -6.]
 [ -6.  -2.]
 [ -2.   2.]
 [  2.   6.]
 [  6.  10.]
 [ 10.  14.]
 [ 14.  18.]
 [ 18.  22.]
 [ 22.  26.]
 [ 26.  30

And same for longitude, but another way is to use `cdms2` pre-built grids generation functions

In [18]:
grid = cdms2.createUniformGrid?

[0;31mSignature:[0m
[0mcdms2[0m[0;34m.[0m[0mcreateUniformGrid[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mstartLat[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnlat[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdeltaLat[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mstartLon[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnlon[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdeltaLon[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0morder[0m[0;34m=[0m[0;34m'yx'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmask[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Not documented
[0;31mFile:[0m      /opt/conda/lib/python3.7/site-packages/cdms2/grid.py
[0;31mType:[0m      Proxy


In [None]:
grid = cdms2.createUniformGrid

In [19]:
grid = cdms2.createUniformGrid(-88., 45, 4, 2.5, 72, 4)
lat = grid.getLatitude()
lon = grid.getLongitude()

Gaussian grids can also be generated based on their number of latitudes:

In [20]:
cdms2.createGaussianGrid?

[0;31mSignature:[0m [0mcdms2[0m[0;34m.[0m[0mcreateGaussianGrid[0m[0;34m([0m[0mnlats[0m[0;34m,[0m [0mxorigin[0m[0;34m=[0m[0;36m0.0[0m[0;34m,[0m [0morder[0m[0;34m=[0m[0;34m'yx'[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Create a Gaussian grid, with shape (nlats, 2*nlats).

Parameters
----------
nlats : is the number of latitudes.

xorigin : is the origin of the longitude axis

order : is either "yx" or "xy"
[0;31mFile:[0m      /opt/conda/lib/python3.7/site-packages/cdms2/grid.py
[0;31mType:[0m      Proxy


In [21]:
T42_gaussian_grid = cdms2.createGaussianGrid(64)

For non rectilinear grids (bi-polar, curvilinear, cube-sphere, etc...) please see [this tutorial](https://cdat.llnl.gov/Jupyter-notebooks/cdms/Creating_Non_Rectiilinear_Grids_From_Scratch/Creating_Non_Rectiilinear_Grids_From_Scratch.html)

At this point we have all of our axes let's *decorate* our variable.

We can do this one axis at a time:

In [22]:
ta.setAxis(0, tim)
ta.setAxis(1, lev)
# Negative indexing is also supported:
ta.setAxis(-1, lon)
ta.setAxis(-2, lat)

Or we can set all the axes at once:

In [23]:
ta.setAxisList([tim, lev, lat, lon])

Let's take a look at ta now:

In [24]:
ta.info()

*** Description of Slab ta ***
id: ta
shape: (120, 17, 45, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: Air Temperature
units: K
tileIndex: None
comment: Randomly generated data
history: random function from numpy
Grid has Python id 0x2aaacf966b38.
Gridtype: generic
Grid shape: (45, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  days since 2000
   Length: 120
   First:  0
   Last:   3622
   Other axis attributes:
      axis: T
      calendar: proleptic_gregorian
   Python id:  0x2aaacf966da0
** Dimension 2 **
   id: level
   Designated a level axis.
   units:  hPa
   Length: 17
   First:  1000
   Last:   10
   Other axis attributes:
      axis: Z
   Python id:  0x2aaad2e0cc88
** Dimension 3 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 45
   First:  -88.0
   Last:   88.0
   Other axis attributes:
      axis: Y
   Python id:  0x2aaad2e739b0
** Dim

Much better, but while we now understand its spatio-temporal representation, we still do not understand what this variable actually is. Let's further decorate it.

In [25]:
# Name:
ta.id = "ta"
ta.long_name = "Air Temperature"
# Units
ta.units = "K"

We could add as many attributes as we wish, but that's enough for our purpose here.

At this point we have a fully decorated variable that we can use.

`MV2` operations will preserve dimensions as much as possible e.g:

In [26]:
ta_C = ta - 273.15
ta_C.info()  # Note the disappearance of attributes though

*** Description of Slab variable_2 ***
id: variable_2
shape: (120, 17, 45, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: Air Temperature
units: K
tileIndex: None
comment: Randomly generated data
history: random function from numpy
Grid has Python id 0x2aaad2fa4358.
Gridtype: generic
Grid shape: (45, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  days since 2000
   Length: 120
   First:  0
   Last:   3622
   Other axis attributes:
      axis: T
      calendar: proleptic_gregorian
   Python id:  0x2aaacf966748
** Dimension 2 **
   id: level
   Designated a level axis.
   units:  hPa
   Length: 17
   First:  1000
   Last:   10
   Other axis attributes:
      axis: Z
   Python id:  0x2aaad2ddacf8
** Dimension 3 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 45
   First:  -88.0
   Last:   88.0
   Other axis attributes:
      axis: Y
   Python id:  0x2aa

We can quickly put the attributes back on if we wish. For more see [here](https://cdat.llnl.gov/Jupyter-notebooks/cdms/Redecorate+Transient+Variable/Redecorate+Transient+Variable.html)

In [27]:
for attribute in ta.attributes:
    print("ATTR:" ,attribute)
    setattr(ta_C, attribute, getattr(ta, attribute))
# id is a 'special' attribute, needs to be added manually
ta_C.id = "ta"
ta_C.info()

ATTR: name
ATTR: tileIndex
ATTR: units
ATTR: long_name
ATTR: comment
ATTR: history
*** Description of Slab ta ***
id: ta
shape: (120, 17, 45, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: Air Temperature
units: K
tileIndex: None
comment: Randomly generated data
history: random function from numpy
Grid has Python id 0x2aaad2fa4358.
Gridtype: generic
Grid shape: (45, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  days since 2000
   Length: 120
   First:  0
   Last:   3622
   Other axis attributes:
      axis: T
      calendar: proleptic_gregorian
   Python id:  0x2aaacf966748
** Dimension 2 **
   id: level
   Designated a level axis.
   units:  hPa
   Length: 17
   First:  1000
   Last:   10
   Other axis attributes:
      axis: Z
   Python id:  0x2aaad2ddacf8
** Dimension 3 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 45
   First:  -88.0
   Last: 

`MV2` gives you access to most numpy.ma functions

In [28]:
dir(MV2)

['AbstractRectGrid',
 'AbstractVariable',
 'CDMSError',
 'TransientAxis',
 'TransientVariable',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'absolute',
 'add',
 'all',
 'allclose',
 'allequal',
 'alltrue',
 'any',
 'arange',
 'arccos',
 'arcsin',
 'arctan',
 'arctan2',
 'argsort',
 'around',
 'array',
 'arrayrange',
 'asVariable',
 'as_masked',
 'asarray',
 'average',
 'axisAllclose',
 'axisConcatenate',
 'axisTake',
 'bitwise_and',
 'bitwise_or',
 'bitwise_xor',
 'byte',
 'ceil',
 'character',
 'choose',
 'commonAxes',
 'commonDomain',
 'commonGrid',
 'commonGrid1',
 'common_fill_value',
 'compress',
 'concatenate',
 'conjugate',
 'cos',
 'cosh',
 'count',
 'counter',
 'create_mask',
 'diagonal',
 'divide',
 'dot',
 'dump',
 'e',
 'equal',
 'exp',
 'fabs',
 'fill_value',
 'filled',
 'float',
 'float32',
 'float64',
 'floor',
 'floor_divide',
 'fmod',
 'fromfunction',
 'getNumericCompatibility',
 'get_print_limit',


Function not re-implemented will lose dimensions but will usually still be MV2 variables.

In [29]:
numpy.arccosh(ta).info()

*** Description of Slab ta ***
id: ta
shape: (120, 17, 45, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: Air Temperature
units: K
id: ta
tileIndex: None
comment: Randomly generated data
history: random function from numpy
Grid has Python id 0x2aaacf966b38.
Gridtype: generic
Grid shape: (45, 72)
Order: yx
** Dimension 1 **
   id: axis_0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x2aaad2fa4b38
** Dimension 2 **
   id: axis_1
   Length: 17
   First:  0.0
   Last:   16.0
   Python id:  0x2aaad2fa4cc0
** Dimension 3 **
   id: axis_2
   Length: 45
   First:  0.0
   Last:   44.0
   Python id:  0x2aaad2fa4cf8
** Dimension 4 **
   id: axis_3
   Length: 72
   First:  0.0
   Last:   71.0
   Python id:  0x2aaad2fa4d68
*** End of description for ta ***


At this point let's proceed to our [writing tutorial](03_Writing_Data_To_File.ipynb)