PMP Json Formatted I/O
======================

# Reading in PMP's json files

This section will sohw how to read in json files generated by PMP and select pieces of it.

We are assuming you're running this notebook from its directory in the [pcmdi_metrics](https://github.com/pcmdi/pcmdi_metrics) repo


In [1]:
# Setup the notebook
from __future__ import print_function
import pcmdi_metrics
json1 = "../../tests/pcmdi_install_test_results/metrics_results/installationTest/tas_2.5x2.5_regrid2_linear_metrics.json"
json2 = "../../tests/pcmdi_install_test_results/metrics_results/installationTest/tos_2.5x2.5_esmf_linear_metrics_2.json"


  data = yaml.load(f.read()) or {}
  defaults = yaml.load(f)


## Reader object

Let's create our Json reader object by pointing it to the desired files

In [2]:
J1 = pcmdi_metrics.io.base.JSONs([json1, json2])

## Querying the reader object

Let's query the object, first what are the axes available?
i.e what is the overall json structure of the files read in

In [3]:
J1.getAxisIds()

['variable', 'model', 'reference', 'rip', 'region', 'statistic', 'season']

Now we get a little more info by getting the actual cdms2 axes that would be generated by reading everything in.

Note now that the axes length include the **total** possible number of values, for example in this example each file contains **ONE** variable , but the resulting reading would have **TWO** variables

In [4]:
J1.getAxisList()

[   id: variable
    Length: 2
    First:  tas
    Last:   tos
    Python id:  0x7fe15e1d9ba8,    id: model
    Length: 2
    First:  GFDL-ESM2G
    Last:   GFDL-ESM2Gb
    Python id:  0x7fe15e1d9a58,    id: reference
    Length: 2
    First:  SimulationDescription
    Last:   defaultReference
    Python id:  0x7fe15e1d9940,    id: rip
    Length: 2
    First:  r1i1p1
    Last:   r2i1p1
    Python id:  0x7fe15e1d9e10,    id: region
    Length: 6
    First:  NHEX
    Last:   terre
    Python id:  0x7fe15e1d9908,    id: statistic
    Length: 16
    First:  bias_xy
    Last:   std_xyt
    Python id:  0x7fe15e1d9c18,    id: season
    Length: 5
    First:  ann
    Last:   son
    Python id:  0x7fe15e1d9da0]

We can also retrieve only a specific axis

In [5]:
J1.getAxis("statistic")

   id: statistic
   Length: 16
   First:  bias_xy
   Last:   std_xyt
   Python id:  0x7fe15e1d99b0

Let's print all the values in the axis

In [6]:
J1.getAxis("statistic")[:]

array(['bias_xy', 'cor_xy', 'mae_xy', 'mean-obs_xy', 'mean_xy',
       'rms_devzm', 'rms_xy', 'rms_xyt', 'rms_y', 'rmsc_xy', 'std-obs_xy',
       'std-obs_xy_devzm', 'std-obs_xyt', 'std_xy', 'std_xy_devzm',
       'std_xyt'], dtype='<U16')

## Reading in data

### All of it

Now let's read **everything** in

In [7]:
data = J1()
data.shape

(2, 2, 2, 2, 6, 16, 5)

### Getting only some elements

#### For one dimension

But we might not be interested in everything, let's subset the **statistics** dimension
Note that the output array now went from 16 in length for the dimenson, down to 2

In [8]:
data = J1(statistic=['rms_xy','std_xy'])
print(data.shape)
data.getAxis(-2)[:]

(2, 2, 2, 2, 6, 2, 5)


array(['rms_xy', 'std_xy'], dtype='<U6')

#### For multiple dimensions

Of course we can subset multiple axes at once. Notice that now region is smaller as well.

In [9]:
data = J1(statistic=['rms_xy','std_xy'], region=['NHEX', 'global', 'terre'])
print(data.shape)
data.getAxis(-3)[:]

(2, 2, 2, 2, 3, 2, 5)


array(['NHEX', 'global', 'terre'], dtype='<U6')

#### Reordering the elements

One can also re-order the axes as it is read in. Notice 'terre' is now first.

In [10]:
data = J1(statistic=['rms_xy','std_xy'], region=['terre', 'NHEX', 'global'])
print(data.shape)
data.getAxis(-3)[:]

(2, 2, 2, 2, 3, 2, 5)


array(['terre', 'NHEX', 'global'], dtype='<U6')

Sometimes it can be useful to *join* two or many dimensions together.
For example let's merge `model` and `rip` together.

In [11]:
#### Merging dimensions together

##### Two dimensions merged in one

data = J1(merge=['model','rip'])
print(data.shape)
data.getAxis(1)[:]

(2, 3, 1, 6, 16, 5)


  dout = self.data[indx]
  mout = _mask[indx]


array(['GFDL-ESM2G_r1i1p1', 'GFDL-ESM2Gb_r1i1p1', 'GFDL-ESM2Gb_r2i1p1'],
      dtype='<U18')

Notice that the resulting array now has 6 dimensions rather than 7.

The newly constructed array takes all possible values of `model` and match them with all possible of `rip`.

Also notice that while it will create all possible combinations of model/rip, `GFDL-ESM2G_r2i1p1` was excluded has the combination contained no valid data.

#### Order matters

Let's switch the order we combine these dimensions

In [12]:
data = J1(merge=['rip','model'])
print(data.shape)
data.getAxis(1)[:]

(2, 3, 1, 6, 16, 5)


array(['r1i1p1_GFDL-ESM2G', 'r1i1p1_GFDL-ESM2Gb', 'r2i1p1_GFDL-ESM2Gb'],
      dtype='<U18')

Notice that the axis values now start with the values of `rip` combined with with `model` values. Which is the opposite of the previous example.

### Combining more than 2 dimensions together

You can also merge more than 2 dimensions together, agin the axis values will be all the possible (not totally empy) matrix combinations

In [13]:
data = J1(merge=['statistic', 'region', 'season'])
print(data.shape)
data.getAxis(-1)[:]


(2, 2, 1, 2, 312)


array(['bias_xy_NHEX_ann', 'bias_xy_NHEX_djf', 'bias_xy_NHEX_jja',
       'bias_xy_NHEX_mam', 'bias_xy_NHEX_son', 'bias_xy_SHEX_ann',
       'bias_xy_SHEX_djf', 'bias_xy_SHEX_jja', 'bias_xy_SHEX_mam',
       'bias_xy_SHEX_son', 'bias_xy_TROPICS_ann', 'bias_xy_TROPICS_djf',
       'bias_xy_TROPICS_jja', 'bias_xy_TROPICS_mam',
       'bias_xy_TROPICS_son', 'bias_xy_global_ann', 'bias_xy_global_djf',
       'bias_xy_global_jja', 'bias_xy_global_mam', 'bias_xy_global_son',
       'bias_xy_ocean_ann', 'bias_xy_ocean_djf', 'bias_xy_ocean_jja',
       'bias_xy_ocean_mam', 'bias_xy_ocean_son', 'bias_xy_terre_ann',
       'bias_xy_terre_djf', 'bias_xy_terre_jja', 'bias_xy_terre_mam',
       'bias_xy_terre_son', 'cor_xy_NHEX_ann', 'cor_xy_NHEX_djf',
       'cor_xy_NHEX_jja', 'cor_xy_NHEX_mam', 'cor_xy_NHEX_son',
       'cor_xy_SHEX_ann', 'cor_xy_SHEX_djf', 'cor_xy_SHEX_jja',
       'cor_xy_SHEX_mam', 'cor_xy_SHEX_son', 'cor_xy_TROPICS_ann',
       'cor_xy_TROPICS_djf', 'cor_xy_TROPICS_jja', 'cor

#### Merging and subsetting can be combined

You can also shuffle the order and number of values for each dimension

In [14]:
data = J1(merge=['statistic', 'region', 'season'],
          season=['mam','djf'], statistic=['rms_xy', 'cor_xy'], region=['global','TROPICS'])
print(data.shape)
data.getAxis(-1)[:]


(2, 2, 1, 2, 8)


array(['rms_xy_global_mam', 'rms_xy_global_djf', 'rms_xy_TROPICS_mam',
       'rms_xy_TROPICS_djf', 'cor_xy_global_mam', 'cor_xy_global_djf',
       'cor_xy_TROPICS_mam', 'cor_xy_TROPICS_djf'], dtype='<U18')

#### Multiple ***combined dimensions*** can be created (and subsetted) at once

You can also merge multiple dimensions at once:

In [15]:
data = J1(merge=[['model','rip'], ['statistic', 'region', 'season']],
          season=['mam','djf'], statistic=['rms_xy', 'cor_xy'], region=['global','TROPICS'])
print(data.shape)

(2, 3, 1, 8)
