# Accessing Data With CDAT

Data I/O in CDAT is done via the `cdms2` module.

In [1]:
import cdms2

* In depth doc can be found at: https://cdms.readthedocs.io/en/latest/
* Some tutorials can be found at: https://cdat.llnl.gov/tutorials.html

## Opening a file

CDMS can open a variety of file format: NetCDF, Grads/Grib (with .ctl file), pp, cdms2's xml, OpenDAP

To *open* of file simply use `cdms2`'s `open` command. 

Similarly to Python's regular `open` command, this command does not load any data in memory it simply returns a *handle* to it.

Let's open one of our local files.

In [2]:
import os
home = os.path.expandvars("$HOME")
f = cdms2.open(os.path.join(home,"cmip6_data/CMIP6/CMIP/E3SM-Project/E3SM-1-0/piControl/r1i1p1f1/Amon/tas/gr/v20180608/tas_Amon_E3SM-1-0_piControl_r1i1p1f1_gr_000101-050012.nc_0"))

## Querying the file

Now that we have a handle to the file, let's query it a bit

### Global Attributes

First let's query the file's global attributes

In [4]:
print(f.attributes.keys())

dict_keys(['Conventions', 'activity_id', 'branch_method', 'branch_time_in_child', 'branch_time_in_parent', 'contact', 'creation_date', 'data_specs_version', 'description', 'experiment', 'experiment_id', 'external_variables', 'forcing_index', 'frequency', 'further_info_url', 'grid', 'grid_label', 'history', 'initialization_index', 'institution', 'institution_id', 'mip_era', 'nominal_resolution', 'parent_activity_id', 'parent_experiment_id', 'parent_mip_era', 'parent_source_id', 'parent_time_units', 'parent_variant_label', 'physics_index', 'product', 'realization_index', 'realm', 'references', 'source', 'source_id', 'source_type', 'sub_experiment', 'sub_experiment_id', 'table_id', 'table_info', 'title', 'variable_id', 'variant_label', 'license', 'cmor_version', 'tracking_id'])


You can then simply access the values as you would for a regular Python attribute:

In [5]:
print(f.institution)

LLNL (Lawrence Livermore National Laboratory, Livermore, CA 94550, USA); ANL (Argonne National Laboratory, Argonne, IL 60439, USA); BNL (Brookhaven National Laboratory, Upton, NY 11973, USA); LANL (Los Alamos National Laboratory, Los Alamos, NM 87545, USA); LBNL (Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA); ORNL (Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA); PNNL (Pacific Northwest National Laboratory, Richland, WA 99352, USA); SNL (Sandia National Laboratories, Albuquerque, NM 87185, USA). Mailing address: LLNL Climate Program, c/o David C. Bader, Principal Investigator, L-103, 7000 East Avenue, Livermore, CA 94550, USA


### Variables

Now let's query the variables contained in the file

In [7]:
print(f.variables.keys())

dict_keys(['time_bnds', 'lat_bnds', 'lon_bnds', 'height', 'tas'])


This file contains 5 variables. But 4 are really just dimensions related variables (bounds)

Let's obtain a handle to the `tas` variable. Nothing will be loaded in memory at this point

In [8]:
tas = f["tas"]

Here again we can query a variable's attributes

In [9]:
print(tas.attributes)

{'missing_value': array([1.e+20], dtype=float32), 'standard_name': 'air_temperature', 'long_name': 'Near-Surface Air Temperature', 'comment': 'near-surface (usually, 2 meter) air temperature', 'units': 'K', 'cell_methods': 'area: time: mean', 'cell_measures': 'area: areacella', 'history': "2018-06-08T17:59:44Z altered by CMOR: Treated scalar dimension: 'height'.", 'coordinates': 'height', '_FillValue': array([1.e+20], dtype=float32), 'ndim': 3}


Let's print a few attributes:

In [10]:
print("units:", tas.units)
print("longname:",tas.long_name)
# or
for att in ["comment", "standard_name"]:
    print("{}: {}".format(att, getattr(tas,att)))

units: K
longname: Near-Surface Air Temperature
comment: near-surface (usually, 2 meter) air temperature
standard_name: air_temperature


### Dimensions on a variable

Let's now query the variables dimensions

In [11]:
print(tas.getAxisIds())

['time', 'lat', 'lon']


We can bring in an axis via its index

In [12]:
print(tas.getAxis(2))

   id: lon
   Designated a longitude axis.
   units:  degrees_east
   Length: 256
   First:  0.0
   Last:   358.59375
   Other axis attributes:
      axis: X
      long_name: longitude
      standard_name: longitude
   Python id:  0x2aaaaf08de10



Or its name:

In [14]:
print(tas.getAxis(tas.getAxisIndex("lat")))

   id: lat
   Designated a latitude axis.
   units:  degrees_north
   Length: 129
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      axis: Y
      long_name: latitude
      standard_name: latitude
   Python id:  0x2aaaaf08def0



`time`, `level`, `latitude`, `longitude` if present can be access directly without need to know their exact name or index

In [15]:
print(tas.getTime())

   id: time
   Designated a time axis.
   units:  days since 0001-01-01 00:00:00
   Length: 6000
   First:  15.5
   Last:   182484.5
   Other axis attributes:
      calendar: noleap
      axis: T
      long_name: time
      standard_name: time
   Python id:  0x2aaaaf08de48



And here again we can query an axis attributes

In [16]:
time = tas.getTime()
print("Attributes:", time.attributes)
print("units:", time.units)

Attributes: {'units': 'days since 0001-01-01 00:00:00', 'bounds': 'time_bnds', 'calendar': 'noleap', 'axis': 'T', 'long_name': 'time', 'standard_name': 'time'}
units: days since 0001-01-01 00:00:00


## Bringing in the data

This concludes our first tutorial, please see [Bringing In The Data](01_Bring_In_Data.ipynb) next