In [1]:
from IPython.core.display import HTML
HTML("""<style>
.answers { 
    visibility: hidden;
}
</style>""")

In [2]:
from notebook.services.config import ConfigManager
icm = ConfigManager()
icm.update('livereveal', {
              'theme': 'simple',
              'transition': 'convex',
              'start_slideshow_at': 'selected'
});

# Python Common Data Model
Presented by: Barron H. Henderson, Byeong-Uk Kim

# Loading Plotting and Science Libaries

In [3]:
# Prepare my slides
%pylab inline
%cd working

Populating the interactive namespace from numpy and matplotlib
/Users/barronh/Development/RAQMSandPython/working


# Downloads DC3 Observations

In this exercise, we will use publicly available data as a playground for learning about the Common Data Model and how to use it in Python. Download Merged Observations from Deep Clouds, Convection, and Chemistry (DC3) data from the DC8 aircraft during the DC3 campaign.

In a minute, not yet, we are going to download the data.



In [4]:
%mkdir icartt

mkdir: cannot create directory ‘icartt’: File exists


In [5]:
from urllib.request import urlretrieve
url='http://www-air.larc.nasa.gov/cgi-bin/enzFile?e38EE03EFAE02C04F06E9647DAF98F48D6A2f7075622d6169722f534541433452532d4443332f4d45524745532f315f4d494e5554452e4443385f4d52472f6463332d6d726736302d6463385f6d657267655f32303132303531385f52375f7468727532303132303632322e696374'
dc3path = 'icartt/dc3-mrg60-dc8_merge_20120518_R7_thru20120622.ict'
import os
if not os.path.exists(dc3path):
    urlretrieve(url, filename = dc3path);

# Download CMAQ Benchmark

Benchmark data is available from CMAS's website. We're only using 1 day, so we won't download the whole thing.

# Downloading just what we need

In [18]:
from urllib.request import urlretrieve, url2pathname
import os
ftproot = 'ftp://data.as.essie.ufl.edu/pub/exch/CMAQandPython/'
aconcpath = ftproot +\
            'cmaq/CCTM_D502a_Linux2_x86_64intel.ACONC.GA12C_20060801.nc4'
metcro3dpath = ftproot +\
            'cmaq/METCRO3D_Benchmark.nc4'
urlpaths = [aconcpath, metcro3dpath]
for urlpath in urlpaths:
    local_filename = os.path.join('cmaq', os.path.basename(urlpath))
    if not os.path.exists(local_filename):
        urlretrieve(urlpath, filename = local_filename)

# CHECK POINT:
List the size of of the file you downloaded

In [19]:
import os
aconcpath = 'cmaq/CCTM_D502a_Linux2_x86_64intel.ACONC.GA12C_20060801.nc4'
metcro3dpath = 'cmaq/METCRO3D_Benchmark.nc4'
dc3path = 'icartt/dc3-mrg60-dc8_merge_20120518_R7_thru20120622.ict'
for path in [aconcpath, metcro3dpath, dc3path]:
    print('%5.1fM' % (os.path.getsize(path)/1024.**2), path)


253.5M cmaq/CCTM_D502a_Linux2_x86_64intel.ACONC.GA12C_20060801.nc4
 51.7M cmaq/METCRO3D_Benchmark.nc4
 30.6M icartt/dc3-mrg60-dc8_merge_20120518_R7_thru20120622.ict


# ANSWERS Hidden

<div class="answers">
```
253.5M cmaq/CCTM_D502a_Linux2_x86_64intel.ACONC.GA12C_20060801.nc4
 51.7M cmaq/METCRO3D_Benchmark.nc4
 30.6M icartt/dc3-mrg60-dc8_merge_20120518_R7_thru20120622.ict
```
</div>

# Intro to the Common Data Model

1. files and groups
2. dimensions
3. properties
4. variables
5. Conventions
  * IOAPI and WRF-IOAPI
  * COARDS
  * Climate Forecasting (CF) Conventions
6. Conceptualizing any data set as CDM

# NetCDF and PseudoNetCDF

- NetCDF is a library (and was a format) that utilizes the CDM
- NetCDF-Java extended the CDM to other formats in Java
- PseudoNetCDF extends the idea to atmospheric data of many formats in Python

(I/O API is a library and meta-data convention and library)


# [P]NC Tools

PseudoNetCDF has pncdump.py, pncgen.py, pncmap.py, and many more utilities that can be used from the command line or from within Python. These give a Common Data Model interface to many formats.

The Common Data Model is most useful when dimensions can be directly translated to physical time and space. pnc tools, like CDO, help by creating a common internal standard meta-data. This allows for space/time aware processing.



In [8]:
# Import PseudoNetCDF Processor (PNC)
from PseudoNetCDF import PNC
help(PNC)

Help on function PNC in module PseudoNetCDF.pncparse:

PNC(*args, ifiles=[], actions=None, **kwds)
    Arguments:
        args - Command Line arguments/options for PseudoNetCDF
               for full list of potential args PNC('--help')
        ifiles - (optional) pre-loaded input files
        actions -  (default: False)
            False: only open files do not make outputs
            True: enable dump,gen,map,etc output actions
                  for action options see subparsers help
                  e.g., PNC('dump', '--help', actions = True)
    
    Returns:
        out - Namespace object with parsed arguments
              including a list of processed files (out.ifiles)
    
    Example:
        # Single File
        out = PNC('--format=netcdf', inpath)    
        infile = out.ifiles[0]
        O3 = infile.variables['O3']
    
        # Multiple Files
        out = PNC('--format=netcdf', inpath1, inpath2)    
        infile1, infile2 = out.ifiles
        O3_1 = infile1.vari

# Check Point

- List 5 formats that PNC tools can read.
- How do you reduce the time dimension using the `mean` function?
- How do you get the 3rd element of the latitude dimension?

# Answers Hidden

<div class="answers">
```
PNC('--list-formats')
```
Example answer:
- netcdf (e.g., CMAQ, WRF, new GEOS-Chem, etc)
- geoschemfiles.bpch (aka bpch)
- icarttfiles.ffi1001.ffi1001 (aka ffi1001)
- camxfiles.uamiv.Memmap.temperature (aka uamiv)
- textfiles.csv
- camxfiles.temperature.Memmap.temperature (aka temperature)
- camxfiles.point_source.Memmap.point_source (aka point_source)

```
PNC('--help')
```

    -s dim,start[,stop[,step]], --slice dim,start[,stop[,step]]
                        Variables have dimensions (time, layer, lat, lon),
                        which can be subset using dim,start,stop,stride (e.g.,
                        --slice=layer,0,47,5 would sample every fifth layer
                        starting at 0)
    -r dim,function[,weight], --reduce dim,function[,weight]
                        Variable dimensions can be reduced using
                        dim,function,weight syntax (e.g.,
                        --reduce=layer,mean,weight). Weighting is not fully
                        functional.

PNC('--reduce=time,mean', '--slice=latitude,2')
</div>

# Example DC3

Explore the Common Data Language, which describes the Common Data Model, using the `dump` action. 
The dump action has the same general functionality of the ncdump NetCDF utility. Use the `dump` 
action to display the CDL for the dc3path.

```
dc3args = PNC('dump', '--header',
              '--format=ffi1001', dc3path,
              actions = True)
```

In [9]:
dc3args = PNC('dump', '--header',
              '--format=ffi1001', dc3path,
              actions = True);

PseudoNetCDF.icarttfiles.ffi1001.ffi1001 icartt/dc3-mrg60-dc8_merge_20120518_R7_thru20120622.ict {
dimensions:
        POINTS = 6817 ;

variables:
        double Fractional_Day(POINTS);
                Fractional_Day:units = "Fractional_Day, none" ;
                Fractional_Day:standard_name = "Fractional_Day" ;
                Fractional_Day:missing_value = -999999 ;
                Fractional_Day:fill_value = -999999.0 ;
                Fractional_Day:scale = 1.0 ;
                Fractional_Day:llod_flag = -888888 ;
                Fractional_Day:llod_value = "N/A" ;
                Fractional_Day:ulod_flag = -777777 ;
                Fractional_Day:ulod_value = "N/A" ;
        double UTC(POINTS);
                UTC:units = "s" ;
                UTC:standard_name = "UTC" ;
                UTC:missing_value = -999999 ;
                UTC:fill_value = -999999.0 ;
                UTC:scale = 1 ;
                UTC:llod_flag = -888888 ;
                UTC:llod_value = "N/A" ;
    

# CHECK POINT

1. List 5 attributes of the file

    -

2. Modify the command above to extract only the O3_ESRL variable.

    -
    
3. What attributes does O3_ESRL have?

    -
    

# Answers Hidden
<div class="answers">
1. global properties
  * fmt = "1001" ;
  * n_header_lines = 409
  * PI_NAME = "Shook, Michael"
  * ORGANIZATION_NAME = "NASA Atmospheric Composition Branch, NASA Langley Research Center (SSAI)"
  * SOURCE_DESCRIPTION = "Merged data file for DC3, Flights 01-18 (20120518-20120622), on the DC-8 platform. Data is merged to 60 seconds/timeline."

2. 
```
dc3args = PNC('dump', '--header',
              '--variables=O3_ESRL',
              '--format=ffi1001', dc3path,
              actions = True)
```
  * scale = 1 ;
  * ulod_flag = -777777 ;
  * units = "ppbv" ;
  * fill_value = -999999.0 ;
  * llod_flag = -888888 ;
  * missing_value = -999999 ;
  * standard_name = "O3_ESRL" ;
  * ulod_value = "N/A" ;
  * llod_value = "N/A" ;
</div>

# Common Processing and Terminology

This section will explain many of the techniques used in the tile plot section and in all subsequent sections.

1. slicing in numpy
2. dimensional reductions
3. Loading data from different formats
  * CMAQ (already done)
  * CAMx, WRF, GEOS-Chem, CSV, NASA AMES, AQS
4. Adding coordinate variables
5. Using named dimensions via PseudoNetCDF
6. Adding derived variables via PseudoNetCDF

# Slicing and Aggregating

In [20]:
# Load file (no extra processing or actions)
dc3args = PNC('--format=ffi1001', dc3path)
dc3file = dc3args.ifiles[0]

# Get a variable
O3_ESRL = dc3file.variables['O3_ESRL']

# Slice (start:stop:step)
O3_subset = O3_ESRL[1:100:10]

# Mean of slice
O3_subset_mean = O3_subset.mean()
print(O3_subset_mean)

73.782911928


# Check Point

- Describe the elements selected based on the slice.
- What dimension was averaged?

# Answers Hidden

<div class="answers">
- Start with the second element, include up to the 200th element; of that set, start with the first and select every 10th afer that.
- Use help(O3_subset.mean) to learn that mean applies to all dimensions unless specified.
- In this case, it is applied to POINTS. We know because there is only one.
</div>

# PNC

- uses named dimensions
- adjusts dimensions

In [11]:
# Repeat, but apply same processing to all variables
dc3procargs = PNC('--format=ffi1001', '--slice=POINTS,1,100,10', '--reduce=POINTS,mean', dc3path)
dc3file = dc3procargs.ifiles[0]
O3_pnc_subset_mean = dc3file.variables['O3_ESRL']

In [12]:
print(O3_pnc_subset_mean)

    double O3_ESRL(POINTS); // shape: (1,)
        O3_ESRL:scale = 1 ;
        O3_ESRL:missing_value = -999999 ;
        O3_ESRL:ulod_flag = -777777 ;
        O3_ESRL:units = "ppbv" ;
        O3_ESRL:fill_value = -999999.0 ;
        O3_ESRL:llod_value = "N/A" ;
        O3_ESRL:llod_flag = -888888 ;
        O3_ESRL:ulod_value = "N/A" ;
        O3_ESRL:standard_name = "O3_ESRL" ;
array: [73.782911928]


# Exercise

1. Describe the coordinates of both files.
2. Start with the DC3 data
3. Then, use the CMAQ data

Goals: familiarity with coordinates, combining files, using convolutions, adding derived variables.

# CHECK POINT 2:

What are the dimensions in the DC3 file?

List 4 "coordinate" variable names and units that can be used to answer area  of interest

1. _
2. _
3. _
4. _

# ANSWERS Hidden

<div class="answers">

- Dimensions = POINTS
- Variables: 
  - LATITUDE,degs
  - LONGITUDE,degs
  - PRESSURE,hPa
  - Fractional_Day,none = by convention days since 2004-12-31

</div>

# Get the range of values (min, median, max) for each variable



In [13]:
from PseudoNetCDF import PNC
dc3args = PNC("--format=ffi1001",
           "--variables=Fractional_Day,LATITUDE,LONGITUDE,PRESSURE",
           dc3path)

# CHECK POINT 3:
What are the minimum, mean, and maximum values for each variable?

* _
* _
* _
* _

# ANSWERS Hidden

<div class="answers">
For POINTS,100,200
```
for varkey in dc3args.variables:
    print(varkey, dc3file.variables[varkey].min(), dc3file.variables[varkey].mean(), dc3file.variables[varkey].max())
```

- Fractional_Day 139.86354 139.8979167 139.93229
- LATITUDE 40.27951427 41.1336588365 42.14375258
- LONGITUDE 255.4862488 256.739798669 258.6718221
- PRESSURE 519.8900667 583.703467165 693.8251833
</div>

# Now we'll work with CMAQ

CHECK POINT:
What are the relevant dimensions in CMAQ?

1. _
2. _
3. _
4. _

# ANSWERS Hidden

<div class="answers">
```
from PseudoNetCDF import PNC
args = PNC(aconcpath)
cmaqfile = args.ifiles[0]
print(cmaqfile.dimensions.keys())
```

- TSTEP
- LAY
- ROW
- COL

</div>

# What are the Latitude/Longitude Coordinates?

Many applications of CMAQ use Lambert Conformal Conic or another projected coordinate system. This can make comparing with geographic coordinates difficult.

PNC offers helper tools to convert x/y to lat/lon (uses the IOAPI_ISPH environmental variable). 

Try running the command below:
1. *without* and then,
2. with the "--from-conv=ioapi" option.

In [14]:
from PseudoNetCDF import PNC
args = PNC(aconcpath)
cmaqfile = args.ifiles[0]
print(cmaqfile.variables.keys())
args = PNC("--from-conv=ioapi", aconcpath)
cmaqfile = args.ifiles[0]
print(cmaqfile.variables.keys())

odict_keys(['TFLAG', 'O3', 'NO', 'CO', 'NO2', 'ASO4I', 'ASO4J', 'NH3'])
odict_keys(['ASO4J', 'NO2', 'TFLAG', 'ASO4I', 'CO', 'NH3', 'NO', 'O3', 'layer', 'level', 'time', 'time_bounds', 'LambertConformalProjection', 'x', 'y', 'latitude', 'longitude', 'latitude_bounds', 'longitude_bounds'])


# CHECK POINT

What are the relevant coordinate variables in CMAQ? With and without conventions?

1. _
2. _
3. _
4. _

What is missing?

# ANSWERS Hidden

<div class="answers">

- time,TFLAG
- layer,N/A
- latitude,N/A
- longitude,N/A

- missing: pressure
</div>

## What is vertical layer coordinate variable

WRF CMAQ applications use the hydrostatic pressure coordinate system ($\eta$).

$\eta = \frac{P - P_{top}}{P_{surf} - P_{top}}$ = any3dfile.VGLVLS

$P = \eta (P_{surf} - P_{top}) + P_{top} \approx $ `METCRO3D.variables['PRES']`

$P_{top}$ = `any3dfile.VGTOP`; $P_{surf}$ = `METCRO2D.variables['PRSFC']`

## Big approximation
In flat and low terrain, set $P_{surf}$ = 101325
```
--expr=PRES=O3*0.+np.convolve(ifile.VGLVLS[:],[0.5,.5])[None,:,None,None]*(101325.-ifile.VGTOP)+ifile.VGTOP;PRES.units='Pa'
```

### Combining files to get pressure

Here we make one file from two. We get PRES from METCRO3D. 

In [15]:
from PseudoNetCDF import PNC

args = PNC("--merge", "--",\
           "--from-conv=ioapi", aconcpath,\
           "--sep", "--variables=PRES", "--convolve=TSTEP,valid,.5,.5", metcro3dpath)
aconcfile = args.ifiles[0]
time = aconcfile.variables['time']
lat = aconcfile.variables['latitude']
lon = aconcfile.variables['longitude']
pres = aconcfile.variables['PRES']

#### CHECK POINT:

The "--" forces commands to be solved in order. What do the remaining options do?

--sep = 

--variables = 

--convolve = 

# ANSWERS HIDDEN
<div class="answers">

```
--sep = separates groups of commands so that they only apply to a group of files

--variables = subsets the variables in each file

--convolve = applies a convolute on a dimension (TSTEP) using a mode (valid) and a sequence of weighting factors.
```
</div>

## Check Point: What are the range of all coordinate variables?

Amswers Hidden.
<div class="answers">
```
for varkey in 'latitude longitude PRES time'.split():
    cvar = aconcfile.variables[varkey]
    print(cvar.units, np.percentile(cvar[:], [0, 50, 100]))
```
</div>


Optionally, use the PNC('dump', '--help') option `pncdump.py -h` and describe what these options do.

- -t, --timestring
- --full-indices=c
- --full-indices=f

In [21]:
PNC('dump', '--help', actions = True);

usage: PNC dump [-H] [-t] [--full-indices [c|f]] [-l LEN]
                [--float-precision FDIG] [--double-precision PDIG]
                [--dump-name CDLNAME] [--verbose] [--help] [--pnc PNC]
                [-f {see --list-formats for choices}] [--list-formats]
                [--help-format HELPFORMAT] [--sep] [--inherit] [--mangle]
                [--rename RENAME] [--remove-singleton REMOVESINGLETON]
                [--coordkeys key1,key2] [-v varname1[,varname2[,...,varnameN]]
                [-a att_nm,var_nm,mode,att_typ,att_val] [-m MASKS]
                [--from-convention FROMCONV] [--to-convention TOCONV]
                [--stack STACK] [--merge] [-s dim,start[,stop[,step]]]
                [-r dim,function[,weight]] [--mesh dim,weight,function]
                [-c dim,mode,wgt1,wgt2,...wgtN] [-e EXTRACT]
                [--extract-file EXTRACTFILE]
                [--extractmethod {nn,linear,cubic,quintic,KDTree}]
                [--op-typ OPERATORS] [--expr EXPRESSIONS