# The grid as an object

In this notebook, the grid object and the xarray data frame is explained and demonstrated with a few illustrating examples. 

---

The module grid.py contains one class, `Grid()`. When we import the module, a few convenient variables are also generated. Most packages are imported with the module, but some rarely used dependencies are imported in functions when needed. More about this later. 

We import grid: 

In [1]:
#Depending on where the grid.py is located, you cane either call it from that location...

config_file = "../code/grid.py"
with open(config_file) as f:
    code = compile(f.read(), config_file, 'exec')
    exec(code, globals(), locals())

# ... or use you Python path or present working directory:
#from grid import *



print(Grid)
print(km, milli) #-This is so handy! -You're welcome. 

#Check if some modules have been imported
print('numpy' in sys.modules, 
      'rasterio' in sys.modules, 
      'bokeh' in sys.modules, sep='\n') #(bokeh is imported in a function)

print(type(np), type(plt), type(xr)) # Use of standard aliases 

<class '__main__.Grid'>
1000 0.001
True
True
False
<class 'module'> <class 'module'> <class 'module'>


We can now define a grid object. 

Each grid is a model frame with a defended, regular$^1$ spatial extent. The frame is generated when the object is initiated and resolution, coordinate system and coordinates are defined. 

The class defines an object. Let's say that we would like to develop a gridded model of the African continent. We want to use WGS 1984, EPSG:4326, and a resolution of $0.01^\circ$. We define the extent as left, right, up, down in the units of the projection, degrees in this example. 

When initiating the object, a number of instance variables are defined. These contains the size of the grid, the name of the grid and the affine transform of the grid. (`use_dask` will be explained later). 

---
1. This will updated in later versions. 

In [2]:
africa = Grid(crs_tgt=4326, left=-20, down= -45, right=55, up=40, 
              res = (0.5, 0.5), use_dask=False)

print(africa)

<__main__.Grid object at 0x10cfef780>


With the object, variables are generated, both class variables and variables for the instance. 

E.g verbose is a switch for print statements from the functions. nx, ny, nn, shape3 are integers and tuples that conveniently gives to fundamental size of the grid. transform is the affine transform. 

In [3]:
# Class variables
print(Grid.verbose) # Switch for print commands

# Instance variables
print(africa.ny, africa.nx, africa.nn, africa.shape3)
print(africa.transform)

False
170 150 (170, 150) (170, 150, 5)
| 0.50, 0.00,-20.00|
| 0.00,-0.50, 40.00|
| 0.00, 0.00, 1.00|


Note that dimensions are given as y, x, not x, y. This was a tricky decision to take. The reasoning is, that numpy arrays are given as rows-columns. By given the y dimension first, we are consistent through the project, but I'd expect that this might cause confusion. I'm happy for suggestions of how to simplify the indexing. Indexing is, however, less a problem when we move to the main feature of the grid object, the xarray dataset. 

---

Most important, and xarray dataset is created and it is already populated with dimensions that contains coordinates. X, Y and Z are (in this example) the three spatial dimensions. X and Y are set to the selected projection (epsg:4326) and is hence in degrees. 

Xarray dataset can be a bit tricky to understand in the beginning. 

You can read at [xarray project page](http://xarray.pydata.org/en/stable/api.html) where there are a number of code examples. As always, [Stack overflow](https://stackoverflow.com/search?q=xarray) is a good resource as well. 

Think about it as Pandas, but multidimensional. It can also be understood as a very organized way to arrange numpy arrays in relation to each other. Another approach, if you are familiar with [NetCDF](https://www.unidata.ucar.edu/software/netcdf/), xarray has a similar internal structure. 

In [4]:
print(africa.ds)

<xarray.Dataset>
Dimensions:  (RGB: 3, X: 150, Y: 170, Z: 5)
Coordinates:
  * X        (X) float32 -20.0 -19.496645 -18.993288 ... 53.99329 54.496643 55.0
  * Y        (Y) float32 40.0 39.49704 38.994083 ... -43.994083 -44.49704 -45.0
  * Z        (Z) float32 0.0 8000.0 16000.0 40000.0 350000.0
  * RGB      (RGB) <U1 'R' 'G' 'B'
    XV       (Y, X) float32 -20.0 -19.496645 -18.993288 ... 54.496643 55.0
    YV       (Y, X) float32 40.0 40.0 40.0 40.0 40.0 ... -45.0 -45.0 -45.0 -45.0
    lat      (Y, X) float32 40.0 40.0 40.0 40.0 40.0 ... -45.0 -45.0 -45.0 -45.0
    lon      (Y, X) float32 -20.0 -19.496645 -18.993288 ... 54.496643 55.0
Data variables:
    *empty*


We use standard Python / numpy methods to generate or process data and assign to the dataset. Here we generate a 2D array of data points and assign it to the Y and X coordinates of the grid. 

In [5]:
random_data = np.random.random(africa.nn)
africa.ds['RANDOM'] = (('Y', 'X'), random_data)

The object contains a number of functions to import, process, export and visualize data in the grid. More about import functions in next tutorial. For now, we download a global polygon shape file. 

In [None]:
# Usning standard programs: 

! mkdir -p ../../data/vector
! wget -nc http://data.openstreetmapdata.com/simplified-land-polygons-complete-3857.zip \
    -O ../../data/vector/simplified-land-polygons-complete-3857.zip
! unzip -n ../../data/vector/simplified-land-polygons-complete-3857.zip -d ../../data/vector

And use the shape file to generate a Boolean raster. The function `assign_shape` in the object ´africa´ reads a shape file and rasterises it. Here, all polygons are assigned `True` as the FID attribute is above 0. This might take some time, as the vector layer is very large. This might generate some warnings, as no vector layer is perfect. 

Note again, the order of dimension: Y, X. 

In [None]:
africa.ds['LAND'] = (('Y', 'X'), 
                     0 < africa.assign_shape('../../data/vector/simplified-land-polygons-complete-3857/simplified_land_polygons.shp', 'FID') )

Now, if we look at our dataset, we have two data variables, or frames. RANDOM and LAND. Both have an extent along the X and Y axes. 

In [None]:
africa.ds

Xarray works similar to numpy, in many ways. Here are some examples of arithmetic and conditional computations. We will look closer at the `map_grid` function in later tutorial. 

In [None]:
africa.ds['BIG_RANDOM'] = africa.ds['RANDOM'] * 2 

africa.ds['DATA'] = africa.ds['BIG_RANDOM']*africa.ds['LAND']

africa.ds['DATA'] = africa.ds['DATA'].where(africa.ds['DATA'] != 0.)  

africa.map_grid('DATA', cmap='viridis') # We only need to send the label to the function 

Xarray have a number option to index and select data. See details about the API at the [Xarray project page](http://xarray.pydata.org/en/stable/api.html). 

In [None]:
print('Data by numpy index:')
print(africa.ds['DATA'][66, 33])

print('Coord by index:')
print(africa.ds.coords['lon'][66, 33])

print('Data by coord values:')
print(africa.ds['DATA'].isel(X=33, Y=66) )

print('Closest to coordinate values:')
print(africa.ds['DATA'].sel(X=[14, 71], method='nearest') )

print('Or another coordinate set:')
print(africa.ds.coords['lon'][66, 33] )


This returned a cell of the entire dataset, but we can also extract numpy arrays and numbers by using a dictionary interface `values`: 

Attribute data are important:

In [6]:
# Add attribute data directely
africa.ds.attrs['units'] = 'degrees'
africa.ds.attrs['contact'] = 'mail@address.au'

#Or as text, xml, json etc
import json
with open('attr.json', 'r') as fp:
    meta_data = json.loads(fp.read())

#africa.ds.attrs['Imported from json'] = meta_data


Handy attributes: 

In [7]:
print('Coords:')
print(africa.ds.coords)

print('Attributes, metadata:')
print(africa.ds.attrs)

print('Size in bytes:')
print(africa.ds.nbytes)

Coords:
Coordinates:
  * X        (X) float32 -20.0 -19.496645 -18.993288 ... 53.99329 54.496643 55.0
  * Y        (Y) float32 40.0 39.49704 38.994083 ... -43.994083 -44.49704 -45.0
  * Z        (Z) float32 0.0 8000.0 16000.0 40000.0 350000.0
  * RGB      (RGB) <U1 'R' 'G' 'B'
    XV       (Y, X) float32 -20.0 -19.496645 -18.993288 ... 54.496643 55.0
    YV       (Y, X) float32 40.0 40.0 40.0 40.0 40.0 ... -45.0 -45.0 -45.0 -45.0
    lat      (Y, X) float32 40.0 40.0 40.0 40.0 40.0 ... -45.0 -45.0 -45.0 -45.0
    lon      (Y, X) float32 -20.0 -19.496645 -18.993288 ... 54.496643 55.0
Attributes, metadata:
OrderedDict([('units', 'degrees'), ('contact', 'mail@address.au')])
Size in bytes:
613312


We can release some memory by deleting data variables or simply delete the entire object from memory. 

In [8]:
print('Before:', africa.ds.data_vars)

africa.ds.drop('BIG_RANDOM')

print('After:', africa.ds.data_vars)

#One can also just free all memory at once, as usual. 
#del africa 

Before: Data variables:
    RANDOM   (Y, X) float64 0.5254 0.4761 0.619 0.07548 ... 0.7273 0.5514 0.9283


ValueError: One or more of the specified variables cannot be found in this dataset

We use a function from the class to save. Returns size.  

In [11]:
africa.save(file_name='africa.nc')

614152

In [13]:
!stat africa.nc  

16777220 8620593019 -rw-r--r-- 1 tobiasstal staff 0 614152 "Jan 28 15:16:05 2019" "Jan 28 15:16:05 2019" "Jan 28 15:16:05 2019" "Jan 28 15:16:05 2019" 4194304 1200 0 africa.nc


Clear some memory: 

In [16]:
del africa 

NameError: name 'africa' is not defined