# File Input and Output in Xarray

Xarray supports direct serialization and I/O to several file formats including pickle, netCDF, OPeNDAP (read-only), GRIB1/2 (read-only), and HDF by integrating with third-party libraries. Additional serialization formats for 1-dimensional data are available through pandas.

File types
- Pickle
- NetCDF 3/4
- RasterIO
- Zarr
- PyNio

Interoperability
- Pandas
- Iris
- CDMS
- dask DataFrame

### Tutorial Duriation
20 minutes

### Going Further

Xarray I/O Documentation: http://xarray.pydata.org/en/latest/io.html

### Import library

In [5]:
%matplotlib inline

import glob
import pandas as pd
import xarray as xr

###  Function for creating pandas DatetimeIndex for your raster files

In [6]:
def time_index_from_filenames(filenames):
    '''helper function to create a pandas DatetimeIndex
       Filename example: 20150520_0164.tif'''
    return pd.DatetimeIndex([pd.Timestamp(f[:8]) for f in filenames])

### Loading all your raster files 

In [7]:
filenames = glob.glob('*.tif')

### Create time dimension for xarray dataset

In [8]:
time = xr.Variable('time', time_index_from_filenames(filenames))

### Define x, y dimension in xarrary dataset

In [9]:
chunks = {'x': 5490, 'y': 5490, 'band': 1} # x: your data arrays # y: your data arrays

### Concat data arrays along time dimension 

In [10]:
da = xr.concat([xr.open_rasterio(f, chunks=chunks) for f in filenames], dim=time)

ValueError: must supply at least one object to concatenate

### Export xarray dataset to netCDF format

In [None]:
da.to_netcdf('test.nc')

## Multifile datasets (advance)

Xarray can read/write multifile datasets using the `open_mfdataset` and `save_mfdataset` functions. 

In [12]:
# load data using xarray 
ds = xr.tutorial.load_dataset('rasm')  # 'rasm' change the file to your own
ds

<xarray.Dataset>
Dimensions:  (time: 36, x: 275, y: 205)
Coordinates:
  * time     (time) datetime64[ns] 1980-09-16T12:00:00 1980-10-17 ...
    xc       (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ...
    yc       (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ...
Dimensions without coordinates: x, y
Data variables:
    Tair     (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ...
Attributes:
    title:                     /workspace/jhamman/processed/R1002RBRxaaa01a/l...
    institution:               U.W.
    source:                    RACM R1002RBRxaaa01a
    output_frequency:          daily
    output_mode:               averaged
    convention:                CF-1.4
    references:                Based on the initial model of Liang et al., 19...
    comment:                   Output from the Variable Infiltration Capacity...
    nco_openmp_thread_number:  1
    NCO:                       "4.6.0"
    history:                   Tue Dec 2

In [13]:
# open many netCDF files
years, datasets = zip(*ds.groupby('time.year'))
paths = ['./data/%s.nc' % y for y in years]
print(paths)

['./data/1980.nc', './data/1981.nc', './data/1982.nc', './data/1983.nc']


In [14]:
# write the 4 netcdf files
xr.save_mfdataset(datasets, paths)

PermissionError: [Errno 13] Permission denied: b'/mnt/d/UW_work/git/Xarray_tutorial_test/tutorial/data/1980.nc'

In [15]:
!ls ./data/
!ncdump -h data/1981.nc

1980.nc  1981.nc  1982.nc  1983.nc  co2.csv  rasm.nc
netcdf \1981 {
dimensions:
	time = 12 ;
	y = 205 ;
	x = 275 ;
variables:
	double Tair(time, y, x) ;
		Tair:_FillValue = 9.96920996838687e+36 ;
		Tair:units = "C" ;
		Tair:long_name = "Surface air temperature" ;
		Tair:type_preferred = "double" ;
		Tair:time_rep = "instantaneous" ;
		Tair:coordinates = "xc yc" ;
	double time(time) ;
		time:_FillValue = NaN ;
		time:long_name = "time" ;
		time:type_preferred = "int" ;
		time:units = "days since 0001-01-01" ;
		time:calendar = "noleap" ;
	double xc(y, x) ;
		xc:_FillValue = NaN ;
		xc:long_name = "longitude of grid cell center" ;
		xc:units = "degrees_east" ;
		xc:bounds = "xv" ;
	double yc(y, x) ;
		yc:_FillValue = NaN ;
		yc:long_name = "latitude of grid cell center" ;
		yc:units = "degrees_north" ;
		yc:bounds = "yv" ;

// global attributes:
		:title = "/workspace/jhamman/processed/R1002RBRxaaa01a/lnd/temp/R1002RBRxaaa01a.vic.ha.1979-09-01.nc" ;
		:institution = "U.W." ;
		:source = 

In [16]:
# open a group of files and concatenate them into a single xarray.Dataset
ds3 = xr.open_mfdataset('./data/19*nc')
assert ds3.equals(ds) # they are not the same dataset, but they are not equal

OSError: [Errno -51] NetCDF: Unknown file format: b'/mnt/d/UW_work/git/Xarray_tutorial_test/tutorial/data/1980.nc'

## Interoperability

Xarray objects include exports methods that allow users to transform data from the Xarray data model to other data models such as Pandas, Iris, and CDMS. 

Below is a quick example of how to export a time series from Xarray to Pandas.  

In [17]:
# select certain spatial subset to pandas dataframe
t_series = ds.isel(x=100, y=100)['Tair'].to_pandas()
t_series.head()

time
1980-09-16 12:00:00    -3.419416
1980-10-17 00:00:00   -15.496765
1980-11-16 12:00:00   -26.294163
1980-12-17 00:00:00   -29.829660
1981-01-17 00:00:00   -21.887778
dtype: float64

In [18]:
# export pandas dataframe to csv format
t_series.to_csv('data/rasm_point.csv')