# File Input and Output in Xarray

Xarray supports direct serialization and I/O to several file formats including pickle, netCDF, OPeNDAP (read-only), GRIB1/2 (read-only), and HDF by integrating with third-party libraries. Additional serialization formats for 1-dimensional data are available through pandas.

File types
- Pickle
- NetCDF 3/4
- RasterIO
- Zarr
- PyNio

Interoperability
- Pandas
- Iris
- CDMS
- dask DataFrame

### Tutorial Duriation
20 minutes

### Going Further

Xarray I/O Documentation: http://xarray.pydata.org/en/latest/io.html

### Import library

In [1]:
%matplotlib inline

import glob
import pandas as pd
import xarray as xr
import os

  return f(*args, **kwds)


###  Function for creating pandas DatetimeIndex for your raster files

In [2]:
def time_index_from_filenames(filenames):
    '''helper function to create a pandas DatetimeIndex
       Filename example: 20150520_0164.tif'''
    return pd.DatetimeIndex([pd.Timestamp(f[:]) for f in filenames])

In [3]:
def dummytime(flist):
    datetimecollect=[]
    for eachfile in flist:
        obj=os.path.basename(eachfile).split('_')[1]
        datetimecollect.append(pd.datetime.strptime(obj,'%Y%m').strftime('%Y-%m-%d'))
    return(pd.DatetimeIndex(datetimecollect))


### Loading all your raster files 

In [4]:
os.chdir('../data')
os.getcwd()

'/mnt/d/UW_work/geohack18/Xarray/data'

In [5]:
filenames = glob.glob('*.tif')
filenames
dummytime(filenames)

DatetimeIndex(['1980-10-01', '1980-11-01', '1980-12-01', '1980-01-01',
               '1980-02-01', '1980-03-01', '1980-04-01', '1980-05-01',
               '1980-06-01', '1980-07-01', '1980-08-01', '1980-09-01'],
              dtype='datetime64[ns]', freq=None)

### Create time dimension for xarray dataset

In [6]:
time = xr.Variable('time', dummytime(filenames))

### Define x, y dimension in xarrary dataset

In [7]:
chunks = {'x': 5490, 'y': 5490, 'band': 1} # x: your data arrays # y: your data arrays

### Concat data arrays along time dimension 

In [9]:
da = xr.concat([xr.open_rasterio(f, chunks=chunks) for f in filenames], dim=time)

/home/hc10024/miniconda3/bin/python


ImportError: libkea.so.1.4.7: cannot open shared object file: No such file or directory

### Export xarray dataset to netCDF format

In [None]:
da.to_netcdf('test.nc')

## Multifile datasets (advance)

Xarray can read/write multifile datasets using the `open_mfdataset` and `save_mfdataset` functions. 

In [None]:
# load data using xarray 
ds = xr.tutorial.load_dataset('rasm')  # 'rasm' change the file to your own
ds

In [None]:
# open many netCDF files
years, datasets = zip(*ds.groupby('time.year'))
paths = ['./data/%s.nc' % y for y in years]
print(paths)

In [None]:
# write the 4 netcdf files
xr.save_mfdataset(datasets, paths)

In [None]:
!ls ./data/
!ncdump -h data/1981.nc

In [None]:
# open a group of files and concatenate them into a single xarray.Dataset
ds3 = xr.open_mfdataset('./data/19*nc')
assert ds3.equals(ds) # they are not the same dataset, but they are not equal

## Interoperability

Xarray objects include exports methods that allow users to transform data from the Xarray data model to other data models such as Pandas, Iris, and CDMS. 

Below is a quick example of how to export a time series from Xarray to Pandas.  

In [None]:
# select certain spatial subset to pandas dataframe
t_series = ds.isel(x=100, y=100)['Tair'].to_pandas()
t_series.head()

In [None]:
# export pandas dataframe to csv format
t_series.to_csv('data/rasm_point.csv')