# Input and Output

Xarray supports direct serialization and I/O to several file formats including pickle, netCDF, OPeNDAP (read-only), GRIB1/2 (read-only), and HDF by integrating with third-party libraries. Additional serialization formats for 1-dimensional data are available through pandas.

File types
- Pickle
- NetCDF 3/4
- RasterIO
- Zarr
- PyNio

Interoperability
- Pandas
- Iris
- CDMS
- dask DataFrame

### Tutorial Duriation
10 minutes

### Going Further

Xarray I/O Documentation: http://xarray.pydata.org/en/latest/io.html

In [None]:
%matplotlib inline

import os

import xarray as xr

## Setup

In [None]:
ds = xr.tutorial.load_dataset('rasm')  # this actually loads data using xr.open_dataset
ds

## Saving xarray datasets as netcdf files

Xarray provides a high-level method for writing netCDF files directly from Xarray Datasets/DataArrays.

In [None]:
# writing data to a netcdf file
ds.to_netcdf('./data/rasm.nc')

In [None]:
!ncdump -h ./data/rasm.nc

# Opening xarray datasets

Xarray's `open_dataset` and `open_mfdataset` are the primary functions for opening local or remote datasets such as netCDF, GRIB, OpenDap, and HDF. These operations are all supported by third party libraries (engines) for which xarray provides a common interface. 

In [None]:
ds2 = xr.open_dataset('./data/rasm.nc', engine='netcdf4')
ds2

In [None]:
assert ds is not ds2  # they aren't the same dataset
assert ds.equals(ds2) # but they are equal

*Definition*

**Roundtrip**: the ability to read/write a dataset without changing its contents

## Multifile datasets

Xarray can read/write multifile datasets using the `open_mfdataset` and `save_mfdataset` functions. 

In [None]:
years, datasets = zip(*ds.groupby('time.year'))
paths = ['./data/%s.nc' % y for y in years]
print(paths)

In [None]:
# write the 4 netcdf files
xr.save_mfdataset(datasets, paths)

In [None]:
!ls ./data/
!ncdump -h data/1981.nc

In [None]:
# open a group of files and concatenate them into a single xarray.Dataset
ds3 = xr.open_mfdataset('./data/19*nc')
assert ds3.equals(ds2)

## Zarr

Zarr is a Python package providing an implementation of chunked, compressed, N-dimensional arrays. Zarr has the ability to store arrays in a range of ways, including in memory, in files, and in cloud-based object storage such as Amazon S3 and Google Cloud Storage. Xarray’s Zarr backend allows xarray to leverage these capabilities.

*Note: Zarr support is still an experimental feature. Please report any bugs or unexepected behavior via github issues.*

In [None]:
# Zarr
ds.to_zarr('./data/rasm.zarr', mode='w')

In [None]:
!ls data/*zarr
!du -h data/*zarr

In [None]:
import zarr
compressor = zarr.Blosc(clevel=2, shuffle=-1)
ds.to_zarr('./data/rasm_compressed.zarr', mode='w', encoding={var: {'compressor': compressor} for var in ds.variables})

In [None]:
!ls data/*zarr
!du -h data/*zarr

## Interoperability

Xarray objects include exports methods that allow users to transform data from the Xarray data model to other data models such as Pandas, Iris, and CDMS. 

Below is a quick example of how to export a time series from Xarray to Pandas.  

In [None]:
t_series = ds.isel(x=100, y=100)['Tair'].to_pandas()
t_series.head()

In [None]:
t_series.to_csv('data/rasm_point.csv')