# **This notebook aims to show the difference between the zarr and netcdf files for the [Atlantic-Iberian Biscay Irish- Ocean Physics Reanalysis](https://data.marine.copernicus.eu/product/IBI_MULTIYEAR_PHY_005_002/description) product**

In [1]:
import xarray as xr
import s3fs
import copernicusmarine
import numpy as np
fs=s3fs.S3FileSystem(anon=True, 
                  endpoint_url="https://s3.waw3-1.cloudferro.com",
                     )

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
name = "cmems_mod_ibi_phy_my_0.083deg"

catalogue = copernicusmarine.describe(
include_datasets=True,
contains = [name],
)

## **1. Metadata in thetao variable for depth coordinate**

### 1.1 NetCDF file
I only selected 1 time for this file, in order to save time while loading the netCDF

In [3]:
#fs.ls("mdl-native-10/native/IBI_MULTIYEAR_PHY_005_002/cmems_mod_ibi_phy_my_0.083deg-3D_P1D-m_202012/2015/06")

In [4]:
thetao_nc_uri = "mdl-native-10/native/IBI_MULTIYEAR_PHY_005_002/cmems_mod_ibi_phy_my_0.083deg-3D_P1D-m_202012/2015/06/CMEMS_v5r1_IBI_PHY_MY_PdE_01dav_20150611_20150611_R20201201_RE01.nc"

In [5]:
thetao_nc = xr.open_dataset(fs.open(thetao_nc_uri),engine ="h5netcdf")

  machar = _get_machar(dtype)


In [6]:
thetao_nc

### 1.2 Loading from zarr

In [7]:
thetao_zarr = xr.open_dataset("https://s3.waw3-1.cloudferro.com/mdl-arco-time-032/arco/IBI_MULTIYEAR_PHY_005_002/cmems_mod_ibi_phy_my_0.083deg-3D_P1D-m_202012/timeChunked.zarr",engine="zarr")[["thetao"]]

In [8]:
thetao_zarr

### 1.3 Data comparison

In [9]:
thetao_nc.coords

Coordinates:
  * time       (time) datetime64[ns] 8B 2015-06-11T12:00:00
  * longitude  (longitude) float32 1kB -19.0 -18.92 -18.83 ... 4.833 4.917 5.0
  * latitude   (latitude) float32 1kB 26.0 26.08 26.17 ... 55.83 55.92 56.0
  * depth      (depth) float32 200B 0.5058 1.556 2.668 ... 5.292e+03 5.698e+03

In [10]:
thetao_zarr.coords

Coordinates:
  * elevation  (elevation) float32 200B -5.698e+03 -5.292e+03 ... -1.556 -0.5058
  * latitude   (latitude) float32 1kB 26.0 26.08 26.17 ... 55.83 55.92 56.0
  * longitude  (longitude) float32 1kB -19.0 -18.92 -18.83 ... 4.833 4.917 5.0
  * time       (time) datetime64[ns] 85kB 1993-01-01 1993-01-02 ... 2021-12-28

We can notice here that the coordinates of the dataset are not organized in the same way, which is not a problem if the data is still correctly assigned, but, there is also a change in the dimension.  
In the NetCDF file, the coordinates along the Z axis is the depth, while in the zarr file, the coordinate is elevation.

In [11]:
thetao_nc.depth.attrs

{'long_name': 'Depth',
 'units': 'm',
 'axis': 'Z',
 'positive': 'down',
 'unit_long': 'Meters',
 'standard_name': 'depth',
 '_CoordinateAxisType': 'Height',
 '_CoordinateZisPositive': 'down'}

In [12]:
thetao_zarr.elevation.attrs

{'_CoordinateAxisType': 'Height',
 '_CoordinateZisPositive': 'down',
 'axis': 'Z',
 'long_name': 'Depth',
 'positive': 'down',
 'standard_name': 'depth',
 'unit_long': 'Meters',
 'units': 'm'}

Meanwhile, the attributes are the same in these datasets, which make me think about a particular data manipulation that occured during the creation of the zarr files, because the data does not represent the same information, but has the same meta data, as far as I understand, it should be the opposite, since elevation and depth are the opposite.

### 1.4 Downloaded files, available with the folloing [tool](https://data.marine.copernicus.eu/product/IBI_MULTIYEAR_PHY_005_002/download)

In [40]:
thetao_dl = xr.open_dataset("thetao.nc")

In [43]:
thetao_dl

In [42]:
thetao_nc[["thetao","zos"]]

In [16]:
thetao_zarr

Finally, I wanted to download the raw NetCDF files to compare, and I also noticed some difference in the data, such as longitude that is not rigorusly the same. There might be some things that I did not spot. Maybe I am wrong about the files I have downloaded though.

___
## **2. Comparison between the latitude dimension from the deptho variable and the thetao variable** 

### 2.1 NetCDF file
I only selected 1 time for this file, in order to save time while loading the netCDF

In [17]:
fs.ls("mdl-native-10/native/IBI_MULTIYEAR_PHY_005_002/cmems_mod_ibi_phy_my_0.083deg-3D_static_202012/")

['mdl-native-10/native/IBI_MULTIYEAR_PHY_005_002/cmems_mod_ibi_phy_my_0.083deg-3D_static_202012/IBI-MFC_005_002_coordinates.nc',
 'mdl-native-10/native/IBI_MULTIYEAR_PHY_005_002/cmems_mod_ibi_phy_my_0.083deg-3D_static_202012/IBI-MFC_005_002_mask_bathy.nc',
 'mdl-native-10/native/IBI_MULTIYEAR_PHY_005_002/cmems_mod_ibi_phy_my_0.083deg-3D_static_202012/IBI-MFC_005_002_mdt.nc']

In [18]:
deptho_nc_uri = 's3://mdl-native-10/native/IBI_MULTIYEAR_PHY_005_002/cmems_mod_ibi_phy_my_0.083deg-3D_static_202012/IBI-MFC_005_002_mask_bathy.nc'

In [19]:
deptho_nc = xr.open_dataset(fs.open(deptho_nc_uri), engine='h5netcdf')

In [20]:
deptho_dl = xr.open_dataset("deptho.nc") ## Update this with your path to the downloaded dataset

### 2.2 Loading from zarr

In [21]:
deptho_zarr = xr.open_dataset("https://s3.waw3-1.cloudferro.com:443/mdl-arco-time-032/arco/IBI_MULTIYEAR_PHY_005_002/cmems_mod_ibi_phy_my_0.083deg-3D_static_202012--ext--bathy/static.zarr",engine="zarr")

### 2.3 Data comparison

In [22]:
deptho_zarr

In [23]:
deptho_nc["deptho"].equals(deptho_zarr["deptho"])

True

In [24]:
np.array_equal(deptho_dl["deptho"].data,deptho_nc["deptho"].data)

False

Here, there is a difference in the dephto variable between the file I downloaded and the file I loaded from the store, but this is not my main concern.

In [25]:
thetao_nc["latitude"]

In [26]:
deptho_nc["latitude"]

In [27]:
deptho_nc["latitude"].equals(thetao_nc["latitude"])

False

In [28]:
deptho_zarr["latitude"].equals(thetao_zarr["latitude"])

False

In [29]:
abs(deptho_zarr["latitude"].data - thetao_zarr["latitude"].data).max()

0.00023269653

The two latitudes have a really small difference but the latitude in the deptho dataset and the thetao dataset not being fully align result in issues while operating on them in my work.   
I fixed this by replacing the deptho latitude with the thetao latitude in my data but I find this strange that for the same product, there is this kind of difference in here. Still, I do not think it's related to a bug while creating the zarr files since the coherence in data looks correct between the NetCDF loaded (and not downloaded) from the sotre and the zarr loaded from the store