# Importing and saving WINDS dataset

## Example for: April 1997 - half-hour dataset for 70 days

In [1]:
import numpy as np
import xarray as xr
import pandas as pd
import netCDF4 as nc

### Loading the half-hour velocity dataset using xarray

Some problems with the data access may occur. It's better to save the dataset to have it always available, even it is quite heavy. After clipping the unnecessary variables the dataset for 301x301 grid points has a size of 1.7GB, for 100 days of half-hour observations.

In [2]:
base_url_1997 = 'https://dap.ceda.ac.uk/thredds/dodsC/bodc/UOX220077/WINDS-M/1997/WINDS-M_SFC_1997.nc'
ds_complete_1997 = xr.open_dataset(base_url_1997)

In [None]:
# Dropping unnecessary variables
dropping_variables_1997 = ds_complete_1997.drop_vars(['s_w','hc','s_rho',
                                 'theta_s','theta_b','Tcline','Vtransform','h','f','pm','pn','lon_rho','lat_rho',
                                  'angle','mask_rho', 'Cs_r','sc_r','Cs_w','lon_u','lon_v','lat_u', 'lat_v',
                                  'sc_w'])

# Clipping considering only the domain of interest for the thesis case study
clipped_domain_1997 = dropping_variables_1997.sel(y_rho=slice(800, 1101), x_rho=slice(150, 451),
                          y_u=slice(800, 1101), x_u=slice(150, 451),
                          y_v=slice(800, 1101), x_v=slice(150, 451))

# Considering only 70 + 30 days from 1st April (so 100*24*2+1) values of 'APRIL', so from 01-04-1997 to 09-07-1997, so from 4320 to 4320+4801
days100_from_april_1997 = clipped_domain_1997.isel(time_counter=slice(4320, 4320+4801))

- The complete dataset has these dimensions (rho_points from 34.62 to 77.50 and from -23.50 to 0.01111261):

time_counter: 17520, y_rho: 1211, x_rho: 2145, s_rho: 50, s_w: 51, y_v: 1210, x_v: 214, 5, y_u: 1211, x_u: 2144


- For the thesis case study dataset (rho_points from 37.62 to 43.62 and from -8.161049 to -2.1883478):

time_counter: 4801, y_rho: 301, x_rho: 301, y_v: 301, x_v: 301

##### NOTE: Case of November as spawning month

Since starting from 1st November 1997, the 100th day after is the 8th February 1998, the 1998 dataset must be used too, with the same process done above.

To concatenate the two dataset with 61 + 39 days, it is possible to proceed in this way (it is very fast [9sec with rechunking, 15min without]):
- Chunk sizes
  
chunk_sizes = {'time_counter': 1, 'y_rho': 301, 'x_rho': 301, 'y_v': 301, 'x_v': 301, 'y_u': 301, 'x_u': 301}
- Rechunking the two datasets (the one with values of November and December and the one of January and February)
  
nov_dec_1997_rechunked = nov_dec_1997.chunk(chunk_sizes)

jan_feb_1998_rechunked = jan_feb_1998.chunk(chunk_sizes)
- Creating the rechunked 100 days dataset
  
days70_from_november_1997 = xr.concat([nov_dec_1997_rechunked, jan_feb_1998_rechunked], dim='time_counter')

### Mesh grid

The same process of velocities. NOTE that the mesh grid is unique, so it has not to be imported for every period of study (ex. November and April), but just once.

In [None]:
mask_url = 'https://dap.ceda.ac.uk/thredds/dodsC/bodc/UOX220077/WINDS-M/supplementary_files/croco_grd.nc'
ds_mask = xr.open_dataset(mask_url)

clipped_mask_one = ds_mask.sel(eta_rho=slice(800, 1101), xi_rho=slice(150, 451),
                               eta_psi=slice(800, 1101), xi_psi=slice(150, 451),
                               eta_u=slice(800, 1101), xi_u=slice(150, 451),
                               eta_v=slice(800, 1101), xi_v=slice(150, 451)) # clipping the domain

clipped_mask = clipped_mask_one.drop_vars(['xl','el','depthmin','depthmax', 'spherical', 'angle', 'h', 'hraw', 'alpha',
                                 'f','pm','pn','dndx','dmde','x_rho','x_u','x_v','x_psi','y_rho',
                                  'y_v','y_u', 'y_psi', 'mask_u', 'mask_v', 'mask_psi'   
                                ])

### Velocity fields and mesh grid

To better handle with OceanParcels is a good practice to split the u surface velocity field (u_surf) and the v surface velocity field (v_surf). It is also necessary to have in the FieldSet the mesh grid, so it must be imported too, as done above.

In [None]:
days100_from_april_1997_velocity_u = days100_from_april_1997.drop_vars(['nav_lon_v', 'nav_lat_v', 'v_surf', 'time'])
days100_from_april_1997_velocity_v = days100_from_april_1997.drop_vars(['nav_lon_u', 'nav_lat_u', 'u_surf', 'time'])

### Saving datasets into nc files

To save the dataset locally this method works (U velocity field for 1st April 1997 - 9th July 1997):

chunks = {'time_counter': 100, 'y_u': 301, 'x_u': 301}

days100_from_april_1997_velocity_u_chunked = days100_from_april_1997_velocity_u.chunk(chunks)

days100_from_april_1997_velocity_u_chunked.to_netcdf("days100_from_april_1997_velocity_u_OK.nc")

### Opening saved datasets

It is very easy and quick: just naming a variable for the dataset and use 'xr.open.dataset' function

days100_from_april_1997_velocity_u_OK = xr.open_dataset("days70_from_april_1997_velocity_u_OK.nc")