### Description

This script aims to select and subset EN4 profile data for comparing to HYCOM. The data is downloaded from: https://hadleyserver.metoffice.gov.uk/en4/download-en4-2-1.html and a description of the profile files can be found here: https://hadleyserver.metoffice.gov.uk/en4/en4-0-2-profile-file-format.html

The files need to be prepared for ingesting by Bjorn's scripts to create station and depthlevels files (https://gitlab.com/backeb/hycom_enoi/-/blob/master/scripts/hycom/sandbox/make_hyc2station_infiles.py).

The goal is to:
1. first try subset the profiles to the model domain; then
2. try export the required data into a .txt (or .csv) file for generating the station and depth files

*Alternatively,* an attempt could be made to generate the files directly from the profiles netcdf files without first writing to a .txt file.

In [1]:
import glob
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# import cartopy.crs as ccrs
# import cartopy
# from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import xarray as xr
# from scipy import stats

  PANDAS_TYPES = (pd.Series, pd.DataFrame, pd.Panel)


In [2]:
plt.rcParams['figure.facecolor']='white'
plt.rcParams['axes.facecolor']='white'

In [13]:
# loading a profile data file
EN4_profile = xr.open_dataset('../Data/EN4_profiles/EN.4.2.1.f.profiles.g10.200901.nc')
# ds_EN4 = ds_EN4.sel(time=slice('2009-01','2014-04'))
# ds_EN4['temperature'] = ds_EN4['temperature'] - 273.15

'''
We now need to import all the files, or a list thereof for sequential preprocessing
in a loop
'''

'\nWe now need to import all the files, or a list thereof for sequential preprocessing\nin a loop\n'

In [14]:
'''
This converting to dataframe will need to take place within the loop that writes all
data into a single file or variable.
'''

# convert the required fields to dataframe
n_prof = EN4_profile['N_PROF'].to_dataframe()
juld = EN4_profile['JULD'].to_dataframe()
lat = EN4_profile['LATITUDE'].to_dataframe()
lon = EN4_profile['LONGITUDE'].to_dataframe()
sal = EN4_profile['PSAL_CORRECTED'].to_dataframe()
temp = EN4_profile['TEMP'].to_dataframe()
depth = EN4_profile['DEPH_CORRECTED'].to_dataframe()

# subset the lats and lons to model domain
lat_ind = np.where((lat <= -10) & (lat >= -50 ))[0]
lat = lat.iloc[lat_ind]
lon_ind = np.where((lon <= 70) & (lon >= 0))[0]
lon = lon.iloc[lon_ind]

In [15]:
# Join lats and lons for first dataset, 'tester'
tester = lat.join(lon, how='inner')

In [16]:
# join other dataframes with inner join (Use intersection of keys from both frames)
tester2 = tester.join(depth,how='inner').join(temp, how='inner').join(sal, how='inner').join(juld, how='inner')
tester2

Unnamed: 0_level_0,Unnamed: 1_level_0,LATITUDE,LONGITUDE,DEPH_CORRECTED,TEMP,PSAL_CORRECTED,JULD
N_PROF,N_LEVELS,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
21,0,-43.943001,10.397,4.959837,7.933,34.209751,2009-01-01 13:34:46.000017
21,1,-43.943001,10.397,9.919554,7.851,34.218380,2009-01-01 13:34:46.000017
21,2,-43.943001,10.397,14.879150,7.834,34.221432,2009-01-01 13:34:46.000017
21,3,-43.943001,10.397,19.838627,7.814,34.225231,2009-01-01 13:34:46.000017
21,4,-43.943001,10.397,24.797983,7.698,34.236179,2009-01-01 13:34:46.000017
...,...,...,...,...,...,...,...
32588,395,-29.983000,13.367,,,,2009-01-31 22:39:00.000008
32588,396,-29.983000,13.367,,,,2009-01-31 22:39:00.000008
32588,397,-29.983000,13.367,,,,2009-01-31 22:39:00.000008
32588,398,-29.983000,13.367,,,,2009-01-31 22:39:00.000008


In [7]:
# Create list of index values to be referenced later
tester2.index[0::400]
lst = [i[0] for i in tester2.index[0::400]]
lst[0]

'''
Hereafter, removal of NaN depths should be considered, then the building of the
subsequent months of profile data should be done, too.
The subsequent months should have unqiue station values, too, so the current thinking
is to add len(N_PROF) to the next months N_PROF consecutively.
'''

NameError: name 'tester2' is not defined

In [107]:
# Testing retrieval of max depth value
tester2.loc[lst[0]]['DEPH_CORRECTED'].max()

'''
Here the max value can be retrieved, but the advantage of having each depth value
in the final dataframe is that the entire depthvalues file can match the depth
values from the profile, as opposed to Bjorn's 5 m incremental approach.
'''

1876.2327

In [38]:
EN4_profile

<xarray.Dataset>
Dimensions:                       (N_CALIB: 1, N_HISTORY: 0, N_LEVELS: 400, N_PARAM: 5, N_PROF: 32595)
Dimensions without coordinates: N_CALIB, N_HISTORY, N_LEVELS, N_PARAM, N_PROF
Data variables:
    CALIBRATION_DATE              (N_PROF, N_CALIB, N_PARAM) |S14 b'' ... b''
    CYCLE_NUMBER                  (N_PROF) int32 -2147483647 ... -2147483647
    DATA_CENTRE                   (N_PROF) |S2 b'MO' b'MO' b'MO' ... b'MO' b'MO'
    DATA_MODE                     (N_PROF) |S1 b'D' b'D' b'D' ... b'D' b'D' b'D'
    DATA_STATE_INDICATOR          (N_PROF) |S4 b'2C+ ' b'2C+ ' ... b'2C+ '
    DATA_TYPE                     |S16 b'ENSEMBLES EN3 v1'
    DATE_CREATION                 |S14 b'20170421133031'
    DATE_UPDATE                   |S14 b'20170421133031'
    DC_REFERENCE                  (N_PROF) |S16 b' A20090101-02729' ... b' A20090131-65980'
    DEPH_CORRECTED                (N_PROF, N_LEVELS) float32 4.3566914 ... nan
    DEPH_CORRECTED_QC             (N_PROF, N_LEVEL

## Building the loop

In [25]:
# Get list of profile netcdf files
profiles = glob.glob('../Data/EN4_profiles/EN.4.2.1.f.profiles.g10.20090[12].nc')
metadata = pd.DataFrame()
total_profiles = 0
profile_list = []

for filename in profiles:
    print(filename)
    ds = xr.open_dataset(filename)
    
    # increment N_PROF by previous total to ensure unique IDs
    ds['N_PROF']+=total_profiles
    total_profiles = ds['N_PROF'][-1].values
    
    # convert the required fields to dataframes
    n_prof = ds['N_PROF'].to_dataframe()
    juld = ds['JULD'].to_dataframe()
    lat = ds['LATITUDE'].to_dataframe()
    lon = ds['LONGITUDE'].to_dataframe()
    sal = ds['PSAL_CORRECTED'].to_dataframe()
    temp = ds['TEMP'].to_dataframe()
    depth = ds['DEPH_CORRECTED'].to_dataframe()

    # subset the lats and lons to model domain
    lat_ind = np.where((lat <= -10) & (lat >= -50 ))[0]
    lat = lat.iloc[lat_ind]
    lon_ind = np.where((lon <= 70) & (lon >= 0))[0]
    lon = lon.iloc[lon_ind]
    
    # join dataframes with inner join (Use intersection of keys from both frames)
    file_metadata = lat.join(lon, how='inner').join(depth,how='inner').join(temp, how='inner').join(sal, how='inner').join(juld, how='inner')
    profile_list.extend([i[0] for i in file_metadata.index[0::400]])
    file_metadata = file_metadata.dropna(subset=['DEPH_CORRECTED'])
    metadata = metadata.append(file_metadata)

../Data/EN4_profiles/EN.4.2.1.f.profiles.g10.200901.nc
../Data/EN4_profiles/EN.4.2.1.f.profiles.g10.200902.nc


In [34]:
metadata.loc[profile_list[10]]['DEPH_CORRECTED'].values

array([  78.440575,   82.9078  ,   88.06981 ,   92.53683 ,   97.301544,
        102.36393 ,  106.9299  ,  111.69428 ,  121.12346 ,  130.5522  ,
        140.57597 ,  150.2023  ,  159.332   ,  169.65204 ,  179.6739  ,
        188.90152 ,  198.6248  ,  208.2484  ,  218.36758 ,  227.39507 ,
        237.5133  ,  247.13509 ,  256.75644 ,  265.98056 ,  285.7169  ,
        304.9555  ,  324.09317 ,  343.42734 ,  362.7597  ,  391.11057 ,
        420.05215 ,  457.8089  ,  505.66382 ,  553.9039  ,  602.5287  ,
        651.43915 ,  699.64526 ,  748.7307  ,  798.596   ,  847.1636  ,
        895.7199  ,  945.35223 ,  994.0833  , 1043.1982  , 1092.104   ,
       1141.591   , 1190.6714  , 1239.4441  , 1288.8962  , 1338.2382  ,
       1387.3711  , 1436.1967  , 1534.9971  , 1632.963   , 1731.2773  ,
       1829.2509  , 1926.9824  , 2024.3739  ], dtype=float32)

In [47]:
'''
Next step is to create the depthlevels files (completed; below) and the station files (to be completed) in the loop or in a new loop
'''

# depthlevels = np.arange(5, maxdepth[i], 5)
# depthlevels = np.append(depthlevels, maxdepth[i])
# f = open("DEPTHLEVEL_FILES/depthlevels.in."+str(stn[i])+"."+str(year[i])+"_"+str("%03d" % (doy[i])), "w")
# f.write(str(len(depthlevels))+"                  # Number of z levels"+"\n")
# for x in range(0, len(depthlevels)):
# f.write("%s\n" % depthlevels[x])

# f.close()

stn = profile_list[10]
date = pd.to_datetime(metadata.loc[profile_list[10]]['JULD'][0])
doy = date.dayofyear - 1
year = date.year

depthlevels = metadata.loc[profile_list[10]]['DEPH_CORRECTED'].values
f = open("depthlevels.in."+str(stn)+"."+str(year)+"_"+str("%03d" % (doy)), "w")
f.write(str(len(depthlevels))+"                  # Number of z levels"+"\n")
for depth in range(0, len(depthlevels)):
    f.write("%s\n" % depthlevels[depth])
f.close()

In [27]:
ds['N_PROF']

<xarray.DataArray 'N_PROF' (N_PROF: 36406)>
array([36545, 36546, 36547, ..., 72948, 72949, 72950])
Coordinates:
  * N_PROF   (N_PROF) int64 36545 36546 36547 36548 ... 72947 72948 72949 72950

In [26]:
ds['N_PROF']+=test

In [24]:
test = ds['N_PROF'][-1].values

In [25]:
test

array(36475)