<h1> DOWNLOAD IN SITU NETCDF FILES PROGRAMATICALLY</h1>

STEPS
<ul>
<li>Set your credentials</li>
<li>Target an In Situ product, Host and index file</li>
<li>Loop over the Index File and download the netCDFs that suits you (choose by metadat or filename)</li>
</ul>

<H2>MADATORY PYTHON LIBRARIES</H2>

In [157]:
import ftputil
import numpy as np
import os

<H2>SET OUTPUT DIRECTORY</H2>

In [167]:
output_directory = os.getcwd() #default to current working directory

<h2> SET YOUR CREDENTIALS</h2>

In [163]:
user = '' #type CMEMS user name
password = '' #type CMEMS password

<h2> TARGET A PRODUCT, HOST AND INDEX FILE</h2>

There are Near Real Time products (NRT) and Reprocessed (REP) In Situ products. Depending on the product you will have a host and a number of index files to choose.<br><br>
NRT products:<ul>
<li> host: <i>nrt.cmems-du.eu</i> </li><li> available index files: <i>index_latest.txt</i>, <i>index_monthly.txt</i> and <i>index_history.txt</i> </li>
</ul>
<br>
REP products: <ul>
<li> host: <i>my.cmems-du</i> </li><li> available index file: <i>index_history.txt</i> </li>
</ul>

In [164]:
product_name = '' #type aimed In Situ product i.e INSITU_MED_NRT_OBSERVATIONS_013_035
host = '' #type aimed host (nrt.cmems-du.eu or my.cmems-du) i.e. nrt.cmems-du.eu
index_file = '' #type aimed index file i.e index_latest.txt

<h2>DOWNLOAD ALL AVAILABLE NETCDFS - NO SELECTION CRITERIA</h2>

In [172]:
#connect to CMEMS FTP
with ftputil.FTPHost(host, user, password) as ftp_host: 
    
    #open the index file to read
    with ftp_host.open("Core"+'/'+product_name+'/'+index_file, "r") as indexfile:
        
        #read the index file as a comma-separate-value file
        index = np.genfromtxt(indexfile, skip_header=6, unpack=False, delimiter=',', dtype=None, names=['catalog_id', 'file_name','geospatial_lat_min', 'geospatial_lat_max', 'geospatial_lon_min','geospatial_lon_max','time_coverage_start', 'time_coverage_end', 'provider', 'date_update', 'data_mode', 'parameters'])
        
        #loop over the lines/netCDFs and download the most sutable ones for you
        for netCDF in index:
            
            #getting ftplink, filepath and filename
            ftplink = netCDF['file_name'].decode('utf-8')
            filepath = '/'.join(ftplink.split('/')[3:len(ftplink.split('/'))])
            ncdf_file_name = ftplink[ftplink.rfind('/')+1:]
            
            #download netDCF
            if ftp_host.path.isfile(filepath):
                cwd = os.getcwd()
                os.chdir(output_directory)
                ftp_host.download(filepath, ncdf_file_name)  # remote, local
                os.chdir(cwd)

<h2>DOWNLOAD NETCDFS MATCHING A CERTAIN CRITERIA</h2>

There are 12 file metadata:
<ul>
<li>catalog_id</li>
<li>file_name</li>
<li>geospatial_lat_min</li>
<li>geospatial_lat_max</li>
<li>geospatial_lon_min</li>
<li>geospatial_lon_max</li>
<li>time_coverage_start</li>
<li>time_coverage_end</li>
<li>date_update</li>
<li>data_mode</li>
<li>parameters</li>
</ul>

<h3>DOWNLOAD NETCDFS WITH CERTAIN PARAMETERS</h3>

In [None]:
#connect to CMEMS FTP
with ftputil.FTPHost(host, user, password) as ftp_host: 
    
    #open the index file to read
    with ftp_host.open("Core"+'/'+product_name+'/'+index_file, "r") as indexfile:
        
        #read the index file as a comma-separate-value file
        index = np.genfromtxt(indexfile, skip_header=6, unpack=False, delimiter=',', dtype=None, names=['catalog_id', 'file_name','geospatial_lat_min', 'geospatial_lat_max', 'geospatial_lon_min','geospatial_lon_max','time_coverage_start', 'time_coverage_end', 'provider', 'date_update', 'data_mode', 'parameters'])
        
        #selection criteria: parameter i.e PSAL
        parameter = 'PSAL'
            
        #loop over the lines/netCDFs and download the most sutable ones for you
        for netCDF in index:
            
            #getting ftplink, filepath and filename
            ftplink = netCDF['file_name'].decode('utf-8')
            filepath = '/'.join(ftplink.split('/')[3:len(ftplink.split('/'))])
            ncdf_file_name = ftplink[ftplink.rfind('/')+1:]
            
            #download netCDF if meeting selection criteria
            parameters = netCDF['parameters'].decode('utf-8')
            param_list = parameters.split(' ');
            if parameter in param_list: 
                if ftp_host.path.isfile(filepath):
                    cwd = os.getcwd()
                    os.chdir(output_directory)
                    ftp_host.download(filepath, ncdf_file_name)  # remote, local
                    os.chdir(cwd)

<h3>DOWNLOAD NETCDFS FROM A CERTAIN PLATFORM</h3>

In [58]:
def get_filename_structure(filename):
    elements = filename.split('.nc')[0].split('_')
    json = {}
    if len(elements[1]) == 2: #history file RR_XX_YY_CODE<_ZZZ>.nc
        json['region'] = elements[0]
        json['data'] = elements[1]
        json['type'] = elements[2]
        json['code'] = elements[3]
    elif elements[1] == 'LATEST': #RR_LATEST_XX_YY_CODE_YYYYMMDD.nc
        json['region'] = elements[0]
        json['data'] = elements[2]
        json['type'] = elements[3]
        json['code'] = elements[4]
        json['timestamp'] = elements[5]
    else: #monthly file RR_YYYYMM_XX_YY_CODE.nc
        json['region'] = elements[0]
        json['data'] = elements[2]
        json['type'] = elements[3]
        json['code'] = elements[4]
        json['timestamp'] = elements[1]
    return json

In [155]:
#connect to CMEMS FTP
with ftputil.FTPHost(host, user, password) as ftp_host: 
    
    #open the index file to read
    with ftp_host.open("Core"+'/'+product_name+'/'+index_file, "r") as indexfile:
        
        #read the index file as a comma-separate-value file
        index = np.genfromtxt(indexfile, skip_header=6, unpack=False, delimiter=',', dtype=None, names=['catalog_id', 'file_name','geospatial_lat_min', 'geospatial_lat_max', 'geospatial_lon_min','geospatial_lon_max','time_coverage_start', 'time_coverage_end', 'provider', 'date_update', 'data_mode', 'parameters'])
        
        #selection criteria: platform_type i.e MO
        platform_type = 'MO'
        
        #loop over the lines/netCDFs and download the most sutable ones for you
        for netCDF in index:
            
            #getting ftplink, filepath and filename
            ftplink = netCDF['file_name'].decode('utf-8')
            filepath = '/'.join(ftplink.split('/')[3:len(ftplink.split('/'))])
            ncdf_file_name = ftplink[ftplink.rfind('/')+1:]
            
            #download netCDF if meeting selection criteria
            filename_convention = get_filename_structure(ncdf_file_name)
            if filename_convention['type'] == platform_type: 
                if ftp_host.path.isfile(filepath):
                    cwd = os.getcwd()
                    os.chdir(output_directory)
                    ftp_host.download(filepath, ncdf_file_name)  # remote, local
                    os.chdir(cwd)

<h3>DOWNLOAD NETCDFS WITHIN A CERTAIN TIMERANGE</h3>

In [83]:
import datetime

In [88]:
#connect to CMEMS FTP
with ftputil.FTPHost(host, user, password) as ftp_host: 
    
    #open the index file to read
    with ftp_host.open("Core"+'/'+product_name+'/'+index_file, "r") as indexfile:
        
        #read the index file as a comma-separate-value file
        index = np.genfromtxt(indexfile, skip_header=6, unpack=False, delimiter=',', dtype=None, names=['catalog_id', 'file_name','geospatial_lat_min', 'geospatial_lat_max', 'geospatial_lon_min','geospatial_lon_max','time_coverage_start', 'time_coverage_end', 'provider', 'date_update', 'data_mode', 'parameters'])
        
        #selection criteria: time coverage
        date_format = "%Y-%m-%dT%H:%M:%SZ" 
        ini = datetime.datetime.strptime('2018-05-01T00:00:00Z', date_format)
        end = datetime.datetime.strptime('2018-05-31T23:59:59Z', date_format)        
        
        #loop over the lines/netCDFs and download the most sutable ones for you
        for netCDF in index:
            
            #getting ftplink, filepath and filename
            ftplink = netCDF['file_name'].decode('utf-8')
            filepath = '/'.join(ftplink.split('/')[3:len(ftplink.split('/'))])
            ncdf_file_name = ftplink[ftplink.rfind('/')+1:]
            
            #download netCDF if meeting selection criteria
            time_start = datetime.datetime.strptime(netCDF['time_coverage_start'].decode('utf-8'), date_format)
            time_end = datetime.datetime.strptime(netCDF['time_coverage_start'].decode('utf-8'), date_format)
            if time_start > ini  and time_end < end: 
                if ftp_host.path.isfile(filepath):
                    cwd = os.getcwd()
                    os.chdir(output_directory)
                    ftp_host.download(filepath, ncdf_file_name)  # remote, local
                    os.chdir(cwd)

<h3>DOWNLOAD NETCDFS WITHIN A CERTAIN BOUNDINGBOX</h3>

In [141]:
from shapely.geometry import box
import folium

In [152]:
#connect to CMEMS FTP
with ftputil.FTPHost(host, user, password) as ftp_host: 
    
    #open the index file to read
    with ftp_host.open("Core"+'/'+product_name+'/'+index_file, "r") as indexfile:
        
        #read the index file as a comma-separate-value file
        index = np.genfromtxt(indexfile, skip_header=6, unpack=False, delimiter=',', dtype=None, names=['catalog_id', 'file_name','geospatial_lat_min', 'geospatial_lat_max', 'geospatial_lon_min','geospatial_lon_max','time_coverage_start', 'time_coverage_end', 'provider', 'date_update', 'data_mode', 'parameters'])
        
        #selection criteria: spatial coverage
        targeted_geospatial_lat_min = 39.0   # enter min latitude of your bounding box
        targeted_geospatial_lat_max =  40.0   # enter max latitude of your bounding box
        targeted_geospatial_lon_min = 2.0  # enter min longitude of your bounding box
        targeted_geospatial_lon_max =  3.00  # enter max longitude of your bounding box   
        targeted_bounding_box = box(targeted_geospatial_lon_min, targeted_geospatial_lat_min, targeted_geospatial_lon_max, targeted_geospatial_lat_max)
        
        map = folium.Map(location=[targeted_bounding_box.centroid.y, targeted_bounding_box.centroid.x], zoom_start=7)
        folium.PolyLine([[targeted_geospatial_lat_min, targeted_geospatial_lon_min],[targeted_geospatial_lat_min, targeted_geospatial_lon_max],[targeted_geospatial_lat_max, targeted_geospatial_lon_max],[targeted_geospatial_lat_max, targeted_geospatial_lon_min], [targeted_geospatial_lat_min, targeted_geospatial_lon_min]]).add_to(map)
            
        #loop over the lines/netCDFs and download the most sutable ones for you
        for netCDF in index:
            
            #getting ftplink, filepath and filename
            ftplink = netCDF['file_name'].decode('utf-8')
            filepath = '/'.join(ftplink.split('/')[3:len(ftplink.split('/'))])
            ncdf_file_name = ftplink[ftplink.rfind('/')+1:]

            #download netCDF if meeting selection criteria
            geospatial_lat_min = float(netCDF['geospatial_lat_min'])
            geospatial_lat_max = float(netCDF['geospatial_lat_max'])
            geospatial_lon_min = float(netCDF['geospatial_lon_min'])
            geospatial_lon_max = float(netCDF['geospatial_lon_max'])
            bounding_box = box(geospatial_lon_min, geospatial_lat_min, geospatial_lon_max, geospatial_lat_max)
                    
            if targeted_bounding_box.disjoint(bounding_box) is False: 
                if ftp_host.path.isfile(filepath):
                    folium.Marker(location = [bounding_box.centroid.y, bounding_box.centroid.x], popup=ncdf_file_name).add_to(map)
                    cwd = os.getcwd()
                    os.chdir(output_directory)
                    ftp_host.download(filepath, ncdf_file_name)  # remote, local
                    os.chdir(cwd)

In [153]:
map