<h1>MANDATORY PACKAGES</h1>

In [1]:
import os
import datetime
import numpy as np
from collections import namedtuple
import pandas as pd
import ftputil #pip install ftputil
from shapely.geometry import box #conda install Shapely
import folium # conda install -c conda-forge folium
from folium import plugins

<ul><b>Warning!</b>: Some of the packages will need a prior installation. A clear indication of this will be the see a <i style="color:red">ModuleNotFoundError: No module named '{module name}}'</i> when running the next cell.<br>
For each package throwing this error, please open first the Anaconda Powershell Prompt and run the installing command specified after the '#' next to the package.</ul>

<h1>AUXILIARY FUNCTIONS</h1>

Next functions have been created in order to smooth the process of searching for In Situ data in CMEMS FTP servers. <br>
These are basic ones implemntations so feel free to tune them further, specially the checkers (i.e the bbox_checker) to better fulfill your needs.

In [2]:
def cmems_hosts(): 
    #dictionary of available FTP servers hosting CMEMS data
    return {'NRT': 'nrt.cmems-du.eu', 'REP': 'my.cmems-du.eu'} 

In [3]:
def bbox_check(netCDF, search_area):
    #filter-out netCDFs whose bounding-box is not within the aimed area (search_area)
    #please use shapely documentation (https://shapely.readthedocs.io/en/stable/manual.html) to play with different relationships between geometric objects – contains, intersects, overlaps, touches, etc
    geospatial_lat_min = float(netCDF['geospatial_lat_min'])
    geospatial_lat_max = float(netCDF['geospatial_lat_max'])
    geospatial_lon_min = float(netCDF['geospatial_lon_min'])
    geospatial_lon_max = float(netCDF['geospatial_lon_max'])
    targeted_bounding_box = box(search_area[0], search_area[1], search_area[2], search_area[3])
    bounding_box = box(geospatial_lon_min, geospatial_lat_min, geospatial_lon_max, geospatial_lat_max)        
    if targeted_bounding_box.contains(bounding_box): 
        return True
    else:
        return False

In [4]:
def timerange_check(netCDF, search_timerange):
    #filter-out netCDFs whose timerange does not overlap with the aimed times (search_timerange)
    date_format = "%Y-%m-%dT%H:%M:%SZ" 
    targeted_ini = datetime.datetime.strptime(search_timerange[0], date_format)
    targeted_end = datetime.datetime.strptime(search_timerange[1], date_format)

    time_start = datetime.datetime.strptime(netCDF['time_coverage_start'].decode('utf-8'), date_format)
    time_end = datetime.datetime.strptime(netCDF['time_coverage_end'].decode('utf-8'), date_format)
    
    Range = namedtuple('Range', ['start', 'end'])
    r1 = Range(start=targeted_ini, end=targeted_end)
    r2 = Range(start=time_start, end=time_end)
    
    latest_start = max(r1.start, r2.start)
    earliest_end = min(r1.end, r2.end)
    delta = (earliest_end - latest_start).days + 1
    overlap = max(0, delta)
    if overlap != 0:
        return True
    else:
        return False

In [5]:
def parameters_check(netCDF, search_parameters):
    #filter-out those netCDFs not containing any of the aimed parameters (search_parameters)
    #see more at: https://archimer.ifremer.fr/doc/00422/53381/
    params = netCDF['parameters'].decode('utf-8').split(' ')
    result = False
    for param in params:
        if param in search_parameters:
            result = True
    return result

In [6]:
def sources_check(netCDF, search_sources):
    #filter-out those netCDFs not coming from the aimed data sources (search_sources)
    #see more at  http://resources.marine.copernicus.eu/documents/PUM/CMEMS-INS-PUM-013-048.pdf
    ftplink = netCDF['file_name'].decode('utf-8')
    result = False
    for source in search_sources:
        if source == 'TS':
            source = 'TS_TS'
        if '_'+source+'_' in ftplink:
            result = True
    return result

In [7]:
#dictionary of checkers
checkers = {
    'bbox': bbox_check,
    'timerange': timerange_check,
    'parameters': parameters_check,
    'sources': sources_check
}

In [8]:
def search(configuration, threshold=None, output=None, output_dir=None):
    #Access the FTPserver, product and archive to search for those netCDFs matching the conditions set in the configuration
    #if output is set to None (default behaviour) it returns the lis of matching files per archive.
    #if output is set to 'files' it download the matching files.
    #if output is set to 'map' it returns a map with the matching files's bbox-centrois.
    matches = {}
    host = [cmems_hosts()[key] for key in cmems_hosts().keys() if key in configuration['product']][0]
    map = folium.Map(zoom_start=5)
    with ftputil.FTPHost(host, configuration['user'], configuration['password']) as ftp_host:
        archives = configuration['archives']
        for item in archives:
            marker_cluster = plugins.MarkerCluster(name=item,overlay=True,control=True)
            counter, matches[item], index_file  = 0, [],'index_'+item+'.txt'
            columns = ['catalog_id', 'file_name','geospatial_lat_min', 'geospatial_lat_max', 'geospatial_lon_min','geospatial_lon_max','time_coverage_start', 'time_coverage_end', 'provider', 'date_update', 'data_mode', 'parameters']
            #open the index file to read
            with ftp_host.open("Core"+'/'+configuration['product']+'/'+index_file, "r") as indexfile:
                #read the index file as a comma-separate-value file
                index = np.genfromtxt(indexfile, skip_header=6, unpack=False, delimiter=',', dtype=None, names=columns)
                dataframe = pd.DataFrame(index)
                #loop over the lines/netCDFs and download the most suitable ones for you
                for netCDF in index:
                    if threshold != None and counter > threshold-1:
                        break
                    values = [checkers[key](netCDF,val) for key,val in configuration['searching_criteria'].items() if val != None]
                    if False not in values:  #save netCDF if meeting all selection criteria
                        counter = counter + 1
                        decoded_metadata = [metadata.decode('utf-8') if isinstance(metadata, bytes) else metadata for metadata in list(netCDF)]
                        matches[item].append({key: val for key,val in zip(columns,decoded_metadata)})
                        #getting ftplink, filepath and filename
                        ftplink = netCDF['file_name'].decode('utf-8')
                        filepath = '/'.join(ftplink.split('/')[3:len(ftplink.split('/'))])
                        ncdf_file_name = ftplink[ftplink.rfind('/')+1:]
                        if output=='map':
                            lat_min = netCDF['geospatial_lat_min']
                            lat_max = netCDF['geospatial_lat_max']
                            lon_min = netCDF['geospatial_lon_min']
                            lon_max = netCDF['geospatial_lon_max']                           
                            try:
                                bounding_box  = box(lon_min, lat_min, lon_max, lat_max)
                            except Exception as e:
                                bounding_box  = box(float(lon_min), float(lat_min), float(lon_max), float(lat_max))
                            x,y = bounding_box.centroid.x, bounding_box.centroid.y
                            marker = folium.Marker([y,x])
                            popup_content = '<br>'.join('<b>'+key+'</b> : '+str(val) for key,val in zip(columns,decoded_metadata))
                            folium.Popup(popup_content).add_to(marker)
                            marker_cluster.add_child(marker)
                        if output=='files':
                            directory = output_dir if output_dir != None else os.getcwd()
                            os.chdir(directory)
                            print('...Downloading from '+item+' : '+ncdf_file_name)
                            ftp_host.download(filepath, ncdf_file_name) #download netCDF
            ftp_host.close()
            marker_cluster.add_to(map)
        folium.LayerControl().add_to(map)
        if output == None:
            [print('Found '+str(len(matches[item]))+' macthes in '+item) for item in archives]
            print('Search completed!')
            return matches
        if output == 'map':
            [print('Found '+str(len(matches[item]))+' macthes in '+item) for item in archives]
            print('....Displaying files boundingBox centroids:')
            print('warning!: open the notebook with chrome if map does not display')
            return map
        if output == 'files':
            print('Download completed!')
            return

<h1>SETTINGS</h1>

In [9]:
configuration = {
    'user': '', #type CMEMS user name <= Don't you have one? ask here: http://marine.copernicus.eu/services-portfolio/register-now/
    'password': '', #type CMEMS password <= Don't you have one? ask here: http://marine.copernicus.eu/services-portfolio/register-now/
    'product': 'INSITU_IBI_NRT_OBSERVATIONS_013_033', #options: INSITU_IBI_TS_REP_OBSERVATIONS_013_040 or INSITU_IBI_NRT_OBSERVATIONS_013_033
    'archives': ['history'], #options: history (NRT & REP), monthly (NRT), latest (NRT)
    'searching_criteria':{
        'bbox': [-8.6, 36.78, -12.6, 41.9], #Define here the area you want to check for data (expected order: south-east longitude, south-east latitude, north-west longitude, north-west latitude)
        'timerange':  ['2019-04-01T00:00:00Z', '2019-05-30T23:59:59Z'],#Define here the time-range you want to check for data (expected format: "YYYY-mm-ddTHH:MM:SSZ")
        'parameters': ['TEMP', 'PSAL'], #Define here the parameters you are interested in (see more at: https://archimer.ifremer.fr/doc/00422/53381/
        'sources':  ['MO']#Define here the sources you are interested in (see more at: http://resources.marine.copernicus.eu/documents/PUM/CMEMS-INS-PUM-013-048.pdf),
    }
}

If you do not want to apply any of the above searching_criteria provide an None value instead.

Achives refers to the collection of netCDFs to explore:
<ul>
<li><b>Latest</b>: to access last 30 days of data => one file/platform/day</li>
<li><b>Monthy</b>: to access last 5 years of data => one file/platform/month</li>
<li><b>History</b>: to access all available data => one file/platform</li>
</ul>

<H1>MATCHING FILES</H1>

Search files matching the above configuration:

In [10]:
search_result = search(configuration)

  if __name__ == '__main__':


Found 3 macthes in history
Search completed!


Get a quick view of the files:

In [11]:
pd.DataFrame(search_result['history'])

Unnamed: 0,catalog_id,data_mode,date_update,file_name,geospatial_lat_max,geospatial_lat_min,geospatial_lon_max,geospatial_lon_min,parameters,provider,time_coverage_end,time_coverage_start
0,COP-IR-01,R,2019-04-03T10:20:02Z,ftp://nrt.cmems-du.eu/Core/INSITU_IBI_NRT_OBSE...,41.15,41.15,-9.58,-9.58,DEPH VMDR WSPD FLU3 VTM02 ATMS VTPK RELH VHM0 ...,Instituto Hidrográfico da Marinha Portuguesa (...,2019-04-03T10:00:00Z,2010-05-23T00:00:00Z
1,COP-IR-01,R,2019-04-03T10:20:02Z,ftp://nrt.cmems-du.eu/Core/INSITU_IBI_NRT_OBSE...,39.51,39.51,-9.64,-9.64,DEPH VMDR WSPD FLU3 ATMS VTPK RELH VTM02 VZMX ...,Instituto Hidrográfico da Marinha Portuguesa (...,2019-04-03T10:00:00Z,2009-04-27T19:00:00Z
2,COP-IR-01,R,2019-04-03T10:20:02Z,ftp://nrt.cmems-du.eu/Core/INSITU_IBI_NRT_OBSE...,39.56,39.56,-9.21,-9.21,DEPH VMDR FLU3 WSPD ATMS VTM02 VTPK RELH VHM0 ...,Instituto Hidrográfico da Marinha Portuguesa (...,2019-04-03T10:00:00Z,2010-06-12T12:00:00Z


Locate on a map matching files bounding-box centroids:

In [12]:
search(configuration, output='map')

  if __name__ == '__main__':


Found 3 macthes in history
....Displaying files boundingBox centroids:


Download the files matching the configuration:

In [13]:
search(configuration, output='files')

  if __name__ == '__main__':


...Downloading from history : IR_TS_MO_6200191.nc
...Downloading from history : IR_TS_MO_6200192.nc
...Downloading from history : IR_TS_MO_6200199.nc
Download completed!
