<h3> ABSTRACT </h3>

All CMEMS in situ data products can be found and downloaded after [registration](http://marine.copernicus.eu/services-portfolio/register-now/) via [CMEMS catalogue] (http://marine.copernicus.eu/services-portfolio/access-to-products/).

Such channel is advisable just for sporadic netCDF donwloading because when operational, interaction with the web user interface is not practical. In this context though, the use of scripts for ftp file transference is is a much more advisable approach.

As long as every line of such files contains information about the netCDFs contained within the different directories [see at tips why](https://github.com/CopernicusMarineInsitu/INSTACTraining/blob/master/tips/README.md), it is posible for users to loop over its lines to download only those that matches a number of specifications such as spatial coverage, time coverage, provider, data_mode, parameters or file_name related (region, data type, TS or PF, platform code, or/and platform category, timestamp).

<h3>PREREQUISITES</h3>

- [credentias](http://marine.copernicus.eu/services-portfolio/register-now/)
- aimed [in situ product name](http://cmems-resources.cls.fr/documents/PUM/CMEMS-INS-PUM-013.pdf)
- aimed [hosting distribution unit](https://github.com/CopernicusMarineInsitu/INSTACTraining/blob/master/tips/README.md)
- aimed [index file](https://github.com/CopernicusMarineInsitu/INSTACTraining/blob/master/tips/README.md)

i.e:

In [5]:
user = '' #type CMEMS user name within colons
password = ''  #type CMEMS password within colons
product_name = 'INSITU_BAL_NRT_OBSERVATIONS_013_032' #type aimed CMEMS in situ product 
distribution_unit = 'cmems.smhi.se' #type aimed hosting institution
index_file = 'index_history.txt' #type aimed index file name

#remember! platform category only for history and monthly directories

<h3>DOWNLOAD</h3>

1. Index file download

In [6]:
import ftplib 

In [7]:
ftp=ftplib.FTP(distribution_unit,user,password) 
ftp.cwd("Core")
ftp.cwd(product_name) 
local_file = open(index_file, 'wb')
ftp.retrbinary('RETR ' + index_file, local_file.write)
local_file.close()
ftp.quit()
#ready when 221 Goodbye.!

'221 Goodbye.'

<h3>QUICK VIEW</h3>

In [6]:
import numpy as np
import pandas as pd
from random import randint

In [7]:
index = np.genfromtxt(index_file, skip_header=6, unpack=False, delimiter=',', dtype=None,
           names=['catalog_id', 'file_name', 'geospatial_lat_min', 'geospatial_lat_max',
                     'geospatial_lon_min', 'geospatial_lon_max',
                     'time_coverage_start', 'time_coverage_end', 
                     'provider', 'date_update', 'data_mode', 'parameters'])

In [8]:
dataset = randint(0,len(index)) #ramdom line of the index file

In [9]:
values = [index[dataset]['catalog_id'], '<a href='+index[dataset]['file_name']+'>'+index[dataset]['file_name']+'</a>', index[dataset]['geospatial_lat_min'], index[dataset]['geospatial_lat_max'],
                 index[dataset]['geospatial_lon_min'], index[dataset]['geospatial_lon_max'], index[dataset]['time_coverage_start'],
                 index[dataset]['time_coverage_end'], index[dataset]['provider'], index[dataset]['date_update'], index[dataset]['data_mode'],
                 index[dataset]['parameters']]
headers = ['catalog_id', 'file_name', 'geospatial_lat_min', 'geospatial_lat_max',
                     'geospatial_lon_min', 'geospatial_lon_max',
                     'time_coverage_start', 'time_coverage_end', 
                     'provider', 'date_update', 'data_mode', 'parameters']
df = pd.DataFrame(values, index=headers, columns=[dataset])
df.style

Unnamed: 0,922
catalog_id,COP-BO-01
file_name,ftp://cmems.smhi.se/Core/INSITU_BAL_NRT_OBSERVATIONS_013_032/history/vessel/BO_PR_CT_VEJ0005790.nc
geospatial_lat_min,55.8501
geospatial_lat_max,56
geospatial_lon_min,9.6357
geospatial_lon_max,10
time_coverage_start,1981-05-11T12:00:00Z
time_coverage_end,2012-12-06T10:57:00Z
provider,DMU
date_update,2017-05-07T16:03:51Z


<h3>FILTERING CRITERIA</h3>

Regarding the above glimpse, it is posible to filter by 12 criteria. As example we will setup next a filter to only download those files that contains data within a defined boundingbox.

    1. Aimed category 

In [20]:
targeted_category = 'drifter'

    2. netCDF filtering/selection

In [21]:
selected_netCDFs = [];

for netCDF in index:    
    file_name = netCDF['file_name']
    
    folders = file_name.split('/')[3:len(file_name.split('/'))-1]
    category = file_name.split('/')[3:len(file_name.split('/'))-1][len(file_name.split('/')[3:len(file_name.split('/'))-1])-1]
    
    if (category == targeted_category):
        selected_netCDFs.append(file_name)
            
print("total: " +str(len(selected_netCDFs)))

total: 11


<h3> SELECTION DOWNLOAD </h3>

In [24]:
for nc in selected_netCDFs:

    last_idx_slash = nc.rfind('/')
    ncdf_file_name = nc[last_idx_slash+1:]
    folders = nc.split('/')[3:len(nc.split('/'))-1]
    host = nc.split('/')[2] #or distribution unit
    
    ftp=ftplib.FTP(host,user,password) 
    for folder in folders:
        ftp.cwd(folder)
                                   
    local_file = open(ncdf_file_name, 'wb')
    ftp.retrbinary('RETR '+ncdf_file_name, local_file.write)
    local_file.close()                             

    ftp.quit()