Let's start here! If you can directly link to an image relevant to your notebook, such as [canonical logos](https://github.com/numpy/numpy/blob/main/doc/source/_static/numpylogo.svg), do so here at the top of your notebook. You can do this with Markdown syntax,

> `![<image title>](http://link.com/to/image.png "image alt text")`

or edit this cell to see raw HTML `img` demonstration. This is preferred if you need to shrink your embedded image. **Either way be sure to include `alt` text for any embedded images to make your content more accessible.**

<img src="images/ProjectPythia_Logo_Final-01-Blue.svg" width=250 alt="Project Pythia Logo"></img>

# Accessing Argo Data

---

## Overview

Building upon previous notebook, [Introduction to Argo](notebooks/argo-introduction.ipynb), we next explore how to access Argo data using various methods.

These methods are described in more detail on their respective websites, linked below. Our goal here is to provide a brief overview of some of the different tools available. 

1. [GO-BGC Toolbox](https://github.com/go-bgc/workshop-python) 
2. [Argopy](https://argopy.readthedocs.io/en/latest/user-guide/fetching-argo-data/index.html), a dedicated Python package
3. [Argovis](https://argovis.colorado.edu/argo) for API-based queries 

<!-- 2. Downloading [monthly snapshots](http://www.argodatamgt.org/Access-to-data/Argo-DOI-Digital-Object-Identifier) using Argo DOI's -->
<!-- 4. Using the [GO-BGC Toolbox](https://github.com/go-bgc/workshop-python) -->

After going through this notebook, you will be able to retrieve Argo data of interest within a certain time frame, geographical location, or by platform identifier. There are many other ways of working with Argo data, so we encourage users to explore what applications work best for their needs. 
Further information on Argo access can be found on the [Argo website](https://argo.ucsd.edu/data/).

## Prerequisites

Label the importance of each concept explicitly as **helpful/necessary**.

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Numpy](https://numpy.org/learn/) | Necessary | |
| [Intro to NetCDF](https://foundations.projectpythia.org/core/data-formats/netcdf-cf.html) | Necessary | Familiarity with metadata structure |
| [Intro to Xarray](https://foundations.projectpythia.org/core/xarray.html) | Necessary | |

- **Time to learn**: 20 min


---

## Imports
Begin your body of content with another `---` divider before continuing into this section, then remove this body text and populate the following code cell with all necessary Python imports **up-front**:

In [59]:
# Import packages
import sys
import os
import numpy as np
import pandas as pd
import scipy
import xarray as xr
from datetime import datetime, timedelta

import requests
import time
import urllib3
import shutil

import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
import seaborn as sns
from cmocean import cm as cmo

from argovisHelpers import helpers as avh

## 1. Downloading with the GO-BGC Toolbox

In the previous notebook, [Introduction to Argo](notebooks/argo-introduction.ipynb), we saw how Argo synthetic profile ('[sprof](https://archimer.ifremer.fr/doc/00445/55637/)') data is stored in netcdf4 format.

Using the GDAC function allows you to subset and download Sprof's for multiple floats. 
We recommend this tool for users who only need a few profilesd in a specific area of interest. 
Considerations: 
- Easy to use and understand
- Downloads float data as individual .nc files to your local machine (takes up storage space)
- Must download all variables available (cannot subset only variables of interest)

The two major functions below are courtesy of the [GO-BGC Toolbox](https://github.com/go-bgc/workshop-python) (Ethan Campbell). A full tutorial is available in the Toolbox.


In [65]:
# # Base filepath. Need for Argo GDAC function.z
# root = '/Users/sangminsong/Library/CloudStorage/OneDrive-UW/Code/2024_Pythia/'
# profile_dir = root + 'SOCCOM_GO-BGC_LoResQC_LIAR_28Aug2023_netcdf/'

# # Base filepath. Need for Argo GDAC function.
root = '../data/'
profile_dir = root + 'bgc-argo/'

### 1.0 GO-BGC Toolbox Functions

In [63]:
# Function to download a single file (From GO-BGC Toolbox)
def download_file(url_path,filename,save_to=None,overwrite=False,verbose=True):
    """ Downloads and saves a file from a given URL using HTTP protocol.

    Note: If '404 file not found' error returned, function will return without downloading anything.
    
    Arguments:
        url_path: root URL to download from including trailing slash ('/')
        filename: filename to download including suffix
        save_to: None (to download to root Google Drive GO-BGC directory)
                 or directory path
        overwrite: False to leave existing files in place
                   or True to overwrite existing files
        verbose: True to announce progress
                 or False to stay silent
    
    """
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

    if save_to is None:
      save_to = root #profile_dir  # EDITED HERE

    try:
      if filename in os.listdir(save_to):
          if not overwrite:
              if verbose: print('>>> File ' + filename + ' already exists. Leaving current version.')
              return
          else:
              if verbose: print('>>> File ' + filename + ' already exists. Overwriting with new version.')

      def get_func(url,stream=True):
          try:
              return requests.get(url,stream=stream,auth=None,verify=False)
          except requests.exceptions.ConnectionError as error_tag:
              print('Error connecting:',error_tag)
              time.sleep(1)
              return get_func(url,stream=stream)

      response = get_func(url_path + filename,stream=True)

      if response.status_code == 404:
          if verbose: print('>>> File ' + filename + ' returned 404 error during download.')
          return
      with open(save_to + filename,'wb') as out_file:
          shutil.copyfileobj(response.raw,out_file)
      del response
      if verbose: print('>>> Successfully downloaded ' + filename + '.')

    except:
      if verbose: print('>>> An error occurred while trying to download ' + filename + '.')

In [64]:
# Function to download and parse GDAC synthetic profile index file (GO-BGC Toolbox)
def argo_gdac(lat_range=None,lon_range=None,start_date=None,end_date=None,sensors=None,floats=None,
              overwrite_index=False,overwrite_profiles=False,skip_download=False,
              download_individual_profs=False,save_to=None,verbose=True):
  """ Downloads GDAC Sprof index file, then selects float profiles based on criteria.
      Either returns information on profiles and floats (if skip_download=True) or downloads them (if False).

      Arguments:
          lat_range: None, to select all latitudes
                     or [lower, upper] within -90 to 90 (selection is inclusive)
          lon_range: None, to select all longitudes
                     or [lower, upper] within either -180 to 180 or 0 to 360 (selection is inclusive)
                     NOTE: longitude range is allowed to cross -180/180 or 0/360
          start_date: None or datetime object
          end_date:   None or datetime object
          sensors: None, to select profiles with any combination of sensors
                   or string or list of strings to specify required sensors
                   > note that common options include PRES, TEMP, PSAL, DOXY, CHLA, BBP700,
                                                      PH_IN_SITU_TOTAL, and NITRATE
          floats: None, to select any floats matching other criteria
                  or int or list of ints specifying floats' WMOID numbers
          overwrite_index: False to keep existing downloaded GDAC index file, or True to download new index
          overwrite_profiles: False to keep existing downloaded profile files, or True to download new files
          skip_download: True to skip download and return: (, ,
                                                            )
                         or False to download those profiles
          download_individual_profs: False to download single Sprof file containing all profiles for each float
                                     or True to download individual profile files for each float
          save_to: None to download to Google Drive "/GO-BGC Workshop/Profiles" directory
                   or string to specify directory path for profile downloads
          verbose: True to announce progress, or False to stay silent

  """
  # Paths
  url_root = 'https://www.usgodae.org/ftp/outgoing/argo/'
  dac_url_root = url_root + 'dac/'
  index_filename = 'argo_synthetic-profile_index.txt'
  if save_to is None: save_to = root

  # Download GDAC synthetic profile index file
  download_file(url_root,index_filename,overwrite=overwrite_index)

  # Load index file into Pandas DataFrame
  gdac_index = pd.read_csv(root + index_filename,delimiter=',',header=8,parse_dates=['date','date_update'],
                          date_parser=lambda x: pd.to_datetime(x,format='%Y%m%d%H%M%S'))

  # Establish time and space criteria
  if lat_range is None:  lat_range = [-90.0,90.0]
  if lon_range is None:  lon_range = [-180.0,180.0]
  elif lon_range[0] > 180 or lon_range[1] > 180:
    if lon_range[0] > 180: lon_range[0] -= 360
    if lon_range[1] > 180: lon_range[1] -= 360
  if start_date is None: start_date = datetime(1900,1,1)
  if end_date is None:   end_date = datetime(2200,1,1)

  float_wmoid_regexp = r'[a-z]*/[0-9]*/profiles/[A-Z]*([0-9]*)_[0-9]*[A-Z]*.nc'
  gdac_index['wmoid'] = gdac_index['file'].str.extract(float_wmoid_regexp).astype(int)
  filepath_main_regexp = '([a-z]*/[0-9]*/)profiles/[A-Z]*[0-9]*_[0-9]*[A-Z]*.nc'
  gdac_index['filepath_main'] = gdac_index['file'].str.extract(filepath_main_regexp)
  filepath_regexp = '([a-z]*/[0-9]*/profiles/)[A-Z]*[0-9]*_[0-9]*[A-Z]*.nc'
  gdac_index['filepath'] = gdac_index['file'].str.extract(filepath_regexp)
  filename_regexp = '[a-z]*/[0-9]*/profiles/([A-Z]*[0-9]*_[0-9]*[A-Z]*.nc)'
  gdac_index['filename'] = gdac_index['file'].str.extract(filename_regexp)

  # Subset profiles based on time and space criteria
  gdac_index_subset = gdac_index.loc[np.logical_and.reduce([gdac_index['latitude'] >= lat_range[0],
                                                            gdac_index['latitude'] <= lat_range[1],
                                                            gdac_index['date'] >= start_date,
                                                            gdac_index['date'] <= end_date]),:]
  if lon_range[1] >= lon_range[0]:    # range does not cross -180/180 or 0/360
    gdac_index_subset = gdac_index_subset.loc[np.logical_and(gdac_index_subset['longitude'] >= lon_range[0],
                                                             gdac_index_subset['longitude'] <= lon_range[1])]
  elif lon_range[1] < lon_range[0]:   # range crosses -180/180 or 0/360
    gdac_index_subset = gdac_index_subset.loc[np.logical_or(gdac_index_subset['longitude'] >= lon_range[0],
                                                            gdac_index_subset['longitude'] <= lon_range[1])]

  # If requested, subset profiles using float WMOID criteria
  if floats is not None:
    if type(floats) is not list: floats = [floats]
    gdac_index_subset = gdac_index_subset.loc[gdac_index_subset['wmoid'].isin(floats),:]

  # If requested, subset profiles using sensor criteria
  if sensors is not None:
    if type(sensors) is not list: sensors = [sensors]
    for sensor in sensors:
      gdac_index_subset = gdac_index_subset.loc[gdac_index_subset['parameters'].str.contains(sensor),:]

  # Examine subsetted profiles
  wmoids = gdac_index_subset['wmoid'].unique()
  wmoid_filepaths = gdac_index_subset['filepath_main'].unique()

  # Just return list of floats and DataFrame with subset of index file, or download each profile
  if not skip_download:
    downloaded_filenames = []
    if download_individual_profs:
      for p_idx in gdac_index_subset.index:
        download_file(dac_url_root + gdac_index_subset.loc[p_idx]['filepath'],
                      gdac_index_subset.loc[p_idx]['filename'],
                      save_to=save_to,overwrite=overwrite_profiles,verbose=verbose)
        downloaded_filenames.append(gdac_index_subset.loc[p_idx]['filename'])
    else:
      for f_idx, wmoid_filepath in enumerate(wmoid_filepaths):
        download_file(dac_url_root + wmoid_filepath,str(wmoids[f_idx]) + '_Sprof.nc',
                      save_to=save_to,overwrite=overwrite_profiles,verbose=verbose)
        downloaded_filenames.append(str(wmoids[f_idx]) + '_Sprof.nc')
    return wmoids, gdac_index_subset, downloaded_filenames
  else:
    return wmoids, gdac_index_subset

### 1.1 Using GDAC function to access Argo subsets

In [None]:
# dont download, just get wmoids
# wmoids, gdac_index = argo_gdac(lat_range=lat_bounds,lon_range=lon_bounds,
#                                start_date=start_yd,end_date=end_yd,
#                                sensors=None,floats=None,
#                                overwrite_index=True,overwrite_profiles=False,
#                                skip_download=True,download_individual_profs=False,
#                                save_to=profile_dir,verbose=True)

# download specific float #5906030 
wmoids, gdac_index, downloaded_filenames \
                   = argo_gdac(lat_range=None,lon_range=None,
                               start_date=None,end_date=None,
                               sensors=None,floats=5906030,
                               overwrite_index=True,overwrite_profiles=False,
                               skip_download=False,download_individual_profs=False,
                               save_to=profile_dir,verbose=True)

In [None]:
# DSdict = {}
# for filename in os.listdir(profile_dir):
#     if filename.endswith(".nc"):
#         fp = profile_dir + filename
#         single_dataset = xr.open_dataset(fp, decode_times=False)
#         DSdict[filename[0:7]] = single_dataset
# # DSdict['5906030']

## 2. Using the Argopy Python Package

## 3. Querying Data with Argovis

Argovis provides an API that allows us to interact with Argo data while only downloading the exact subsets of data needed for analysis. 
Our examples here are modified from the [tutorial notebooks](https://github.com/argovis/demo_notebooks) released by Argovis. We showcase only a few of the functionalities, but more information can be found in the previous link.

The introduction published by Argovis:
>"Argovis is a REST API and web application for searching, downloading, co-locating and visualizing oceanographic data, including Argo array data, ship-based profile data, data from the Global Drifter Program, tropical cyclone data, and several gridded products. Our API is meant to be integrated into living documents like Jupyter notebooks and analyses intended to update their consumption of Argo data in near-real-time, and our web frontend is intended to make it easy for students and educators to explore data about Earth's oceans at will."

Argovis should be cited as:

Tucker, T., D. Giglio, M. Scanderbeg, and S.S.P. Shen: Argovis: A Web Application for Fast Delivery, Visualization, and Analysis of Argo Data. J. Atmos. Oceanic Technol., 37, 401–416, https://doi.org/10.1175/JTECH-D-19-0041.1


### Getting started with `argovisHelpers`

From the Argovis tutorial: 
> In order to allocate Argovis's limited computing resources fairly, users are encouraged to register and request a free API key. This works like a password that identifies your requests to Argovis. To do so:
>
> - Visit [https://argovis-keygen.colorado.edu/](https://argovis-keygen.colorado.edu/)
> - Fill out the form under _New Account Registration_
> - An API key will be emailed to you shortly.
>
> Treat this API key like a password - don't share it or leave it anywhere public. If you ever forget it or accidentally reveal it to a third party, see the same website above to change or deactivate your token.
>
> Put your API key in the quotes in the variable below before moving on:

In [5]:
API_ROOT='https://argovis-api.colorado.edu/'
API_KEY='de6ee72a54bc5ca29dee5c801cab13fa4a354985'

### Getting Argo data documents

Before actually getting Argo measurements, we can query information about the profile (including pointers to the metadata).

In [9]:
argoSearch = {
    'startDate': '2013-05-01T00:00:00Z',
    'endDate': '2023-05-01T00:00:00Z',
    'center': '-22.5,0',
    'radius': 100
}

argoProfiles = avh.query('argo', options=argoSearch, apikey=API_KEY, apiroot=API_ROOT)
argoProfiles[0]

{'_id': '1901820_256',
 'geolocation': {'type': 'Point', 'coordinates': [-22.75594, -0.2218]},
 'basin': 1,
 'timestamp': '2023-04-09T18:34:30.001Z',
 'date_updated_argovis': '2023-07-14T10:44:14.125Z',
 'source': [{'source': ['argo_core'],
   'url': 'ftp://ftp.ifremer.fr/ifremer/argo/dac/aoml/1901820/profiles/R1901820_256.nc',
   'date_updated': '2023-07-13T22:33:14.000Z'}],
 'cycle_number': 256,
 'geolocation_argoqc': 1,
 'profile_direction': 'A',
 'timestamp_argoqc': 1,
 'vertical_sampling_scheme': 'Primary sampling: averaged [nominal 2 dbar binned data sampled at 0.5 Hz from a SBE41CP]',
 'data_info': [['pressure',
   'pressure_argoqc',
   'salinity',
   'salinity_argoqc',
   'temperature',
   'temperature_argoqc'],
  ['units', 'data_keys_mode'],
  [['decibar', 'A'],
   [None, None],
   ['psu', 'A'],
   [None, None],
   ['degree_Celsius', 'A'],
   [None, None]]],
 'metadata': ['1901820_m0']}

In [33]:
argoProfiles[0]['_id']

'1901820_256'

Note that the first object in argoProfiles is a single vertical Argo "profile". 
The first 7 digits of `argoProfiles[0]['_id']` refer to a float's WMO unique identification number. 
The last three digits are the profile number. 

In the above example, we are looking at data from the 256th profile from float WMO #1901820.

We can get more information about this particular float by querying `argo/meta`.

In [34]:
metaOptions = {
    'id': argoProfiles[0]['metadata'][0]
}
argoMeta = avh.query('argo/meta', options=metaOptions, apikey=API_KEY, apiroot=API_ROOT)
argoMeta

[{'_id': '1901820_m0',
  'data_type': 'oceanicProfile',
  'data_center': 'AO',
  'instrument': 'profiling_float',
  'pi_name': ['BRECK OWENS', ' STEVEN JAYNE', ' P.E. ROBBINS'],
  'platform': '1901820',
  'platform_type': 'S2A',
  'fleetmonitoring': 'https://fleetmonitoring.euro-argo.eu/float/1901820',
  'oceanops': 'https://www.ocean-ops.org/board/wa/Platform?ref=1901820',
  'positioning_system': 'GPS',
  'wmo_inst_type': '854'}]

We can also specify all of the profiles taken from the same float with WMO ID 1901820.

In [31]:
platformSearch = {
    'platform': argoMeta[0]['platform']
}

platformProfiles = avh.query('argo', options=platformSearch, apikey=API_KEY, apiroot=API_ROOT)
print(len(platformProfiles))

301


### Making `data` queries

Now, we want to retrieve actual measurements. We can use any number of identifiers. 

Below, we are specifying float WMO 4901283 and profile #003. The `data` variable can be:

- A comma separated list of variable names, e.g. `'temperature, doxy'`
- `'all'`, meaning get all available variables. 

In [40]:
dataQuery = {
    'id': '4901283_003',
    'data': 'all'
}
profile = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)
# avh.data_inflate(profile[0])[0:10]

We can query float profiles within larger bounds: 

In [26]:
dataQuery = {
    'startDate': '2020-01-01T00:00:00Z',
    'endDate': '2024-01-01T00:00:00Z',
    'polygon': [[-150,-30],[-155,-30],[-155,-35],[-150,-35],[-150,-30]],
    'data': 'doxy'
}

profiles = avh.query('argo', options=dataQuery, apikey=API_KEY, apiroot=API_ROOT)

In [22]:
inflated_data = avh.data_inflate(profiles[0])
inflated_data[0:10]

[{'doxy': 235.335724, 'pressure': 7.6},
 {'doxy': 235.327026, 'pressure': 13.07},
 {'doxy': 235.418045, 'pressure': 17.720001},
 {'doxy': 235.212158, 'pressure': 22.02},
 {'doxy': 235.242828, 'pressure': 26.68},
 {'doxy': 235.235306, 'pressure': 31.320002},
 {'doxy': 235.273743, 'pressure': 36.709999},
 {'doxy': 235.165115, 'pressure': 41.73},
 {'doxy': 235.16153, 'pressure': 48.260002},
 {'doxy': 235.032471, 'pressure': 54.619999}]

### Querying within geospatial bounds

In [None]:
qs = {
    'startDate': '2017-08-01T00:00:00Z',
    'endDate': '2017-09-01T00:00:00Z',
    'box': [[-20,70],[20,72]]
}

profiles = avh.query('argo', options=qs, apikey=API_KEY, apiroot=API_ROOT)
latitudes = [x['geolocation']['coordinates'][1] for x in profiles]
print(min(latitudes))
print(max(latitudes))

### Subsection to the second section

#### a quick demonstration

##### of further and further

###### header levels

as well $m = a * t / h$ text! Similarly, you have access to other $\LaTeX$ equation [**functionality**](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Typesetting%20Equations.html) via MathJax (demo below from link),

\begin{align}
\dot{x} & = \sigma(y-x) \\
\dot{y} & = \rho x - y - xz \\
\dot{z} & = -\beta z + xy
\end{align}

Check out [**any number of helpful Markdown resources**](https://www.markdownguide.org/basic-syntax/) for further customizing your notebooks and the [**Jupyter docs**](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) for Jupyter-specific formatting information. Don't hesitate to ask questions if you have problems getting it to look *just right*.

## Last Section

If you're comfortable, and as we briefly used for our embedded logo up top, you can embed raw html into Jupyter Markdown cells (edit to see):

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
    Your relevant information here!
</div>

Feel free to copy this around and edit or play around with yourself. Some other `admonitions` you can put in:

<div class="admonition alert alert-success">
    <p class="admonition-title" style="font-weight:bold">Success</p>
    We got this done after all!
</div>

<div class="admonition alert alert-warning">
    <p class="admonition-title" style="font-weight:bold">Warning</p>
    Be careful!
</div>

<div class="admonition alert alert-danger">
    <p class="admonition-title" style="font-weight:bold">Danger</p>
    Scary stuff be here.
</div>

We also suggest checking out Jupyter Book's [brief demonstration](https://jupyterbook.org/content/metadata.html#jupyter-cell-tags) on adding cell tags to your cells in Jupyter Notebook, Lab, or manually. Using these cell tags can allow you to [customize](https://jupyterbook.org/interactive/hiding.html) how your code content is displayed and even [demonstrate errors](https://jupyterbook.org/content/execute.html#dealing-with-code-that-raises-errors) without altogether crashing our loyal army of machines!

---

## Summary
Add one final `---` marking the end of your body of content, and then conclude with a brief single paragraph summarizing at a high level the key pieces that were learned and how they tied to your objectives. Look to reiterate what the most important takeaways were.

### What's next?
Let Jupyter book tie this to the next (sequential) piece of content that people could move on to down below and in the sidebar. However, if this page uniquely enables your reader to tackle other nonsequential concepts throughout this book, or even external content, link to it here!

## Resources and references
Finally, be rigorous in your citations and references as necessary. Give credit where credit is due. Also, feel free to link to relevant external material, further reading, documentation, etc. Then you're done! Give yourself a quick review, a high five, and send us a pull request. A few final notes:
 - `Kernel > Restart Kernel and Run All Cells...` to confirm that your notebook will cleanly run from start to finish
 - `Kernel > Restart Kernel and Clear All Outputs...` before committing your notebook, our machines will do the heavy lifting
 - Take credit! Provide author contact information if you'd like; if so, consider adding information here at the bottom of your notebook
 - Give credit! Attribute appropriate authorship for referenced code, information, images, etc.
 - Only include what you're legally allowed: **no copyright infringement or plagiarism**
 
Thank you for your contribution!