### Build local cache file from Argo data sources
*Execute commands to pull data from the Internet into a local HDF file so that we can better interact with the data*

On a development system (where we have not executed a `pip install oxyfloat`) we need to add the oxyfloat directory to the Python search path. Do this before starting the notebook server with (replace `~/dev/oxyfloatgit/` with the directory where you cloned the oxyfloat project):

```bash
export PYTHONPATH=~/dev/oxyfloatgit/
cd ~/dev/oxyfloatgit/notebooks
ipython notebook
```

Alternatively, you can set the path interactively, e.g.:

In [1]:
import sys
sys.path.insert(0, 'c:\Users\saca\Documents\GitHub\oxyfloat')

Import the OxyFloat class and instatiate an OxyFloat object (`of`) with verbosity set to 2 so that we get INFO messages.

In [2]:
from oxyfloat import OxyFloat
of = OxyFloat(verbosity=2)

You can now explore what methods the of object has by typing `of.` in a cell and pressing the tab key. One of the methods is `get_oxy_floats()`; to see what it does select it and press shift-tab with the cursor in the parantheses of `of.get_oxy_floats()`. Let's get a list of all the floats that have been out for at least 304 days and print the length of that list.

In [3]:
%%time
floats340 = of.get_oxy_floats(age_gte=340)
print('{} floats at least 340 days old'.format(len(floats340)))

563 floats at least 340 days old
CPU times: user 159 ms, sys: 55 ms, total: 214 ms
Wall time: 278 ms


If this the first time you've executed the cell it will take half minute or so to read the Argo status information from the Internet (the PerformanceWarning can be ignored - for this small table it doesn't matter much). 

Once the status information is read it is cached locally and further calls to `get_oxy_floats()` will execute much faster. To demonstrate, let's count all the oxygen labeled floats that have been out for at least 2 years. 

In [4]:
%%time
floats730 = of.get_oxy_floats(age_gte=730)
print('{} floats at least 730 days old'.format(len(floats730)))

400 floats at least 730 days old
CPU times: user 124 ms, sys: 32 ms, total: 156 ms
Wall time: 196 ms


Now let's find the Data Assembly Center URL for each of the floats in our list.

In [5]:
%%time
dac_urls = of.get_dac_urls(floats340)
print(len(dac_urls))
dac_urls[:5]

562
CPU times: user 756 ms, sys: 4 ms, total: 760 ms
Wall time: 807 ms


In [6]:
import xray

In [None]:
for url in of.get_profile_opendap_urls(dac_urls[0]):
    ds = xray.open_dataset(url)
    import pdb; pdb.set_trace()
    d = of.get_profile_data(url)
    print d
    break

> <ipython-input-7-4a3b5665cf4e>(4)<module>()
-> d = of.get_profile_data(url)
(Pdb) p ds
<xray.Dataset>
Dimensions:                       (N_CALIB: 1, N_HISTORY: 4, N_LEVELS: 70, N_PARAM: 4, N_PROF: 1)
Coordinates:
  * N_CALIB                       (N_CALIB) int64 0
  * N_HISTORY                     (N_HISTORY) int64 0 1 2 3
  * N_LEVELS                      (N_LEVELS) int64 0 1 2 3 4 5 6 7 8 9 10 11 ...
  * N_PARAM                       (N_PARAM) int64 0 1 2 3
  * N_PROF                        (N_PROF) int64 0
Data variables:
    DATA_TYPE                     object 'Argo profile merged             '
    FORMAT_VERSION                object '3.1 '
    HANDBOOK_VERSION              object '1.2 '
    REFERENCE_DATE_TIME           object '19500101000000'
    DATE_CREATION                 object '20150603183627'
    DATE_UPDATE                   object '20150806164718'
    PLATFORM_NUMBER               (N_PROF) object '1900722 '
    PROJECT_NAME                  (N_PROF) object 'US ARGO P

In [None]:
ds