# Data Access through flux

Data must be accessible for it to be useful.

Standard APIs let us bridge the gap between new technologies (Zarr/Icechunk/Arraylake) & legacy workflows.

This means we can make our data fit different formats.

This notebook demonstrates a few ways this can work using *Flux* - Earthmover's Data Delivery service.

Funamentally Flux enables access through standard Web APIs

```python
    "https://compute.earthmover.io"
    # the "web tiles" service
    "/v1/services/tiles/"
    # {org}/{repo}
    "earthmover-public/era5-surface-aws"
    # {branch}/{path/to/group}/{service}
    "/main/spatial/tiles/"
    # {tiling-system}/... <- as required by the OGC Tiles standard
    "WebMercatorQuad/{z}/{y}/{x}"
    # query parameters, here selecting variable and setting colorbar range
    "?variables=tcc&colorscalerange=0,0.0000001"
    # setting size of PNGs
    "&width=512&height=512"
```

Read the docs about Flux here: https://docs.earthmover.io/flux/

_As we work through the material below, notice how we never use the Arraylake Client_

## Grab tabular formats via the EDR API


### Serve data to tabular library users

Here we use pandas and specify the _csv_ format

In [21]:
import pandas as pd

pd.read_csv(
    "https://compute.earthmover.io/v1/services/edr/"
    # {org}/{repo}
    "earthmover-public/era5-surface-aws"
    # {branch}/{path/to/group}/edr/{type-of-query}?
    "/main/temporal/edr/position?"
    # select variable `sd` at coordinates
    "parameter-name=sd&coords=POINT(286%2040.0150)"
    # output format
    "&f=csv"
)

Unnamed: 0,time,latitude,longitude,sd,spatial_ref
0,1975-01-01 00:00:00,40.0,286.0,0.000159,0
1,1975-01-01 01:00:00,40.0,286.0,0.000172,0
2,1975-01-01 02:00:00,40.0,286.0,0.000179,0
3,1975-01-01 03:00:00,40.0,286.0,0.000195,0
4,1975-01-01 04:00:00,40.0,286.0,0.000203,0
...,...,...,...,...,...
438307,2024-12-31 19:00:00,40.0,286.0,0.000000,0
438308,2024-12-31 20:00:00,40.0,286.0,0.000000,0
438309,2024-12-31 21:00:00,40.0,286.0,0.000000,0
438310,2024-12-31 22:00:00,40.0,286.0,0.000000,0


### Export a csv to a file

In [39]:
%%bash

curl "https://compute.earthmover.io/v1/services/edr/"\
"earthmover-public/era5-surface-aws"\
"/main/temporal/edr/position?"\
"parameter-name=sd&coords=POINT(286%2040.0150)"\
"&f=csv" > era5_timeseries.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    67  100    67    0     0    237      0 --:--:-- --:--:-- --:--:--   237


In [40]:
!head era5_timeseries.csv

time,latitude,longitude,sd,spatial_ref
2020-01-01,40.0,286.0,0.0,0


### Exercise: Arraylake data in Google Sheets

Use the `IMPORTDATA` function to grab data from the csv link.

Grabbing the whole timeseries will fail (too much data for the sheet); but you can specify a datetime range using `&datetime=2020-01-01/2020-12-31`.

Your cell should look like `=IMPORTDATA(URL)`


### Exercise 2: Does this work in Excel?
### Exercise 3: Adapt that solution to your own dataset.

## Serve map tiles via the Tiles service

### In the notebook

In [62]:
from ipyleaflet import Map, TileLayer, basemaps

url = (
    "https://compute.earthmover.io"
    "/v1/services/tiles/earthmover-public/era5-surface-aws"
    "/main/spatial/tiles/WebMercatorQuad/{z}/{y}/{x}"
    "?variables=t2&colorscalerange=280,310&width=512&height=512"
)
m = Map(center=(-40, 160), zoom=4)
tile_layer = TileLayer(url=url, show_loading=True, opacity=0.5)
m.add(tile_layer)
m

Map(center=[-40, 160], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_t…

### As a standalone MapBox app

See `mapbox.html` in this same directory. Right click and open in a new browser window.

Note this HTML file uses a token that I will delete after the workshop. Replace `mapboxgl.accessToken` with your token, if you try this in the future.

### Exercise: Update `mapbox.html` to visualize your own data

## Access your data in netCDF tools using OPeNDAP


Let's use `ncdump -h` to get a listing of all metadata in this _Zarr_ group

In [72]:
!ncdump -h https://compute.earthmover.io/v1/services/dap2/earthmover-public/era5-surface-aws/main/temporal/opendap

netcdf opendap {
dimensions:
	latitude = 721 ;
	longitude = 1440 ;
	time = 438312 ;
variables:
	double time(time) ;
		time:long_name = "time" ;
		time:axis = "T" ;
		time:units = "hours since 1975-01-01" ;
		time:calendar = "proleptic_gregorian" ;
	double latitude(latitude) ;
		latitude:long_name = "latitude" ;
		latitude:short_name = "lat" ;
		latitude:units = "degrees_north" ;
		latitude:axis = "Y" ;
		latitude:_FillValue = NaN ;
	double longitude(longitude) ;
		longitude:long_name = "longitude" ;
		longitude:short_name = "lon" ;
		longitude:units = "degrees_east" ;
		longitude:axis = "X" ;
		longitude:_FillValue = NaN ;
	float blh(time, latitude, longitude) ;
		blh:long_name = "Boundary layer height" ;
		blh:short_name = "blh" ;
		blh:units = "m" ;
		blh:original_format = "WMO GRIB 1 with ECMWF local table" ;
		blh:ecmwf_local_table = 128 ;
		blh:ecmwf_parameter = 159 ;
		blh:minimum_value = 7.5058069229126 ;
		blh:maximum_value = 6227.10009765625 ;
		blh:grid_specification = "0.25 

### Exercise: work with your data in your favorite netcdf viewer (ncview?)

## Discussion:

1. What other programs or interfaces would you like to expose for your data?