<a href="https://colab.research.google.com/github/MathewBiddle/erddap_copy/blob/main/create_EDDTableCopy_snippets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Create xml snippets for existing ERDDAP datasets to copy to local ERDDAP.

This process copies the data from one ERDDAP to a local ERDDAP and serves that data via the local ERDDAP.

It heavily leverages ERDDAP's built in capabilities with [EDDTableCopy](https://erddap.github.io/docs/server-admin/datasets#eddtablecopy) and [EDDTableFromErddap](https://erddap.github.io/docs/server-admin/datasets#eddfromerddap).

In [None]:
!pip install erddapy



In [None]:
from erddapy import ERDDAP
import pandas as pd

First, we'd like to search for the CeNCOOS ERDDAP. We want to narrow our focus a little bit to ensure we know what data we are working with. For this we will only return datasets that align to ERDDAP's `tabledap` response and have a `cdm_data_type` of `TimeSeries`. We bring back the information as a dataframe to use later.

In [None]:
from erddapy import servers

cencoos_servers = {k: v.url for k, v in servers.items() if 'cencoos' in v.url}

df_out = pd.DataFrame()

for s, url in cencoos_servers.items():

    e = ERDDAP(server=url, protocol="tabledap")

    kw = {"cdm_data_type": "TimeSeries"}

    url = e.get_search_url(response="csv", **kw)

    df = pd.read_csv(url)

    df_out = pd.concat([df_out,df], ignore_index=True)

df_out.sample(n=5)

Unnamed: 0,griddap,Subset,tabledap,Make A Graph,wms,files,Title,Summary,FGDC,ISO 19115,Info,Background Info,RSS,Email,Institution,Dataset ID
289,,,http://erddap.cencoos.org/erddap/tabledap/jws-...,http://erddap.cencoos.org/erddap/tabledap/jws-...,,,JWS Patton Cove,Timeseries data from 'JWS Patton Cove' (jws-pa...,http://erddap.cencoos.org/erddap/metadata/fgdc...,http://erddap.cencoos.org/erddap/metadata/iso1...,http://erddap.cencoos.org/erddap/info/jws-patt...,https://sensors.ioos.us/#metadata/135153/station,http://erddap.cencoos.org/erddap/rss/jws-patto...,http://erddap.cencoos.org/erddap/subscriptions...,California State University Long Beach,jws-patton-cove
67,,,http://erddap.cencoos.org/erddap/tabledap/edu_...,http://erddap.cencoos.org/erddap/tabledap/edu_...,,,"234 - Santa Barbara Island North, CA (46262)",Timeseries data from '234 - Santa Barbara Isla...,http://erddap.cencoos.org/erddap/metadata/fgdc...,http://erddap.cencoos.org/erddap/metadata/iso1...,http://erddap.cencoos.org/erddap/info/edu_ucsd...,https://sensors.ioos.us/#metadata/103480/station,http://erddap.cencoos.org/erddap/rss/edu_ucsd_...,http://erddap.cencoos.org/erddap/subscriptions...,Coastal Data Information Program (CDIP),edu_ucsd_cdip_234
8,,,http://erddap.cencoos.org/erddap/tabledap/edu_...,http://erddap.cencoos.org/erddap/tabledap/edu_...,,,"073 - Scripps Pier, La Jolla, CA (LJPC1)","Timeseries data from '073 - Scripps Pier, La J...",http://erddap.cencoos.org/erddap/metadata/fgdc...,http://erddap.cencoos.org/erddap/metadata/iso1...,http://erddap.cencoos.org/erddap/info/edu_ucsd...,https://sensors.ioos.us/#metadata/103411/station,http://erddap.cencoos.org/erddap/rss/edu_ucsd_...,http://erddap.cencoos.org/erddap/subscriptions...,Coastal Data Information Program (CDIP),edu_ucsd_cdip_073
17,,,http://erddap.cencoos.org/erddap/tabledap/edu_...,http://erddap.cencoos.org/erddap/tabledap/edu_...,,,"101 - Torrey Pines Inner, CA (46273)",Timeseries data from '101 - Torrey Pines Inner...,http://erddap.cencoos.org/erddap/metadata/fgdc...,http://erddap.cencoos.org/erddap/metadata/iso1...,http://erddap.cencoos.org/erddap/info/edu_ucsd...,https://sensors.ioos.us/#metadata/103420/station,http://erddap.cencoos.org/erddap/rss/edu_ucsd_...,http://erddap.cencoos.org/erddap/subscriptions...,Coastal Data Information Program (CDIP),edu_ucsd_cdip_101
461,,,http://erddap.cencoos.org/erddap/tabledap/gov_...,http://erddap.cencoos.org/erddap/tabledap/gov_...,,,Sacramento,Timeseries data from 'Sacramento' (gov_noaa_wa...,http://erddap.cencoos.org/erddap/metadata/fgdc...,http://erddap.cencoos.org/erddap/metadata/iso1...,http://erddap.cencoos.org/erddap/info/gov_noaa...,https://sensors.ioos.us/#metadata/130065/station,http://erddap.cencoos.org/erddap/rss/gov_noaa_...,http://erddap.cencoos.org/erddap/subscriptions...,"NOAA Water Resources Regions, National Weather...",gov_noaa_water_atic1


Okay, we we have a table of datasets we'd like to copy and add to our ERDDAP. Our next task is to start building the xml snippets for each of the datasets.

Here, we will use the python package jinja2 to create a slew of xml snippets for the datasets of interest. So, first we need to create a template xml snippet for `EDDTableCopy` and `EDDTableFromErddap` from which we can build our individual xml snippets.

In [None]:
%%writefile templates/datasets_template.xml
<dataset type="EDDTableCopy" datasetID="{{configs.datasetID}}_EDDTableCopy" active="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <extractDestinationNames>{{configs.destinationNames}}</extractDestinationNames>
  <checkSourceData>true</checkSourceData>
  <dataset type="EDDTableFromErddap" datasetID="{{configs.datasetID}}" active="true">
    <sourceUrl>{{configs.datasetURL}}</sourceUrl>
  </dataset>
</dataset>

Overwriting templates/datasets_template.xml


Using the template we just created, we can write some functions to help us insert the appropriate information into the files and write the files out.

In [None]:
import os
from jinja2 import Environment, FileSystemLoader

def write_html_index(template, configs):
    root = os.path.dirname(os.path.abspath('.'))
    # root = path to output directory
    fname = f"datasets_{configs['datasetID']}_EDDTableCopy.xml"
    filename = os.path.join("datasets/", fname)
    with open(filename, "w", encoding="utf-8") as fh:
        fh.write(template.render(configs=configs))


def load_template():
    root = os.path.dirname(os.path.abspath('.'))
    templates_dir = "templates/"
    env = Environment(loader=FileSystemLoader(templates_dir))
    template = env.get_template("datasets_template.xml")
    return template


def write_templates(configs):
    template = load_template()
    write_html_index(template, configs)

def main(configs):
    write_templates(configs)

Before we go whole hog, let's test this with one dataset.

In [None]:
dset = df_out.loc[df_out['Institution'].str.contains("University")].iloc[0]

configs = {
    'datasetID': dset['Dataset ID'],
    'destinationNames': 'station latitude longitude',
    'datasetURL': dset['tabledap']
}

main(configs)

Take a look at the resultant dataset snippet.

In [None]:
%cat datasets/datasets_bodega-head-intertidal-shore-sta_EDDTableCopy.xml

<dataset type="EDDTableCopy" datasetID="bodega-head-intertidal-shore-sta_EDDTabl
eCopy" active="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <extractDestinationNames>station latitude longitude</extractDestinationNames>
  <checkSourceData>true</checkSourceData>
  <dataset type="EDDTableFromErddap" datasetID="bodega-head-intertidal-shore-sta
" active="true">
    <sourceUrl>http://erddap.cencoos.org/erddap/tabledap/bodega-head-intertidal-
shore-sta</sourceUrl>
  </dataset>
</dataset>


## Create xml snippet for each dataset of interest
Now, let's go whole hog and create snippets for all the "University" affiliated datasets.


Assumptions:
1. The source provider (CenCOOS, in this case) is okay with all of this.
1. all `destinationNames` exist in the specific dataset. (`station`, `latitude`, `longitude`)


In [None]:
dsets = df_out.loc[df_out['Institution'].str.contains("University")]

for index, dset in dsets.iterrows():

  configs = {
    'datasetID': dset['Dataset ID'],
    'destinationNames': 'station latitude longitude',
    'datasetURL': dset['tabledap']
  }

  main(configs)

How many snippets did we make?

In [None]:
%ls datasets/ | wc

    171     171    7600


Show me a few of those filenames.

In [None]:
%ls datasets/ | head -5

datasets_bodega-head-intertidal-shore-sta_EDDTableCopy.xml
datasets_bodega-marine-laboratory-bml-_EDDTableCopy.xml
datasets_bodega-marine-laboratory-weather_EDDTableCopy.xml
datasets_carquinez_EDDTableCopy.xml
datasets_cordell-banks-mooring_EDDTableCopy.xml


Take a look at one of those xml snippets

In [None]:
%cat datasets/datasets_cordell-banks-mooring_EDDTableCopy.xml

<dataset type="EDDTableCopy" datasetID="cordell-banks-mooring_EDDTableCopy" acti
ve="true">
  <reloadEveryNMinutes>10080</reloadEveryNMinutes>
  <extractDestinationNames>station latitude longitude</extractDestinationNames>
  <checkSourceData>true</checkSourceData>
  <dataset type="EDDTableFromErddap" datasetID="cordell-banks-mooring" active="t
rue">
    <sourceUrl>http://erddap.cencoos.org/erddap/tabledap/cordell-banks-mooring</
sourceUrl>
  </dataset>
</dataset>


Now we have 171 dataset xml snippets we can add to our source datasets.xml file to be loaded into ERDDAP. Once you add these snippets into your source datasets.xml file, flag them for reloading, and ERDDAP will go out and grab the appropriate data, download it to `/erddap/data/copy/` and then load the data from there.

Now you have a local copy of all 171 datasets of interest. If the remote ERDDAP (CeNCOOS in this case) goes down, you will still be able to provide access to the data via your ERDDAP since you have a local copy.