# 3. **Data Requests**
In this tutorial, we'll explain how to download data.

In [None]:
using Dates, SeisIO
include("safe_rm.jl")

## A. **Requesting data**
`get_data` is the wrapper to online time-series data requests. <br>
You can use it with FDSN dataselect and IRIS timeseries functions.

In [None]:
?get_data

\
**Let's try an example**.\
First, we'll get the current local time.

In [None]:
ds = Dates.now(); ds -= (Day(1) + Millisecond(ds) + Second(ds))
s = string(ds)

Now, let's use that to request some data. From the help text,\
the keywords `s=` and `t=` accept Strings, DateTime objects,\
and numbers. So let's start at `s`, as defined above, and end\
at `t=600`, or 10 minutes later.

In [None]:
cha_str = "UW.MBW..EHZ, UW.SHW..EHZ, UW.HSR..EHZ, UW.TDH..EHZ, CC.PALM..EH?" 
S = get_data("FDSN", cha_str, src="IRIS", s=s, t=600) 

#### **What each positional argument does**
* `"FDSN"` tells get_data to use the FDSN dataselect service for our request
* The long string gives the data channels requested as a comma-separated list

#### **What each keyword does**
* `s=s` sets the start time to `s`, the string created in the cell above.
* `t=600` sets the termination (end) time to 600 seconds after `s`.
* `src="IRIS"` tells get_data to check the IRIS FDSN dataselect server. 

Note: the last keyword isn't equivalent to setting the first positional\
argument to "IRIS", because IRIS runs FDSN dataselect and its own timeseries\
request service (plus many others).\
\
...which channels were there today?

In [None]:
S.t[1]

\
Any sign of TDH? (It's a pleasant hike in summer, but winter outages happen...)

In [None]:
findid("UW.TDH..EHZ", S)

\
Where can we look for data? What servers are available?

In [None]:
?seis_www

\
I bet that CalTech is happy to handle a random download request!

In [None]:
S2 = get_data("FDSN", "CI.SDD..BHZ", src="SCEDC", s=s, t=600, fmt="mseed", msr=true, w=true, demean=true, rr=true)

#### **What the new keywords do:**
* `src="SCEDC"` tells `get_data` to use the SCEDC FDSN servers.
* `fmt="mseed"` specifies the data format for the download. (Note: mseed is actually the default, but including this keyword is useful for tutorial purposes.)
* `w=true` write the download **directly** to disk, byte for byte, before any parsing happens. The file extension is always ".`fmt`". The entire request is saved even if a parsing error happens -- which is rare, but possible with SEED. (Some Blockettes and data decoders are so rare that we've literally never seen them)
* `demean=true` removes the mean of each channel after downloading.
* `rr=true` removes the instrument response, flattening to DC.
* `msr=true` uses the multi-stage instrument response. Most users don't need that much detail, so `msr` defaults to `false`.


In [None]:
# Example of single-stage response
S.resp[1]

In [None]:
# Example of multi-stage response
S2.resp[1]

### **Check logs early and often**
All file I/O and data processing operations are logged to the `:notes` \
fields of SeisIO data structures. For example:

In [None]:
S2.notes[1]

Let's make sense of these logs using SeisIO built-ins:

In [None]:
?show_src

In [None]:
show_src(S2)

In [None]:
?show_writes

In [None]:
show_writes(S2)

## B. **Saving requests**
Remember, from above: **data requests can be written directly to disk with\
keyword `w=true`**. This writes raw output to file, even if data parsing\
somehow fails.\
\
In addition, SeisData and SeisChannel structures can be written to ASDF, SAC,\
or to SeisIO's native format, as we saw in the last tutorial.

In [None]:
wseis("req_1.seis", S)

### 1. **Data request syntax is always the same**
\
NN.SSSSS.LL.CC (net.sta.loc.cha, separated by periods) is the expected syntax \
for all web functions. The maximum field width in characters corresponds to the \
length of each field (e.g. 2 for network). Fields can’t contain whitespace. \
\
Data requests in SeisIO all use this syntax, even though IRIS timeseries, \
FDSN dataselect, and SeedLink format strings differently. Request strings are \
converted to the appropriate syntax for the request protocol.

In [None]:
# these are identical requests
channels = "UW.KMO.., IU.COR.00.BHZ, CC.LON..BH?"                          # single String
channels = ["UW.KMO..", "IU.COR.00.BHZ", "CC.LON..BH?"]                    # Vector{String}
channels = ["UW" "KMO" "" ""; "IU" "COR" "00" "BHZ"; "CC" "LON" "" "BH?"]  # Matrix{String}

In [None]:
?chanspec # where to find help on channel specification

See also: https://seisio.readthedocs.io/en/latest/src/Appendices/web_syntax.html

### 2. **Multiple data requests to one structure**
\
Web requests to a structure **S** always create a new channel in **S** for \
each channel in each request, even if a channel with the same ID exists. \
This is necessary to prevent TCP congestion. \
\
This is different from multiple file reads to one structure; file reads \
always attempt to append channels with matching IDs. \
\
You can "flatten" structures with redundant channels by calling `merge!`. \
To see how this works, let's append a new data request to our first one: \

In [None]:
get_data!(S, "FDSN", cha_str, src="IRIS", s=ds+Minute(10), t=600) 
S.id

With two sets of requests to the same channels, each \
channel ID should appear twice. Let's clean this up.

In [None]:
merge!(S)

Check the results:

In [None]:
S.t[1]

## C. **Other acquisition methods**
See the docstrings for these functions for details:

* `FDSNsta`: request only station information
* `seedlink`: stream to a SeisData structure in the background

See also: SeisIO.Quake submodule

### Streaming Data with Seedlink (Optional Section)
SeedLink streams data to objects in the background. ]
Here's a quick example:

In [None]:
channels = ["UW.KMO..", "IU.COR.00.BH?", "CC.LON..BH?", "CC.VALT..???", "UW.ELK..EHZ"]
S3 = SeisData()
seedlink!(S3, "DATA", channels)

A `SeisData` object like `S3` has a field `:c` that tracks connections.\
When finished with a SeedLink session, close the corresponding connection\
in `S3.c`:

In [None]:
sleep(30)          # sleep 30 seconds; SeedLink doesn't engage immediately
close(S3.c[1])     # close the SeedLink connection

This ends the SeedLink session and processes all buffered data to the \
parent (`SeisData`) structure. You can see the details of the streaming \
process by engaging SeedLink with higher verbosity, but beware: `v=3`is \
for developer debugging and spams stdout, defeating the purpose of running \
SeedLink in the background.

In [None]:
S3

*Caution*: SeedLink requests appear to start with the first packet *after* \
the requested start time. If data need to start precisely at some time \
`t0`, start around a minute *earlier*, then sync to `t0` with the `sync!`\
command (to be covered in the Processing tutorial).

## D. **Cleanup**
Let's remove these extraneous downloads. The creator of SeisIO used to \
receive regular automated warnings from his grad school SysAdmin \
for being the \#1 "disk hog" and still feels bad about it. Sorry, Ed!

In [None]:
files = ls("*.SAC")
for f in files
    safe_rm(f)
end
safe_rm("req_1.pz")

In [None]:
files = ls("*.mseed")
for f in files
    safe_rm(f)
end

## **Saving Your Work**
All data from this tutorial can be written to file using commands from the File IO tutorial. To review:

`wseis("fname.seis", S)` writes `S` to low-level SeisIO native format file `fname.seis`.

`writesac(S)` writes `S` to SAC files with auto-generated names.

`write_hdf5("fname.h5", S)` writes `S` to ASDF (HDF5) file `fname.h5`.

## E. **Further help**
Please consult the official SeisIO documentation:

### **Web requests with `get_data`**
https://seisio.readthedocs.io/en/latest/src/Web/webclients.html

### **Streaming with `seedlink`**
https://seisio.readthedocs.io/en/latest/src/Web/seedlink.html

## F. **Additional examples**
The examples below are also found in https://seisio.readthedocs.io/en/latest/src/Appendices/examples.html

### **FDSN get_data**
Request the last 600 seconds of data from the IRIS FDSNWS server \
for channels CC.PALM, UW.HOOD, CC.TIMB, CC.HIYU, UW.TDH

In [None]:
S_fdsn = get_data("FDSN", "CC.PALM, UW.HOOD, CC.TIMB, CC.HIYU, UW.TDH", src="IRIS", s=-600, t=0)

In [None]:
S_fdsn.id

### **IRIS get_data**
Request an hour of recent data from the IRISWS timeseries server \
for channels CC.TIMB..EHE, CC.TIMB..EHN, CC.TIMB..EHZ, \
UW.HOOD..HHE, UW.HOOD..HHN, UW.HOOD..HHZ

In [None]:
S_iris = get_data("IRIS", ["CC.TIMB..BHE", "CC.TIMB..BHN", "CC.TIMB..BHZ", "UW.HOOD..HHE", "UW.HOOD..HHN", "UW.HOOD..HHZ"], s=-3600, t=-1800)

### **FDSNevt**
Request waveform data for the Tohoku-Oki great earthquake, \
recorded by some borehole strain meters and seismometers \
in WA (USA), from IRIS (USA). \
\
**This function is part of the Quake submodule.**

In [None]:
using SeisIO.Quake
S_evt = FDSNevt("201103110547", "PB.B004..EH?,PB.B004..BS?,PB.B001..BS?,PB.B001..EH?")

### **SeedLink**
A short SeisComp3 SeedLink session using the IRIS server

The `seedlink!` command below only executes if the test data \
are installed.

In [None]:
sl_conf = realpath(dirname(pathof(SeisIO))*"/../test/SampleFiles/SL_long_test.conf")
S_sl = seedlink("TIME", "UW.GRUT,UW.H1K,UW.MDW", s=-120, t=120)
if isfile(sl_conf)
    seedlink!(S_sl, "DATA", sl_conf)
end
sleep(30)
for conn in S_sl.c; close(conn); end

In [None]:
S_sl