# 2. **Data Acquisition**
In this tutorial, we'll explain how to read and download data.

In [None]:
using SeisIO

## **Loading Files**
`read_data` reads one or more entire files into memory:

In [None]:
?read_data

\
If you know the file format you're trying to read, pass\
it as the first argument to `read_data` in lowercase:

In [None]:
S = read_data("mseed", "DATA/2018.224.00.00.00.000.TA.C26K..BHZ.R.mseed")

In [None]:
S = read_data("sac", "DATA/2018.224.00.00.00.000.TA.C26K..BHZ.R.SAC")

\
If you don't know a file's format, `read_data` calls a (somewhat slower)\
function called `guess`that can usually identify it:

In [None]:
S2 = read_data("DATA/2018.224.00.00.00.000.TA.C26K..BHZ.R.SAC")

In [None]:
S == S2

`read_data` accepts file string wildcards.

In [None]:
path = dirname(pathof(SeisIO))*"/../test/SampleFiles/SUDS/"
S = read_data("sac", path * "*.sac")

### File and Format Information (Optional Section)
Information on files and formats can be found in a number of places,\
including the command-line interface.

In [None]:
guess("DATA/2018.224.00.00.00.000.TA.C26K..BHZ.R.SAC")

In [None]:
path = dirname(pathof(SeisIO))*"/../test/SampleFiles/SUDS/"
fname = path * "10081701.WVP"
g = guess(fname)

In [None]:
SeisIO.formats[g[1]] # what are we looking at...?

In [None]:
S = read_data(g[1], fname, swap = g[2])

In [None]:
# since volcano colleagues keep asking
using SeisIO.SUDS
suds_support()

In [None]:
# while I'm at it
using SeisIO.SEED

In [None]:
?seed_support

In [None]:
?mseed_support

...and knowing is half the battle.

## **Requesting Data Online**
`get_data` is the wrapper to online time-series data requests.\
You can use it with FDSN dataselect and IRIS timeseries functions.

In [None]:
?get_data

\
**Let's try an example**.\
First, we'll get the current local time.

In [None]:
using Dates
ds = Dates.now(); ds -= (Day(1) + Millisecond(ds) + Second(ds))
s = string(ds)

Now, let's use that to request some data. From the help text,\
the keywords `s=` and `t=` accept Strings, DateTime objects,\
and numbers. So let's start at `s`, as defined above, and end\
at `t=600`, or 10 minutes later.

In [None]:
S = get_data("FDSN", "UW.MBW..EHZ, UW.SHW..EHZ, UW.HSR..EHZ, UW.TDH..EHZ, CC.PALM..EH?", src="IRIS", s=s, t=600) 

#### **What each positional argument does**
* "FDSN" tells get_data to use the FDSN dataselect service for our request
* The long string of channels is our request.

#### **What each keyword does**
* `src="IRIS"` tells get_data to check the IRIS FDSN dataselect server. Note that this is not the same as setting the first positional argument to "IRIS" (rather than "FDSN): IRIS runs both FDSN dataselect and its own service.
* `s=s` sets the start time to `s`, the string created in the cell above.
* `t=600` sets the termination (end) time to 600 seconds after `s`.

\
...which channels were there today?

In [None]:
S.id

\
Any sign of TDH? (It's a pleasant hike in summer, but winter outages happen...)

In [None]:
findid("UW.TDH..EHZ", S)

\
Where can we look for data? What servers are available?

In [None]:
?seis_www

\
I bet that CalTech is happy to handle a random download request.

In [None]:
S2 = get_data("FDSN", "CI.SDD..BHZ", src="SCEDC", s=s, t=600, fmt="mseed", msr=true, w=true, demean=true, rr=true)

#### **What the new keywords do:**
* `src="SCEDC"` tells `get_data` to use the SCEDC FDSN servers.
* `fmt="mseed"` specifies the data format for the download. (Note: mseed is actually the default, but including this keyword is useful for tutorial purposes.)
* `w=true` write the download **directly** to disk, byte for byte, before any parsing happens. The file extension is always ".`fmt`". The entire request is saved even if a parsing error happens -- which is rare, but possible with SEED. (Some Blockettes and data decoders are so rare that we've literally never seen them)
* `demean=true` removes the mean of each channel after downloading.
* `rr=true` removes the instrument response, flattening to DC.
* `msr=true` uses the multi-stage instrument response. Most users don't need that much detail, so `msr` defaults to `false`.


In [None]:
S.resp[1]

In [None]:
S2.resp[1]

## **Saving Data**
Remember, from above: **data requests can be written directly to disk with keyword w=true**.\
In addition, SeisData and SeisChannel structures can be written to SAC or to SeisIO's native format.\
SAC has the advantage that it's almost universally readable; SeisIO format saves more information.\
\
To write to SAC:

In [None]:
writesac(S)                         # filenames are auto-generated. no need to specify.
writesacpz(S, "req_1.pz")           # in case you need instrument responses later.

To write to SeisIO format:

In [None]:
wseis("req_1.seis", S)

SeisIO format can hold multiple structures in one file.\
So, to read from a SeisIO file, you'll need to specify\
one or more object numbers:

In [None]:
S2 = rseis("req_1.seis")[1]

In [None]:
S == S2

## **Data Request Syntax is Always the Same**
\
NN.SSSSS.LL.CC (net.sta.loc.cha, separated by periods) is the expected syntax \
for all web functions. The maximum field width in characters corresponds to the \
length of each field (e.g. 2 for network). Fields can’t contain whitespace. \
\
Data requests in SeisIO all use this syntax, even though IRIS timeseries, \
FDSN dataselect, and SeedLink format strings differently. Request strings are \
converted to the appropriate syntax for the request protocol.

In [None]:
# these are identical requests
channels = "UW.KMO.., IU.COR.00.BHZ, CC.LON..BH?"                          # single String
channels = ["UW.KMO..", "IU.COR.00.BHZ", "CC.LON..BH?"]                    # Vector{String}
channels = ["UW" "KMO" "" ""; "IU" "COR" "00" "BHZ"; "CC" "LON" "" "BH?"]  # Matrix{String}

In [None]:
?chanspec

See also: https://seisio.readthedocs.io/en/latest/src/Appendices/web_syntax.html

## **Other Useful Data Acquisition Functions**
* FDSNsta: request only station information
* SeedLink: stream to a SeisData structure in the background
* See also: SeisIO.Quake submodule

### Streaming Data with Seedlink (Optional Section)
SeedLink streams data to objects in the background. ]
Here's a quick example:

In [None]:
channels = ["UW.KMO..", "IU.COR.00.BH?", "CC.LON..BH?", "CC.VALT..???", "UW.ELK..EHZ"]
S3 = SeisData()
SeedLink!(S3, channels)

A `SeisData` object like `S3` has a field `:c` that tracks connections.\
When finished with a SeedLink session, close the corresponding connection\
in `S3.c`:

In [None]:
sleep(30)          # sleep 30 seconds; SeedLink doesn't engage immediately
close(S3.c[1])     # close the SeedLink connection

This ends the SeedLink session and processes all buffered data to the \
parent (`SeisData`) structure. You can see the details of the streaming \
process by engaging SeedLink with higher verbosity, but beware: `v=3`is \
for developer debugging and spams stdout, defeating the purpose of running \
SeedLink in the background.

In [None]:
S3

*Caution*: SeedLink requests appear to start with the first packet *after* \
the requested start time. If data need to start precisely at some time \
`t0`, start around a minute *earlier*, then sync to `t0` with the `sync!`\
command (to be covered in 3-Processing).

## Cleanup (Optional Section)
Let's remove these extraneous downloads. The creator of SeisIO used to \
receive regular automated warnings from his grad school SysAdmin \
for being the \#1 "disk hog" and still feels bad about it. Sorry, Ed!

In [None]:
files = ls("*.SAC")

In [None]:
for f in files
    rm(f)
end
rm("req_1.pz")

In [None]:
files = ls("*.mseed")

In [None]:
for f in files
    rm(f)
end

## **For More Help**
Please consult the official SeisIO documentation:

### **Reading files with `read_data`**
https://seisio.readthedocs.io/en/latest/src/Formats/fileformats.html
 
### **Web requests with `get_data`**
https://seisio.readthedocs.io/en/latest/src/Web/webclients.html

### **Streaming with `SeedLink`**
https://seisio.readthedocs.io/en/latest/src/Web/seedlink.html

## **Additional Examples**
https://seisio.readthedocs.io/en/latest/src/Appendices/examples.html