# Spool Basics

December 2, 2024

This notebook introduces the basics of DASCore's [`Spool`](https://dascore.org/api/dascore/core/spool/BaseSpool.html). It is a shortened version of the [DASCore's Spool tutorial](https://dascore.org/tutorial/spool.html). 

<a target="_blank" href="https://colab.research.google.com/github/DASDAE/seg_tutorial/blob/master/03_spool.ipynb">

</a>  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>

#### Useful links: 
* [Colab Link](https://colab.research.google.com/github/DASDAE/seg_tutorial/blob/master/03_spool.ipynb)
* [DASCore Documentation](https://dascore.org)


In [None]:
%%capture

# First ensure DASCore is installed. If not, install and restart the kernel.
try:
    import dascore as dc
except ImportError:
    !pip install dascore
    !pip install ipympl
    # resetart kernel
    import IPython
    IPython.Application.instance().kernel.do_shutdown(True) #automatically restarts kernel

from rich import print


# Spool
As stated above, the `Spool` class manages a group of patches. `Spool` instances can be initialized in several different ways including: 
- from in-memory patches
- from a single file
- from a directory of DAS files

In [None]:
# This block creates patch files.
in_memory_spool = dc.get_example_spool("diverse_das")

# save patches to disk
das_folder_path = dc.examples.spool_to_directory(in_memory_spool)
das_file_path = next(das_folder_path.glob("*.hdf5"))

In [None]:
# From a patch or list of patches
spool = dc.spool([patch])

In [None]:
# From a single file
spool = dc.spool(das_file_path)

In [None]:
# From a directory of files
# Update will create an index of the contents for fast querying/access
spool = dc.spool(das_folder_path).update()

In [None]:
print(spool)

In [None]:
# Display the contents of a spool as a dataframe
contents_df = spool.get_contents()
contents_df.head()

### **Exercise** (Spool 1)

Using the diverse das spool, determine how many unique stations are represented. Print the duration of each patch in the spool.

In [None]:
diverse_spool = dc.get_example_spool("diverse_das")

### Accessing Patches

Patches are retrieved using iteration or indexing

In [None]:
first_patch = spool[0]
last_patch = spool[-1]

In [None]:
for patch in spool:
    ...    

In [None]:
# spools can also be sliced (sub-indexed)
sub = spool[1:-1]

### **Exercise** (Spool 2)

Sort the diverse spool based on time (using [`Spool.sort`'](https://dascore.org/api/dascore/core/spool/DataFrameSpool/sort.html)), create a sub-spool with the last 4 patches. Print the attrs of each patch in this spool. 

### Selecting

`Spool` contents can be select (filtered) with `Spool.select`

In [None]:
# Return a spool with patches that end before 1990
sub_spool = spool.select(time=(..., '1990-01-01'))
print(sub_spool)

In [None]:
# Return a spool with patches whose station attribute is "wayout"
sub_spool = spool.select(station="wayout")
print(sub_spool)

In [None]:
# Return a spool with patches whose tags meets a unix-style match string
sub_spool = spool.select(tag="*dom")
print(sub_spool)

### **Exercise** (Spool 3)

Create a sub-spool by selecting all patches with a station code that ends with an 's'. 

### Chunking
`Spool.chunk` is used to merge contiguous/overlapping patches or create patches of new sizes.

In [None]:
# Chunk spool for 3 second increments with 1 second overlaps
# and keep any segements at the end that don't have the full 3 seconds.
subspool = spool.chunk(time=3, overlap=1, keep_partial=True)

# Merge all contiguous segments along time dimension.
merged_spool = spool.chunk(time=None)

Sometimes the `tolerance` parameter is needed if there are slight gaps in the data. 

### **Exercise** (Spool 4)

Chunk the diverse spool to combine all compatible patches along the time dimension. Determine how many patches remain in the spool. Vary the tolerance parameter over reasonable values. Does this change the result? 