# Basic Usage

This notebook demonstrates how to extract rhythmic segments from interval data and how to attach aggregate metadata using the utilities provided by the `rhythmic_segments` package.

In [1]:
import numpy as np
import pandas as pd

from rhythmic_segments import RhythmicSegments

## Rhythmic Segment Analysis

A *rhythmic segment analysis (RSA)* analyzes every fixed-length *segment* of a sequence of time intervals: the short groups you obtain by sliding a window across the data. Each segment has a *duration* and a *pattern*. The pattern captures the relative durations of a segment's intervals, either as a normalized vector or as a ratio. For example, the segment $(2, 4, 4)$ has the pattern $(0.2, 0.4, 0.4)$ or $1 : 2 : 2$; both descriptions are interchangeable. Thinking of patterns as normalized vectors shows that all patterns of a given length live on a *simplex*: a line when $n = 2$, a triangle when $n = 3$, and so on. The goal is to study rhythmic material by analysing how its segments are distributed of that simplex.

Computing patterns is as simple as normalising the segment:

In [2]:
segment = np.array([2, 4, 4])
pattern = segment / segment.sum()
pattern

array([0.2, 0.4, 0.4])

And so you can absolutely do a rhythmic segment analysis without using this package. This package however provides some utilities that make things easier. In particular, the `RhythmicSegments` class allows you to conveniently store large numbers of segments and handle associated metadata, and makes them accessible via `.segments`, `.patterns`, `.durations`, `.meta`:

In [3]:
intervals = [1, 2, 3, 4, 5, 6, 7, 8, 9]
rs = RhythmicSegments.from_intervals(intervals, length=3)
rs.segments

array([[1., 2., 3.],
       [2., 3., 4.],
       [3., 4., 5.],
       [4., 5., 6.],
       [5., 6., 7.],
       [6., 7., 8.],
       [7., 8., 9.]], dtype=float32)

In [4]:
rs.patterns

array([[0.16666667, 0.33333334, 0.5       ],
       [0.22222222, 0.33333334, 0.44444445],
       [0.25      , 0.33333334, 0.41666666],
       [0.26666668, 0.33333334, 0.4       ],
       [0.2777778 , 0.33333334, 0.3888889 ],
       [0.2857143 , 0.33333334, 0.3809524 ],
       [0.29166666, 0.33333334, 0.375     ]], dtype=float32)

Often, interval data is composed of multiple blocks (e.g. bouts, songs, etc.) and segments should not cross block boundaries. `RhythmicSegments` will treat `np.nan` entries as block boundaries, unless `split_at_nan=False`.

In [5]:
intervals = [1, 2, 3, 4, np.nan, 5, 6, np.nan, 7, 8, 9, np.nan]
rs = RhythmicSegments.from_intervals(intervals, length=2)
rs.segments 

array([[1., 2.],
       [2., 3.],
       [3., 4.],
       [5., 6.],
       [7., 8.],
       [8., 9.]], dtype=float32)

You can also construct `RhythmicSegments` directly from segments:

In [6]:
segments = [[2, 8], [.3, .6], [1, 1]]
rs = RhythmicSegments.from_segments(segments)
rs.patterns

array([[0.2       , 0.8       ],
       [0.33333334, 0.6666667 ],
       [0.5       , 0.5       ]], dtype=float32)

## Metadata

`RhythmicSegments` really shines when you need to handle metadata. Suppose individual intervals carry annotations (for example, performer labels) and you want to derive segment-level metadata from them. You can supply an *aggregator* that receives the per-interval metadata for each segment and returns a dictionary describing the segment. Every key in that dictionary becomes a column in the resulting segment metadata.

Below, the aggregator concatenates the labels of the intervals into a `label` field and records the first label separately in `first`:

In [15]:
def my_aggregator(interval_meta):
    return {
        "label": "-".join(interval_meta['label']), 
        "first": interval_meta.iloc[0]['label']
    }

interval_meta = pd.DataFrame(["a", "b", "c"], columns=["label"])
my_aggregator(interval_meta)

{'label': 'a-b-c', 'first': 'a'}

If you now pass both the metadata and the aggregator to `from_intervals`, you get segment-level metadata:

In [8]:
intervals = [1, 2, 3, 4, np.nan, 5, 6, 7]
meta = {"label": ['a', 'b', 'c', 'd', np.nan, 'e', 'f', 'g']}
rs = RhythmicSegments.from_intervals(intervals, length=3, meta=meta, meta_aggregator=my_aggregator)
rs.meta

Unnamed: 0,label,first
0,a-b-c,a
1,b-c-d,b
2,e-f-g,e


Note that it respects block boundaries: segments do not cross the `np.nan` values.

## Operations

`RhythmicSegments` supports several convenience operations, such as selecting subsets, filtering, querying metadata, and shuffling. Each of these returns a new instance with the metadata kept in sync.

In [9]:
intervals = [1, 2, 3, 4, 5, 6, 7, 8, 9]
meta = { "label": list('abcdefghi') }
rs = RhythmicSegments.from_intervals(intervals, length=2, meta=meta, meta_aggregator=my_aggregator)
rs.segments

array([[1., 2.],
       [2., 3.],
       [3., 4.],
       [4., 5.],
       [5., 6.],
       [6., 7.],
       [7., 8.],
       [8., 9.]], dtype=float32)

In [10]:
# You can do very basic indexing: select the first three segments:
head = rs.take([0, 1, 2])
head.segments

array([[1., 2.],
       [2., 3.],
       [3., 4.]], dtype=float32)

Metadata is preserved as well:

In [11]:
head.meta

Unnamed: 0,label,first
0,a-b,a
1,b-c,b
2,c-d,c


You can also filter segments based on some condition. For example, selecting all segments with a certain duration:

In [12]:
rs.filter(rs.durations > 10).segments

array([[5., 6.],
       [6., 7.],
       [7., 8.],
       [8., 9.]], dtype=float32)

Or shuffle segments:

In [13]:
rs.shuffle(random_state=1).segments

array([[6., 7.],
       [1., 2.],
       [2., 3.],
       [5., 6.],
       [3., 4.],
       [7., 8.],
       [4., 5.],
       [8., 9.]], dtype=float32)

Finally, you can query metadata using pandas query syntax. For example, selecting all segments whose label contains the letter 'b':

In [14]:
rs.query("label.str.contains('b')").meta

Unnamed: 0,label,first
0,a-b,a
1,b-c,b
