Add more flexible sampler types through `Range` #2758

lostella · 2023-03-24T22:36:22Z

Description of changes: This proposes a new type of samplers, that select (not necessarily) random instances to construct training or validation batches. The main addition to the existing samplers, is that we can configure sampling ranges with timestamps (pd.Period) in addition to integer indices: this is obtained through the Range class.

This should allow composing batches as required in this discussion. In addition, it improves the code compared to existing samplers, I believe.

In summary, Range represents a "partially specified" range object, that can only be constructed once we know what sequence we intend to range over (where in time it starts, and how log it is). Once the sequence is known, a regular Python range object is constructed, and samplers take (not necessarily) random elements from it according to their own strategy.

Examples:

# consider observations between the 30th-last and the 10th-last, going 4 by 4
rge = Range(-30, -10, step=4)

# let's just get all of them
sample = SampleAll(rge)

sample(start=pd.Period("2023-01-01", freq="D"), length=90)
# [60, 64, 68, 72, 76]

# consider observations between these dates
rge = Range(pd.Period("2023-03-10"), pd.Period("2023-03-20"))

# sample one on average
sample = SampleOnAverage(rge, 1)

sample(start=pd.Period("2023-01-01", freq="D"), length=90)
# [68, 74]
# [76]
# [71]
# []
# [70, 75, 76]
# [76]

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

jaheba · 2023-03-27T11:47:03Z

I like the idea of splitting (heh) the tasks of defining windows where splits should happen and selecting the concrete instances. Still, I think splitting a dataset into train/val/test might be something different.

Say, I want to incrementally train an existing model using new data. I can define a train/test split first, and then select training windows to be only selected from a recent range.

add Range and new Sampler types

5bb5515

lostella force-pushed the sampler-range branch from f5c8a6b to 5bb5515 Compare April 3, 2023 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more flexible sampler types through `Range` #2758

Add more flexible sampler types through `Range` #2758

lostella commented Mar 24, 2023 •

edited

Loading

jaheba commented Mar 27, 2023

Add more flexible sampler types through Range #2758

Are you sure you want to change the base?

Add more flexible sampler types through Range #2758

Conversation

lostella commented Mar 24, 2023 • edited Loading

jaheba commented Mar 27, 2023

Add more flexible sampler types through `Range` #2758

Add more flexible sampler types through `Range` #2758

lostella commented Mar 24, 2023 •

edited

Loading