Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Time Series Block #239

Merged
merged 22 commits into from
Jul 29, 2022
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/FastAI.jl
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,9 @@ include("Tabular/Tabular.jl")
include("Textual/Textual.jl")
@reexport using .Textual

include("TimeSeries/TimeSeries.jl")
@reexport using .TimeSeries

include("deprecations.jl")
export
methodmodel,
Expand Down Expand Up @@ -168,6 +171,7 @@ export
Continuous,
Image,
Paragraph,
TimeSeriesRow,

# encodings
encode,
Expand Down
52 changes: 52 additions & 0 deletions src/TimeSeries/TimeSeries.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
module TimeSeries


using ..FastAI
using ..FastAI:
# blocks
Block, WrapperBlock, AbstractBlock, OneHotTensor, OneHotTensorMulti, Label,
LabelMulti, wrapped, Continuous, getencodings, getblocks, encodetarget, encodeinput,
# encodings
Encoding, StatefulEncoding, OneHot,
# visualization
ShowText,
# other
Context, Training, Validation
import ..FastAI: Datasets
using ..FastAI.Datasets
# for tests
using ..FastAI: testencoding

# extending
import ..FastAI:
blockmodel, blockbackbone, blocklossfn, encode, decode, checkblock,
encodedblock, decodedblock, showblock!, mockblock, setup

import MLUtils: MLUtils, eachobs, getobs, numobs
import Requires: @require

using FilePathsBase
using InlineTest

# Blocks
include("blocks/timeseriesrow.jl")

include("encodings/timeseriespreprocessing.jl");

const _tasks = Dict{String, Any}()
include("tasks/classification.jl")

include("recipes.jl")

function __init__()
_registerrecipes()
foreach(values(_tasks)) do t
if !haskey(FastAI.learningtasks(), t.id)
push!(FastAI.learningtasks(), t)
end
end
end

export
TimeSeriesRow, TSClassificationSingle, TimeSeriesPreprocessing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export
TimeSeriesRow, TSClassificationSingle, TimeSeriesPreprocessing
export
TimeSeriesRow, TSClassificationSingle, TSPreprocessing

end
42 changes: 42 additions & 0 deletions src/TimeSeries/blocks/timeseriesrow.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
"""
TimeSeriesRow{M,N}() <: Block

[`Block`](#) for a M variate time series with N number of time steps. `obs` is valid for `TimeSeriesRow{M,N}()`
if it is an (M,N) dimensional Matrix with number element type.

## Examples

Creating a block:

```julia
TimeSeriesRow{1,51}() # Univariate time series with length 51.
TimeSeriesRow{2,51}() # Multivariate time series with 2 variables and length 51.
```

You can create a random observation using [`mockblock`](#):

{cell=main}
```julia
using FastAI
FastAI.mockblock(TimeSeriesRow{1,10}())
```

"""

struct TimeSeriesRow <: Block
nfeatures::Int
obslength::Union{Int, Colon}
end

function checkblock(row::TimeSeriesRow, obs::AbstractArray{T,2}) where {T<:Number}
size(obs) == (row.nfeatures, row.obslength)
end
codeboy5 marked this conversation as resolved.
Show resolved Hide resolved

function mockblock(row::TimeSeriesRow)
rand(Float64, (row.nfeatures, row.obslength))
end

function setup(::Type{TimeSeriesRow}, data)
nfeatures, obslength = size(getindex(data, 1))
return TimeSeriesRow(nfeatures, obslength)
end
51 changes: 51 additions & 0 deletions src/TimeSeries/encodings/timeseriespreprocessing.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
"""
TSPreprocessing() <: Encoding

Encodes 'TimeSeriesRow's by normalizing the time-series values. The time-series can
either be normalized by each variable or time-step.

Encodes
- 'TimeSeriesRow' -> 'TimeSeriesRow'
"""

struct TSPreprocessing <: Encoding
tfms
end

function TSPreprocessing()
base_tfms = [
]
return TSPreprocessing(base_tfms)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of transforms will be in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only Standardize, that's the only used in the tutorials.
If time permits we can also add normalisation using min-max, clipping outliers based on IQR, handle missing values in the time series.


function encodedblock(p::TSPreprocessing, block::TimeSeriesRow)
return block
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the format of the time series is changed by the encoding, this should return a different block

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the format won't be changed, as I discussed with Brian earlier that different models might require different formats and so the encoding shouldn't depend on the model.


function encode(p::TSPreprocessing, context, block::TimeSeriesRow, obs)
for tfm in values(p.tfms)
obs = tfm(obs)
end
obs
end

function tsdatasetstats(
data;
by_var=false,
by_step=false
)
drop_axes = []
if (by_var)
append!(drop_axes,2)
else
append!(drop_axes,3)
end
axes = [ax for ax in [1, 2, 3] if !(ax in drop_axes)]
codeboy5 marked this conversation as resolved.
Show resolved Hide resolved
mean = Statistics.mean(data, dims=axes)
std = Statistics.std(data, dims=axes)
return mean, std
end

function setup(::Type{TSPreprocessing}, ::TimeSeriesRow, data)
means, stds = tsdatasetstats(data)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function setup(::Type{TSPreprocessing}, ::TimeSeriesRow, data)
means, stds = tsdatasetstats(data)
end
setup(::Type{TSPreprocessing}, ::TimeSeriesRow, data) = means, stds = tsdatasetstats(data)

Empty file added src/TimeSeries/makie.jl
Empty file.
63 changes: 63 additions & 0 deletions src/TimeSeries/recipes.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
"""
TimeSeriesDatasetRecipe(file; loadfn = loadfile)

Recipe for loading a time series dataset stored in a .ts file

"""
Base.@kwdef struct TimeSeriesDatasetRecipe <: Datasets.DatasetRecipe
train_file
test_file = nothing
loadfn = Datasets.loadfile
end

Datasets.recipeblocks(::Type{TimeSeriesDatasetRecipe}) = Tuple{TimeSeriesRow, Label}

#TODO: Add Check if test_file is nothing.
function Datasets.loadrecipe(recipe::TimeSeriesDatasetRecipe, path)
path = convert(String, path)
datasetpath_train = joinpath(path, recipe.train_file)
rows_train, labels_train = recipe.loadfn(datasetpath_train)
datasetpath_test = joinpath(path, recipe.test_file)
rows_test, labels_test = recipe.loadfn(datasetpath_test)
rows = [rows_train; rows_test]
labels = [labels_train; labels_test]
rows = TimeSeriesDataset(rows)
data = rows, labels
blocks = (
setup(TimeSeriesRow,rows),
Label(unique(eachobs(labels))),
)
return data, blocks
end

# Registering recipes

const RECIPES = Dict{String,Vector{Datasets.DatasetRecipe}}(
"ecg5000" => [
TimeSeriesDatasetRecipe(train_file="ECG5000_TRAIN.ts", test_file="ECG5000_TEST.ts")
],
)

function _registerrecipes()
for (name, recipes) in RECIPES, recipe in recipes
if !haskey(datarecipes(), name)
push!(datarecipes(), (
id = name,
datasetid = name,
blocks = Datasets.recipeblocks(recipe),
package = @__MODULE__,
recipe = recipe,
))
end
end
end

# ## Tests

@testset "TimeSeriesDataset [recipe]" begin
path = datasetpath("ecg5000")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how big this dataset is, but if it's really big, we may not want to run this on the CI, since it'll need to download it every time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataset is around 10mb, should we run this on CI ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be fine 👍

recipe = TimeSeriesDatasetRecipe(train_file="ECG5000_TRAIN.ts", test_file="ECG5000_TEST.ts")
data, block = loadrecipe(recipe, path)
sample = getobs(data, 1)
@test checkblock(block, sample)
end
29 changes: 29 additions & 0 deletions src/TimeSeries/tasks/classification.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
"""
TSClassificationSingle(blocks[, data])
Learning task for single-label time-series classification. Samples are normalized and
classified into of the 'classes'.
"""
function TSClassificationSingle(
blocks::Tuple{<:TimeSeriesRow, <:Label},
data
)
return SupervisedTask(
blocks,
(
OneHot()
)
)
end

_tasks["tsclfsingle"] = (
id = "timeseries/single",
name = "Time-Series Classification (single-label)",
constructor = TSClassificationSingle,
blocks = (TimeSeriesRow, Label),
category = "supervised",
description = """
Time-Series classification task where every time-series has a single
class label associated with it.
""",
package = @__MODULE__,
)
2 changes: 2 additions & 0 deletions src/datasets/Datasets.jl
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,14 @@ end
export
# primitive containers
TableDataset,
TimeSeriesDataset,

mapobs, eachobs, groupobs, shuffleobs, ObsView,

# utilities
isimagefile,
istextfile,
istimeseriesfile,
matches,
loadfile,
loadmask,
Expand Down