Add Time Series Block #239

codeboy5 · 2022-06-18T06:36:43Z

Added Time Series Container and Block. It is capable of loading all datasets from timeseriesclassification. The .ts files are loaded using Julia translation of this method .

I have also added a basic test case for the recipe.
This allows us to do the following

using FastAI

data, blocks = load(datarecipes()["ecg5000"])
nobs(data)
sample = series, class = getobs(data, 10)

Just wanted to get some initial thoughts on the work, there might be more changes as I continue to work on the other parts.

codeboy5 · 2022-06-19T01:39:52Z

Hey for downloading the datasets, the message as well as the post fetch method needs to be different which I think might require some restructuring, how should I go about that ?

src/datasets/load.jl

src/datasets/containers.jl

ToucheSir · 2022-06-21T04:42:14Z

src/datasets/containers.jl

+                elseif startswith(ln, "@timestamps")
+                    # Check that the associated value is valid
+                    tokens = split(ln, " ")
+                    token_len = length(tokens)
+
+                    if tokens[2] == "true"
+                        timestamps = true
+                    else
+                        timestamps = false
+                    end
+
+                    has_timestamps_tag = true
+                    metadata_started = true
+


There should be a way to extract out some of this logic. For now, having tests would help verify it's working.

Yeah I think it could be extracted out for the current datasets we are using. I just copied it over from python library to get it working.
By tests do you mean automated tests and creating a sample .ts file to run the test on ?

Correct. You could also use a small existing one as long as the license is compatible.

lorenzoh · 2022-06-21T10:50:38Z

src/TimeSeries/blocks/timeseriesrow.jl

+
+"""
+
+struct TimeSeriesRow{M,N} <: Block end


I'm not sure of the block name here. I would prefer TimeSeries, but that's already the module name. Once #240 is through, the time-series functionality will be a subpackage FastTimeSeries, so then the name TimeSeries for the block would be available. Let's leave it for now, and maybe change it then.

lorenzoh · 2022-06-21T10:54:56Z

src/TimeSeries/blocks/timeseriesrow.jl

+
+"""
+
+struct TimeSeriesRow{M,N} <: Block end


Do we need both the number of features and the observation length as type parameters? We should only do this if we need to dispatch on the number of features or observation length.

Additionally, a Block is constant for a dataset, so including the observation length means we wouldn't be able to support datasets where different samples have varying observation lengths. Is that the case for any of the datasets we're using? Do we need this information somewhere on the block-level? If we don't need it, I would suggest dropping the observation length from the block or allowing passing a colon : to allow variable-length observations.

Also, if we don't need to dispatch on the number of features (do we?), it can be added as a field.

So we'd have something like

struct TimeSeriesRow <: Block nfeatures::Int obslength::Union{Int, Colon} end

For the current datasets we are planning to use, the length is same for all observations, so we can have a constant block.

From my understanding we build the model from the block ? So the parameters might depend on time series length and number of variables.
So ideally the block should just dispatch on parameters which are required in model building ?

lorenzoh · 2022-06-21T10:58:11Z

src/TimeSeries/recipes.jl

+# ## Tests
+
+@testset "TimeSeriesDataset [recipe]" begin
+    path = datasetpath("ecg5000")


Not sure how big this dataset is, but if it's really big, we may not want to run this on the CI, since it'll need to download it every time

The dataset is around 10mb, should we run this on CI ?

That should be fine 👍

lorenzoh · 2022-06-21T10:59:22Z

src/datasets/containers.jl

+    table::AbstractArray{Float64,3}
+end 
+
+function LearnBase.getobs(dataset::TimeSeriesDataset, idx)


This will need to be updated to Base.getindex and Base.length now that #229 is merged

Sure, will update it.

rebased with master

codeboy5 · 2022-06-24T03:36:58Z

I have implemented the changes we discussed in the last call.

Replace type parameters with fields
TSClassificationDatasetStruct to load the .ts files from http://timeseriesclassification.com .
Test for .ts2df function as @ToucheSir suggested. The size for file is 159KB, should I just include it in the commit or download It on the fly. As for the license, according to tsml_repo, it seems to have a GNU General Public License v3.0 .

ToucheSir · 2022-06-24T04:28:33Z

Great. Just to be safe, I think it would be best to download the file on the fly for now.

…t) (#244) Co-authored-by: CompatHelper Julia <compathelper_noreply@julialang.org>

lorenzoh · 2022-07-01T13:17:18Z

src/TimeSeries/encodings/timeseriespreprocessing.jl

+function setup(::Type{TSPreprocessing}, ::TimeSeriesRow, data)
+    means, stds = tsdatasetstats(data)
+end


Suggested change

function setup(::Type{TSPreprocessing}, ::TimeSeriesRow, data)

means, stds = tsdatasetstats(data)

end

setup(::Type{TSPreprocessing}, ::TimeSeriesRow, data) = means, stds = tsdatasetstats(data)

lorenzoh · 2022-07-01T13:18:17Z

src/TimeSeries/TimeSeries.jl

+export 
+    TimeSeriesRow, TSClassificationSingle, TimeSeriesPreprocessing


Suggested change

export

TimeSeriesRow, TSClassificationSingle, TimeSeriesPreprocessing

export

TimeSeriesRow, TSClassificationSingle, TSPreprocessing

lorenzoh · 2022-07-01T13:55:33Z

src/TimeSeries/encodings/timeseriespreprocessing.jl

+function TSPreprocessing()
+    base_tfms = [
+    ]
+    return TSPreprocessing(base_tfms)
+end


What kind of transforms will be in here?

Currently only Standardize, that's the only used in the tutorials.
If time permits we can also add normalisation using min-max, clipping outliers based on IQR, handle missing values in the time series.

lorenzoh · 2022-07-01T13:56:22Z

src/TimeSeries/encodings/timeseriespreprocessing.jl

+function encodedblock(p::TSPreprocessing, block::TimeSeriesRow)
+    return block
+end


If the format of the time series is changed by the encoding, this should return a different block

No the format won't be changed, as I discussed with Brian earlier that different models might require different formats and so the encoding shouldn't depend on the model.

codeboy5 · 2022-07-02T12:53:17Z

I have added the basic structure and code. There seems to be a couple of errors, I will solve by tonight.
Let me know how does the basic code look.

* Move domain-specific functionality to subpackages * Add FastMakie.jl * Update tests * Add subpackage CI * run SciMLStyle * Add sysimage example * Update documentation * Rerun notebooks

lorenzoh

Left some minor comments, but let me know when you're done with the other changes :) 👍

lorenzoh · 2022-07-03T07:19:38Z

src/TimeSeries/encodings/timeseriespreprocessing.jl

+struct TSStats{T}
+    means::AbstractArray{T,2}
+    stds::AbstractArray{T,2}
+end
+
+function TSStats(means, stds)
+    TSStats{eltype(means)}(means, stds)
+end


Not sure we need this struct, it may be simpler to add means and stds fields to the Encoding. Then that also makes it easier to construct TSPreprocessing manually (i.e. without setup).

lorenzoh · 2022-07-03T07:21:37Z

src/TimeSeries/encodings/timeseriespreprocessing.jl

-        obs = tfm(obs)
-    end
-    obs
+function TSStandardize(


Should this be an encode method?

src/TimeSeries/encodings/timeseriespreprocessing.jl

src/TimeSeries/blocks/timeseriesrow.jl

src/TimeSeries/encodings/timeseriespreprocessing.jl

ToucheSir · 2022-07-03T17:42:35Z

src/TimeSeries/encodings/timeseriespreprocessing.jl

+    end 
+    axes = [ax for ax in [1, 2, 3] if !(ax in drop_axes)]
+    mean = Statistics.mean(data.table, dims=axes)
+    std  = Statistics.std(data.table, dims=axes)


Suggested change

std = Statistics.std(data.table, dims=axes)

std = Statistics.std(data.table, dims=axes, mean=mean)

If the data source supports it, this is more efficient.

Yes, I tried using Statistics.stdm(itr, mean) earlier, couldn't seem to get it working. Will look into it further and keep this comment open till then.

ToucheSir · 2022-07-03T17:42:51Z

src/TimeSeries/encodings/timeseriespreprocessing.jl

+    means = reshape(means, ( size( means)[2:3] ))
+    stds  = reshape(stds, ( size( stds)[2:3] ))


A little weird formatting going on here. Why is the reshape needed?

ToucheSir · 2022-07-03T17:46:13Z

src/datasets/containers.jl

+        if class_labels
+            return data, class_val_list
+        else
+            return data
+        end


Returning one or two things based on a conditional is a little surprising. Consider either always returning class_val_list and/or making the second return value some meaningful null value (nothing, empty array, whatever makes the most sense).

ToucheSir · 2022-07-03T17:47:56Z

src/TimeSeries/encodings/timeseriespreprocessing.jl

+    means::AbstractArray{T,2}
+    stds::AbstractArray{T,2}


You could also consider changing these to AbstractMatrix{T}, but it would be best to confirm first that they will always be 2-dimensional (what about higher-dimensional time series, for example?)

Not sure about the higher dimensional series, I will have to check the literature or some examples online if I can find them.

Oh I can come up with examples easily enough, the question is whether the fastai docs have any :)

Oh, I will check that up before our meeting today.

codeboy5 · 2022-07-03T21:11:17Z

Hey the encodings are working now.

> using FastAI
> data, blocks = load(datarecipes()["ecg5000"]);
> task = FastAI.TSClassificationSingle(blocks, data);
> input, target = getobs(data, 2)
(Float32[-1.1008778 -3.9968398 … 1.1196209 -1.4362499], "1")
> encodesample(task, Training(), (input, target))
(Float32[-1.1048299 -4.011188 … 1.1236403 -1.4414059], Float32[1.0, 0.0, 0.0, 0.0, 0.0])

I am looking to add some tests as well look to resolve all the comments you guys made before the meeting. Have also started working on creating the demo notebook side-by-side as you guys suggested.

lorenzoh · 2022-07-05T12:32:39Z

Could you push the notebook here as well? You can just place it in the notebooks folder

lorenzoh · 2022-07-05T12:33:29Z

Since I merged #240, this shows a lot of merge conflicts, but don't worry, I will do the merge myself manually once this PR is ready 👍

codeboy5 · 2022-07-06T04:45:44Z

I've added the decode method and basic visualisation too and updated the notebook accordingly. The formatting for plots Is a little weird on the notebooks, but looks fine on the terminal. Do you guys know how to fix this, should I try adjusting the size of the plot ?

lorenzoh · 2022-07-06T08:58:06Z

The formatting for plots Is a little weird on the notebooks, but looks fine on the terminal.

Rendering of Unicode plots can be a bit off depending on the notebook environment/viewer. It should end up looking fine in the docs, though.

For regular usage, the Makie backend will probably preferable, but we can get to that later 👍

lorenzoh · 2022-07-06T09:34:07Z

Having some trouble to push the subpackage merge to your fork, @codeboy5. Can you check that I have permissions to push to your fork? Or, alternatively just give me committer access to the fork

codeboy5 · 2022-07-06T18:49:50Z

Having some trouble to push the subpackage merge to your fork, @codeboy5. Can you check that I have permissions to push to your fork? Or, alternatively just give me committer access to the fork

If checked, users with write access to FluxML/FastAI.jl can add new commits to your timeseries-blocks branch. You can always change this setting later.
This option is checked out for me.
Should I add you as a collaborator on my forked repo ?

lorenzoh · 2022-07-06T20:43:45Z

Having some trouble to push the subpackage merge to your fork, @codeboy5. Can you check that I have permissions to push to your fork? Or, alternatively just give me committer access to the fork

If checked, users with write access to FluxML/FastAI.jl can add new commits to your timeseries-blocks branch. You can always change this setting later.
This option is checked out for me.
Should I add you as a collaborator on my forked repo ?

Yeah, please try that

codeboy5 · 2022-07-07T03:24:33Z

Having some trouble to push the subpackage merge to your fork, @codeboy5. Can you check that I have permissions to push to your fork? Or, alternatively just give me committer access to the fork

If checked, users with write access to FluxML/FastAI.jl can add new commits to your timeseries-blocks branch. You can always change this setting later.
This option is checked out for me.
Should I add you as a collaborator on my forked repo ?

Yeah, please try that

Yeah, just sent you an invite for the same. Hopefully it should work now.

lorenzoh · 2022-07-11T15:41:40Z

I merged this PR into master, so it now lives in the subpackage FastTimeSeries. The CI is failing right now for unrelated reasons, I'll rerun it when those are fixed (by #247)

codeboy5 · 2022-07-17T18:44:25Z

With this commit, we are able to the following now

> using FastAI, FastTimeSeries
> data, blocks = load(datarecipes()["ecg5000"]);
> task = FastTimeSeries.TSClassificationSingle(blocks, data);
> backbone = FastTimeSeries.Models.StackedLSTM(1, 16, 10, 2);
> model = FastAI.taskmodel(task, backbone)
Chain(
  StackedLSTMCell(
    Recur(
      LSTMCell(1 => 10),                # 500 parameters
    ),
    Recur(
      LSTMCell(10 => 16),               # 1_760 parameters
    ),
  ),
  identity,
  Dense(16 => 5),                       # 85 parameters
)         # Total: 12 trainable arrays, 2_345 parameters,
          # plus 4 non-trainable, 13_312 parameters, summarysize 62.145 KiB.

ToucheSir · 2022-07-18T00:28:29Z

FastTimeSeries/src/models/StackedLSTM.jl

+function StackedLSTM(in::Int, out::Integer, hiddensize::Integer, layers::Integer;
+			init=Flux.glorot_uniform)
+	if layers == 1
+		chain = Chain(LSTM(in, out; init=init))
+	elseif layers == 2
+		chain = Chain(LSTM(in, hiddensize; init=init),
+					  LSTM(hiddensize, out; init=init))
+	else
+		chain_vec = [LSTM(in, hiddensize; init=init)]
+		for i = 1:layers - 2
+			push!(chain_vec, LSTM(hiddensize, hiddensize; init=init))
+		end
+		chain = Chain(chain_vec..., LSTM(hiddensize, out; init=init))
+	end
+	return StackedLSTMCell(chain)
+end


Something is a little weird here: you're returning a Cell, but inside that cell are Recurs and not LSTMCells. Ideally StackedLSTMCell would be structured like the other Flux RNN cells, but if that's not possible I'd recommend renaming to StackedLSTM to better represent the model as stateful + containing internal mutation (transitively via Recur from LSTM).

FastTimeSeries/src/models/StackedLSTM.jl

ToucheSir · 2022-07-18T00:29:47Z

FastTimeSeries/src/models/StackedLSTM.jl

+function initialize_bias!(l::StackedLSTMCell)
+	for i = 1:length(l.chain)
+		l.chain[i].cell.b .= 1
+	end
+	return nothing
+end


This should be done explicitly through the initb argument of the LSTM(Cell) constructor if possible.

FastTimeSeries/src/models/StackedLSTM.jl

ToucheSir · 2022-07-18T00:31:22Z

FastTimeSeries/src/models/StackedLSTM.jl

+	[m.chain(x) for x ∈ X[1:end-1]]
+	return m.chain(X[end])


Is this always a sequence-to-one model?

I think for classification and regression, this will always be a seq-to-one model ?
For some nlp tasks, it would not be.

Ok, then I think you should be able to rewrite this using foldl. It'll be more correct and likely perform better as well. It may also let you drop the Recur wrapper for the inner layer stack and use the cells directly instead (i.e. LSTM -> LSTMCell).

FastTimeSeries/src/tasks/classification.jl

codeboy5 · 2022-07-22T20:27:11Z

FastTimeSeries/src/models/RNN.jl

ToucheSir · 2022-07-22T22:28:47Z

FastTimeSeries/src/models/RNN.jl

@@ -24,5 +30,5 @@ function RNNModel(recbackbone,
                  dropout_rate = 0.0)

    dropout = dropout_rate == 0 ? identity : Dropout(dropout_rate)
-    Chain(recbackbone, dropout, finalclassifier)
-end
+    Chain(tabular2rnn, recbackbone, dropout, finalclassifier)


I don't think we should be doing this dense to slices transform in the model itself. If you just need something RNN friendly, relying on the built-in support for dense inputs should be enough. The permutedims can stay for now, but even that probably shouldn't be in the gradient hot path (it allocates O(n) both ways).

An alternative for now is to make the data pipeline spit out the vector of arrays. We can then revisit if/when you add models like CNNs which expect dense inputs.

I think the "ideal" place to do this transform would be inside the training loop. I am not sure how to exactly do that for FastAI.jl. Is there a way ?

Since the second phase would involve using some CNNs, using data pipeline to spit out vector of arrays would not work.

ToucheSir · 2022-07-22T22:30:22Z

FastTimeSeries/src/models/RNN.jl

@@ -1,3 +1,9 @@
+function tabular2rnn(X::AbstractArray{Float32, 3})


I think "tabular" is a bit of a misnomer here, but naming is not a priority.

lorenzoh · 2022-07-27T13:50:34Z

Since this PR is getting quite long, it may be best to move the newest changes to a new PR and merge the parts that are ready. We could revert the PR to the point where I did the manual merge, merge this PR with the changes up to that point into master, and then open a new PR based on master with the more recent changes, including models and so on. @codeboy5 let me know if you aren’t sure about how to do this 🙂

codeboy5 · 2022-07-27T16:28:41Z

Since this PR is getting quite long, it may be best to move the newest changes to a new PR and merge the parts that are ready. We could revert the PR to the point where I did the manual merge, merge this PR with the changes up to that point into master, and then open a new PR based on master with the more recent changes, including models and so on. @codeboy5 let me know if you aren’t sure about how to do this 🙂

Yeah sure, just have to revert the code to the pr right ?
Will do it.

codeboy5 · 2022-07-27T19:48:01Z

@lorenzoh Does this look okay ?

On the last meeting, @darsnack suggested some changes to the model code, I will commit them in the next PR.

lorenzoh · 2022-07-28T08:52:42Z

Yeah this looks good! Just to make sure, @darsnack does it make sense to rebase this on master when merging or should we squash and merge?

darsnack · 2022-07-28T12:41:31Z

Either is okay. Maybe squash and merge will be less painful?

lorenzoh · 2022-07-29T10:57:32Z

I've squashed and merged it. @codeboy5 you should be able to open a new PR for the model code based on master now.

codeboy5 · 2022-07-29T16:47:53Z

I've squashed and merged it. @codeboy5 you should be able to open a new PR for the model code based on master now.

will do thanks . 👍🏻

ToucheSir reviewed Jun 21, 2022

View reviewed changes

lorenzoh reviewed Jun 21, 2022

View reviewed changes

codeboy5 added 3 commits June 21, 2022 23:43

time series blocks

62e05d8

rebased with master

Added TSClassificationDataset struct

f34952d

changed timeseries block

dd57498

codeboy5 force-pushed the timeseries-blocks branch from 89a0de1 to dd57498 Compare June 23, 2022 19:42

added test for _ts2df function

9117ac6

codeboy5 requested review from ToucheSir and lorenzoh June 24, 2022 03:41

codeboy5 and others added 4 commits June 26, 2022 23:50

coded to download the file on the fly

6d85a5d

CompatHelper: bump compat for UnicodePlots to 3, (keep existing compa…

d432f41

…t) (#244) Co-authored-by: CompatHelper Julia <compathelper_noreply@julialang.org>

time series block complete v1

b4fffa1

added basic task and encoding setup

972fa0c

lorenzoh reviewed Jul 1, 2022

View reviewed changes

encoding progress

4229f7e

Move domain-specific functionality to subpackages (#240)

c0cbc53

* Move domain-specific functionality to subpackages * Add FastMakie.jl * Update tests * Add subpackage CI * run SciMLStyle * Add sysimage example * Update documentation * Rerun notebooks

lorenzoh reviewed Jul 3, 2022

View reviewed changes

ToucheSir reviewed Jul 3, 2022

View reviewed changes

working encodings

19b1af4

type and tests for encodings

06e6b07

codeboy5 added 2 commits July 6, 2022 00:01

basic notebook added

4ba57d9

added decode and visualation

aab2c1f

Merge master into timeseries PR, adding subpackage FastTimeSeries

e543dc7

codeboy5 added 3 commits July 15, 2022 01:57

added lstm code and tests for classification block

024bc58

Added RNN Model

371ae47

added blockmodel

8b73ead

ToucheSir reviewed Jul 18, 2022

View reviewed changes

working model

cd79590

ToucheSir reviewed Jul 22, 2022

View reviewed changes

codeboy5 added 2 commits July 23, 2022 19:23

tests for timeseries

234d3fc

added GAP1d and monash regression download

a5236c8

reverted to old PR

9f84954

lorenzoh merged commit 71b8813 into FluxML:master Jul 29, 2022

		export
		TimeSeriesRow, TSClassificationSingle, TimeSeriesPreprocessing

	std = Statistics.std(data.table, dims=axes)
	std = Statistics.std(data.table, dims=axes, mean=mean)

		means = reshape(means, ( size( means)[2:3] ))
		stds = reshape(stds, ( size( stds)[2:3] ))

		@@ -1,3 +1,9 @@
		function tabular2rnn(X::AbstractArray{Float32, 3})

Add Time Series Block #239

Add Time Series Block #239

Conversation

codeboy5 commented Jun 18, 2022

codeboy5 commented Jun 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codeboy5 commented Jun 24, 2022

ToucheSir commented Jun 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codeboy5 commented Jul 2, 2022

lorenzoh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codeboy5 commented Jul 3, 2022

lorenzoh commented Jul 5, 2022

lorenzoh commented Jul 5, 2022

codeboy5 commented Jul 6, 2022

lorenzoh commented Jul 6, 2022 • edited

lorenzoh commented Jul 6, 2022

codeboy5 commented Jul 6, 2022

lorenzoh commented Jul 6, 2022

codeboy5 commented Jul 7, 2022

lorenzoh commented Jul 11, 2022 • edited

codeboy5 commented Jul 17, 2022

ToucheSir Jul 18, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codeboy5 commented Jul 22, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorenzoh commented Jul 27, 2022

codeboy5 commented Jul 27, 2022

codeboy5 commented Jul 27, 2022

lorenzoh commented Jul 28, 2022

darsnack commented Jul 28, 2022

lorenzoh commented Jul 29, 2022

codeboy5 commented Jul 29, 2022

lorenzoh commented Jul 6, 2022 •

edited

lorenzoh commented Jul 11, 2022 •

edited

ToucheSir Jul 18, 2022 •

edited