Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train on custom dataset #7

Open
tomyjara opened this issue Jan 23, 2024 · 1 comment
Open

Train on custom dataset #7

tomyjara opened this issue Jan 23, 2024 · 1 comment

Comments

@tomyjara
Copy link

Hi! How are you?

I found that tsdiff could be a great tool for generating eeg data. I have a dataset containing the channels measurements from an eeg obtained in an experiment and I would like to train your model with this data. How should I do in order to train your model with a custom dataset?

Thanks!

@abdulfatir
Copy link
Contributor

Hi @tomyjara!

You can use something like this to build a custom dataset.

  1. Create a JSON lines file with your time series data. Basically every line has one time series in JSON format with two keys, start (the start time stamp) and target (the actual time series). I have attached an example file. Note that the time series are not required to have the same start or length.

  2. Use this function to load the file as a GluonTS dataset.

from pathlib import Path

from gluonts.dataset.split import split
from gluonts.dataset.common import (
    MetaData,
    TrainDatasets,
    FileDataset,
)


def get_custom_dataset(
    jsonl_path: Path,
    freq: str,
    prediction_length: int,
    split_offset: int = None,
):
    """Creates a custom GluonTS dataset from a JSONLines file and
    give parameters.

    Parameters
    ----------
    jsonl_path
        Path to a JSONLines file with time series
    freq
        Frequency in pandas format
        (e.g., `H` for hourly, `D` for daily)
    prediction_length
        Prediction length
    split_offset, optional
        Offset to split data into train and test sets, by default None

    Returns
    -------
        A gluonts dataset
    """
    if split_offset is None:
        split_offset = -prediction_length

    metadata = MetaData(freq=freq, prediction_length=prediction_length)
    test_ts = FileDataset(jsonl_path, freq)
    train_ts, _ = split(test_ts, offset=split_offset)
    dataset = TrainDatasets(metadata=metadata, train=train_ts, test=test_ts)
    return dataset
  1. This get_custom_dataset can be used as a replacement for
    dataset = get_gts_dataset(dataset_name)
  2. Modify the default config appropriately, especially the context length, lags, etc.

Thanks @marcelkollovieh for helping with the response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants