TSDC - Time Series Dataset Creator

A powerful and intuitive Python library for creating time series datasets ready for machine learning models like LSTM, GRU, and Transformers. No more manual data preprocessing - just load your data and start training!

Why TSDC?

When working with time series models (especially LSTM), you always need to:

Convert raw data into sliding window sequences
Split data temporally (not randomly!)
Normalize/scale features properly
Handle multivariate inputs with single target outputs
Create proper shapes for neural networks

TSDC automates all of this in just a few lines of code.

Installation

From PyPI (Recommended)

Install TSDC using pip:

pip install tsdc

That's it! You're ready to use TSDC.

With optional dependencies

For financial data loading and examples:

pip install tsdc[examples]

This includes yfinance for financial data and matplotlib for visualization.

From source (for development)

If you want to contribute or modify the code:

git clone https://github.com/DeepPythonist/tsdc.git
cd tsdc
pip install -e .

Quick Start

Get started with TSDC in just 3 steps!

Step 1: Install

pip install tsdc

Step 2: Create Your Dataset

import numpy as np
from tsdc import TimeSeriesDataset

# Your time series data (e.g., stock prices, sensor data, etc.)
data = np.random.randn(1000) * 100 + 500

# Create dataset with 60 timesteps looking back, predicting 1 step ahead
dataset = TimeSeriesDataset(
    data=data,
    lookback=60,
    horizon=1,
    scaler_type='minmax'
)

# Prepare train/val/test splits
dataset.prepare()

Step 3: Get Your Data (Ready for LSTM!)

X_train, y_train = dataset.get_train()
X_val, y_val = dataset.get_val()
X_test, y_test = dataset.get_test()

print(f"X_train shape: {X_train.shape}")  # (samples, 60, 1)
print(f"y_train shape: {y_train.shape}")  # (samples,)

# Now feed directly to your LSTM model!

Real-World Example: Bitcoin Price Prediction

# First, install with financial data support
# pip install tsdc[examples]

from tsdc import TimeSeriesDataset
from tsdc.loaders import FinancialLoader

# Load Bitcoin data
loader = FinancialLoader()
btc_data = loader.load(
    symbol="BTC-USD",
    start_date="2023-01-01",
    end_date="2024-01-01"
)

# Create dataset for LSTM
dataset = TimeSeriesDataset(
    data=btc_data[['Close', 'Volume']],
    lookback=60,
    horizon=1,
    target_column='Close',
    scaler_type='minmax'
)
dataset.prepare()

X_train, y_train = dataset.get_train()

# Ready for your LSTM model!
# X_train shape: (samples, 60, 2)  - 60 days, 2 features
# y_train shape: (samples,)         - Next day's close price

Core Concepts

1. Lookback and Horizon

lookback: Number of past timesteps to use as input
horizon: Number of future timesteps to predict

lookback=60, horizon=1   # Use 60 past points to predict next 1 point
lookback=24, horizon=12  # Use 24 hours to predict next 12 hours

2. Stride

Control how windows overlap:

stride=1   # Maximum overlap, windows shift by 1 timestep
stride=5   # Less overlap, windows shift by 5 timesteps

3. Train/Val/Test Splits

IMPORTANT: TSDC uses temporal (sequential) splitting, NOT random splitting!

Time series splitting preserves temporal order to prevent data leakage:

TimeSeriesDataset(
    data=data,
    train_split=0.7,   # First 70% for training (oldest data)
    val_split=0.15,    # Next 15% for validation (middle data)
    test_split=0.15    # Last 15% for testing (newest data)
)

# Train ← Val ← Test (sequential, no shuffling)
# This prevents training on future data and testing on past data!

4. Scaling Options

scaler_type='minmax'    # Scale to [0, 1]
scaler_type='standard'  # Zero mean, unit variance
scaler_type='robust'    # Robust to outliers
scaler_type='none'      # No scaling

API Reference

TimeSeriesDataset

Main class for dataset creation.

TimeSeriesDataset(
    data: Union[np.ndarray, pd.DataFrame, pd.Series, str],
    lookback: int = 10,
    horizon: int = 1,
    stride: int = 1,
    target_column: Optional[Union[int, str]] = None,
    scaler_type: str = "minmax",
    train_split: float = 0.7,
    val_split: float = 0.15,
    test_split: float = 0.15
)

Methods:

prepare(preprocess=True): Prepare the dataset
get_train(): Returns (X_train, y_train)
get_val(): Returns (X_val, y_val)
get_test(): Returns (X_test, y_test)
get_all(): Returns dictionary with all splits
get_info(): Get dataset information
inverse_transform_predictions(predictions): Convert scaled predictions back

Sequencer

Low-level API for creating sequences.

from tsdc import Sequencer

sequencer = Sequencer(lookback=10, horizon=5, stride=1)
X, y = sequencer.create_sequences(data)

Preprocessor

Standalone preprocessing utilities.

from tsdc import Preprocessor

preprocessor = Preprocessor(
    scaler_type='minmax',
    handle_missing='forward_fill',
    remove_outliers=True,
    outlier_threshold=3.0
)
scaled_data = preprocessor.fit_transform(data)
original_data = preprocessor.inverse_transform(scaled_data)

FinancialLoader

Load financial data from Yahoo Finance.

from tsdc.loaders import FinancialLoader

loader = FinancialLoader()
btc_data = loader.load(
    symbol="BTC-USD",
    start_date="2023-01-01",
    end_date="2024-01-01",
    source="yahoo"
)

btc_data = loader.add_technical_indicators(
    sma_periods=[20, 50],
    ema_periods=[12, 26],
    rsi_period=14,
    macd=True
)

Examples

Example 1: Bitcoin Price Prediction with LSTM

import numpy as np
from tsdc import TimeSeriesDataset
from tsdc.loaders import FinancialLoader

loader = FinancialLoader()
btc_data = loader.load(symbol="BTC-USD", start_date="2022-01-01")

dataset = TimeSeriesDataset(
    data=btc_data[['Close', 'Volume']],
    lookback=60,
    horizon=1,
    target_column='Close',
    scaler_type='minmax'
)
dataset.prepare()

X_train, y_train = dataset.get_train()

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

model = Sequential([
    LSTM(100, return_sequences=True, input_shape=(60, 2)),
    Dropout(0.2),
    LSTM(50, return_sequences=False),
    Dropout(0.2),
    Dense(1)
])

model.compile(optimizer='adam', loss='mse')
model.fit(X_train, y_train, epochs=50, batch_size=32)

Example 2: Walk-Forward Validation

from tsdc import TimeSeriesDataset, Sequencer
from tsdc.utils.splitters import walk_forward_validation

data = np.random.randn(1000, 3)
sequencer = Sequencer(lookback=20, horizon=1)
X, y = sequencer.create_sequences(data)

for X_train, y_train, X_test, y_test in walk_forward_validation(X, y, n_splits=5):
    model.fit(X_train, y_train)
    score = model.evaluate(X_test, y_test)
    print(f"Test Score: {score}")

Example 3: Loading from CSV

from tsdc import TimeSeriesDataset

dataset = TimeSeriesDataset(
    data="path/to/data.csv",
    lookback=30,
    horizon=7,
    target_column="sales"
)
dataset.prepare()

Example 4: Custom Preprocessing

from tsdc import Preprocessor

preprocessor = Preprocessor(
    scaler_type='robust',
    handle_missing='interpolate',
    remove_outliers=True,
    outlier_threshold=2.5
)

cleaned_data = preprocessor.fit_transform(raw_data)

Advanced Usage

Multi-step Forecasting

Predict multiple timesteps ahead:

dataset = TimeSeriesDataset(
    data=data,
    lookback=48,
    horizon=24,
    target_column='price'
)
dataset.prepare()
X_train, y_train = dataset.get_train()

Custom Splits with Indices

from tsdc.utils.splitters import expanding_window_split

for X_train, y_train, X_test, y_test in expanding_window_split(
    X, y, 
    initial_train_size=100,
    test_size=20,
    step=10
):
    pass

Inverse Transform Predictions

predictions = model.predict(X_test)
original_scale = dataset.inverse_transform_predictions(predictions)

Project Structure

tsdc/
├── tsdc/
│   ├── __init__.py
│   ├── core/
│   │   ├── dataset.py       # Main TimeSeriesDataset class
│   │   ├── sequencer.py     # Sliding window operations
│   │   └── preprocessor.py  # Data preprocessing
│   ├── loaders/
│   │   ├── base.py         # Base loader class
│   │   └── financial.py    # Financial data loaders
│   └── utils/
│       ├── validators.py   # Input validation
│       └── splitters.py    # Time series splitting
├── examples/
│   ├── basic_usage.py      # Basic examples
│   ├── lstm_bitcoin.py     # Bitcoin prediction
│   └── quick_start.py      # Quick start guide
├── tests/
│   └── test_core.py        # Unit tests
├── setup.py
├── requirements.txt
└── README.md

Development

Running Tests

pytest tests/ -v

Running Examples

python examples/basic_usage.py
python examples/lstm_bitcoin.py

Code Style

This project follows PEP 8 guidelines. Format your code with:

black tsdc/
flake8 tsdc/

Features

✅ Easy sequence creation for LSTM/GRU/Transformer models
✅ Built-in preprocessing and normalization
✅ Proper train/validation/test splitting for time series
✅ Support for univariate and multivariate data
✅ Target column selection for multivariate inputs
✅ Financial data loaders with technical indicators
✅ Walk-forward and expanding window validation
✅ Flexible sliding window operations
✅ Missing value handling
✅ Outlier detection and removal
✅ Inverse transform for predictions
✅ Multiple scaling methods

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use TSDC in your research, please cite:

@software{tsdc2024,
  title={TSDC: Time Series Dataset Creator},
  author={DeepPythonist},
  year={2024},
  url={https://github.com/DeepPythonist/tsdc}
}

Support

For issues and questions:

Open an issue on GitHub Issues
Check the examples/ directory for usage examples

Roadmap

Add more data loaders (crypto, weather, etc.)
Add data augmentation techniques
Support for irregular time series
Integration with PyTorch DataLoader
Built-in visualization tools
Automated hyperparameter tuning for lookback/horizon

Made with ❤️ for the ML community

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
tests		tests
tsdc		tsdc
.gitignore		.gitignore
.pypirc.example		.pypirc.example
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

DeepPythonist/tsdc

Folders and files

Latest commit

History

Repository files navigation

TSDC - Time Series Dataset Creator

Why TSDC?

Table of Contents

Installation

From PyPI (Recommended)

With optional dependencies

From source (for development)

Quick Start

Step 1: Install

Step 2: Create Your Dataset

Step 3: Get Your Data (Ready for LSTM!)

Real-World Example: Bitcoin Price Prediction

Core Concepts

1. Lookback and Horizon

2. Stride

3. Train/Val/Test Splits

4. Scaling Options

API Reference

TimeSeriesDataset

Sequencer

Preprocessor

FinancialLoader

Examples

Example 1: Bitcoin Price Prediction with LSTM

Example 2: Walk-Forward Validation

Example 3: Loading from CSV

Example 4: Custom Preprocessing

Advanced Usage

Multi-step Forecasting

Custom Splits with Indices

Inverse Transform Predictions

Project Structure

Development

Running Tests

Running Examples

Code Style

Features

Contributing

License

Citation

Support

Roadmap

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages