piecewise-regression

Dual-Guaranteed Approximation: A Novel Two-Stage Curve Fitting Method

Overview

This repository introduces a novel optimization strategy for fitting smooth, differentiable functions to noisy or complex data while simultaneously preserving two critical mathematical properties:

Integral preservation (total area under the curve)
Derivative preservation (rate of change / shape characteristics)

The method combines computational efficiency with mathematical rigor through a two-stage pipeline that transforms a traditionally slow, iterative problem into a fast, deterministic solution.

The Problem

Traditional curve fitting methods face a fundamental trade-off:

Method	Speed	Smoothness	Integral Accuracy	Derivative Accuracy
Polynomial Regression	Slow (O(n³))	✅ Good	⚠️ Poor	⚠️ Poor
Spline Interpolation	Fast (O(n))	✅ Good	⚠️ Variable	⚠️ Not guaranteed
Piecewise Linear	Very Fast (O(n))	❌ Not smooth	✅ Excellent	❌ Discontinuous

Key Challenge: How can we achieve all of the following simultaneously?

Fast computation (no slow iterative optimization)
Smooth, differentiable output function
Guaranteed integral preservation
Guaranteed derivative matching

The Solution: Two-Stage Pipeline

This method introduces a hybrid approach that leverages the strengths of both fast interpolation and rigorous regression:

Stage 1: Acceleration Stage (O(n) complexity)

Uses piecewise linear interpolation to create a fast, clean "reference path"
Generates a dense set of ordered sample points
Computes reference derivatives (local slopes)
Output: A noise-free dataset with known derivatives

Stage 2: Smoothing Stage (O(n³) complexity, but only once)

Applies weighted polynomial regression that simultaneously fits:
- Function values: f(x) ≈ y_reference
- Derivative values: f'(x) ≈ dy/dx_reference
Uses dual constraints to enforce both integral and shape preservation
Output: A single, smooth, infinitely differentiable polynomial

Mathematical Foundation

The method is inspired by the uniqueness theorem for functions:

If two smooth functions f(x) and g(x) have the same values AND the same derivatives at a dense set of points, they are approximately equivalent.

By constructing a polynomial that matches both the reference path's values and slopes, we create an "equivalent" smooth function that preserves the essential characteristics of the original data.

Optimization Formulation

The Stage 2 regression solves:

minimize: w_p · ||f(x) - y_ref||² + w_d · ||f'(x) - dy/dx_ref||²

where:
  - w_p = weight for point fitting (integral preservation)
  - w_d = weight for derivative fitting (shape preservation)
  - f(x) = polynomial of degree d

This creates an augmented least squares system:

[ w_p · X_points  ] [c]   [ w_p · y        ]
[ w_d · X_derivs  ] [ ] = [ w_d · dy/dx    ]

Key Innovation

The core insight: Instead of trying to improve a slow method (direct polynomial regression on noisy data), we:

First use a fast method to "clean" the data and establish clear constraints
Then use the cleaned data as an oracle for a single, high-quality regression

This is fundamentally different from traditional approaches because:

✅ No iterative optimization needed
✅ No hyperparameter tuning for regularization
✅ Deterministic and reproducible results
✅ Scale-independent validation metrics

Features

🚀 Computational Efficiency

Stage 1: O(n) - eliminates noise and establishes reference
Stage 2: O(n³) - but only executed once with clean data
No iterations, no convergence issues

📊 Dual Guarantees

Relative Integral Error: Scale-independent metric for area preservation
Derivative RMSE: Quantifies shape/slope matching quality

🔧 Flexibility

Adjustable polynomial degree
Tunable weights for point vs. derivative importance
Configurable interpolation density

📈 Validation Framework

Built-in metrics that go beyond simple error measurement:

Integral preservation (mathematical correctness)
Derivative matching (physical interpretation)
Coefficient analysis (model interpretability)

Installation

pip install numpy scipy matplotlib

Requirements:

Python ≥ 3.7
NumPy ≥ 1.18
SciPy ≥ 1.4
Matplotlib ≥ 3.1

Quick Start

from dual_guaranteed_approximation import DualGuaranteedApproximation
import numpy as np

# Generate noisy data
x = np.linspace(0, 10, 30)
y = np.sin(x) + 0.5*x + np.random.normal(0, 0.2, len(x))

# Fit the model
model = DualGuaranteedApproximation(
    degree=5,                      # Polynomial degree
    n_interpolation_points=1000,   # Density of Stage 1 sampling
    derivative_weight=1.0,          # Weight for slope matching
    point_weight=1.0                # Weight for value matching
)

model.fit(x, y)

# Make predictions
x_new = np.array([2.5, 5.0, 7.5])
y_pred = model.predict(x_new)
derivatives = model.derivative(x_new)

# Validate results
metrics = model.validate()
print(f"Relative Integral Error: {metrics['relative_integral_error']:.6f}")
print(f"Derivative RMSE: {metrics['derivative_rmse']:.6f}")

# Visualize
model.plot_results(x, y, show_detailed=True)

Applications

This method is particularly valuable for:

🎮 Computer Graphics & Animation

Path smoothing for motion trajectories
Interpolation with velocity constraints
Smooth camera movements

📡 Signal Processing

Noise reduction while preserving frequency characteristics
Feature extraction from time-series data
Signal reconstruction with derivative constraints

🤖 Robotics & Control

Trajectory planning with acceleration limits
Smooth path following
Motion profile generation

🧬 Scientific Computing

Curve fitting for experimental data
Physical model approximation
Differentiation of noisy measurements

📊 Data Science

Feature engineering for machine learning
Trend extraction from noisy time-series
Smoothing with shape preservation

Validation & Benchmarks

The method provides two key validation metrics:

1. Relative Integral Error

error = |Integral(smooth) - Integral(reference)| / |Integral(reference)|

Scale-independent
Measures area preservation
Target: ≈ 0 (typically < 0.001)

2. Derivative RMSE

error = sqrt(mean((dy/dx_smooth - dy/dx_reference)²))

Measures shape preservation
Lower values indicate better slope matching
Target: Minimize while maintaining smoothness

Advantages Over Existing Methods

Criterion	Direct Regression	Spline Interpolation	This Method
Speed	❌ Slow	✅ Fast	✅ Fast
Smoothness	✅ Perfect	✅ Good	✅ Perfect
Integral Accuracy	❌ Poor	⚠️ Variable	✅ Guaranteed
Derivative Control	❌ None	⚠️ Limited	✅ Explicit
No Iterations	❌ No	✅ Yes	✅ Yes
Validation Metrics	⚠️ Basic	⚠️ Basic	✅ Dual-constraint

Limitations & Future Work

Current Limitations

Polynomial degree must be chosen manually (no automatic selection)
Not optimized for very large datasets (n > 10⁶)
Assumes data can be sorted along x-axis
Best for smooth underlying functions

Citation

If you use this method in your research, please cite:

@software{dual_guaranteed_approximation,
  author = {Arsalan peiman},
  title = {Dual-Guaranteed Approximation: A Two-Stage Curve Fitting Method},
  year = {2025},
  url = {https://github.com/Normal0arsalan/piecewise-regression/edit/main}
}

Contributing

Contributions are welcome! Areas of interest:

Performance benchmarks against existing methods
Real-world application examples
Extensions to higher dimensions
Alternative optimization formulations

Please open an issue or submit a pull request.

Acknowledgments

This method was developed to address the computational bottlenecks in traditional curve fitting while maintaining mathematical rigor. The key insight—using fast interpolation as a preprocessing step for high-quality regression—emerged from observations about the trade-offs between speed and accuracy in numerical methods.

thank you 🤗

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
linear_regression_picewise.py		linear_regression_picewise.py

License

Normal0arsalan/piecewise-regression

Folders and files

Latest commit

History

Repository files navigation