This repository introduces a novel optimization strategy for fitting smooth, differentiable functions to noisy or complex data while simultaneously preserving two critical mathematical properties:
- Integral preservation (total area under the curve)
- Derivative preservation (rate of change / shape characteristics)
The method combines computational efficiency with mathematical rigor through a two-stage pipeline that transforms a traditionally slow, iterative problem into a fast, deterministic solution.
Traditional curve fitting methods face a fundamental trade-off:
Method | Speed | Smoothness | Integral Accuracy | Derivative Accuracy |
---|---|---|---|---|
Polynomial Regression | Slow (O(n³)) | ✅ Good | ||
Spline Interpolation | Fast (O(n)) | ✅ Good | ||
Piecewise Linear | Very Fast (O(n)) | ❌ Not smooth | ✅ Excellent | ❌ Discontinuous |
Key Challenge: How can we achieve all of the following simultaneously?
- Fast computation (no slow iterative optimization)
- Smooth, differentiable output function
- Guaranteed integral preservation
- Guaranteed derivative matching
This method introduces a hybrid approach that leverages the strengths of both fast interpolation and rigorous regression:
- Uses piecewise linear interpolation to create a fast, clean "reference path"
- Generates a dense set of ordered sample points
- Computes reference derivatives (local slopes)
- Output: A noise-free dataset with known derivatives
- Applies weighted polynomial regression that simultaneously fits:
- Function values:
f(x) ≈ y_reference
- Derivative values:
f'(x) ≈ dy/dx_reference
- Function values:
- Uses dual constraints to enforce both integral and shape preservation
- Output: A single, smooth, infinitely differentiable polynomial
The method is inspired by the uniqueness theorem for functions:
If two smooth functions f(x) and g(x) have the same values AND the same derivatives at a dense set of points, they are approximately equivalent.
By constructing a polynomial that matches both the reference path's values and slopes, we create an "equivalent" smooth function that preserves the essential characteristics of the original data.
The Stage 2 regression solves:
minimize: w_p · ||f(x) - y_ref||² + w_d · ||f'(x) - dy/dx_ref||²
where:
- w_p = weight for point fitting (integral preservation)
- w_d = weight for derivative fitting (shape preservation)
- f(x) = polynomial of degree d
This creates an augmented least squares system:
[ w_p · X_points ] [c] [ w_p · y ]
[ w_d · X_derivs ] [ ] = [ w_d · dy/dx ]
The core insight: Instead of trying to improve a slow method (direct polynomial regression on noisy data), we:
- First use a fast method to "clean" the data and establish clear constraints
- Then use the cleaned data as an oracle for a single, high-quality regression
This is fundamentally different from traditional approaches because:
- ✅ No iterative optimization needed
- ✅ No hyperparameter tuning for regularization
- ✅ Deterministic and reproducible results
- ✅ Scale-independent validation metrics
- Stage 1: O(n) - eliminates noise and establishes reference
- Stage 2: O(n³) - but only executed once with clean data
- No iterations, no convergence issues
- Relative Integral Error: Scale-independent metric for area preservation
- Derivative RMSE: Quantifies shape/slope matching quality
- Adjustable polynomial degree
- Tunable weights for point vs. derivative importance
- Configurable interpolation density
Built-in metrics that go beyond simple error measurement:
- Integral preservation (mathematical correctness)
- Derivative matching (physical interpretation)
- Coefficient analysis (model interpretability)
pip install numpy scipy matplotlib
Requirements:
- Python ≥ 3.7
- NumPy ≥ 1.18
- SciPy ≥ 1.4
- Matplotlib ≥ 3.1
from dual_guaranteed_approximation import DualGuaranteedApproximation
import numpy as np
# Generate noisy data
x = np.linspace(0, 10, 30)
y = np.sin(x) + 0.5*x + np.random.normal(0, 0.2, len(x))
# Fit the model
model = DualGuaranteedApproximation(
degree=5, # Polynomial degree
n_interpolation_points=1000, # Density of Stage 1 sampling
derivative_weight=1.0, # Weight for slope matching
point_weight=1.0 # Weight for value matching
)
model.fit(x, y)
# Make predictions
x_new = np.array([2.5, 5.0, 7.5])
y_pred = model.predict(x_new)
derivatives = model.derivative(x_new)
# Validate results
metrics = model.validate()
print(f"Relative Integral Error: {metrics['relative_integral_error']:.6f}")
print(f"Derivative RMSE: {metrics['derivative_rmse']:.6f}")
# Visualize
model.plot_results(x, y, show_detailed=True)
This method is particularly valuable for:
- Path smoothing for motion trajectories
- Interpolation with velocity constraints
- Smooth camera movements
- Noise reduction while preserving frequency characteristics
- Feature extraction from time-series data
- Signal reconstruction with derivative constraints
- Trajectory planning with acceleration limits
- Smooth path following
- Motion profile generation
- Curve fitting for experimental data
- Physical model approximation
- Differentiation of noisy measurements
- Feature engineering for machine learning
- Trend extraction from noisy time-series
- Smoothing with shape preservation
The method provides two key validation metrics:
error = |Integral(smooth) - Integral(reference)| / |Integral(reference)|
- Scale-independent
- Measures area preservation
- Target: ≈ 0 (typically < 0.001)
error = sqrt(mean((dy/dx_smooth - dy/dx_reference)²))
- Measures shape preservation
- Lower values indicate better slope matching
- Target: Minimize while maintaining smoothness
Criterion | Direct Regression | Spline Interpolation | This Method |
---|---|---|---|
Speed | ❌ Slow | ✅ Fast | ✅ Fast |
Smoothness | ✅ Perfect | ✅ Good | ✅ Perfect |
Integral Accuracy | ❌ Poor | ✅ Guaranteed | |
Derivative Control | ❌ None | ✅ Explicit | |
No Iterations | ❌ No | ✅ Yes | ✅ Yes |
Validation Metrics | ✅ Dual-constraint |
- Polynomial degree must be chosen manually (no automatic selection)
- Not optimized for very large datasets (n > 10⁶)
- Assumes data can be sorted along x-axis
- Best for smooth underlying functions
If you use this method in your research, please cite:
@software{dual_guaranteed_approximation,
author = {Arsalan peiman},
title = {Dual-Guaranteed Approximation: A Two-Stage Curve Fitting Method},
year = {2025},
url = {https://github.com/Normal0arsalan/piecewise-regression/edit/main}
}
Contributions are welcome! Areas of interest:
- Performance benchmarks against existing methods
- Real-world application examples
- Extensions to higher dimensions
- Alternative optimization formulations
Please open an issue or submit a pull request.
This method was developed to address the computational bottlenecks in traditional curve fitting while maintaining mathematical rigor. The key insight—using fast interpolation as a preprocessing step for high-quality regression—emerged from observations about the trade-offs between speed and accuracy in numerical methods.
thank you 🤗