In [1]:
# we first use Tsfresh/Tsfel to generate some features to get a feel of what we are dealing with:
#   1. Tsfresh takes n*(id, time, features) and len(id) labels as input:
#   2. tsfresh doesn't test/preprocess anything about input timeseries, so you better make them stationary by yourselves
#   3. id = independent timeseries, long or short
#   4. each id has 1 label, which is against the notion that timeseries need to have timeseries label as well
#   5. however, at each time stamp, we have short memory features and long memory features, for short memory features, it is actually equivalent to having a sliding window of short timeseries and output a series of scalar labels, which is exactly how tsfresh/tsfel works
#   6. indeed, it might be the best practice to use these tools to generate short memory features and handcraft long memory/more hidden features
#   7. these tools can not catch any cross-sectional features between ids. you can generate features for 1 id at a time, works exactly the same, it is just more computationally efficient for parallelism
#   8. do features on long timeseries works equally well on its splitted many rolling short timeseries on average? not necessary
#   9. thus how do you evaluate whether features generated like this work consistently over time? either averaging statistical performance or try to train a model(then compare model weights)
#   10.try short->long window for feature importance(e.g. FFT doesn't work well on short window length)
#   11.to evaluate the effect of a feature, dont need to have too many, have enough samples that can make sure feature is stationary and preferably normal distributed

| **Method**                                       | **Approach**                                | **Pattern Type Learned**                 | **Interpretable?**            | **Best Use Case**                                      | **Tools / Libraries**                |
| ------------------------------------------------ | ------------------------------------------- | ---------------------------------------- | ----------------------------- | ------------------------------------------------------ | ------------------------------------ |
| **Shapelet Transform**                           | Distance-based, supervised                  | Local subsequence "shapes"               | ✅ High                        | Finding interpretable, discriminative patterns         | `tslearn`, `sktime`, `pyts`          |
| **1D Convolutional Neural Networks (CNNs)**      | Deep learning                               | Localized filters (motifs)               | ⚠️ Limited                    | Predicting future outcomes from raw price shapes       | `Keras`, `PyTorch`                   |
| **Dynamic Time Warping + Supervised Clustering** | Similarity + outcome aggregation            | Whole series similarity (flexible time)  | ✅ Medium                      | Pattern grouping + average label scoring               | `tslearn`, `dtaidistance`, `HDBSCAN` |
| **Siamese / Triplet Networks**                   | Metric learning                             | Latent similarity between time windows   | ⚠️ Medium                     | Learning similarity among high-performing windows      | `PyTorch`, `TensorFlow`              |
| **Time Series Forest (TSF)**                     | Tree ensemble over random intervals         | Random intervals + summaries             | ⚠️ Partial                    | Strong classification baseline for sequences           | `sktime`, `tslearn`                  |
| **ROCKET / MiniROCKET / MultiROCKET**            | Random convolutional kernels                | Statistical response to many filters     | ❌ No                          | Very fast, accurate classification/regression          | `sktime`, `rocket-boost`             |
| **Bag-of-SFA Symbols (BOSS)**                    | Symbolic Fourier Approx.                    | Frequency of symbolic subsequences       | ✅ High                        | When symbolic patterns matter (e.g., zigzags)          | `sktime`, `pyts`                     |
| **HIVE-COTE 2.0**                                | Ensemble of diverse time series classifiers | Multiple feature types                   | ✅ Partial (individual models) | State-of-the-art accuracy on many tasks                | `sktime`                             |
| **RNNs / LSTMs / Transformers**                  | Deep sequential models                      | Temporal dynamics, memory, long patterns | ❌ No                          | Capturing long-term dependencies                       | `PyTorch`, `Keras`, `Hugging Face`   |
| **Autoencoder + Regressor**                      | Latent pattern extraction                   | Abstract embeddings                      | ⚠️ Medium                     | Unsupervised pretraining + supervised label prediction | `PyTorch`, `scikit-learn`            |
| **TDE (Temporal Dictionary Ensemble)**           | Dictionary learning + regression            | Frequency/strength of learned shapes     | ✅ High                        | Detecting repeated motifs with outcome correlation     | `sktime`                             |
