![qlib](figure/qlib_flow.png)
![junk](figure/junk.png)
# Categories
- Gradient Boosting Decision Trees (GBDT):

    - XGBoost (Tianqi Chen, et al. KDD 2016)
    - LightGBM (Guolin Ke, et al. NIPS 2017)
    - CatBoost (Liudmila Prokhorenkova, et al. NIPS 2018)
    - DoubleEnsemble based on LightGBM (Chuheng Zhang, et al. ICDM 2020)
- Neural Networks (NN):

    - Feedforward Neural Networks (MLP) based on PyTorch
    - Recurrent Neural Networks (RNN):
        - Long Short-Term Memory (LSTM) based on PyTorch (Sepp Hochreiter, et al. Neural computation 1997)
        - Gated Recurrent Unit (GRU) based on PyTorch (Kyunghyun Cho, et al. 2014)
        - Attention-based LSTM (ALSTM) based on PyTorch (Yao Qin, et al. IJCAI 2017)
        - Kernelized Recurrent Neural Networks (KRNN) based on PyTorch
        - Adaptive Recurrent Neural Networks (ADARNN) based on PyTorch (YunTao Du, et al. 2021)
    - Graph Neural Networks (GNN):
        - Graph Attention Networks (GATs) based on PyTorch (Petar Velickovic, et al. 2017)
    - Convolutional Neural Networks (CNN):
        - Temporal Convolutional Networks (TCN) based on PyTorch (Shaojie Bai, et al. 2018)
    - Transformer-based Models:
        - Transformer based on PyTorch (Ashish Vaswani, et al. NeurIPS 2017)
        - Localformer based on PyTorch (Juyong Jiang, et al.)
    - Spatial-Temporal Models:
        - Hierarchical Spatial-Temporal (HIST) based on PyTorch (Wentao Xu, et al. 2021)
    - Attentional Models:
        - Attentional Deep Learning (ADD) based on PyTorch (Hongshun Tang, et al. 2020)
        - Interpretable Graph Multi-Task Framework (IGMTF) based on PyTorch (Wentao Xu, et al. 2021)
    - Other NN architectures:
        - Sandwich based on PyTorch
- Other Models:
    - Temporal Fusion Transformers (TFT) based on TensorFlow (Bryan Lim, et al. International Journal of Forecasting 2019)
    - TabNet based on PyTorch (Sercan O. Arik, et al. AAAI 2019)
    - Temporal Relational Attention (TRA) based on PyTorch (Hengxu Dong, et al. KDD 2021)
    - Structured Feature Manipulation (SFM) based on PyTorch (Liheng Zhang, et al. KDD 2017)

# Strength & Weakness

## Gradient Boosting Decision Trees (GBDT):

### XGBoost:

- Pros:
    - Well-optimized, efficient implementation with parallelization and regularization techniques.
    - Often achieves state-of-the-art results in structured/tabular data competitions.
    - Supports various objective functions and custom evaluation metrics.
    - Offers advanced features like monotonicity constraints and handling missing values.

- Cons:
    - Can be computationally expensive for large datasets due to its inherently sequential nature.
    - Prone to overfitting if not carefully tuned, especially with noisy or imbalanced data.
    - Limited support for handling categorical features directly without preprocessing.
    - Interpretability diminishes with increasing model complexity.

### LightGBM:

- Pros:
    - Optimized for speed and efficiency with distributed computing and GPU support.
    - Handles categorical features natively with efficient tree building algorithms.
    - Less memory usage compared to other GBDT implementations, making it suitable for large datasets.
    - Offers built-in feature importance calculation.

- Cons:
    - Sensitive to hyperparameters, particularly the number of leaves and maximum depth.
    - May require more data preprocessing compared to XGBoost for optimal performance.
    - Limited support for custom loss functions and evaluation metrics compared to XGBoost.
    - Less mature compared to XGBoost in terms of community support and documentation.

### CatBoost:

- Pros:
    - Handles categorical features automatically without manual preprocessing.
    - Robust to overfitting due to its novel implementation of ordered boosting and symmetric trees.
    - Supports GPU training for faster computation.
    - Provides advanced features like robust model selection and parameter tuning.

- Cons:
    - Slower training speed compared to other GBDT libraries, especially with large datasets.
    - Limited support for custom loss functions and evaluation metrics compared to XGBoost.
    - Higher memory usage compared to LightGBM due to its categorical feature handling.
    - May not consistently outperform XGBoost and LightGBM in all scenarios.

## Neural Networks (NN):

### Feedforward Neural Networks (MLP) based on PyTorch:

- Pros:
    - Highly flexible architecture suitable for various tasks including regression, classification, and function approximation.
    - Easy to implement and experiment with different network structures and hyperparameters.
    - Effective for modeling non-linear relationships and complex patterns in data.
    - Suitable for both small and large datasets.

- Cons:
    - Prone to overfitting, especially with deep architectures and limited data.
    - Requires careful tuning of hyperparameters like learning rate, batch size, and regularization.
    - Interpretability is limited compared to simpler models like linear regression or decision trees.
    - Training can be computationally intensive, particularly for deep networks.

### Recurrent Neural Networks (RNN):

- Pros:
    - Suitable for sequential data processing tasks such as time series prediction, natural language processing, and speech recognition.
    - Can capture temporal dependencies and long-range dependencies in data.
    - Offers various architectures like LSTM and GRU, each with its advantages for different tasks.
    - Supports variable-length inputs and outputs.

- Cons:
    - Prone to vanishing and exploding gradients, which can hinder training stability and learning long-term dependencies.
    - Computationally intensive, especially with deep architectures and long sequences.
    - Sensitive to the choice of activation functions and initialization methods.
    - May suffer from issues like gradient decay and information loss over long sequences.

### Graph Neural Networks (GNN):

- Pros:
    - Effective for learning representations of graph-structured data such as social networks, molecular graphs, and citation networks.
    - Can capture both node-level and graph-level information through message passing mechanisms.
    - Offers flexibility in designing architectures for different graph-based tasks.
    - State-of-the-art performance in tasks like node classification, link prediction, and graph classification.

- Cons:
    - Limited scalability for large graphs due to computational complexity.
    - Vulnerable to over-smoothing and information loss in deep architectures.
    - Requires careful design and tuning of aggregation functions and neighborhood sampling strategies.
    - Interpretability can be challenging, especially with deep GNN architectures.

### Convolutional Neural Networks (CNN):

- Pros:
    - Effective for tasks involving grid-like structured data such as images, video frames, and 1D signals.
    - Can automatically learn hierarchical features through convolutional and pooling layers.
    - Translation-invariant properties make them suitable for tasks like object recognition and image classification.
    - Supports transfer learning and fine-tuning pretrained models on new datasets.

- Cons:
    - Limited in handling sequential and graph-structured data compared to RNNs and GNNs.
    - Requires large amounts of labeled data for training, especially for deep architectures.
    - Vulnerable to issues like overfitting, especially with small datasets and complex models.
    - Interpretability can be challenging, particularly with deep convolutional architectures.

### Transformer-based Models:

- Pros:
    - Highly parallelizable architecture suitable for processing long sequences with self-attention mechanisms.
    - State-of-the-art performance in various natural language processing tasks including machine translation, text generation, and sentiment analysis.
    - Supports efficient training and inference through techniques like attention masking and scaled dot-product attention.
    - Offers flexibility in model size and depth, allowing trade-offs between performance and computational resources.

- Cons:
    - Computationally expensive, especially with large vocabularies and long sequences.
    - Limited ability to handle structured data and non-sequence data compared to CNNs and GNNs.
    - Interpretability can be challenging due to the complexity of attention mechanisms and multiple layers.
    - Requires large-scale pretraining data for achieving state-of-the-art performance, limiting applicability in low-resource scenarios.

### Spatial-Temporal Models:

- Pros:
    - Effective for capturing both spatial and temporal dependencies in data such as video sequences, sensor data, and spatiotemporal graphs.
    - Offers flexibility in designing architectures for different tasks including video action recognition, trajectory prediction, and dynamic graph modeling.
    - Can capture both short-term and long-term dependencies through convolutional and recurrent layers.
    - State-of-the-art performance in tasks like video understanding, human pose estimation, and traffic forecasting.

- Cons:
    - Computationally intensive, especially with large-scale datasets and complex architectures.
    - Requires careful design and tuning of hyperparameters like kernel sizes, dilation rates, and temporal receptive fields.
    - Vulnerable to issues like overfitting, especially with limited training data and deep architectures.
    - Interpretability can be challenging due to the complexity of spatial-temporal relationships and model architectures.

### Attentional Models:

- Pros:
    - Effective for modeling complex relationships and dependencies in data through attention mechanisms.
    - Can selectively attend to relevant parts of input data, improving performance in tasks like machine translation, question answering,  and image captioning.
    - Offers flexibility in incorporating attention mechanisms into various neural network architectures including RNNs, CNNs, and  Transformers.
    - State-of-the-art performance in tasks requiring long-range dependencies and context understanding.

- Cons:
    - Computationally expensive, especially with large-scale datasets and deep architectures.
    - Vulnerable to issues like overfitting, especially with limited training data and complex attention mechanisms.
    - Requires careful design and tuning of attention mechanisms, including attention heads, layers, and attention scores.
    - Interpretability can be challenging due to the complexity of attention distributions and aggregation methods.

## Other Models (TFT, TabNet, SFM, etc.):

- Pros:
    - Specialized architectures tailored for specific tasks like time series forecasting, tabular data processing, and structured feature manipulation.
    - Can offer state-of-the-art performance in their respective domains with appropriate data preprocessing and model tuning.
    - Offers flexibility in incorporating domain-specific knowledge and constraints into model architectures.
    - Can provide interpretable insights into model predictions and feature importance through specialized techniques and visualizations.

- Cons:
    - Limited applicability outside their specialized domains, potentially requiring additional effort for adaptation to new tasks and datasets.
    - May require specific data preprocessing or feature engineering steps for optimal performance.
    - Performance heavily depends on hyperparameter tuning and architecture choices, requiring domain expertise and experimentation.
    - Less established compared to mainstream architectures like GBDT, NNs, and Transformers, potentially leading to fewer resources and community support.
