- Generate npy from csv: csv2npy.py
- Generate fft features with prepare_fft_feature.py
- Preprocess(Normalization and concat adjacent epoch feature) with preprocessing_ffts
- Model 4.1. Do cv with cnn_cv_single_epoch.py or cnn_cv_multi_epoch.py 4.2. Make prediction with cnn_pred_single_epoch.py or cnn_pred_multi_epoch.py
Stage 1 | Stage 2 | Stage 3 | Stage 4 | Stage 5 |
---|---|---|---|---|
Visualization & Statistics | Preprocessing & feature extraction | SImple xgb model | CNN based model | RNN based model |
Stage 1 | Stage 2 | Stage 3 | Stage 4 | Stage 5 | Stage 6 |
---|---|---|---|---|---|
Visualization & Statistics | Data preprocessing: abnormal data / noise (auto encoder/fft) | Slicing & statistics | Feature extraction(Manural, auto-encoder, fft) | Aggregation model | Sequence model |
Stage 1 | Stage 2 | Stage 3 | Stage 4 |
---|---|---|---|
Code reconstructing | Deal with imbalanced-data | Feature selection | Model selection and hyper-param tuning |
https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/discussion/19240#110095 https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/discussion/20247#latest-356655 https://www.kaggle.com/c/bnp-paribas-cardif-claims-management/discussion/20258#latest-133476 https://www.kaggle.com/c/telstra-recruiting-network/discussion/19239#latest-381687 https://www.kaggle.com/c/prudential-life-insurance-assessment/discussion/19003#latest-229720 https://www.kaggle.com/c/otto-group-product-classification-challenge/discussion/14335#latest-622005 https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings/discussion/18918#latest-627461 https://www.kaggle.com/c/mlsp-2014-mri/discussion/9854#latest-568751 https://github.com/diefimov/santander_2016/blob/master/README.pdf
Filling Method | pre-feature number | feature selection model(2nd) | Model | Outlier Detection |
---|---|---|---|---|
Random forest | 200 | 126 (lasso selection with alpha 0.02) | Ensemble | Ask Chen Le |
Stage 1 | Stage 2 | Stage 3 | Stage 4 | Stage 5 |
---|---|---|---|---|
Code reconstructing | Fill the missing data | Outlier detection | Feature selection | Model selection and hyper-param tuning |
Ensemble Reference: https://www.kaggle.com/serigne/stacked-regressions-top-4-on-leaderboard