- Step 1 : Pre-Processing { Refer to file Final/TheDataMoguls_Stage1.ipynb }
This file contains code for- preprocessing
- model fitting
- LGBM classifier: obtained the AUC of 0.72
- feature extraction
- visualization.
Since the dataset is large, running this file takes around 15 minutes.
- Step 2: Data Preparation {Refer to file Final/TheDataMoguls_DataPreparation.ipynb}
This file contains code for merging the original dataset with the external data provided at the kaggle notebook (https://www.kaggle.com/cdeotte/external-data-malware-0-50).
- Step 3:LSTM Classifier {Refer to Final/TheDataMoguls_LSTM.ipynb}
This file contains code the LSTM model applied to the combined dataset. The accuracy obtained was 50%.
- Step 4:Adaboost Classifier {Refer to file Final/TheDataMoguls_Adaboost.ipynb}
This file contains code for the Adaboost Classifier applied to the combined dataset. The accuracy obtained was 55%.
- Step 5:LightGBM Classifer {} {Refer to file Final/TheDataMoguls_LightGBM.ipynb}
The lightGBM model was applied to the combined data. The AUC obtained was 0.57.
Light GBM does not require categorical features to be encoded and for a time dependent classification it suits well. It is fast when applied to large datasets. /