Skip to content

SIakovlev/LANL-Earthquake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LANL-Earthquake

LANL Earthquake Prediction Challenge: https://www.kaggle.com/c/LANL-Earthquake-Prediction#description

Date: 11 Feb 2019

Table of contents

  1. Team
  2. Abstract
  3. Introduction
    1. Project structure
    2. Setup
  4. Pipeline architecture
    1. Data processing
    2. Validation
    3. Testing
  5. Results
  6. Summary

Team

Name Email Responsibilities
Denish Shitov - -
Alexey Shaymanov - -
Pavel Tolmachev - -
Sergey Iakovlev siakovlev@student.unimelb.edu.au -

Abstract

Introduction

Project structure

Under the /src directory there is the following structure:

  • /configs - configs for data processing, model training and validation.
  • /data_processing
    • dp_run.py - main data processing script with parameters specified in dp_config.json.
    • feature.py - Feature class implementation.
    • dp_utils.py -
  • /folds - directory with custom fold implementation
  • /models - custom models that follow model.py interface
  • /preproc - ?
  • /test_directory - the main script generating test prediction results
  • /validation - directory with training and validation scripts:
    • train_single_model.py - script for training of a single model with parameters specified in /configs/train_config.json
    • validation_run.py- script for training of a single/multiple models with parameters specified in /configs/validation_config.json

Configs

Data processing, model training and generation of test results can be managed via the following configs (in /configs dir):

  • dp_config.json
    • A user needs to specify directories with the original dataframe (data_dir), output directory for the processed dataframe (data_processed_dir), and their names (data_fname and data_processed_fname);
    • There are two global parameters: window_size and window_stride that are used by default during each feature calculation. Note, these parameters can be overriden by each feature locally (see below).
    • features is the list of features to be calculated. In the configuration file, each feature has 3 parameters:
      • name - feature name;
      • on - the feature caluculation can be enabled or disabled;
      • functions - a dictionary of functions with corresponding parameters from feature.py. An example of a single feature is provided below:
    {
       "name": "q_05_std_rolling_50",
       "on": true,
       "functions": {
         "r_std": {
           "window_size": 50,
           "window_stride": null
         },
         "w_quantile": {
           "q": 0.05
         }
       }
     }
    
  • train_config.json or validation_config.json
  • multi_test_config.json

Setup

Windows

Mac OS

Linux

Results

Notes:

  • Fold: CustomFold(n_splits=1, shuffle=True, fragmentation=0, pad=150)
  • 10 runs is equivalent to 90 model trainings

Best performing models:

Feature config Model Params 10 runs score(std) 100 runs score(std) 300 runs score(std) Public score
e9 XGBRegressor {'booster': 'gbtree', 'colsample_bytree': 1.0, 'eta': 0.177, 'eval_metric': 'mae', 'gamma': 0.93, 'max_depth': 5, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 65001, 'silent': 1, 'subsample': 0.65, 'tree_method': 'gpu_hist'} 2.0092 - 2.1364 (0.7928) 1.647
e9 XGBRegressor {'booster': 'gbtree', 'colsample_bytree': 1.00, 'eta': 0.165, 'eval_metric': 'mae', 'gamma': 0.95, 'max_depth': 5, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 0, 'silent': 1, 'subsample': 0.60, 'tree_method': 'gpu_hist'} 2.0096 - 2.1368 (0.7929) 1.646
e7 XGBRegressor {'booster': 'gbtree', 'colsample_bytree': 0.59, 'eta': 0.273, 'eval_metric': 'mae', 'gamma': 0.82, 'max_depth': 4, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 0, 'silent': 1, 'subsample': 0.78, 'tree_method': 'gpu_hist'} 2.0105 2.0917 (0.7735) 2.1393 (0.7966) 1.650

Other models:

Feature config Model Params 10 runs score(std) 100 runs score(std) 300 runs score(std) Public score
e1 XGBRegressor {'booster': 'gbtree', 'colsample_bytree': 0.8, 'eta': 0.07, 'eval_metric': 'mae', 'gamma': 0.6, 'max_depth': 4, 'min_child_weight': 3, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 0, 'silent': 1, 'subsample': 1.0, 'tree_method': 'gpu_hist'} 2.011 2.092 2.1397 (0.7959) 1.650
e6 XGBRegressor {'booster': 'gbtree', 'colsample_bytree': 0.95, 'eta': 0.015, 'eval_metric': 'mae', 'gamma': 0.75, 'max_depth': 6, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 314159265, 'silent': 1, 'subsample': 0.95, 'tree_method': 'gpu_hist'} 2.013 2.094 2.1418 1.680
e3 XGBRegressor {'booster': 'gbtree', 'colsample_bytree': 0.7, 'eta': 0.15, 'eval_metric': 'mae', 'gamma': 0.75, 'max_depth': 4, 'min_child_weight': 4, 'n_estimators': 21, 'objective': 'gpu:reg:linear', 'seed': 314159265, 'silent': 1, 'subsample': 0.8, 'tree_method': 'gpu_hist'} 2.031 2.1099 2.1556 (0.7787) -
e6 AdaBoost {'learning_rate': 0.23, 'loss': 'linear', 'n_estimators': 13, 'random_state': 0} 2.0728 - - -

About

LANL Earthquake Prediction Challenge

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •