# LANL Earthquake Prediction
Can you predict upcoming laboratory earthquakes?

![map](https://storage.googleapis.com/kaggle-media/competitions/LANL/nik-shuliahin-585307-unsplash.jpg)

Forecasting earthquakes is one of the most important problems in Earth science because of their devastating consequences. Current scientific studies related to earthquake forecasting focus on three key points: **when** the event will occur, **where** it will occur, and **how large** it will be.

In this competition, you will address **when** the earthquake will take place. Specifically, you’ll predict the time remaining before laboratory earthquakes occur from real-time seismic data.

If this challenge is solved and the physics are ultimately shown to scale from the laboratory to the field, researchers will have the potential to improve earthquake hazard assessments that could save lives and billions of dollars in infrastructure.

This challenge is hosted by [Los Alamos National Laboratory](https://www.lanl.gov/) which enhances national security by ensuring the safety of the U.S. nuclear stockpile, developing technologies to reduce threats from weapons of mass destruction, and solving problems related to energy, environment, infrastructure, health, and global security concerns.

**Acknowledgments:**

![LANL](https://storage.googleapis.com/kaggle-competitions/kaggle/11000/logos/thumb76_76.png?t=2019-01-03-23-31-16)

**Geophysics Group:** The competition builds on initial work from Bertrand Rouet-Leduc, Claudia Hulbert, and Paul Johnson. B. Rouet-Leduc prepared the data for the competition.

[![Penn State](https://storage.googleapis.com/kaggle-media/competitions/LANL/PS-HOR-RGB-2C.png)](https://www.psu.edu//)

**Department of Geosciences:** Data are from experiments performed by Chas Bolton, Jacques Riviere, Paul Johnson and Prof. Chris Marone.

[![Purdue](https://storage.googleapis.com/kaggle-media/competitions/LANL/PurdueCropped.png)](https://www.purdue.edu/)

**Department of Physics & Astronomy:** This competition stemmed from the DOE Council workshop “Information is in the Noise: Signatures of Evolving Fracture and Fracture Networks” held March 2018 that was organized by Prof. Laura J. Pyrak-Nolte.

**

Department of Energy
--------------------

**

**Office of Science, Basic Energy Sciences, Chemical Sciences, Geosciences and Biosciences Division:** The Geosciences core research.

Photo by Nik Shuliahin on Unsplash

Link: https://www.kaggle.com/competitions/LANL-Earthquake-Prediction

In [1]:
import numpy as np
import pandas as pd
from pathlib import Path
from tqdm.notebook import tqdm
from catboost import CatBoostRegressor, Pool, sum_models, to_regressor
from sklearn.model_selection import KFold, train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

In [2]:
%load_ext nb_black

<IPython.core.display.Javascript object>

In [3]:
scaler = StandardScaler()

<IPython.core.display.Javascript object>

In [4]:
sample_submission_df = pd.read_csv(
    "../../data/LANL-Earthquake-Prediction/sample_submission.csv"
).set_index("seg_id")
sample_submission_df

Unnamed: 0_level_0,time_to_failure
seg_id,Unnamed: 1_level_1
seg_00030f,0
seg_0012b5,0
seg_00184e,0
seg_003339,0
seg_0042cc,0
...,...
seg_ff4236,0
seg_ff7478,0
seg_ff79d9,0
seg_ffbd6a,0


<IPython.core.display.Javascript object>

In [5]:
train_df = pd.read_csv(
    "../../data/LANL-Earthquake-Prediction/train_prepared.csv"
).set_index("seg_id")
train_df

Unnamed: 0_level_0,mean,std,max,min,mean_change_abs,mean_change_rate,abs_max,abs_min,std_first_50000,std_last_50000,...,min_roll_mean_1000,q01_roll_mean_1000,q05_roll_mean_1000,q95_roll_mean_1000,q99_roll_mean_1000,av_change_abs_roll_mean_1000,abs_max_roll_mean_1000,av_change_rate_roll_std_1000,av_change_rate_roll_mean_1000,target
seg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
seq_1630,4.107287,6.480113,145,-136,0.000000,-0.027335,145,0,7.836271,5.689157,...,3.368,3.500,3.755,4.467,4.606,-7.651007e-07,4.780,74636.781485,74636.614967,0.345198
seq_2808,4.466447,3.153520,68,-65,0.000020,0.089663,68,0,3.147644,2.916550,...,3.896,4.029,4.145,4.816,4.944,-6.040268e-08,5.125,74323.679238,74323.590862,10.548500
seq_3552,4.657553,5.043771,240,-201,0.000047,0.086331,240,0,2.820486,3.875171,...,3.543,4.079,4.282,5.036,5.195,1.053691e-06,5.281,74685.306689,74682.808668,13.706999
seq_845,4.884513,7.607336,149,-135,0.000053,0.030698,149,0,8.977789,8.621918,...,3.870,4.292,4.467,5.340,5.527,1.483221e-06,5.850,74541.657276,74540.147038,3.123896
seq_3337,4.760460,4.649864,143,-96,0.000033,0.082509,143,0,3.548759,6.118750,...,4.125,4.271,4.413,5.154,5.404,9.060403e-07,5.582,74609.219499,74607.875394,7.332897
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
seq_1250,4.607953,3.419499,73,-67,0.000000,0.086249,73,0,3.075176,3.372700,...,3.927,4.087,4.195,5.005,5.186,-4.496644e-07,5.466,74584.023781,74583.720903,0.036798
seq_3550,4.643567,4.430010,161,-115,0.000013,0.082237,161,0,3.713703,3.154168,...,3.761,3.948,4.060,5.130,5.233,4.093960e-07,5.380,74337.677991,74337.647333,13.785696
seq_1279,4.633420,4.032830,126,-95,-0.000020,0.075549,126,0,3.350287,5.080004,...,3.962,4.138,4.245,5.018,5.182,-5.637584e-07,5.313,74514.158415,74514.362947,6.962798
seq_814,4.907873,4.094302,64,-52,-0.000033,0.065538,64,0,4.043301,4.273086,...,4.029,4.276,4.518,5.276,5.408,1.697987e-06,5.558,74491.330173,74493.512766,4.331597


<IPython.core.display.Javascript object>

In [6]:
test_df = pd.read_csv(
    "../../data/LANL-Earthquake-Prediction/test_prepared.csv"
).set_index("seg_id")
test_df

Unnamed: 0_level_0,mean,std,max,min,mean_change_abs,mean_change_rate,abs_max,abs_min,std_first_50000,std_last_50000,...,max_roll_mean_1000,min_roll_mean_1000,q01_roll_mean_1000,q05_roll_mean_1000,q95_roll_mean_1000,q99_roll_mean_1000,av_change_abs_roll_mean_1000,abs_max_roll_mean_1000,av_change_rate_roll_std_1000,av_change_rate_roll_mean_1000
seg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
seg_e5c033,3.942853,5.259674,95,-82,-0.000073,-0.037545,95,0,5.449322,4.589241,...,4.682,3.229,3.467,3.598,4.319,4.451,1.389262e-06,4.682,74523.248722,74521.949339
seg_74537f,3.854553,5.112639,83,-78,-0.000020,-0.047260,83,0,5.199925,3.726957,...,4.648,3.275,3.390,3.537,4.203,4.380,1.268456e-06,4.648,74346.990413,74348.891220
seg_5009d9,4.370267,7.194114,149,-139,0.000067,-0.021931,149,0,5.491165,5.823179,...,5.314,3.466,3.656,3.967,4.814,4.987,-1.718121e-06,5.314,74746.926510,74747.193733
seg_cc7a19,3.799813,7.241903,170,-138,-0.000033,-0.077529,170,0,8.823961,4.709930,...,4.772,2.810,3.355,3.458,4.179,4.276,-2.550336e-07,4.772,74327.728054,74328.160760
seg_abb03a,4.182333,5.515830,141,-138,0.000000,-0.023226,141,0,6.163469,5.194700,...,5.055,3.132,3.348,3.697,4.620,4.721,2.604027e-06,5.055,74597.028976,74597.458084
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
seg_06d7ba,4.327347,8.357528,245,-280,0.000013,-0.014522,280,0,10.606502,7.690559,...,5.597,3.378,3.783,3.931,4.720,4.836,7.986577e-07,5.597,74633.246043,74632.616225
seg_1d980f,3.942400,6.597127,158,-141,-0.000040,-0.053942,158,0,7.324298,6.860940,...,5.232,2.931,3.265,3.366,4.625,4.921,-6.395973e-06,5.232,74523.117383,74522.521642
seg_217eed,4.096700,10.373600,360,-251,0.000027,-0.055808,360,0,5.974448,6.722152,...,6.879,2.314,3.464,3.667,4.490,4.637,-1.208054e-06,6.879,74595.332978,74594.434906
seg_b08e9d,4.252653,3.199164,69,-59,0.000007,0.075376,69,0,3.079637,2.811069,...,4.888,3.494,3.754,3.902,4.614,4.755,4.322148e-06,4.888,74422.349095,74423.027285


<IPython.core.display.Javascript object>

# Prepare

In [7]:
X_test = test_df
X_test.shape

(2624, 138)

<IPython.core.display.Javascript object>

In [8]:
X_test[X_test.columns] = scaler.fit_transform(X_test)
X_test

Unnamed: 0_level_0,mean,std,max,min,mean_change_abs,mean_change_rate,abs_max,abs_min,std_first_50000,std_last_50000,...,max_roll_mean_1000,min_roll_mean_1000,q01_roll_mean_1000,q05_roll_mean_1000,q95_roll_mean_1000,q99_roll_mean_1000,av_change_abs_roll_mean_1000,abs_max_roll_mean_1000,av_change_rate_roll_std_1000,av_change_rate_roll_mean_1000
seg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
seg_e5c033,-0.815617,-0.162929,-0.255670,0.247493,-1.356792,-0.552591,-0.253094,0.0,-0.088439,-0.346953,...,-0.163706,0.054030,-0.193765,-0.513107,-0.837391,-0.591700,0.558760,-0.163706,0.149637,0.141970
seg_74537f,-1.165627,-0.179747,-0.299075,0.261224,-0.389373,-0.790399,-0.292006,0.0,-0.106171,-0.555749,...,-0.172549,0.068827,-0.345798,-0.732184,-1.257543,-0.747114,0.511962,-0.172549,-1.053902,-1.040474
seg_5009d9,0.878598,0.058327,-0.060347,0.051832,1.182682,-0.170423,-0.077993,0.0,-0.085464,-0.048163,...,0.000665,0.130269,0.179405,0.812128,0.955498,0.581563,-0.644997,0.000665,1.676970,1.680984
seg_cc7a19,-1.382610,0.063793,0.015613,0.055265,-0.631228,-1.531270,-0.009898,0.0,0.151503,-0.317729,...,-0.140299,-0.080755,-0.414904,-1.015906,-1.344470,-0.974762,-0.078217,-0.140299,-1.185431,-1.182118
seg_abb03a,0.133653,-0.133631,-0.089283,0.055265,-0.026591,-0.202127,-0.103934,0.0,-0.037662,-0.200345,...,-0.066696,0.022827,-0.428725,-0.157556,0.252830,-0.000691,1.029343,-0.066696,0.653429,0.657894
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
seg_06d7ba,0.708469,0.191396,0.286895,-0.432171,0.215264,0.010918,0.346789,0.0,0.278244,0.404010,...,0.074268,0.101961,0.430160,0.682837,0.615030,0.251036,0.329968,0.074268,0.900729,0.898117
seg_1d980f,-0.817414,-0.009955,-0.027793,0.044967,-0.752155,-0.953958,-0.048810,0.0,0.044875,0.203124,...,-0.020662,-0.041832,-0.592604,-1.346317,0.270940,0.437094,-2.457131,-0.020662,0.148740,0.145880
seg_217eed,-0.205787,0.421989,0.702862,-0.332624,0.457118,-0.999616,0.606198,0.0,-0.051102,0.169517,...,0.407693,-0.240310,-0.199689,-0.265299,-0.218029,-0.184560,-0.447404,0.407693,0.641848,0.637238
seg_b08e9d,0.412393,-0.398606,-0.349715,0.326444,0.094336,2.211348,-0.337402,0.0,-0.256926,-0.777525,...,-0.110130,0.139276,0.372901,0.578686,0.231098,0.073733,1.694919,-0.110130,-0.539333,-0.533929


<IPython.core.display.Javascript object>

In [9]:
X_train = train_df.drop("target", axis=1)
X_train.shape

(4195, 138)

<IPython.core.display.Javascript object>

In [10]:
X_train[X_train.columns] = scaler.fit_transform(X_train)
X_train

Unnamed: 0_level_0,mean,std,max,min,mean_change_abs,mean_change_rate,abs_max,abs_min,std_first_50000,std_last_50000,...,max_roll_mean_1000,min_roll_mean_1000,q01_roll_mean_1000,q05_roll_mean_1000,q95_roll_mean_1000,q99_roll_mean_1000,av_change_abs_roll_mean_1000,abs_max_roll_mean_1000,av_change_rate_roll_std_1000,av_change_rate_roll_mean_1000
seg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
seq_1630,-1.609992,-0.007959,-0.067873,0.049767,0.002659,-1.228181,-0.084373,0.0,0.220153,-0.054544,...,-0.274702,-0.043952,-0.903219,-1.285883,-1.650748,-1.102628,-0.277643,-0.274702,0.183255,0.183232
seq_2808,-0.207127,-0.399188,-0.350030,0.317634,0.334985,1.585990,-0.343762,0.0,-0.364299,-0.370887,...,-0.164986,0.178254,0.197815,0.121206,-0.384541,-0.337305,-0.012621,-0.164986,-0.198015,-0.197955
seq_3552,0.539328,-0.176882,0.280243,-0.195464,0.778086,1.505833,0.235652,0.0,-0.405081,-0.261513,...,-0.115375,0.029696,0.301882,0.615492,0.413642,0.231026,0.406366,-0.115375,0.242345,0.239484
seq_845,1.425825,0.124610,-0.053215,0.053539,0.888861,0.167677,-0.070898,0.0,0.362447,0.280072,...,0.065577,0.167312,0.745210,1.282957,1.516584,0.982764,0.567903,0.065577,0.067420,0.065758
seq_3337,0.941277,-0.223208,-0.075201,0.200678,0.556536,1.413907,-0.091110,0.0,-0.314299,-0.005529,...,-0.019652,0.274627,0.701501,1.088129,0.841757,0.704258,0.350837,-0.019652,0.149692,0.148234
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
seq_1250,0.345592,-0.367907,-0.331708,0.310089,0.002659,1.503864,-0.326918,0.0,-0.373333,-0.318843,...,-0.056542,0.191300,0.318533,0.301602,0.301170,0.210648,-0.159014,-0.056542,0.119011,0.118820
seq_3550,0.484697,-0.249065,-0.009243,0.128995,0.224210,1.407370,-0.030474,0.0,-0.293738,-0.343776,...,-0.083891,0.121440,0.029225,-0.185467,0.754683,0.317068,0.164060,-0.083891,-0.180969,-0.180837
seq_1279,0.445064,-0.295776,-0.137496,0.204451,-0.329667,1.246493,-0.148378,0.0,-0.339039,-0.124046,...,-0.105199,0.206029,0.424682,0.481998,0.348336,0.201591,-0.201922,-0.105199,0.033934,0.034359
seq_814,1.517068,-0.288546,-0.364687,0.366680,-0.551217,1.005701,-0.357237,0.0,-0.252653,-0.216112,...,-0.027284,0.234226,0.711908,1.466961,1.284386,0.713316,0.648671,-0.027284,0.006136,0.008968


<IPython.core.display.Javascript object>

In [11]:
y_train = train_df[["target"]].copy()
y_train.describe()

Unnamed: 0,target
count,4195.0
mean,5.68367
std,3.673246
min,0.006398
25%,2.635348
50%,5.358796
75%,8.1775
max,16.103196


<IPython.core.display.Javascript object>

In [12]:
# RMSLE
y_train["target"] = np.log1p(y_train["target"])

<IPython.core.display.Javascript object>

In [13]:
X_train, X_true, y_train, y_true = train_test_split(
    X_train, y_train, test_size=0.1, random_state=42
)
X_train.shape, X_true.shape, y_train.shape, y_true.shape

((3775, 138), (420, 138), (3775, 1), (420, 1))

<IPython.core.display.Javascript object>

# Train

## Hyperparameter tuning

In [14]:
model = CatBoostRegressor(logging_level="Silent")

# https://docs.aws.amazon.com/sagemaker/latest/dg/catboost-tuning.html
tuned_params = {
    "learning_rate": [
        0.001,
        0.002,
        0.003,
        0.004,
        0.005,
        0.006,
        0.007,
        0.008,
        0.009,
        0.01,
    ],
    "depth": [4, 5, 6, 7, 8, 9, 10],
    "l2_leaf_reg": [2, 3, 4, 5, 6, 7, 8, 9, 10],
    "random_strength": [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0],
    "iterations": [500, 600, 700, 800, 900, 1000],
}

grid_search_result = model.randomized_search(
    tuned_params, Pool(X_train, y_train), verbose=False, plot=True
)

MetricVisualizer(layout=Layout(align_self='stretch', height='500px'))

<IPython.core.display.Javascript object>

In [15]:
best_model_params = grid_search_result["params"]
best_model_params

{'depth': 9,
 'l2_leaf_reg': 8,
 'iterations': 1000,
 'random_strength': 3.0,
 'learning_rate': 0.008}

<IPython.core.display.Javascript object>

## Feature selection

In [16]:
importance_df = pd.DataFrame(
    {
        "Column": X_train.columns,
        "Score": model.get_feature_importance(),
    }
).sort_values(by="Score", ascending=False)

X_sf = X_train[importance_df["Column"]]
y_sf = y_train

X_sf.shape, y_sf.shape

((3775, 138), (3775, 1))

<IPython.core.display.Javascript object>

In [17]:
importance_df

Unnamed: 0,Column,Score
99,q05_roll_std_100,6.487594
121,q05_roll_std_1000,5.072706
76,q01_roll_std_10,4.570004
98,q01_roll_std_100,4.400302
77,q05_roll_std_10,4.104194
...,...,...
69,q999,0.174422
36,abs_q95,0.159388
26,count_big,0.051889
7,abs_min,0.000000


<IPython.core.display.Javascript object>

In [18]:
num_list = list(range(10, X_sf.shape[1], 3))


def select_features_loop(X, y, num_features=10):
    X = X.iloc[:, :num_features]

    X_sub_train, X_sub_val, y_sub_train, y_sub_val = train_test_split(
        X, y, test_size=0.1, shuffle=False, random_state=42
    )

    model = CatBoostRegressor(**best_model_params, logging_level="Silent")
    model.fit(
        Pool(X_sub_train, y_sub_train),
        eval_set=Pool(X_sub_val, y_sub_val),
        verbose=False,
    )

    score = mean_squared_error(y_true, model.predict(X_true), squared=False)

    return [num_features, score]


loss_list = []
for num_features in tqdm(num_list):
    loss_values = select_features_loop(X_sf, y_sf, num_features)
    loss_list.append(loss_values)

num_features_df = (
    pd.DataFrame(loss_list, columns=["num_features", "score"])
    .set_index("num_features")
    .sort_values(by="score")
)
num_features_df.head(10)

  0%|          | 0/43 [00:00<?, ?it/s]

Unnamed: 0_level_0,score
num_features,Unnamed: 1_level_1
31,0.485927
97,0.486847
28,0.486855
37,0.486989
106,0.48705
10,0.48728
43,0.48731
13,0.487334
124,0.487335
100,0.487351


<IPython.core.display.Javascript object>

In [19]:
# X_train = X_train.iloc[:, :97]
# X_train.columns

<IPython.core.display.Javascript object>

## Loop

In [20]:
kf = KFold(n_splits=5)

<IPython.core.display.Javascript object>

In [21]:
ensemble = []

for i, (train_index, val_index) in enumerate(kf.split(X_train)):
    X_sub_train, X_sub_val = X_train.iloc[train_index], X_train.iloc[val_index]
    y_sub_train, y_sub_val = y_train.iloc[train_index], y_train.iloc[val_index]

    model = CatBoostRegressor(**best_model_params, logging_level="Silent")

    model.fit(
        Pool(X_sub_train, y_sub_train),
        eval_set=Pool(X_sub_val, y_sub_val),
        verbose=False,
    )

    ensemble.append(model)
    print(model.best_score_)

{'learn': {'RMSE': 0.43246460281026383}, 'validation': {'RMSE': 0.4944214772186618}}
{'learn': {'RMSE': 0.43634412091130975}, 'validation': {'RMSE': 0.46809686512402593}}
{'learn': {'RMSE': 0.4211571640964347}, 'validation': {'RMSE': 0.5430729933344842}}
{'learn': {'RMSE': 0.4279563166797775}, 'validation': {'RMSE': 0.5059561410872376}}
{'learn': {'RMSE': 0.43602785971092345}, 'validation': {'RMSE': 0.4776016643188267}}


<IPython.core.display.Javascript object>

In [22]:
models_avrg = to_regressor(
    sum_models(ensemble, weights=[1.0 / len(ensemble)] * len(ensemble))
)
models_avrg

<catboost.core.CatBoostRegressor at 0x7fe56c291fd0>

<IPython.core.display.Javascript object>

# Validate

In [23]:
val_df = pd.DataFrame(
    {"True": np.exp(y_true["target"]), "Pred": np.exp(models_avrg.predict(X_true))}
)
val_df["Diff"] = val_df["True"] - val_df["Pred"]
val_df

Unnamed: 0_level_0,True,Pred,Diff
seg_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
seq_3673,9.992399,4.596844,5.395554
seq_3937,9.165699,10.144746,-0.979047
seq_1040,9.219896,6.227696,2.992200
seq_335,15.138100,9.369640,5.768459
seq_602,4.735396,4.494477,0.240919
...,...,...,...
seq_3800,5.043799,4.406569,0.637230
seq_2540,10.966597,8.397354,2.569243
seq_2498,1.176098,6.589622,-5.413524
seq_2548,10.654897,7.712045,2.942853


<IPython.core.display.Javascript object>

In [24]:
y_true["target"].describe()

count    420.000000
mean       1.714235
std        0.657042
min        0.067096
25%        1.302093
50%        1.819870
75%        2.222249
max        2.813874
Name: target, dtype: float64

<IPython.core.display.Javascript object>

In [25]:
val_df["Diff"].describe()

count    420.000000
mean       0.575268
std        2.631749
min       -8.961155
25%       -1.141636
50%        0.187076
75%        2.194831
max        7.654584
Name: Diff, dtype: float64

<IPython.core.display.Javascript object>

In [28]:
mean_squared_error(val_df["True"], val_df["Pred"], squared=False)

2.6908262782763432

<IPython.core.display.Javascript object>

# Submission

In [29]:
y_preds_avrg = models_avrg.predict(X_test)
y_preds_avrg

array([1.96367014, 1.89664576, 1.33549159, ..., 1.16420999, 2.19821801,
       1.66554375])

<IPython.core.display.Javascript object>

In [30]:
submission = pd.DataFrame(
    {"seg_id": X_test.index, "time_to_failure": np.exp(y_preds_avrg)}
).set_index("seg_id")
submission

Unnamed: 0_level_0,time_to_failure
seg_id,Unnamed: 1_level_1
seg_e5c033,7.125430
seg_74537f,6.663506
seg_5009d9,3.801864
seg_cc7a19,4.213722
seg_abb03a,5.579360
...,...
seg_06d7ba,3.735242
seg_1d980f,3.745206
seg_217eed,3.203391
seg_b08e9d,9.008945


<IPython.core.display.Javascript object>

In [31]:
submission.to_csv("../../data/LANL-Earthquake-Prediction/submission.csv")

<IPython.core.display.Javascript object>