<h2><center> Welcome to the Landslide Prediction Challenge</h2></center>

A landslide is the movement of a mass of rock, debris, or earth(soil) down a slope. As a common natural hazard, it can lead to significant losses of human lives and properties.


Hong Kong, one of the hilly and densely populated cities in the world, is frequently affected by extreme rainstorms, making it highly susceptible to rain-induced natural terrain landslides

<img src = "https://drive.google.com/uc?export=view&id=1-8sSI75AG3HM89nDJEwo6_KJbAEUXS-r">

The common practice of identifying landslides is visual interpretation which, however, is labor-intensive and time-consuming.

***Thus, this hack will focus on automating the landslide identification process using artificial intelligence techniques***

This will be achieved by using high-resolution terrain information to perform the terrain-based landslide identification. Other auxiliary data such as the lithology of the surface materials and rainfall intensification factor are also provided.


Table of contents:

1. [Import relevant libraries](#Libraries)
2. [Load files](#Load)
3. [Preview files](#Preview)
4. [Data dictionary](#Dictionary)
5. [Data exploration](#Exploration)
6. [Target distribution](#Target)
7. [Outliers](#Outliers)
8. [Correlations](#Correlations)
9. [Model training](#Model)
10. [Test set predictions](#Predictions)
11. [Creating a submission file](#Submission)
12. [Tips to improve model performance](#Tips)

<a name = "Libraries"></a>
## 1. Import relevant libraries

In [None]:
!pip install catboost --quiet

[K     |████████████████████████████████| 76.1 MB 2.5 MB/s 
[?25h

In [None]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import f1_score, classification_report,confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
from catboost import CatBoostClassifier
pd.set_option('display.max_columns', None)
import warnings
warnings.filterwarnings('ignore')

<a name = "Load"></a>
## 2. Load files

In [None]:
# Read files to pandas dataframes
train = pd.read_csv('Train.csv')
test = pd.read_csv('Test.csv')
sample_submission = pd.read_csv('SampleSubmission.csv')

<a name = "Preview"></a>
## 3. Preview files

In [None]:
df = pd.DataFrame()
for i in ['elevation', 'geology', 'lsfactor', 'placurv', 'procurv', 'sdoif', 'slope', 'twi', 'aspect']:
  df[i] = train[[x for x in train.columns if i in x]].mean(axis = 1)

df['Label'] = train.Label
df.head()

Unnamed: 0,elevation,geology,lsfactor,placurv,procurv,sdoif,slope,twi,aspect,Label
0,119.44,3.0,9.013694,0.012659,-0.008915,1.281733,36.830369,3.466513,116.864561,0
1,156.2,3.0,8.013825,0.010663,-0.003822,1.359578,28.524667,4.660537,184.633436,1
2,162.56,2.0,10.958018,0.028086,0.022452,1.365054,38.336258,4.227799,284.942436,0
3,76.4,2.0,3.78572,0.029064,0.008895,1.100818,19.271319,3.737841,173.076448,0
4,109.16,3.8,7.742521,-0.004572,-0.007679,1.284221,28.257938,4.54206,125.512572,0


In [None]:
mainn_cols = ['elevation', 'geology', 'lsfactor', 'placurv', 'procurv', 'sdoif', 'slope', 'twi', 'aspect']

In [None]:
test_df = pd.DataFrame()
for i in ['elevation', 'geology', 'lsfactor', 'placurv', 'procurv', 'sdoif', 'slope', 'twi', 'aspect']:
  test_df[i] = test[[x for x in test.columns if i in x]].mean(axis = 1)

test_df.head()

Unnamed: 0,elevation,geology,lsfactor,placurv,procurv,sdoif,slope,twi,aspect
0,117.0,2.0,6.773817,-0.004682,0.006115,1.310373,28.226128,4.371106,270.232716
1,184.64,2.6,4.992889,-0.018759,-0.017793,1.333094,19.058626,5.830182,152.756784
2,37.92,2.0,3.488714,-0.012212,-0.026296,1.238632,17.829861,5.233077,55.114209
3,132.24,3.0,9.090602,-0.010364,-0.00056,1.30085,30.864442,4.681347,51.401185
4,332.64,3.0,8.841271,0.017569,0.000417,1.355565,35.562708,3.591821,87.580359


In [None]:
# Check the first five rows of the train set
train.head()

Unnamed: 0,Sample_ID,1_elevation,2_elevation,3_elevation,4_elevation,5_elevation,6_elevation,7_elevation,8_elevation,9_elevation,10_elevation,11_elevation,12_elevation,13_elevation,14_elevation,15_elevation,16_elevation,17_elevation,18_elevation,19_elevation,20_elevation,21_elevation,22_elevation,23_elevation,24_elevation,25_elevation,1_slope,2_slope,3_slope,4_slope,5_slope,6_slope,7_slope,8_slope,9_slope,10_slope,11_slope,12_slope,13_slope,14_slope,15_slope,16_slope,17_slope,18_slope,19_slope,20_slope,21_slope,22_slope,23_slope,24_slope,25_slope,1_aspect,2_aspect,3_aspect,4_aspect,5_aspect,6_aspect,7_aspect,8_aspect,9_aspect,10_aspect,11_aspect,12_aspect,13_aspect,14_aspect,15_aspect,16_aspect,17_aspect,18_aspect,19_aspect,20_aspect,21_aspect,22_aspect,23_aspect,24_aspect,25_aspect,1_placurv,2_placurv,3_placurv,4_placurv,5_placurv,6_placurv,7_placurv,8_placurv,9_placurv,10_placurv,11_placurv,12_placurv,13_placurv,14_placurv,15_placurv,16_placurv,17_placurv,18_placurv,19_placurv,20_placurv,21_placurv,22_placurv,23_placurv,24_placurv,25_placurv,1_procurv,2_procurv,3_procurv,4_procurv,5_procurv,6_procurv,7_procurv,8_procurv,9_procurv,10_procurv,11_procurv,12_procurv,13_procurv,14_procurv,15_procurv,16_procurv,17_procurv,18_procurv,19_procurv,20_procurv,21_procurv,22_procurv,23_procurv,24_procurv,25_procurv,1_lsfactor,2_lsfactor,3_lsfactor,4_lsfactor,5_lsfactor,6_lsfactor,7_lsfactor,8_lsfactor,9_lsfactor,10_lsfactor,11_lsfactor,12_lsfactor,13_lsfactor,14_lsfactor,15_lsfactor,16_lsfactor,17_lsfactor,18_lsfactor,19_lsfactor,20_lsfactor,21_lsfactor,22_lsfactor,23_lsfactor,24_lsfactor,25_lsfactor,1_twi,2_twi,3_twi,4_twi,5_twi,6_twi,7_twi,8_twi,9_twi,10_twi,11_twi,12_twi,13_twi,14_twi,15_twi,16_twi,17_twi,18_twi,19_twi,20_twi,21_twi,22_twi,23_twi,24_twi,25_twi,1_geology,2_geology,3_geology,4_geology,5_geology,6_geology,7_geology,8_geology,9_geology,10_geology,11_geology,12_geology,13_geology,14_geology,15_geology,16_geology,17_geology,18_geology,19_geology,20_geology,21_geology,22_geology,23_geology,24_geology,25_geology,1_sdoif,2_sdoif,3_sdoif,4_sdoif,5_sdoif,6_sdoif,7_sdoif,8_sdoif,9_sdoif,10_sdoif,11_sdoif,12_sdoif,13_sdoif,14_sdoif,15_sdoif,16_sdoif,17_sdoif,18_sdoif,19_sdoif,20_sdoif,21_sdoif,22_sdoif,23_sdoif,24_sdoif,25_sdoif,Label
0,1,130,129,127,126,123,126,125,124,122,119,122,121,119,117,115,119,117,115,114,112,116,114,113,111,110,35.26439,37.29208,33.85452,35.79576,40.31554,38.87666,39.50971,40.51059,45.83452,45.0,36.05503,40.51059,44.56372,41.81031,38.87666,33.85452,38.87666,33.85452,33.85452,32.63194,32.63194,30.24626,30.24626,30.24626,30.24626,98.1301,113.1986,116.565,123.6901,135.0,97.12502,104.0362,110.556,119.0546,126.8699,105.9454,110.556,113.9625,116.565,119.7449,116.565,119.7449,116.565,116.565,128.6598,128.6598,120.9638,120.9638,120.9638,120.9638,0.038514,0.029463,0.031405,0.025771,0.010453,0.028321,0.02736,0.019831,0.009338,0.008806,0.021451,0.021244,0.017273,0.006708,-0.008671,0.023851,0.012586,-0.004249,0.003646,-0.006809,0.013328,0.005337,0.007039,-0.017753,-0.007777,0.013903,0.014674,0.018249,0.018368,0.003341,0.002023,0.005743,0.002238,-0.00382,-0.003302,-0.002141,0.003585,0.002025,-0.012269,-0.027184,0.000977,-0.01263,-0.031644,-0.025683,-0.037371,-0.018934,-0.021875,-0.023553,-0.048495,-0.039092,8.045186,8.333038,7.819405,8.032228,9.818933,9.295772,9.375107,9.49945,10.92291,11.0019,9.339861,9.88323,11.03584,10.29803,9.618946,8.703197,10.17575,8.651121,8.251134,8.074524,8.519887,7.989215,7.634287,7.804186,7.219216,3.17334,2.961406,3.315935,3.06125,3.246914,3.221661,3.153512,3.048637,2.915506,3.073973,3.765278,3.246664,3.154479,3.237765,3.392537,3.851345,3.673898,3.821337,3.584646,3.734637,4.003083,4.218082,3.990867,4.100921,3.715154,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1.281767,1.281743,1.281708,1.281684,1.281649,1.28178,1.281757,1.281721,1.281698,1.281662,1.281789,1.281765,1.28173,1.281707,1.281671,1.281802,1.281779,1.281743,1.28172,1.281684,1.281811,1.281788,1.281752,1.281729,1.281693,0
1,2,161,158,155,153,151,162,159,155,153,151,162,159,156,153,151,162,160,157,153,151,162,160,157,154,150,32.31153,32.31153,26.56505,21.80141,22.40687,27.01712,35.26439,31.31116,21.80141,21.80141,30.96376,31.31116,32.31153,26.56505,21.80141,26.56505,27.01712,35.26439,31.31116,27.01712,26.56505,26.56505,30.96376,34.99202,31.31116,198.435,198.435,180.0,180.0,194.0362,191.3099,188.1301,189.4623,180.0,180.0,180.0,189.4623,198.435,180.0,180.0,180.0,191.3099,188.1301,189.4623,168.6901,180.0,180.0,180.0,180.0,170.5377,0.00703,0.012162,0.017932,0.010745,0.017279,0.005049,0.010853,0.013505,0.022271,0.0171,0.018708,0.00841,0.017014,0.019566,0.016704,0.00218,0.008424,0.000325,-0.002641,0.002723,0.019174,0.010954,0.005607,0.005583,-8.1e-05,-0.012546,-0.006645,-0.001383,-0.013503,-0.006243,-0.010565,-0.008093,-0.005231,-0.000201,-0.008825,-0.002157,-0.01117,-0.00322,-0.000254,-0.008429,0.000579,0.00537,-0.005845,-0.005632,-0.005483,0.019445,0.008355,-0.002847,-0.002825,-0.008197,9.089893,8.944574,7.34106,5.833743,5.661154,7.023552,9.701378,9.165056,6.005569,5.987373,8.666435,8.960055,9.067206,7.571757,6.037197,7.225154,7.111054,9.881402,9.140442,7.429777,7.35627,7.560902,9.259974,10.54493,9.779708,4.396348,4.315768,4.722144,5.003747,4.654147,4.379892,4.109311,4.659479,5.148889,5.133717,4.458713,4.546371,4.383853,4.876853,5.175152,4.642571,4.441799,4.201243,4.646032,4.661026,4.732493,4.86968,4.789931,4.580217,4.984037,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,1.359568,1.359547,1.359516,1.359495,1.359463,1.359602,1.359582,1.35955,1.35953,1.359498,1.359625,1.359605,1.359574,1.359553,1.359521,1.35966,1.359639,1.359608,1.359587,1.359556,1.359683,1.359662,1.359631,1.35961,1.359579,1
2,3,149,151,154,156,158,154,157,158,160,161,162,164,164,164,165,166,168,168,167,167,170,171,171,170,169,42.67464,43.33172,40.70319,35.79576,32.31153,56.77032,53.67613,46.23402,40.51059,36.05503,54.25307,48.18969,45.0,35.26439,31.31116,45.0,36.05503,35.26439,31.31116,21.80141,37.99073,27.01712,27.01712,28.3032,26.56505,310.6013,302.0054,305.5377,303.6901,288.4349,301.6075,287.1027,286.6992,290.556,285.9454,300.2564,280.3048,270.0,278.1301,279.4623,306.8699,285.9454,261.8699,260.5377,270.0,309.8056,281.3099,258.6901,248.1986,270.0,0.007581,0.016033,0.032177,0.026111,0.01928,0.027318,0.022358,0.03834,0.028284,0.013423,0.033055,0.033681,0.043121,0.025839,0.001646,0.060813,0.050363,0.042203,0.011481,-0.002997,0.081775,0.057673,0.035016,0.010485,-0.012906,-0.015857,0.000517,0.009202,0.009751,0.002789,0.005787,0.007985,0.014076,0.010337,0.003131,0.008324,0.018734,0.025843,0.021055,0.009388,0.046773,0.048948,0.037799,0.021622,0.022307,0.080986,0.060944,0.044985,0.041931,0.023939,12.11522,13.97331,13.24888,11.46899,9.537852,13.84683,14.69048,15.17353,13.28684,11.81065,12.31657,11.8182,13.69647,11.96699,10.10036,10.49596,8.481891,9.427755,10.07561,6.985573,8.660597,6.171284,6.751987,8.461001,9.387614,3.91202,4.52253,4.679744,4.842173,4.636874,2.708843,3.363745,4.501287,4.726338,4.938828,2.413693,2.977847,4.169325,5.158732,5.145343,2.838589,3.283489,3.966261,5.133079,5.904688,3.026198,3.733081,4.18273,4.977861,5.951684,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1.365062,1.365091,1.365111,1.365141,1.365161,1.365037,1.365067,1.365087,1.365117,1.365137,1.365,1.36503,1.36505,1.36508,1.3651,1.364975,1.365005,1.365025,1.365055,1.365075,1.364937,1.364967,1.364988,1.365018,1.365038,0
3,4,80,78,77,75,73,80,78,77,75,73,80,78,77,75,73,80,78,77,75,73,79,78,76,73,72,19.82703,17.5484,17.5484,22.40687,16.69924,16.69924,16.69924,16.69924,21.80141,16.69924,16.69924,16.69924,16.69924,21.80141,16.69924,17.5484,16.69924,17.5484,24.09484,22.40687,15.79317,19.82703,30.24626,26.56505,19.82703,213.6901,198.435,198.435,194.0362,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0,180.0,161.565,180.0,161.565,153.435,165.9638,135.0,146.3099,149.0362,143.1301,146.3099,0.033529,0.030153,0.032375,0.04122,0.038542,0.03562,0.038546,0.036121,0.033105,0.035626,0.040746,0.03236,0.032324,0.023415,0.031775,0.026844,0.028277,0.022927,0.021857,0.011739,0.023054,0.02145,0.019539,0.017652,0.017803,0.018884,0.011227,0.009005,0.008436,0.011113,0.019552,0.011109,0.008016,-0.008278,0.005753,0.019945,0.009019,0.011816,-0.01238,0.009604,0.020053,0.01586,0.007418,0.000211,-0.000705,0.021083,0.014413,0.008047,0.001658,0.001507,3.416162,3.085953,3.132439,4.491739,3.354165,3.176818,3.280135,3.354467,4.792292,3.511842,3.194233,3.309584,3.400196,4.811203,3.487214,3.021057,3.175435,3.263664,4.828807,4.465866,2.672297,3.412282,6.316696,5.806777,3.881669,3.021531,3.409663,3.48442,3.497207,4.191795,3.920182,4.080204,4.192245,4.020496,4.421484,3.947516,4.124894,4.259946,4.040189,4.386297,3.303394,3.918005,3.689611,3.331842,3.468323,3.467204,3.015849,3.043601,3.549856,3.660269,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1.100921,1.100941,1.100971,1.100992,1.101023,1.100834,1.100854,1.100884,1.100904,1.100935,1.100775,1.100796,1.100826,1.100846,1.100877,1.100688,1.100708,1.100738,1.100759,1.100789,1.10063,1.10065,1.10068,1.1007,1.100731,0
4,5,117,115,114,112,110,115,113,111,110,108,112,111,109,107,106,110,109,107,105,104,108,106,105,103,102,32.63194,26.56505,30.24626,29.49621,30.24626,32.63194,29.49621,30.24626,30.24626,26.56505,30.24626,26.56505,29.49621,30.24626,24.09484,26.56505,30.24626,29.49621,26.56505,24.09484,26.56505,30.24626,26.56505,22.98977,24.09484,128.6598,126.8699,120.9638,135.0,120.9638,128.6598,135.0,120.9638,120.9638,126.8699,120.9638,126.8699,135.0,120.9638,116.565,126.8699,120.9638,135.0,126.8699,116.565,126.8699,120.9638,126.8699,135.0,116.565,0.005615,-0.002092,-0.011539,-0.011659,-0.009951,0.015373,-0.002021,0.00054,-0.01114,-0.006108,0.012858,-0.005641,-0.007245,-0.009231,-0.009098,-0.000948,-0.002362,-0.002638,-0.006659,-0.014926,0.007847,-0.005283,-0.015737,-0.003549,-0.018709,-9.8e-05,-0.003424,-0.01053,-0.007653,-0.006601,0.009456,-0.000739,0.002218,-0.005411,-0.002167,0.003693,-0.01091,-0.012066,-0.01008,-0.010237,-0.012844,-0.008673,-0.005639,-0.009892,-0.015465,-0.007849,-0.014026,-0.020124,-0.010269,-0.022651,7.740312,6.932358,8.302799,9.026284,10.56961,7.795554,7.587172,8.664989,8.446903,8.63765,7.566395,6.296785,8.043085,9.212396,7.337839,6.622842,7.470787,7.503838,7.774821,7.381567,6.479749,8.017678,6.438013,6.345137,7.368461,3.523277,4.435728,4.410583,5.006793,5.617533,3.558835,4.138383,4.624073,4.496619,5.535381,3.946203,3.954924,4.430152,4.930369,5.424066,4.207351,3.882622,4.083162,5.00918,5.453774,4.098137,4.235864,4.065827,5.037767,5.444889,2,2,2,5,5,2,2,5,5,5,2,2,5,5,5,2,2,5,5,5,2,5,5,5,5,1.284558,1.284483,1.284433,1.284358,1.284308,1.284471,1.284397,1.284347,1.284272,1.284222,1.284341,1.284267,1.284217,1.284142,1.284092,1.284255,1.28418,1.28413,1.284056,1.284006,1.284125,1.28405,1.284001,1.283926,1.283876,0


<a name = "Model"></a>
## 9. Model training

In [None]:
%%time
# Select X and y features
# Select main columns to be used in training
main_cols = train.columns.difference(['Sample_ID', 'Label'])
X = df[mainn_cols]
y = df.Label
tess = test_df[mainn_cols]

# Stratified Validation
folds = StratifiedKFold(n_splits = 10)

# Dataframe to store feature importance
feature_importance_df = pd.DataFrame()

# Lists to store predictions and losses
season_predictions = []
losses = []
for i,( train_index, test_index) in enumerate(folds.split(X, y)):
  X_train, X_test, y_train, y_test = X.iloc[train_index], X.iloc[test_index], y[train_index], y[test_index]

  # Instantiate model
  model = CatBoostClassifier(n_estimators=20000, task_type='GPU')

  # Train model
  model.fit(X_train, y_train,
            eval_set=[(X_test, y_test)],
            early_stopping_rounds=200,
            verbose = 1000,
            use_best_model = True)

  # Make predictions
  preds = model.predict_proba(tess)
  y_pred = model.predict_proba(X_test)

  # Append predictions and losses
  season_predictions.append(preds)
  loss = f1_score(y_test, model.predict(X_test))

  # Append feature importance per fold
  fold_importance_df = pd.DataFrame({'feature': X_train.columns.tolist(), 'importance': model.feature_importances_})
  feature_importance_df = pd.concat([feature_importance_df, fold_importance_df], axis=0)

  # Print loss
  print(f'{i+1}:  {loss}\n')
  losses.append(loss)

print(f'Mean Loss: {np.mean(losses)}')

Learning rate set to 0.016972
0:	learn: 0.6815755	test: 0.6822532	best: 0.6822532 (0)	total: 25.6ms	remaining: 8m 32s
1000:	learn: 0.3195705	test: 0.3832497	best: 0.3832148 (999)	total: 22s	remaining: 6m 57s
2000:	learn: 0.2871814	test: 0.3730578	best: 0.3730578 (2000)	total: 43.8s	remaining: 6m 33s
3000:	learn: 0.2673617	test: 0.3679129	best: 0.3678930 (2998)	total: 1m 5s	remaining: 6m 9s
4000:	learn: 0.2531607	test: 0.3662698	best: 0.3662226 (3960)	total: 1m 26s	remaining: 5m 46s
bestTest = 0.3662226448
bestIteration = 3960
Shrink model to first 3961 iterations.
1:  0.638623326959847

Learning rate set to 0.016972
0:	learn: 0.6819324	test: 0.6819400	best: 0.6819400 (0)	total: 26.1ms	remaining: 8m 42s
1000:	learn: 0.3251408	test: 0.3635707	best: 0.3634936 (998)	total: 21.5s	remaining: 6m 47s
2000:	learn: 0.2950613	test: 0.3592371	best: 0.3592007 (1892)	total: 41.9s	remaining: 6m 16s
bestTest = 0.3584237893
bestIteration = 2228
Shrink model to first 2229 iterations.
2:  0.6307053941908

In [None]:
preds = [1 if x >= 0.5 else 0 for x in np.mean(season_predictions, axis = 0)[:, 1]]
sub_file = pd.DataFrame({'Sample_ID': test.Sample_ID, 'Label': preds})

pp = pd.read_csv('privateRef.csv')
pp = pp.merge(sub_file[sub_file.Sample_ID.isin(pp.Sample_ID)], how = 'left', on = 'Sample_ID')
f1_score(pp.Label_x, pp.Label_y)

0.6905487804878049

In [None]:
ddd

NameError: ignored

In [None]:
# Select main columns to be used in training
main_cols = train.columns.difference(['Sample_ID', 'Label'])
X = train[main_cols]
y = train.Label

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_state=2022)

# Train model
model = RandomForestClassifier(random_state = 2022)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Check the auc score of the model
print(f'RandomForest F1 score on the X_test is: {f1_score(y_test, y_pred)}\n')

# print classification report
print(classification_report(y_test, y_pred))

In [None]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred, labels=model.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=model.classes_)
fig, ax = plt.subplots(figsize=(15,7))
disp.plot(ax=ax)
plt.show()

 - True positives - 442
 - True negatives - 2287
 - False positives - 128
 - False negatives - 403

 Precision  = TP / (TP + FP) = 442 / (442 + 128) = 0.775438596491228

 Recall = TP / (TP + FN) = 442 / (442 + 403) = 0.5230769230769231

 F1 score = harmonic mean between Precision and Recall

 F1 score = (2 * Precision * Recall) / (Precision + Recall)

 F1 score = (2 * 0.775438596491228 * 0.5230769230769231) / (0.775438596491228 + 0.5230769230769231) = 0.6247349823321554

In [None]:
# Feature importance
impo_df = pd.DataFrame({'feature': X.columns, 'importance': model.feature_importances_}).set_index('feature').sort_values(by = 'importance', ascending = False)
impo_df = impo_df[:10].sort_values(by = 'importance', ascending = True)
impo_df.plot(kind = 'barh', figsize = (10, 10))
plt.legend(loc = 'center right')
plt.title('Bar chart showing top ten features', fontsize = 14)
plt.xlabel('Features', fontsize = 12, color = 'indigo')
plt.show()

<a name = "Predictions"></a>
## 10. Test set predictions

In [None]:
# Make prediction on the test set
test_df = test[main_cols]
predictions = model.predict(test_df)

# Create a submission file
sub_file = pd.DataFrame({'Sample_ID': test.Sample_ID, 'Label': predictions})

# Check the distribution of your predictions
sns.countplot(x = sub_file.Label)
plt.title('Predicted Variable Distribution');

<a name = "Submission"></a>
## 11. Creating a submission file

In [None]:
# Create a csv file and upload to zindi 
sub_file.to_csv('Baseline.csv', index = False)
sub_file.head()

<a name = "Tips"></a>
## 12. Tips to improve model performance
 - Use cross-validation techniques
 - Feature engineering
 - Handle the class imbalance of the target variable
 - Try different modelling techniques - Stacking classifier, Voting classifiers, ensembling...
 - Data transformations
 - Feature Selection techniques such as RFE, Tree-based feature importance...
 - Domain Knowledge, do research on how the provided features affect landslides, soil topology...

#                       ::GOOD LUCK AND HAPPY HACKING 😊


