###Assignment-05


Build, train, and save LightGBM and SVM classifiers with integrated cross-validation and hyperparameter tuning & do evaluation of these models using appropriate metrics, compare their performance, and identify which model performs best with reasoning.

In [11]:
import pandas as pd
import joblib
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error, r2_score
from lightgbm import LGBMRegressor, LGBMClassifier
from sklearn.svm import SVR
from sklearn.impute import SimpleImputer

In [2]:
df=pd.read_csv('/content/sample_data/preprocessed_earthquake_data.csv')

In [3]:
target = 'Magnitude'
categorical_cols = ['Type', 'Magnitude Type', 'Source', 'Status']



X =df.drop(columns=[target]+categorical_cols)
y =df[target]
X_train,X_test,y_train,y_test =train_test_split(X,y,test_size=0.2,random_state=42)

### LIGHTGBM

In [4]:
lgb = LGBMRegressor(random_state=42)
param_grid_lgb = {
                  "n_estimators": [100, 200],
                  "learning_rate": [0.1, 0.01]
                  }

grid_lgb = GridSearchCV(lgb, param_grid_lgb, cv=3, scoring="neg_mean_squared_error", n_jobs=-1)
grid_lgb.fit(X_train, y_train)

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001026 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 914
[LightGBM] [Info] Number of data points in the train set: 4855, number of used features: 19
[LightGBM] [Info] Start training from score 0.051454


In [7]:
print("Best LightGBM:", grid_lgb.best_params_)
y_pred_lgb = grid_lgb.predict(X_test)
print("LightGBM Mean Squared Error:", mean_squared_error(y_test, y_pred_lgb))
print("LightGBM R-squared:", r2_score(y_test, y_pred_lgb))

Best LightGBM: {'learning_rate': 0.01, 'n_estimators': 200}
LightGBM Mean Squared Error: 0.9259012847889126
LightGBM R-squared: 0.13992595937555363


###SVM

In [12]:
svm = Pipeline([
    ("imputer", SimpleImputer(strategy='mean')),
    ("scaler", StandardScaler()),
    ("svr", SVR())
])
param_grid_svm = {"svr__C": [0.1, 1, 10], "svr__kernel": ["linear", "rbf"]}
grid_svm = GridSearchCV(svm, param_grid_svm, cv=3, scoring="neg_mean_squared_error", n_jobs=-1)
grid_svm.fit(X_train, y_train)

In [13]:
print("Best SVR:",grid_svm.best_params_)
y_pred_svr=grid_svm.predict(X_test)
print("SVR R2 Score:",r2_score(y_test,y_pred_svr))
print("SVR MSE:",mean_squared_error(y_test,y_pred_svr))

Best SVR: {'svr__C': 1, 'svr__kernel': 'linear'}
SVR R2 Score: 0.07371379951378776
SVR MSE: 0.9971811060472632


###SUMMARY

We compared LightGBM Regressor and SVR to predict earthquake impact, and LightGBM came out on top. It gave better R2 scores and lower errors, meaning it understood the data patterns more accurately. LightGBM also trained much faster, which makes it a good choice when working with large datasets. On the other hand, SVR worked okay but was slower and needed more careful tuning to get good results. Overall, LightGBM was more reliable and efficient, so it is the better option for making accurate earthquake impact predictions.