<a href="https://www.kaggle.com/neesham/lightgbm-v-s-xgboost?scriptVersionId=88805563" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# In this notebook we will compare two Ultimate ML algorithms.

# Ready? Let's go!


# Set Up

In [1]:
# Set up code checking
import os
if not os.path.exists("../input/train.csv"):
    os.symlink("../input/home-data-for-ml-course/train.csv", "../input/train.csv")  
    os.symlink("../input/home-data-for-ml-course/test.csv", "../input/test.csv") 
from learntools.core import binder
from time import time
import pandas as pd
from sklearn.model_selection import train_test_split
binder.bind(globals())
from learntools.ml_intermediate.ex6 import *
from sklearn.metrics import accuracy_score
import seaborn as sns
import tensorflow as tf
from sklearn.metrics import mean_absolute_error
from tensorflow import keras
from tensorflow.keras import layers
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
print("Setup Complete")

Setup Complete


# Pre-Processing the Data (Melbourn Housing).

In [2]:
# Read the data
X = pd.read_csv('../input/train.csv', index_col='Id')

X_test_full = pd.read_csv('../input/test.csv', index_col='Id')

# Remove rows with missing target, separate target from predictors
X.dropna(axis=0, subset=['SalePrice'], inplace=True)
y = X.SalePrice              
X.drop(['SalePrice'], axis=1, inplace=True)

# Break off validation set from training data
X_train_full, X_valid_full, y_train, y_valid = train_test_split(X, y, train_size=0.8, test_size=0.2,
                                                                random_state=0)

# "Cardinality" means the number of unique values in a column
# Select categorical columns with relatively low cardinality (convenient but arbitrary)
low_cardinality_cols = [cname for cname in X_train_full.columns if X_train_full[cname].nunique() < 10 and 
                        X_train_full[cname].dtype == "object"]

# Select numeric columns
numeric_cols = [cname for cname in X_train_full.columns if X_train_full[cname].dtype in ['int64', 'float64']]

# Keep selected columns only
my_cols = low_cardinality_cols + numeric_cols
X_train = X_train_full[my_cols].copy()
X_valid = X_valid_full[my_cols].copy()
X_test = X_test_full[my_cols].copy()

# One-hot encode the data (to shorten the code, we use pandas)
X_train = pd.get_dummies(X_train)
X_valid = pd.get_dummies(X_valid)
X_test = pd.get_dummies(X_test)
X_train, X_valid = X_train.align(X_valid, join='left', axis=1)
X_train, X_test = X_train.align(X_test, join='left', axis=1)

# The XGBoost

In [3]:
from xgboost import XGBRegressor

# Define the model
model = XGBRegressor(n_estimators = 1000, learning_rate = 0.05)

t0 = time()
# Fit the model
model.fit(X_train, y_train,early_stopping_rounds=5,eval_set=[(X_valid, y_valid)],
             verbose=False)

print("Execution Time: ", time() - t0)
# Get predictions
predictions = model.predict(X_valid)

# Calculate MAE
mae_2 = mean_absolute_error(y_valid, predictions)

print("Mean absolute error is: ", mae_2)

# Accuracy
print("Accuracy is: ", (model.score(X_valid, y_valid)) * 100)



Execution Time:  2.7361762523651123
Mean absolute error is:  16802.965325342466
Accuracy is:  84.67858042263228


# The LightGBM

In [4]:
import lightgbm as lgb 

# Define the model
model = lgb.LGBMRegressor(n_estimators = 1000, learning_rate = 0.05)

t0 = time()

# Fit the model
model.fit(X_train, y_train, eval_set=[(X_valid, y_valid)], verbose = False)

print("Execution Time: ", time() - t0)

# Get predictions
predictions = model.predict(X_valid)

# Calculate MAE
mae_2 = mean_absolute_error(y_valid, predictions)

print("Mean absolute error is: ", mae_2)

# Accuracy
print("Accuracy is: ", (model.score(X_valid, y_valid)) * 100)



Execution Time:  2.134910821914673
Mean absolute error is:  17259.09657799767
Accuracy is:  87.6791254297951


# Conclusion

So, both the algorithms performed really great.  Both the algorithms have approximately the same accuracy. But one thing that makes lightGBM notorious is its speed and it clearly outperforms XGBoost in terms of speed.

Thanks for reading and if you found this notebook helpful then please smash that upvote button. Also comment down your favorite feature about XGBoost and lightGBM.