## **XGBOOST**


XGBoost, short for Extreme Gradient Boosting, is a popular machine learning algorithm known for its performance and versatility in handling structured and tabular data. It belongs to the family of gradient boosting algorithms, which are ensemble methods that combine multiple weak predictive models, typically decision trees, to create a strong predictive model.

Here are some key features and characteristics of XGBoost:

1. Gradient Boosting: XGBoost is based on the gradient boosting framework, which iteratively builds an ensemble of weak models, where each subsequent model tries to correct the mistakes made by the previous models.

2. Decision Trees: XGBoost primarily uses decision trees as the base learners. It constructs these trees in a greedy manner by recursively partitioning the data based on the features that provide the most information gain or reduction in the loss function.

3. Regularization: XGBoost includes various regularization techniques to prevent overfitting, such as L1 and L2 regularization, which control the complexity of the individual trees and the overall model.

4. Feature Importance: XGBoost provides a measure of feature importance, which helps in understanding the relative contribution of different features in the prediction task.

5. Handling Missing Values: XGBoost has built-in capabilities to handle missing values in the dataset, allowing it to work effectively even with incomplete data.

6. Parallel Processing: XGBoost supports parallel processing, making it efficient in handling large datasets and speeding up the training process.

7. Cross-Validation: XGBoost supports cross-validation techniques to evaluate the model's performance and tune hyperparameters effectively.

8. Flexibility: XGBoost can be used for both classification and regression problems, and it supports various loss functions suitable for different types of tasks.

9. Wide Adoption: XGBoost is widely used in various domains, including industry, academia, and data science competitions, due to its excellent performance and ability to handle complex, real-world problems.

Overall, XGBoost is a powerful algorithm that has gained popularity due to its scalability, efficiency, and strong predictive performance across a wide range of applications.

## **OPTUNA**

Optuna is a popular open-source hyperparameter optimization framework for machine learning. It provides a flexible and efficient way to automatically search for the best set of hyperparameters for a given machine learning model.

Here are some key features and concepts related to Optuna:

1. Hyperparameter Optimization: Hyperparameters are parameters of a machine learning model that are not learned from the data but are set by the user before training. Examples include learning rate, number of layers, and batch size. Optuna automates the process of searching for the optimal combination of hyperparameters to maximize the performance of the model.

2. Bayesian Optimization: Optuna uses a technique called Bayesian optimization to search for the best hyperparameters. It builds a probabilistic model of the objective function (typically the validation loss or accuracy) and uses this model to decide the next set of hyperparameters to evaluate. Bayesian optimization efficiently explores the hyperparameter space by balancing exploration and exploitation.

3. Study and Trials: In Optuna, a study represents a hyperparameter optimization task. It consists of multiple trials, where each trial represents a specific set of hyperparameters. Optuna keeps track of the trials, their respective hyperparameter values, and the resulting objective function scores.

4. Pruning: Pruning is a technique used in Optuna to early stop unpromising trials. During the training process, Optuna can monitor the intermediate results and terminate trials that are unlikely to yield better results. This helps to allocate computational resources more efficiently.

5. Integration with Machine Learning Frameworks: Optuna can be easily integrated with popular machine learning frameworks such as TensorFlow, PyTorch, scikit-learn, and XGBoost. It provides a unified interface to define the hyperparameters and objective function, making it convenient to incorporate hyperparameter optimization into the training pipeline.

6. Visualization and Analysis: Optuna provides various visualization tools to analyze the optimization process, including optimization curves, parallel coordinate plots, and parameter importance plots. These visualizations help in understanding the search progress and the impact of different hyperparameters on the model's performance.

Optuna is widely used in the machine learning community for automating the hyperparameter optimization process. By leveraging its functionalities, researchers and practitioners can save time and resources while improving the performance of their machine learning models.

In [37]:
import optuna
import xgboost as xgb
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import xgboost

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error


from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_regression


In [39]:
!pip list

Package                       Version
----------------------------- -------
alembic                       1.11.1
asttokens                     2.2.1
attrs                         23.1.0
backcall                      0.2.0
backports.functools-lru-cache 1.6.4
cmaes                         0.9.1
colorama                      0.4.6
colorlog                      6.7.0
contourpy                     1.0.7
cycler                        0.11.0
debugpy                       1.5.1
decorator                     5.1.1
executing                     1.2.0
fastjsonschema                2.17.0
fonttools                     4.39.4
greenlet                      2.0.2
importlib-metadata            6.6.0
ipykernel                     6.15.0
ipython                       8.13.2
jedi                          0.18.2
joblib                        1.2.0
jsonschema                    4.17.3
jupyter_client                8.2.0
jupyter_core                  5.3.0
kiwisolver                    1.4.4
Mako           

In [40]:
df = pd.read_csv("D:\\Downloads\\Admission_Prediction.csv")

In [41]:
df.head()

Unnamed: 0,Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,1,337.0,118.0,4.0,4.5,4.5,9.65,1,0.92
1,2,324.0,107.0,4.0,4.0,4.5,8.87,1,0.76
2,3,,104.0,3.0,3.0,3.5,8.0,1,0.72
3,4,322.0,110.0,3.0,3.5,2.5,8.67,1,0.8
4,5,314.0,103.0,2.0,2.0,3.0,8.21,0,0.65


In [42]:
df.shape

(500, 9)

In [43]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Serial No.         500 non-null    int64  
 1   GRE Score          485 non-null    float64
 2   TOEFL Score        490 non-null    float64
 3   University Rating  485 non-null    float64
 4   SOP                500 non-null    float64
 5   LOR                500 non-null    float64
 6   CGPA               500 non-null    float64
 7   Research           500 non-null    int64  
 8   Chance of Admit    500 non-null    float64
dtypes: float64(7), int64(2)
memory usage: 35.3 KB


In [44]:
df.describe()

Unnamed: 0,Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
count,500.0,485.0,490.0,485.0,500.0,500.0,500.0,500.0,500.0
mean,250.5,316.558763,107.187755,3.121649,3.374,3.484,8.57644,0.56,0.72174
std,144.481833,11.274704,6.112899,1.14616,0.991004,0.92545,0.604813,0.496884,0.14114
min,1.0,290.0,92.0,1.0,1.0,1.0,6.8,0.0,0.34
25%,125.75,308.0,103.0,2.0,2.5,3.0,8.1275,0.0,0.63
50%,250.5,317.0,107.0,3.0,3.5,3.5,8.56,1.0,0.72
75%,375.25,325.0,112.0,4.0,4.0,4.0,9.04,1.0,0.82
max,500.0,340.0,120.0,5.0,5.0,5.0,9.92,1.0,0.97


In [45]:
df.isna().sum()

Serial No.            0
GRE Score            15
TOEFL Score          10
University Rating    15
SOP                   0
LOR                   0
CGPA                  0
Research              0
Chance of Admit       0
dtype: int64

In [46]:
df['GRE Score'] = df['GRE Score'].fillna(df['GRE Score'].median())
df['TOEFL Score'] = df['TOEFL Score'].fillna(df['TOEFL Score'].median())
df['University Rating'] =  df['University Rating'].fillna(df['University Rating'].median())

In [47]:
df.isna().sum()

Serial No.           0
GRE Score            0
TOEFL Score          0
University Rating    0
SOP                  0
LOR                  0
CGPA                 0
Research             0
Chance of Admit      0
dtype: int64

In [48]:
df.head()

Unnamed: 0,Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,1,337.0,118.0,4.0,4.5,4.5,9.65,1,0.92
1,2,324.0,107.0,4.0,4.0,4.5,8.87,1,0.76
2,3,317.0,104.0,3.0,3.0,3.5,8.0,1,0.72
3,4,322.0,110.0,3.0,3.5,2.5,8.67,1,0.8
4,5,314.0,103.0,2.0,2.0,3.0,8.21,0,0.65


In [49]:
X = df.drop(["Serial No.", "Chance of Admit"], axis = 1)
y = df["Chance of Admit"]

In [50]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .30, random_state = 42)

In [51]:
X_train.shape

(350, 7)

In [52]:
X_test.shape

(150, 7)

In [53]:
y_train.shape

(350,)

In [54]:
y_test.shape

(150,)

In [55]:
std_sca = StandardScaler()

In [56]:
X_train = std_sca.fit_transform(X_train)

In [57]:
X_train

array([[ 1.22751102,  1.29082036,  1.66979232, ..., -0.5291228 ,
         1.28550609,  0.88127734],
       [-1.64471122, -0.86291365, -0.08261341, ...,  0.01556244,
         0.07349047, -1.13471657],
       [ 0.48629237,  0.46246113, -0.08261341, ...,  0.56024767,
         0.88150088,  0.88127734],
       ...,
       [-1.36675423, -1.35992919, -1.83501915, ..., -1.61849327,
        -2.23270591, -1.13471657],
       [-0.71818792, -0.36589811, -0.95881628, ...,  0.56024767,
        -1.50886325, -1.13471657],
       [-0.25492627, -0.20022626, -0.95881628, ...,  0.01556244,
        -0.54935089, -1.13471657]])

In [58]:
X_test = std_sca.transform(X_test)

In [59]:
X_test

array([[ 1.59812034,  1.45649221,  0.79358945, ...,  0.01556244,
         1.62217709,  0.88127734],
       [-0.25492627,  0.13111743,  0.79358945, ...,  0.56024767,
         0.78049958,  0.88127734],
       [-0.16227394, -0.36589811, -0.95881628, ..., -1.07380803,
        -1.5593639 , -1.13471657],
       ...,
       [ 0.67159703,  0.95947667,  0.79358945, ...,  0.56024767,
         0.35966083, -1.13471657],
       [-0.44023093, -0.53156995, -0.08261341, ...,  0.56024767,
        -0.81868769, -1.13471657],
       [-0.44023093, -0.20022626, -0.08261341, ...,  1.64961814,
        -0.01067728, -1.13471657]])

In [60]:
!nvidia-smi

Sun May 21 17:19:18 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 441.45       Driver Version: 441.45       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce MX350      WDDM  | 00000000:06:00.0 Off |                  N/A |
| N/A   46C    P8    N/A /  N/A |     68MiB /  2048MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

In [97]:
def objective(trial):
    data, target = make_regression(n_samples=1000, n_features=10, random_state=42)

    train_x, test_x, train_y, test_y = train_test_split(data, target, test_size=0.25, random_state=30)

    param = {
    'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
    'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),
    'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]),
    'subsample': trial.suggest_categorical('subsample', [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]),
    'learning_rate': trial.suggest_categorical('learning_rate', [0.00001, 0.0003, 0.008, 0.02, 0.01, 1, 8]),
    'n_estimators': trial.suggest_int('n_estimators', 100, 3000),
    'max_depth': trial.suggest_categorical('max_depth', [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]),
    'random_state': trial.suggest_categorical('random_state', [10, 20, 30, 2000, 3454, 243123]),
    'min_child_weight': trial.suggest_int('min_child_weight', 1, 200)
}


    xgb_reg_model = xgb.XGBRegressor(**param)
    xgb_reg_model.fit(train_x, train_y, eval_set=[(test_x, test_y)], verbose=True)
    pred_xgb = xgb_reg_model.predict(test_x)
    mse = mean_squared_error(test_y, pred_xgb)

    return mse


In [98]:
study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=10)

best_params = study.best_trial.params
print("Best Parameters:", best_params)

[32m[I 2023-05-21 17:42:15,696][0m A new study created in memory with name: no-name-2b9102a4-4cfd-43ec-be99-6d39c100003a[0m


[0]	validation_0-rmse:117.60885
[1]	validation_0-rmse:107.14754
[2]	validation_0-rmse:90.43681
[3]	validation_0-rmse:88.02690
[4]	validation_0-rmse:87.14219
[5]	validation_0-rmse:81.32593
[6]	validation_0-rmse:77.81712
[7]	validation_0-rmse:77.43928
[8]	validation_0-rmse:76.12968
[9]	validation_0-rmse:75.54556
[10]	validation_0-rmse:73.72435
[11]	validation_0-rmse:74.71313
[12]	validation_0-rmse:73.39092
[13]	validation_0-rmse:73.12631
[14]	validation_0-rmse:72.17153
[15]	validation_0-rmse:70.93454
[16]	validation_0-rmse:71.64827
[17]	validation_0-rmse:70.46782
[18]	validation_0-rmse:71.51257
[19]	validation_0-rmse:70.53224
[20]	validation_0-rmse:70.22640
[21]	validation_0-rmse:71.13400
[22]	validation_0-rmse:69.83970
[23]	validation_0-rmse:70.24271
[24]	validation_0-rmse:71.01891
[25]	validation_0-rmse:69.87958
[26]	validation_0-rmse:70.46799
[27]	validation_0-rmse:70.95114
[28]	validation_0-rmse:70.16674
[29]	validation_0-rmse:70.03729
[30]	validation_0-rmse:71.00127
[31]	validation_

  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[101]	validation_0-rmse:69.61664
[102]	validation_0-rmse:69.72185
[103]	validation_0-rmse:69.41079
[104]	validation_0-rmse:70.16497
[105]	validation_0-rmse:69.91250
[106]	validation_0-rmse:70.22215
[107]	validation_0-rmse:70.73528
[108]	validation_0-rmse:70.69742
[109]	validation_0-rmse:71.68935
[110]	validation_0-rmse:71.51724
[111]	validation_0-rmse:69.48098
[112]	validation_0-rmse:69.30595
[113]	validation_0-rmse:69.45043
[114]	validation_0-rmse:70.30609
[115]	validation_0-rmse:70.77813
[116]	validation_0-rmse:69.97501
[117]	validation_0-rmse:69.17509
[118]	validation_0-rmse:69.58765
[119]	validation_0-rmse:69.29118
[120]	validation_0-rmse:69.24986
[121]	validation_0-rmse:69.35203
[122]	validation_0-rmse:69.48716
[123]	validation_0-rmse:68.91023
[124]	validation_0-rmse:69.17304
[125]	validation_0-rmse:70.03970
[126]	validation_0-rmse:69.51010
[127]	validation_0-rmse:69.80745
[128]	validation_0-rmse:69.57151
[129]	validation_0-rmse:69.88272
[130]	validation_0-rmse:69.64491
[131]	vali

[32m[I 2023-05-21 17:42:17,162][0m Trial 0 finished with value: 5260.945078433159 and parameters: {'lambda': 0.03054016419489117, 'alpha': 0.0007841169952190815, 'colsample_bytree': 0.8, 'subsample': 0.5, 'learning_rate': 1, 'n_estimators': 937, 'max_depth': 11, 'random_state': 3454, 'min_child_weight': 157}. Best is trial 0 with value: 5260.945078433159.[0m


[0]	validation_0-rmse:130.83211
[1]	validation_0-rmse:130.09315
[2]	validation_0-rmse:129.36490
[3]	validation_0-rmse:128.64725
[4]	validation_0-rmse:127.90052
[5]	validation_0-rmse:127.16388
[6]	validation_0-rmse:126.45281
[7]	validation_0-rmse:125.74410
[8]	validation_0-rmse:125.04568
[9]	validation_0-rmse:124.37668
[10]	validation_0-rmse:123.67445
[11]	validation_0-rmse:123.06012
[12]	validation_0-rmse:122.38126
[13]	validation_0-rmse:121.74172
[14]	validation_0-rmse:121.11795
[15]	validation_0-rmse:120.47106
[16]	validation_0-rmse:119.81730
[17]	validation_0-rmse:119.22857
[18]	validation_0-rmse:118.63546
[19]	validation_0-rmse:118.04986
[20]	validation_0-rmse:117.41436
[21]	validation_0-rmse:116.85772
[22]	validation_0-rmse:116.26208
[23]	validation_0-rmse:115.68429
[24]	validation_0-rmse:115.12923
[25]	validation_0-rmse:114.55565
[26]	validation_0-rmse:113.99380
[27]	validation_0-rmse:113.42430
[28]	validation_0-rmse:112.88266
[29]	validation_0-rmse:112.31949
[30]	validation_0-rm

  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[46]	validation_0-rmse:104.46621
[47]	validation_0-rmse:104.00983
[48]	validation_0-rmse:103.60392
[49]	validation_0-rmse:103.17253
[50]	validation_0-rmse:102.74653
[51]	validation_0-rmse:102.38925
[52]	validation_0-rmse:101.99452
[53]	validation_0-rmse:101.65005
[54]	validation_0-rmse:101.22320
[55]	validation_0-rmse:100.81918
[56]	validation_0-rmse:100.42253
[57]	validation_0-rmse:100.07995
[58]	validation_0-rmse:99.62551
[59]	validation_0-rmse:99.30756
[60]	validation_0-rmse:98.90341
[61]	validation_0-rmse:98.53781
[62]	validation_0-rmse:98.15152
[63]	validation_0-rmse:97.80564
[64]	validation_0-rmse:97.46300
[65]	validation_0-rmse:97.06326
[66]	validation_0-rmse:96.71931
[67]	validation_0-rmse:96.38457
[68]	validation_0-rmse:96.05658
[69]	validation_0-rmse:95.70415
[70]	validation_0-rmse:95.36948
[71]	validation_0-rmse:95.04632
[72]	validation_0-rmse:94.71241
[73]	validation_0-rmse:94.38568
[74]	validation_0-rmse:94.07680
[75]	validation_0-rmse:93.77581
[76]	validation_0-rmse:93.48

[32m[I 2023-05-21 17:42:20,311][0m Trial 1 finished with value: 931.1609971893504 and parameters: {'lambda': 0.002005101608919934, 'alpha': 0.13200548602265308, 'colsample_bytree': 1, 'subsample': 1, 'learning_rate': 0.01, 'n_estimators': 1387, 'max_depth': 4, 'random_state': 10, 'min_child_weight': 15}. Best is trial 1 with value: 931.1609971893504.[0m


[0]	validation_0-rmse:131.58189
[1]	validation_0-rmse:131.58189
[2]	validation_0-rmse:131.58189
[3]	validation_0-rmse:131.58189
[4]	validation_0-rmse:131.58189
[5]	validation_0-rmse:131.58189
[6]	validation_0-rmse:131.58189
[7]	validation_0-rmse:131.58189
[8]	validation_0-rmse:131.58189
[9]	validation_0-rmse:131.58189
[10]	validation_0-rmse:131.58189
[11]	validation_0-rmse:131.58189
[12]	validation_0-rmse:131.58189
[13]	validation_0-rmse:131.58189
[14]	validation_0-rmse:131.58189
[15]	validation_0-rmse:131.58189
[16]	validation_0-rmse:131.58189
[17]	validation_0-rmse:131.58189
[18]	validation_0-rmse:131.58189
[19]	validation_0-rmse:131.58189
[20]	validation_0-rmse:131.58189
[21]	validation_0-rmse:131.58189
[22]	validation_0-rmse:131.58189
[23]	validation_0-rmse:131.58189
[24]	validation_0-rmse:131.58189
[25]	validation_0-rmse:131.58189
[26]	validation_0-rmse:131.58189
[27]	validation_0-rmse:131.58189
[28]	validation_0-rmse:131.58189
[29]	validation_0-rmse:131.58189
[30]	validation_0-rm

  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[67]	validation_0-rmse:131.58189
[68]	validation_0-rmse:131.58189
[69]	validation_0-rmse:131.58189
[70]	validation_0-rmse:131.58189
[71]	validation_0-rmse:131.58189
[72]	validation_0-rmse:131.58189
[73]	validation_0-rmse:131.58189
[74]	validation_0-rmse:131.58189
[75]	validation_0-rmse:131.58189
[76]	validation_0-rmse:131.58189
[77]	validation_0-rmse:131.58189
[78]	validation_0-rmse:131.58189
[79]	validation_0-rmse:131.58189
[80]	validation_0-rmse:131.58189
[81]	validation_0-rmse:131.58189
[82]	validation_0-rmse:131.58189
[83]	validation_0-rmse:131.58189
[84]	validation_0-rmse:131.58189
[85]	validation_0-rmse:131.58189
[86]	validation_0-rmse:131.58189
[87]	validation_0-rmse:131.58189
[88]	validation_0-rmse:131.58189
[89]	validation_0-rmse:131.58189
[90]	validation_0-rmse:131.58189
[91]	validation_0-rmse:131.58189
[92]	validation_0-rmse:131.58189
[93]	validation_0-rmse:131.58189
[94]	validation_0-rmse:131.58189
[95]	validation_0-rmse:131.58189
[96]	validation_0-rmse:131.58189
[97]	valid

[32m[I 2023-05-21 17:42:21,559][0m Trial 2 finished with value: 17313.79424612686 and parameters: {'lambda': 0.05787709645832966, 'alpha': 4.515605341911871, 'colsample_bytree': 1, 'subsample': 0.1, 'learning_rate': 1, 'n_estimators': 615, 'max_depth': 7, 'random_state': 2000, 'min_child_weight': 188}. Best is trial 1 with value: 931.1609971893504.[0m


[0]	validation_0-rmse:131.56270
[1]	validation_0-rmse:131.54640
[2]	validation_0-rmse:131.52689
[3]	validation_0-rmse:131.50712
[4]	validation_0-rmse:131.48697
[5]	validation_0-rmse:131.47205
[6]	validation_0-rmse:131.45420
[7]	validation_0-rmse:131.43619
[8]	validation_0-rmse:131.41709
[9]	validation_0-rmse:131.39965
[10]	validation_0-rmse:131.38203
[11]	validation_0-rmse:131.36385
[12]	validation_0-rmse:131.34376
[13]	validation_0-rmse:131.32560
[14]	validation_0-rmse:131.30754
[15]	validation_0-rmse:131.28676
[16]	validation_0-rmse:131.26891
[17]	validation_0-rmse:131.24891
[18]	validation_0-rmse:131.22783
[19]	validation_0-rmse:131.20772
[20]	validation_0-rmse:131.19494
[21]	validation_0-rmse:131.17562
[22]	validation_0-rmse:131.15631
[23]	validation_0-rmse:131.13787
[24]	validation_0-rmse:131.12034
[25]	validation_0-rmse:131.10110
[26]	validation_0-rmse:131.08217
[27]	validation_0-rmse:131.06306
[28]	validation_0-rmse:131.04498
[29]	validation_0-rmse:131.02597
[30]	validation_0-rm

  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[78]	validation_0-rmse:130.14275
[79]	validation_0-rmse:130.12354
[80]	validation_0-rmse:130.10434
[81]	validation_0-rmse:130.08575
[82]	validation_0-rmse:130.06704
[83]	validation_0-rmse:130.04942
[84]	validation_0-rmse:130.03297
[85]	validation_0-rmse:130.01117
[86]	validation_0-rmse:129.99286
[87]	validation_0-rmse:129.97577
[88]	validation_0-rmse:129.95753
[89]	validation_0-rmse:129.93871
[90]	validation_0-rmse:129.92031
[91]	validation_0-rmse:129.90212
[92]	validation_0-rmse:129.88313
[93]	validation_0-rmse:129.86998
[94]	validation_0-rmse:129.85109
[95]	validation_0-rmse:129.83154
[96]	validation_0-rmse:129.81330
[97]	validation_0-rmse:129.79471
[98]	validation_0-rmse:129.77940
[99]	validation_0-rmse:129.76001
[100]	validation_0-rmse:129.74243
[101]	validation_0-rmse:129.72599
[102]	validation_0-rmse:129.70604
[103]	validation_0-rmse:129.68641
[104]	validation_0-rmse:129.66833
[105]	validation_0-rmse:129.65429
[106]	validation_0-rmse:129.63669
[107]	validation_0-rmse:129.61911
[1

[32m[I 2023-05-21 17:42:29,072][0m Trial 3 finished with value: 10565.646333732046 and parameters: {'lambda': 0.7833854732245917, 'alpha': 0.0006308067440236689, 'colsample_bytree': 0.9, 'subsample': 0.5, 'learning_rate': 0.0003, 'n_estimators': 2084, 'max_depth': 3, 'random_state': 30, 'min_child_weight': 30}. Best is trial 1 with value: 931.1609971893504.[0m


[0]	validation_0-rmse:131.58159
[1]	validation_0-rmse:131.58128
[2]	validation_0-rmse:131.58090
[3]	validation_0-rmse:131.58056
[4]	validation_0-rmse:131.58029
[5]	validation_0-rmse:131.57988
[6]	validation_0-rmse:131.57956
[7]	validation_0-rmse:131.57917
[8]	validation_0-rmse:131.57881
[9]	validation_0-rmse:131.57842
[10]	validation_0-rmse:131.57805
[11]	validation_0-rmse:131.57769
[12]	validation_0-rmse:131.57744
[13]	validation_0-rmse:131.57705
[14]	validation_0-rmse:131.57665
[15]	validation_0-rmse:131.57626
[16]	validation_0-rmse:131.57588
[17]	validation_0-rmse:131.57552
[18]	validation_0-rmse:131.57517
[19]	validation_0-rmse:131.57491
[20]	validation_0-rmse:131.57453
[21]	validation_0-rmse:131.57413
[22]	validation_0-rmse:131.57372
[23]	validation_0-rmse:131.57331
[24]	validation_0-rmse:131.57291
[25]	validation_0-rmse:131.57254
[26]	validation_0-rmse:131.57216
[27]	validation_0-rmse:131.57179
[28]	validation_0-rmse:131.57140
[29]	validation_0-rmse:131.57104
[30]	validation_0-rm

  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[50]	validation_0-rmse:131.56454
[51]	validation_0-rmse:131.56415
[52]	validation_0-rmse:131.56393
[53]	validation_0-rmse:131.56357
[54]	validation_0-rmse:131.56323
[55]	validation_0-rmse:131.56302
[56]	validation_0-rmse:131.56268
[57]	validation_0-rmse:131.56227
[58]	validation_0-rmse:131.56188
[59]	validation_0-rmse:131.56158
[60]	validation_0-rmse:131.56124
[61]	validation_0-rmse:131.56086
[62]	validation_0-rmse:131.56056
[63]	validation_0-rmse:131.56035
[64]	validation_0-rmse:131.56006
[65]	validation_0-rmse:131.55968
[66]	validation_0-rmse:131.55938
[67]	validation_0-rmse:131.55900
[68]	validation_0-rmse:131.55863
[69]	validation_0-rmse:131.55822
[70]	validation_0-rmse:131.55782
[71]	validation_0-rmse:131.55744
[72]	validation_0-rmse:131.55708
[73]	validation_0-rmse:131.55672
[74]	validation_0-rmse:131.55636
[75]	validation_0-rmse:131.55600
[76]	validation_0-rmse:131.55564
[77]	validation_0-rmse:131.55527
[78]	validation_0-rmse:131.55507
[79]	validation_0-rmse:131.55467
[80]	valid

[32m[I 2023-05-21 17:42:34,291][0m Trial 4 finished with value: 17192.20098004151 and parameters: {'lambda': 5.192180558523169, 'alpha': 0.42583766273061696, 'colsample_bytree': 0.6, 'subsample': 0.7, 'learning_rate': 1e-05, 'n_estimators': 1365, 'max_depth': 9, 'random_state': 20, 'min_child_weight': 132}. Best is trial 1 with value: 931.1609971893504.[0m


[0]	validation_0-rmse:131.58686
[1]	validation_0-rmse:131.57284
[2]	validation_0-rmse:131.58817
[3]	validation_0-rmse:131.62324
[4]	validation_0-rmse:131.63021
[5]	validation_0-rmse:131.64033
[6]	validation_0-rmse:131.62036
[7]	validation_0-rmse:131.57851
[8]	validation_0-rmse:131.00029
[9]	validation_0-rmse:130.98159
[10]	validation_0-rmse:131.00224
[11]	validation_0-rmse:130.51590


  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[12]	validation_0-rmse:130.50739
[13]	validation_0-rmse:130.50207
[14]	validation_0-rmse:130.51836
[15]	validation_0-rmse:130.49832
[16]	validation_0-rmse:130.04728
[17]	validation_0-rmse:129.58910
[18]	validation_0-rmse:129.54410
[19]	validation_0-rmse:129.53344
[20]	validation_0-rmse:129.53098
[21]	validation_0-rmse:129.52637
[22]	validation_0-rmse:129.11706
[23]	validation_0-rmse:128.56341
[24]	validation_0-rmse:128.54553
[25]	validation_0-rmse:128.53958
[26]	validation_0-rmse:128.54635
[27]	validation_0-rmse:128.54423
[28]	validation_0-rmse:128.08897
[29]	validation_0-rmse:128.08068
[30]	validation_0-rmse:128.10118
[31]	validation_0-rmse:128.10797
[32]	validation_0-rmse:128.11001
[33]	validation_0-rmse:128.11020
[34]	validation_0-rmse:127.47937
[35]	validation_0-rmse:127.45344
[36]	validation_0-rmse:127.12030
[37]	validation_0-rmse:126.57873
[38]	validation_0-rmse:126.60877
[39]	validation_0-rmse:126.55966
[40]	validation_0-rmse:126.55340
[41]	validation_0-rmse:126.23292
[42]	valid

[32m[I 2023-05-21 17:42:35,290][0m Trial 5 finished with value: 11274.060283129204 and parameters: {'lambda': 0.046168833356494694, 'alpha': 0.009224433672295926, 'colsample_bytree': 0.9, 'subsample': 0.1, 'learning_rate': 0.02, 'n_estimators': 222, 'max_depth': 5, 'random_state': 20, 'min_child_weight': 40}. Best is trial 1 with value: 931.1609971893504.[0m


[0]	validation_0-rmse:131.59195
[1]	validation_0-rmse:131.58347
[2]	validation_0-rmse:131.58290
[3]	validation_0-rmse:131.58183
[4]	validation_0-rmse:131.58472
[5]	validation_0-rmse:131.58098
[6]	validation_0-rmse:131.58278
[7]	validation_0-rmse:131.58558
[8]	validation_0-rmse:131.57948
[9]	validation_0-rmse:131.57630
[10]	validation_0-rmse:131.57060
[11]	validation_0-rmse:131.56000
[12]	validation_0-rmse:131.57555
[13]	validation_0-rmse:131.57561
[14]	validation_0-rmse:131.58166
[15]	validation_0-rmse:131.58259
[16]	validation_0-rmse:131.58604
[17]	validation_0-rmse:131.56902
[18]	validation_0-rmse:131.57417
[19]	validation_0-rmse:131.56702
[20]	validation_0-rmse:131.54568
[21]	validation_0-rmse:131.56183
[22]	validation_0-rmse:131.54936
[23]	validation_0-rmse:131.54972
[24]	validation_0-rmse:131.53545
[25]	validation_0-rmse:131.53193
[26]	validation_0-rmse:131.53750
[27]	validation_0-rmse:131.51419
[28]	validation_0-rmse:131.50329
[29]	validation_0-rmse:131.50751
[30]	validation_0-rm

  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[57]	validation_0-rmse:131.51019
[58]	validation_0-rmse:131.52131
[59]	validation_0-rmse:131.52356
[60]	validation_0-rmse:131.52226
[61]	validation_0-rmse:131.53579
[62]	validation_0-rmse:131.53045
[63]	validation_0-rmse:131.54270
[64]	validation_0-rmse:131.52896
[65]	validation_0-rmse:131.51554
[66]	validation_0-rmse:131.51928
[67]	validation_0-rmse:131.52038
[68]	validation_0-rmse:131.52289
[69]	validation_0-rmse:131.52819
[70]	validation_0-rmse:131.53225
[71]	validation_0-rmse:131.53159
[72]	validation_0-rmse:131.52887
[73]	validation_0-rmse:131.52922
[74]	validation_0-rmse:131.51492
[75]	validation_0-rmse:131.51290
[76]	validation_0-rmse:131.51468
[77]	validation_0-rmse:131.51605
[78]	validation_0-rmse:131.52248
[79]	validation_0-rmse:131.54360
[80]	validation_0-rmse:131.52512
[81]	validation_0-rmse:131.52217
[82]	validation_0-rmse:131.51766
[83]	validation_0-rmse:131.52526
[84]	validation_0-rmse:131.53249
[85]	validation_0-rmse:131.54123
[86]	validation_0-rmse:131.55114
[87]	valid

[32m[I 2023-05-21 17:42:45,373][0m Trial 6 finished with value: 17339.98349612421 and parameters: {'lambda': 1.4067041018778614, 'alpha': 0.0025308347887732295, 'colsample_bytree': 0.1, 'subsample': 0.3, 'learning_rate': 0.02, 'n_estimators': 2025, 'max_depth': 3, 'random_state': 2000, 'min_child_weight': 141}. Best is trial 1 with value: 931.1609971893504.[0m


[0]	validation_0-rmse:130.81515
[1]	validation_0-rmse:130.09786
[2]	validation_0-rmse:129.28888
[3]	validation_0-rmse:128.73596
[4]	validation_0-rmse:128.35371
[5]	validation_0-rmse:127.65218
[6]	validation_0-rmse:126.92903
[7]	validation_0-rmse:126.43498
[8]	validation_0-rmse:125.96814
[9]	validation_0-rmse:125.50894
[10]	validation_0-rmse:125.05450
[11]	validation_0-rmse:124.34645
[12]	validation_0-rmse:123.98409
[13]	validation_0-rmse:123.61771
[14]	validation_0-rmse:122.89558
[15]	validation_0-rmse:122.29759
[16]	validation_0-rmse:121.91817
[17]	validation_0-rmse:121.54267
[18]	validation_0-rmse:120.91448
[19]	validation_0-rmse:120.26187
[20]	validation_0-rmse:119.58642
[21]	validation_0-rmse:119.23978
[22]	validation_0-rmse:118.84358
[23]	validation_0-rmse:118.27642
[24]	validation_0-rmse:117.65286
[25]	validation_0-rmse:117.11011
[26]	validation_0-rmse:116.53747
[27]	validation_0-rmse:115.99323
[28]	validation_0-rmse:115.43189
[29]	validation_0-rmse:115.05792
[30]	validation_0-rm

  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[39]	validation_0-rmse:111.04816
[40]	validation_0-rmse:110.55232
[41]	validation_0-rmse:110.07587
[42]	validation_0-rmse:109.55018
[43]	validation_0-rmse:109.05140
[44]	validation_0-rmse:108.65019
[45]	validation_0-rmse:108.40963
[46]	validation_0-rmse:107.97593
[47]	validation_0-rmse:107.48276
[48]	validation_0-rmse:106.95259
[49]	validation_0-rmse:106.63795
[50]	validation_0-rmse:106.14320
[51]	validation_0-rmse:105.73627
[52]	validation_0-rmse:105.33005
[53]	validation_0-rmse:105.05298
[54]	validation_0-rmse:104.76043
[55]	validation_0-rmse:104.34538
[56]	validation_0-rmse:104.11214
[57]	validation_0-rmse:103.69324
[58]	validation_0-rmse:103.29622
[59]	validation_0-rmse:102.87374
[60]	validation_0-rmse:102.48622
[61]	validation_0-rmse:102.09807
[62]	validation_0-rmse:101.70185
[63]	validation_0-rmse:101.32846
[64]	validation_0-rmse:101.06332
[65]	validation_0-rmse:100.66005
[66]	validation_0-rmse:100.38067
[67]	validation_0-rmse:100.00977
[68]	validation_0-rmse:99.68401
[69]	valida

[32m[I 2023-05-21 17:42:51,584][0m Trial 7 finished with value: 2758.1253250087384 and parameters: {'lambda': 0.832361778469978, 'alpha': 0.025394166526994614, 'colsample_bytree': 0.8, 'subsample': 0.6, 'learning_rate': 0.02, 'n_estimators': 1447, 'max_depth': 4, 'random_state': 10, 'min_child_weight': 134}. Best is trial 1 with value: 931.1609971893504.[0m


[0]	validation_0-rmse:131.27059
[1]	validation_0-rmse:131.02203
[2]	validation_0-rmse:130.78155
[3]	validation_0-rmse:130.49904
[4]	validation_0-rmse:130.31284
[5]	validation_0-rmse:130.03582
[6]	validation_0-rmse:129.73937
[7]	validation_0-rmse:129.47940
[8]	validation_0-rmse:129.22988
[9]	validation_0-rmse:128.94784
[10]	validation_0-rmse:128.75047
[11]	validation_0-rmse:128.51592
[12]	validation_0-rmse:128.31997
[13]	validation_0-rmse:128.15012
[14]	validation_0-rmse:127.96079
[15]	validation_0-rmse:127.70810
[16]	validation_0-rmse:127.46036
[17]	validation_0-rmse:127.22310


  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[18]	validation_0-rmse:126.96650
[19]	validation_0-rmse:126.70523
[20]	validation_0-rmse:126.47302
[21]	validation_0-rmse:126.23877
[22]	validation_0-rmse:126.00819
[23]	validation_0-rmse:125.78379
[24]	validation_0-rmse:125.56550
[25]	validation_0-rmse:125.34841
[26]	validation_0-rmse:125.13206
[27]	validation_0-rmse:124.91347
[28]	validation_0-rmse:124.68459
[29]	validation_0-rmse:124.44329
[30]	validation_0-rmse:124.22739
[31]	validation_0-rmse:124.02843
[32]	validation_0-rmse:123.83733
[33]	validation_0-rmse:123.64846
[34]	validation_0-rmse:123.44242
[35]	validation_0-rmse:123.24325
[36]	validation_0-rmse:123.02116
[37]	validation_0-rmse:122.80706
[38]	validation_0-rmse:122.61185
[39]	validation_0-rmse:122.42777
[40]	validation_0-rmse:122.20563
[41]	validation_0-rmse:122.00465
[42]	validation_0-rmse:121.79726
[43]	validation_0-rmse:121.56999
[44]	validation_0-rmse:121.36689
[45]	validation_0-rmse:121.18494
[46]	validation_0-rmse:121.00716
[47]	validation_0-rmse:120.81201
[48]	valid

[32m[I 2023-05-21 17:42:56,729][0m Trial 8 finished with value: 4302.972010589634 and parameters: {'lambda': 0.1423102553859714, 'alpha': 0.0008050770168997233, 'colsample_bytree': 0.7, 'subsample': 0.5, 'learning_rate': 0.01, 'n_estimators': 1218, 'max_depth': 6, 'random_state': 10, 'min_child_weight': 138}. Best is trial 1 with value: 931.1609971893504.[0m


[0]	validation_0-rmse:131.12316
[1]	validation_0-rmse:130.70868
[2]	validation_0-rmse:130.24106
[3]	validation_0-rmse:129.78462
[4]	validation_0-rmse:129.38547
[5]	validation_0-rmse:128.92790
[6]	validation_0-rmse:128.49233
[7]	validation_0-rmse:128.07487
[8]	validation_0-rmse:127.66491
[9]	validation_0-rmse:127.25904
[10]	validation_0-rmse:126.89901
[11]	validation_0-rmse:126.40227
[12]	validation_0-rmse:126.00339
[13]	validation_0-rmse:125.60545
[14]	validation_0-rmse:125.23443
[15]	validation_0-rmse:124.74597
[16]	validation_0-rmse:124.28445
[17]	validation_0-rmse:123.93278
[18]	validation_0-rmse:123.45738
[19]	validation_0-rmse:123.07328
[20]	validation_0-rmse:122.72362
[21]	validation_0-rmse:122.35229
[22]	validation_0-rmse:121.96719
[23]	validation_0-rmse:121.53850
[24]	validation_0-rmse:121.20643
[25]	validation_0-rmse:120.85285
[26]	validation_0-rmse:120.48841
[27]	validation_0-rmse:120.13367
[28]	validation_0-rmse:119.82225
[29]	validation_0-rmse:119.48394
[30]	validation_0-rm

  'lambda': trial.suggest_loguniform('lambda', 1e-4, 10.0),
  'alpha': trial.suggest_loguniform('alpha', 1e-4, 10.0),


[39]	validation_0-rmse:116.20149
[40]	validation_0-rmse:115.82490
[41]	validation_0-rmse:115.46197
[42]	validation_0-rmse:115.12144
[43]	validation_0-rmse:114.74938
[44]	validation_0-rmse:114.43527
[45]	validation_0-rmse:114.10944
[46]	validation_0-rmse:113.81973
[47]	validation_0-rmse:113.49278
[48]	validation_0-rmse:113.17738
[49]	validation_0-rmse:112.85023
[50]	validation_0-rmse:112.48702
[51]	validation_0-rmse:112.20854
[52]	validation_0-rmse:111.93036
[53]	validation_0-rmse:111.67065
[54]	validation_0-rmse:111.38765
[55]	validation_0-rmse:111.09801
[56]	validation_0-rmse:110.82700
[57]	validation_0-rmse:110.47901
[58]	validation_0-rmse:110.16418
[59]	validation_0-rmse:109.86211
[60]	validation_0-rmse:109.55544
[61]	validation_0-rmse:109.29091
[62]	validation_0-rmse:108.98942
[63]	validation_0-rmse:108.69670
[64]	validation_0-rmse:108.46041
[65]	validation_0-rmse:108.18612
[66]	validation_0-rmse:107.90563
[67]	validation_0-rmse:107.66021
[68]	validation_0-rmse:107.33736
[69]	valid

[32m[I 2023-05-21 17:43:13,066][0m Trial 9 finished with value: 1699.1953673321336 and parameters: {'lambda': 8.716800907803476, 'alpha': 0.12503286562094842, 'colsample_bytree': 0.9, 'subsample': 0.9, 'learning_rate': 0.01, 'n_estimators': 2579, 'max_depth': 7, 'random_state': 3454, 'min_child_weight': 126}. Best is trial 1 with value: 931.1609971893504.[0m


Best Parameters: {'lambda': 0.002005101608919934, 'alpha': 0.13200548602265308, 'colsample_bytree': 1, 'subsample': 1, 'learning_rate': 0.01, 'n_estimators': 1387, 'max_depth': 4, 'random_state': 10, 'min_child_weight': 15}


In [104]:
study.trials_dataframe()

Unnamed: 0,number,value,datetime_start,datetime_complete,duration,params_alpha,params_colsample_bytree,params_lambda,params_learning_rate,params_max_depth,params_min_child_weight,params_n_estimators,params_random_state,params_subsample,state
0,0,5260.945078,2023-05-21 17:42:15.701837,2023-05-21 17:42:17.161735,0 days 00:00:01.459898,0.000784,0.8,0.03054,1.0,11,157,937,3454,0.5,COMPLETE
1,1,931.160997,2023-05-21 17:42:17.163731,2023-05-21 17:42:20.310025,0 days 00:00:03.146294,0.132005,1.0,0.002005,0.01,4,15,1387,10,1.0,COMPLETE
2,2,17313.794246,2023-05-21 17:42:20.313017,2023-05-21 17:42:21.558270,0 days 00:00:01.245253,4.515605,1.0,0.057877,1.0,7,188,615,2000,0.1,COMPLETE
3,3,10565.646334,2023-05-21 17:42:21.560269,2023-05-21 17:42:29.071294,0 days 00:00:07.511025,0.000631,0.9,0.783385,0.0003,3,30,2084,30,0.5,COMPLETE
4,4,17192.20098,2023-05-21 17:42:29.073289,2023-05-21 17:42:34.289258,0 days 00:00:05.215969,0.425838,0.6,5.192181,1e-05,9,132,1365,20,0.7,COMPLETE
5,5,11274.060283,2023-05-21 17:42:34.294244,2023-05-21 17:42:35.289260,0 days 00:00:00.995016,0.009224,0.9,0.046169,0.02,5,40,222,20,0.1,COMPLETE
6,6,17339.983496,2023-05-21 17:42:35.292252,2023-05-21 17:42:45.371623,0 days 00:00:10.079371,0.002531,0.1,1.406704,0.02,3,141,2025,2000,0.3,COMPLETE
7,7,2758.125325,2023-05-21 17:42:45.375613,2023-05-21 17:42:51.583868,0 days 00:00:06.208255,0.025394,0.8,0.832362,0.02,4,134,1447,10,0.6,COMPLETE
8,8,4302.972011,2023-05-21 17:42:51.586861,2023-05-21 17:42:56.727862,0 days 00:00:05.141001,0.000805,0.7,0.14231,0.01,6,138,1218,10,0.5,COMPLETE
9,9,1699.195367,2023-05-21 17:42:56.732584,2023-05-21 17:43:13.065361,0 days 00:00:16.332777,0.125033,0.9,8.716801,0.01,7,126,2579,3454,0.9,COMPLETE


In [105]:
optuna.visualization.plot_optimization_history(study)

In [106]:
optuna.visualization.plot_slice(study)

In [107]:
optuna.visualization.plot_contour(study,params=['alpha','lambda'])

In [108]:
best_params={'lambda': 0.002005101608919934, 'alpha': 0.13200548602265308, 'colsample_bytree': 1, 'subsample': 1, 'learning_rate': 0.01, 'n_estimators': 1387, 'max_depth': 4, 'random_state': 10, 'min_child_weight': 15}

In [109]:
model=xgb.XGBRegressor(**best_params)

In [110]:
model.fit(X_train,y_train)

In [111]:
y_pred=model.predict(X_test)

In [113]:
r2score=[0,1]
from sklearn.metrics import r2_score

round(r2_score(y_test,y_pred)*100, 2)

79.22