### 📌 Feature Selection Based on XGBoost Feature Importance  
According to the **XGBoost Feature Importance** technique, `SW` and `Klogh` do not significantly impact the **Rate of Penetration (ROP)**. Therefore, we exclude these parameters and consider only the following features:  

- **`Depth`**: Depth of drilling  
- **`WOB`**: Weight on Bit  
- **`SURF_RPM`**: Rotation Per Minute (RPM)  
- **`PHIF`**: Porosity  
- **`VSH`**: Volume of Shale

In [29]:
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler

In [45]:
data = pd.read_csv("dataset_1.csv")

In [47]:
data.describe()

Unnamed: 0,Depth,WOB,SURF_RPM,ROP_AVG,PHIF,VSH,SW,KLOGH
count,151.0,151.0,151.0,151.0,151.0,151.0,151.0,151.0
mean,3697.417219,45393.934391,2.034981,0.0078,0.084423,0.299809,0.975579,37.072228
std,227.169433,15784.246882,0.208492,0.001476,0.06823,0.264596,0.108946,127.18621
min,3305.0,16961.916,1.31472,0.002666,0.002968,0.049451,0.351393,-0.001124
25%,3502.5,34320.9465,1.998711,0.007368,0.03755,0.108539,1.0,0.001
50%,3700.0,44243.48,1.999697,0.008157,0.059274,0.1931,1.0,0.001
75%,3887.5,53212.685,2.001069,0.00876,0.097212,0.387664,1.0,0.001
max,4085.0,97087.882,2.639233,0.010447,0.279346,1.0,1.013335,709.158935


In [49]:
#target = "ROP_AVG"
X = data[["Depth", "WOB", "SURF_RPM", "PHIF", "VSH"]]
y = data["ROP_AVG"]

In [51]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

In [53]:
# Normalize/Standardize data
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [55]:
from xgboost import XGBRegressor

In [57]:
# Train an XGBoost surrogate model
surrogate_model = XGBRegressor(n_estimators=100, random_state=42)
surrogate_model.fit(X_train, y_train)

# Define the objective function for Bayesian Optimization
def objective_function(Depth, WOB, SURF_RPM, PHIF, VSH):
    # Scale the input
    X_new = scaler.transform([[Depth, WOB, SURF_RPM, PHIF, VSH]])
    # Predict ROP_AVG using the XGBoost model
    rop = surrogate_model.predict(X_new)
    return rop[0]  # Return the predicted ROP_AVG

In [61]:
pip install bayesian-optimization

Collecting bayesian-optimization
  Downloading bayesian_optimization-2.0.3-py3-none-any.whl.metadata (9.0 kB)
Downloading bayesian_optimization-2.0.3-py3-none-any.whl (31 kB)
Installing collected packages: bayesian-optimization
Successfully installed bayesian-optimization-2.0.3
Note: you may need to restart the kernel to use updated packages.


In [63]:
from bayes_opt import BayesianOptimization

In [65]:
# Define parameter bounds for Bayesian Optimization
pbounds = {
    "Depth": (data["Depth"].min(), data["Depth"].max()),  # Bounds for Depth
    "WOB": (data["WOB"].min(), data["WOB"].max()),        # Bounds for WOB
    "SURF_RPM": (data["SURF_RPM"].min(), data["SURF_RPM"].max()),  # Bounds for SURF_RPM
    "PHIF": (data["PHIF"].min(), data["PHIF"].max()),      # Bounds for PHIF
    "VSH": (data["VSH"].min(), data["VSH"].max())          # Bounds for VSH
}

In [67]:
# Initialize Bayesian Optimization
optimizer = BayesianOptimization(
    f=objective_function,
    pbounds=pbounds,
    random_state=42
)

In [69]:
 # Run optimization
optimizer.maximize(init_points=5, n_iter=25)  # Adjust init_points and n_iter as needed

# Get the best parameters
best_params = optimizer.max
print("Best Parameters:", best_params)

|   iter    |  target   |   Depth   |   PHIF    | SURF_RPM  |    VSH    |    WOB    |
-------------------------------------------------------------------------------------
| [39m1        [39m | [39m0.008541 [39m | [39m3.597e+03[39m | [39m0.2657   [39m | [39m2.284    [39m | [39m0.6185   [39m | [39m2.946e+04[39m |
| [39m2        [39m | [39m0.007122 [39m | [39m3.427e+03[39m | [39m0.01902  [39m | [39m2.462    [39m | [39m0.6208   [39m | [39m7.37e+04 [39m |
| [39m3        [39m | [39m0.006057 [39m | [39m3.321e+03[39m | [39m0.271    [39m | [39m2.417    [39m | [39m0.2513   [39m | [39m3.153e+04[39m |
| [39m4        [39m | [39m0.007122 [39m | [39m3.448e+03[39m | [39m0.08705  [39m | [39m2.01     [39m | [39m0.46     [39m | [39m4.03e+04 [39m |
| [39m5        [39m | [39m0.004326 [39m | [39m3.782e+03[39m | [39m0.04152  [39m | [39m1.702    [39m | [39m0.3977   [39m | [39m5.35e+04 [39m |




| [39m6        [39m | [39m0.004326 [39m | [39m3.607e+03[39m | [39m0.194    [39m | [39m1.657    [39m | [39m0.8403   [39m | [39m6.582e+04[39m |
| [39m7        [39m | [39m0.004326 [39m | [39m4.012e+03[39m | [39m0.03209  [39m | [39m1.51     [39m | [39m0.929    [39m | [39m7.872e+04[39m |




| [39m8        [39m | [39m0.004326 [39m | [39m3.761e+03[39m | [39m0.01617  [39m | [39m1.362    [39m | [39m0.1488   [39m | [39m3.565e+04[39m |
| [39m9        [39m | [39m0.008541 [39m | [39m3.586e+03[39m | [39m0.0168   [39m | [39m2.534    [39m | [39m0.9481   [39m | [39m2.924e+04[39m |




| [39m10       [39m | [39m0.004326 [39m | [39m4.031e+03[39m | [39m0.2006   [39m | [39m1.701    [39m | [39m0.2915   [39m | [39m2.604e+04[39m |
| [39m11       [39m | [39m0.008541 [39m | [39m3.601e+03[39m | [39m0.1435   [39m | [39m2.59     [39m | [39m0.7761   [39m | [39m2.924e+04[39m |




| [39m12       [39m | [39m0.007508 [39m | [39m3.343e+03[39m | [39m0.192    [39m | [39m1.38     [39m | [39m0.7715   [39m | [39m4.379e+04[39m |
| [39m13       [39m | [39m0.008195 [39m | [39m4.06e+03 [39m | [39m0.2165   [39m | [39m1.927    [39m | [39m0.9232   [39m | [39m4.696e+04[39m |




| [39m14       [39m | [39m0.007122 [39m | [39m3.395e+03[39m | [39m0.1882   [39m | [39m2.176    [39m | [39m0.799    [39m | [39m9.708e+04[39m |
| [39m15       [39m | [39m0.004326 [39m | [39m4.053e+03[39m | [39m0.04171  [39m | [39m1.393    [39m | [39m0.09009  [39m | [39m9.289e+04[39m |




| [39m16       [39m | [39m0.005866 [39m | [39m3.311e+03[39m | [39m0.1178   [39m | [39m2.371    [39m | [39m0.7332   [39m | [39m4.908e+04[39m |
| [39m17       [39m | [39m0.004326 [39m | [39m3.582e+03[39m | [39m0.2192   [39m | [39m1.546    [39m | [39m0.9808   [39m | [39m2.923e+04[39m |
| [39m18       [39m | [39m0.007122 [39m | [39m3.443e+03[39m | [39m0.2038   [39m | [39m2.434    [39m | [39m0.2116   [39m | [39m4.031e+04[39m |
| [39m19       [39m | [39m0.007122 [39m | [39m3.354e+03[39m | [39m0.1314   [39m | [39m2.511    [39m | [39m0.09935  [39m | [39m4.378e+04[39m |




| [35m20       [39m | [35m0.008605 [39m | [35m3.979e+03[39m | [35m0.1611   [39m | [35m1.866    [39m | [35m0.2236   [39m | [35m3.951e+04[39m |
| [39m21       [39m | [39m0.008058 [39m | [39m3.35e+03 [39m | [39m0.2094   [39m | [39m2.162    [39m | [39m0.7318   [39m | [39m4.378e+04[39m |
| [39m22       [39m | [39m0.008605 [39m | [39m3.986e+03[39m | [39m0.06334  [39m | [39m2.29     [39m | [39m0.07578  [39m | [39m3.953e+04[39m |




| [39m23       [39m | [39m0.007122 [39m | [39m3.441e+03[39m | [39m0.02668  [39m | [39m2.132    [39m | [39m0.7274   [39m | [39m7.37e+04 [39m |
| [39m24       [39m | [39m0.008605 [39m | [39m3.623e+03[39m | [39m0.08379  [39m | [39m2.408    [39m | [39m0.3569   [39m | [39m2.945e+04[39m |
| [39m25       [39m | [39m0.004326 [39m | [39m3.997e+03[39m | [39m0.0631   [39m | [39m1.394    [39m | [39m0.8866   [39m | [39m3.953e+04[39m |




| [39m26       [39m | [39m0.008195 [39m | [39m4.068e+03[39m | [39m0.2632   [39m | [39m2.559    [39m | [39m0.4813   [39m | [39m4.697e+04[39m |
| [39m27       [39m | [39m0.007122 [39m | [39m3.402e+03[39m | [39m0.2604   [39m | [39m2.336    [39m | [39m0.5368   [39m | [39m9.708e+04[39m |




| [39m28       [39m | [39m0.008541 [39m | [39m3.607e+03[39m | [39m0.08267  [39m | [39m2.04     [39m | [39m0.1528   [39m | [39m2.923e+04[39m |
| [39m29       [39m | [39m0.008605 [39m | [39m3.812e+03[39m | [39m0.2383   [39m | [39m1.813    [39m | [39m0.964    [39m | [39m3.375e+04[39m |
| [39m30       [39m | [39m0.008605 [39m | [39m3.994e+03[39m | [39m0.1874   [39m | [39m1.836    [39m | [39m0.2015   [39m | [39m3.949e+04[39m |
Best Parameters: {'target': 0.008605283685028553, 'params': {'Depth': 3978.827820487244, 'PHIF': 0.1610870462271162, 'SURF_RPM': 1.8660667076689712, 'VSH': 0.2236417294718831, 'WOB': 39513.85444566769}}




In [71]:
# Evaluate the best parameters on the test set
best_rop = objective_function(**best_params["params"])
print("Optimized ROP_AVG:", best_rop)

Optimized ROP_AVG: 0.008605284




In [74]:
# Compare with baseline performance
baseline_rop = surrogate_model.predict(X_test).mean()
print("Baseline ROP_AVG:", baseline_rop)

Baseline ROP_AVG: 0.00779683


---
The **magnitude of improvement** on ROP_AVG is calculated as follows:

- **Improvement** = Optimized ROP_AVG - Baseline ROP_AVG  
  \( 0.008605284 - 0.00779683 = 0.000808454 \)

- **Percentage Improvement** = \( \frac{{\text{Improvement}}}{{\text{Baseline ROP_AVG}}} \times 100 \)  
  \( \frac{{0.000808454}}{{0.00779683}} \times 100 \approx 10.4\% \)

A **10.4% improvement** is considered significant, especially in industrial applications like drilling, where even small improvements can lead to **substantial cost savings** or **efficiency gains** over time. However, to ensure robustness, we will also compare the results of **Bayesian Optimization** with other **optimization techniques**

---