This section loads a simulated OBP/VOC dataset, preprocesses the data, and applies XGBoostRegressor to predict binding affinities. It includes evaluation metrics and a scatter plot for visual assessment.

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt

# Simulated data (replace with actual OBP/VOC dataset URL)
np.random.seed(42)
X = np.random.rand(1459, 3048)
y = np.random.rand(1459)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train XGBoostRegressor model
model = XGBRegressor(objective='reg:squarederror', random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

# Evaluate model performance
r2 = r2_score(y_test, y_pred)
print('R2:', r2)

# Plot actual vs predicted
plt.figure(figsize=(8,6))
plt.scatter(y_test, y_pred, alpha=0.5, color='#6A0C76')
plt.plot([0,1],[0,1], color='red', linewidth=2)
plt.xlabel('Actual Binding Affinity')
plt.ylabel('Predicted Binding Affinity')
plt.title('XGBoostRegressor: Actual vs Predicted')
plt.show()

The code calculates the R² score to evaluate model performance and visualizes predictions against actual values, assisting in identifying prediction accuracy and model robustness.





***
### [**Evolve This Code**](https://biologpt.com/?q=Evolve%20Code%3A%20The%20notebook%20reproduces%20the%20regression%20model%20analysis%20on%20the%20OBP%2FVOC%20dataset%20to%20validate%20binding%20affinity%20predictions%20using%20XGBoost.%0A%0AIntegrate%20real%20dataset%20fetching%2C%20implement%20cross-validation%2C%20and%20add%20advanced%20feature%20selection%20for%20improved%20model%20robustness.%0A%0AChemical%20and%20protein%20relationships%20in%20PBP%2FGOBP%20moths%20machine%20learning%0A%0AThis%20section%20loads%20a%20simulated%20OBP%2FVOC%20dataset%2C%20preprocesses%20the%20data%2C%20and%20applies%20XGBoostRegressor%20to%20predict%20binding%20affinities.%20It%20includes%20evaluation%20metrics%20and%20a%20scatter%20plot%20for%20visual%20assessment.%0A%0Aimport%20numpy%20as%20np%0Aimport%20pandas%20as%20pd%0Afrom%20sklearn.model_selection%20import%20train_test_split%0Afrom%20xgboost%20import%20XGBRegressor%0Afrom%20sklearn.metrics%20import%20r2_score%0Aimport%20matplotlib.pyplot%20as%20plt%0A%0A%23%20Simulated%20data%20%28replace%20with%20actual%20OBP%2FVOC%20dataset%20URL%29%0Anp.random.seed%2842%29%0AX%20%3D%20np.random.rand%281459%2C%203048%29%0Ay%20%3D%20np.random.rand%281459%29%0A%0A%23%20Split%20the%20data%0AX_train%2C%20X_test%2C%20y_train%2C%20y_test%20%3D%20train_test_split%28X%2C%20y%2C%20test_size%3D0.2%2C%20random_state%3D42%29%0A%0A%23%20Train%20XGBoostRegressor%20model%0Amodel%20%3D%20XGBRegressor%28objective%3D%27reg%3Asquarederror%27%2C%20random_state%3D42%29%0Amodel.fit%28X_train%2C%20y_train%29%0A%0Ay_pred%20%3D%20model.predict%28X_test%29%0A%0A%23%20Evaluate%20model%20performance%0Ar2%20%3D%20r2_score%28y_test%2C%20y_pred%29%0Aprint%28%27R2%3A%27%2C%20r2%29%0A%0A%23%20Plot%20actual%20vs%20predicted%0Aplt.figure%28figsize%3D%288%2C6%29%29%0Aplt.scatter%28y_test%2C%20y_pred%2C%20alpha%3D0.5%2C%20color%3D%27%236A0C76%27%29%0Aplt.plot%28%5B0%2C1%5D%2C%5B0%2C1%5D%2C%20color%3D%27red%27%2C%20linewidth%3D2%29%0Aplt.xlabel%28%27Actual%20Binding%20Affinity%27%29%0Aplt.ylabel%28%27Predicted%20Binding%20Affinity%27%29%0Aplt.title%28%27XGBoostRegressor%3A%20Actual%20vs%20Predicted%27%29%0Aplt.show%28%29%0A%0AThe%20code%20calculates%20the%20R%C2%B2%20score%20to%20evaluate%20model%20performance%20and%20visualizes%20predictions%20against%20actual%20values%2C%20assisting%20in%20identifying%20prediction%20accuracy%20and%20model%20robustness.%0A%0A)
***

### [Created with BioloGPT](https://biologpt.com/?q=Paper%20Review%3A%20Insight%20into%20the%20Relationships%20Between%20Chemical%2C%20Protein%20and%20Functional%20Variables%20in%20the%20PBP%2FGOBP%20Family%20in%20Moths%20Based%20on%20Machine%20Learning)
[![BioloGPT Logo](https://biologpt.com/static/icons/bioinformatics_wizard.png)](https://biologpt.com/)
***