### Thanks to : https://www.kaggle.com/code/digantabhattacharya/neurips25-rdkit-xgb-additionaldata-starter-i
### Thanks to : https://www.kaggle.com/code/richolson/smiles-rdkit-lgbm-ftw for the additional datasets.
Addtional dataset for target 'Tg' and 'Tc'： Tg🚫: Confirmed as unusable.

### Baseline XGBoost with RDKit Feature Extraction : https://www.kaggle.com/code/digantabhattacharya/neurips25-opp-with-rdkit-xgb-starter-i/
### Baseline LGBM with RDKit Feature Extraction : https://www.kaggle.com/code/digantabhattacharya/neurips25-opp-with-rdkit-lgbm-starter-ii/ [In Progress]

The main changes are as follows:

Comprehensive Feature Expansion:

Changes: The original restriction of truncating or padding feature vectors to 200 dimensions has been completely removed. The new scheme calculates all available approximately 200 molecular descriptors in the RDKit library and expands the Morgan fingerprint to a full 2048 bits.
Purpose: Aims to provide the most comprehensive chemical information to the model, avoid the loss of potential effective signals due to artificial truncation, and theoretically allow the model to capture more complex structure-activity relationships.
Introducing a cross-validation mechanism (Robust Cross-Validation):

Changes: Replace the single train_test_split validation method in the original text with a more robust 5-fold cross-validation (K-Fold).
Purpose: Whether using Optuna for hyperparameter optimization or in the final training prediction stage, cross-validation can provide a more reliable model performance evaluation, avoid the randomness of the results, and obtain more stable submission results by averaging the predictions of 5 models.
SMILES Canonicalization:

Changes: Added a preprocessing step to convert all SMILES strings into a unique "standard" representation.

Purpose: Ensure that the same molecular structure, regardless of its SMILES writing, is considered the same input to the model, eliminating data redundancy and inconsistency.

In [1]:
!pip install /kaggle/input/rdkit-2025-3-3-cp311/rdkit-2025.3.3-cp311-cp311-manylinux_2_28_x86_64.whl

Processing /kaggle/input/rdkit-2025-3-3-cp311/rdkit-2025.3.3-cp311-cp311-manylinux_2_28_x86_64.whl
Installing collected packages: rdkit
Successfully installed rdkit-2025.3.3


# ----------------------------------------------------------------------------
# 1. Import library and environment settings
# ----------------------------------------------------------------------------


In [2]:
import pandas as pd
import numpy as np
from typing import List, Dict, Tuple
import warnings
import logging
import os
import joblib
from tqdm.notebook import tqdm

# RDKit related libraries
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.Chem.rdFingerprintGenerator import GetMorganGenerator

# Machine learning related libraries
import xgboost as xgb
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_error
import optuna

# ----------------------------------------------------------------------------
# 2. Log and warning configuration
# ----------------------------------------------------------------------------


In [3]:
# Suppress all warning messages
warnings.filterwarnings("ignore")

# Set up logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(), # output to console
        logging.FileHandler('polymer_prediction_optimized.log') # 輸出到文件
    ]
)
logger = logging.getLogger(__name__)


# ----------------------------------------------------------------------------
# 3. Helper function
# ----------------------------------------------------------------------------


In [4]:

def check_gpu_availability() -> Dict:
    """Checks whether the GPU is available and returns appropriate XGBoost parameters."""
    try:
        # Try to initialize a simple model using GPU parameters
        xgb.XGBRegressor(tree_method='gpu_hist').get_params()
        logger.info("GPU detected! GPU acceleration will be enabled for XGBoost.")
        return {
            'tree_method': 'gpu_hist',
            'gpu_id': 0,
            'predictor': 'gpu_predictor'
        }
    except Exception as e:
        logger.warning(f"GPU is not available: {str(e)}.Will fall back to the CPU for computation.")
        return {
            'tree_method': 'hist',
            'predictor': 'cpu_predictor'
        }

def canonicalize_smiles(smiles: str) -> str:
    """Convert a SMILES string to its canonical representation, guaranteed to be unique."""
    try:
        mol = Chem.MolFromSmiles(smiles)
        if mol is not None:
            return Chem.MolToSmiles(mol, canonical=True)
    except Exception:
        # If SMILES is invalid, returns None
        return None
    return None

# ----------------------------------------------------------------------------
# 4. Data loading and preprocessing
# ----------------------------------------------------------------------------
Addtional dataset for target 'Tg' and 'Tc'： Tg🚫: Confirmed as unusable.

In [5]:

def load_data() -> Tuple[pd.DataFrame, pd.DataFrame]:
    """Load competition data and additional datasets, and perform normalization and deduplication."""
    # logger.info("Starting to load contest data...")
    # comp_train_df = pd.read_csv('/kaggle/input/neurips-open-polymer-prediction-2025/train.csv')
    # test_df = pd.read_csv('/kaggle/input/neurips-open-polymer-prediction-2025/test.csv')
    # logger.info(f"{len(comp_train_df)} competition training samples and {len(test_df)} test samples loaded.")

    # logger.info("Starting loading additional datasets...")
    # extra_tg_df = pd.read_csv("/kaggle/input/smiles-tg/Tg_SMILES_class_pid_polyinfo_median.csv")
    # extra_tc_df = pd.read_csv("/kaggle/input/tc-smiles/Tc_SMILES.csv")
    # logger.info(f"{len(extra_tg_df)} extra Tg samples and {len(extra_tc_df)} extra Tc samples loaded.")

    # # Extract only the columns you need to simplify merging
    # comp_train_df_slim = comp_train_df[['SMILES', 'Tg', 'FFV', 'Tc', 'Density', 'Rg']]
    # extra_tg_clean = extra_tg_df[['SMILES', 'Tg']].copy()
    # extra_tc_clean = extra_tc_df[['SMILES', 'TC_mean']].rename(columns={'TC_mean': 'Tc'}).copy()

    # # Merge all training data
    # train_df = pd.concat([comp_train_df_slim, extra_tg_clean, extra_tc_clean], ignore_index=True)
    
    # logger.info(f"The total number of training samples after merging is {len(train_df)}。")
    # logger.info("Beginning the process of standardizing SMILES...")
    
    # # Perform SMILES normalization on the training and test sets
    # train_df['SMILES'] = train_df['SMILES'].apply(canonicalize_smiles)
    # test_df['SMILES'] = test_df['SMILES'].apply(canonicalize_smiles)
    
    # # Remove rows with invalid SMILES and remove duplicates based on the normalized SMILES
    # train_df.dropna(subset=['SMILES'], inplace=True)
    # train_df.drop_duplicates(subset=['SMILES'], keep='first', inplace=True)
    
    # logger.info(f"After standardization and duplicate removal, the number of remaining valid training samples is {len(train_df)}.")
    
    # return train_df.reset_index(drop=True), test_df
    
    logger.info("Starting to load contest data...")
    comp_train_df = pd.read_csv('/kaggle/input/neurips-open-polymer-prediction-2025/train.csv')
    test_df = pd.read_csv('/kaggle/input/neurips-open-polymer-prediction-2025/test.csv')
    
    # --- Modification section starts ---
    logger.info("Starting loading additional datasets...")
    # extra_tg_df = pd.read_csv("/kaggle/input/smiles-tg/Tg_SMILES_class_pid_polyinfo_median.csv") # 已移除
    extra_tc_df = pd.read_csv("/kaggle/input/tc-smiles/Tc_SMILES.csv")
    logger.info(f"{len(extra_tc_df)} extra Tc samples loaded.")

    comp_train_df_slim = comp_train_df[['SMILES', 'Tg', 'FFV', 'Tc', 'Density', 'Rg']]
    extra_tc_clean = extra_tc_df[['SMILES', 'TC_mean']].rename(columns={'TC_mean': 'Tc'}).copy()
    
    # Merge only the race data and the extra Tc data
    train_df = pd.concat([comp_train_df_slim, extra_tc_clean], ignore_index=True)
    # --- End of modification section ---
    
    logger.info(f"The total number of training samples after merging is {len(train_df)}.")
    logger.info("Standardize SMILES and remove duplicates...")
    train_df['SMILES'] = train_df['SMILES'].apply(canonicalize_smiles)
    test_df['SMILES'] = test_df['SMILES'].apply(canonicalize_smiles)
    train_df.dropna(subset=['SMILES'], inplace=True)
    train_df.drop_duplicates(subset=['SMILES'], keep='first', inplace=True)
    
    logger.info(f"After standardization and deduplication, the number of remaining valid training samples is {len(train_df)}.")
    return train_df.reset_index(drop=True), test_df


# ----------------------------------------------------------------------------
# 5. Core model class:PolymerPredictor
# ----------------------------------------------------------------------------


In [6]:

class PolymerPredictor:
    """
    A complete machine learning pipeline for predicting polymer properties.
    """
    def __init__(self, random_state=42):
        logger.info("Initialize PolymerPredictor...")
        self.random_state = random_state
        self.xgb_params = check_gpu_availability()
        
        # --- Corrected parts ---
        self.descriptor_calculators = [desc[1] for desc in Descriptors._descList]
        
        # Store the fingerprint size as an attribute
        self.fp_size = 2048
        self.morgan_gen = GetMorganGenerator(radius=3, fpSize=self.fp_size) 
        
        # Dynamically determine the total length of the feature vector
        num_descriptors = len(self.descriptor_calculators)
        # Directly use the fingerprint size attribute we set
        num_fp_bits = self.fp_size
        self.feature_length = num_descriptors + num_fp_bits
        logger.info(f"The dimension of the feature vector has been determined to be: {self.feature_length} ({num_descriptors} descriptors + {num_fp_bits} fingerprints)")
        # --- End of revision ---
        
        self.models = {}  # Store the model for each attribute
        self.scalers = {} # Stores the scaler for each attribute

    def _extract_features(self, smiles_list: List[str]) -> np.ndarray:
        """Efficiently extract rich chemical features from a list of SMILES strings."""
        logger.info(f"Start extracting features from {len(smiles_list)} SMILES..​​.")
        
        features = []
        for smiles in tqdm(smiles_list, desc="Feature extraction"):
            try:
                mol = Chem.MolFromSmiles(smiles)
                if mol is None:
                    features.append(np.zeros(self.feature_length))
                    continue
                
                # Calculate all RDKit descriptors
                descriptors = [func(mol) for func in self.descriptor_calculators]
                
                # Calculating Morgan fingerprint
                fp = self.morgan_gen.GetFingerprint(mol)
                fp_bits = [int(x) for x in fp.ToBitString()]
                
                feature_vector = descriptors + fp_bits
                features.append(feature_vector)
            except Exception as e:
                logger.warning(f"Error processing SMILES '{smiles}': {e}")
                features.append(np.zeros(self.feature_length))
        
        # Convert the result to a numpy array and handle the inf or nan values ​​that may be generated in the calculation
        feature_array = np.array(features, dtype=np.float32)
        feature_array = np.nan_to_num(feature_array, nan=0.0, posinf=0.0, neginf=0.0)
        
        return feature_array

    def optimize_hyperparameters(self, X: np.ndarray, y: pd.Series, n_trials: int = 50) -> Dict:
        """Use Optuna and cross-validation to optimize XGBoost hyperparameters for a single attribute to get more robust results."""
        logger.info(f"Starting {n_trials} Optuna hyperparameter optimization for target attribute...")

        def objective(trial: optuna.Trial) -> float:
            # Define a wider hyperparameter search space for XGBoost
            params = {
                'n_estimators': trial.suggest_int('n_estimators', 1000, 8000),
                'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1, log=True),
                'max_depth': trial.suggest_int('max_depth', 5, 20),
                'subsample': trial.suggest_float('subsample', 0.6, 1.0),
                'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0),
                'gamma': trial.suggest_float('gamma', 1e-8, 1.0, log=True),
                'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 1.0, log=True),
                'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 2.0, log=True),
                'min_child_weight': trial.suggest_int('min_child_weight', 1, 10),
                **self.xgb_params,
                'objective': 'reg:squarederror', 
                'random_state': self.random_state
            }
            
            # Use 5-fold cross validation to evaluate the performance of each set of hyperparameters
            kf = KFold(n_splits=5, shuffle=True, random_state=self.random_state)
            mae_scores = []
            
            for train_idx, valid_idx in kf.split(X):
                X_train, X_valid = X[train_idx], X[valid_idx]
                y_train, y_valid = y.iloc[train_idx], y.iloc[valid_idx]
                
                model = xgb.XGBRegressor(**params)
                model.fit(X_train, y_train, 
                          eval_set=[(X_valid, y_valid)], 
                          early_stopping_rounds=50, 
                          verbose=False)
                
                preds = model.predict(X_valid)
                mae_scores.append(mean_absolute_error(y_valid, preds))
            
            return np.mean(mae_scores)

        study = optuna.create_study(direction='minimize')
        study.optimize(objective, n_trials=n_trials, show_progress_bar=True)
        
        logger.info(f"Optimization completed. Average cross-validation MAE: {study.best_value:.5f}")
        logger.info(f"The best parameters found: {study.best_params}")
        return study.best_params

    def train_and_predict_with_cv(self, X_train: np.ndarray, y_train: pd.Series, X_test: np.ndarray, best_params: Dict) -> Tuple[np.ndarray, list, StandardScaler]:
        """The model was trained using 5-fold cross validation and ensemble predictions were performed on the test set."""
        test_predictions_per_fold = []
        trained_models = []
        
        final_params = best_params.copy()
        final_params.update(self.xgb_params)
        final_params['objective'] = 'reg:squarederror'
        final_params['random_state'] = self.random_state
        final_params['early_stopping_rounds'] = 100
        
        kf = KFold(n_splits=5, shuffle=True, random_state=self.random_state)

        # standardized features
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        for fold, (train_idx, _) in enumerate(kf.split(X_train_scaled, y_train)):
            logger.info(f"--- Start training the {fold+1}/5th fold model ---")
            
            X_fold_train, y_fold_train = X_train_scaled[train_idx], y_train.iloc[train_idx]
            
            # Train a model for a single fold using the best parameters
            model = xgb.XGBRegressor(**final_params)
            model.fit(X_fold_train, y_fold_train, eval_set=[(X_fold_train, y_fold_train)], verbose=1000)
            
            # Make predictions on the test set and store them
            preds = model.predict(X_test_scaled)
            test_predictions_per_fold.append(preds)
            trained_models.append(model)
        
        # Take the average of the prediction results of the five models as the final integrated prediction
        final_predictions = np.mean(test_predictions_per_fold, axis=0)
        
        return final_predictions, trained_models, scaler

# ----------------------------------------------------------------------------
# 6.Main execution process
# ----------------------------------------------------------------------------


In [7]:
if __name__ == "__main__":
    # Create folders to save models and results
    os.makedirs('/kaggle/working/models', exist_ok=True)
    

    train_df, test_df = load_data()

    predictor = PolymerPredictor()
    
    X_test_features = predictor._extract_features(test_df['SMILES'].tolist())
    
    final_submission_predictions = {}
    
    TARGET_COLUMNS = ['Tg', 'FFV', 'Tc', 'Density', 'Rg']
    
    for prop in TARGET_COLUMNS:
        logger.info(f"\n{'='*25} Processing properties: {prop} {'='*25}")
        
        prop_train_df = train_df.dropna(subset=[prop]).reset_index(drop=True)
        if prop_train_df.empty:
            logger.warning(f"Property {prop} has no valid training data and will be skipped.")
            continue
            
        X_prop_train = predictor._extract_features(prop_train_df['SMILES'].tolist())
        y_prop_train = prop_train_df[prop]
        logger.info(f"Number of valid training samples for property {prop}: {len(prop_train_df)}")

        best_params_for_prop = predictor.optimize_hyperparameters(X_prop_train, y_prop_train, n_trials=75)
        
        logger.info(f"Use the best parameters found to train {prop} with 5-fold cross validation...")
        prop_predictions, prop_models, prop_scaler = predictor.train_and_predict_with_cv(
            X_prop_train, y_prop_train, X_test_features, best_params_for_prop
        )
        
        final_submission_predictions[prop] = prop_predictions
        
        logger.info(f"Save the model and scaler for property {prop}...")
        joblib.dump(prop_models, f'/kaggle/working/models/{prop}_models.joblib')
        joblib.dump(prop_scaler, f'/kaggle/working/models/{prop}_scaler.joblib')

    logger.info("All properties processed, creating commit file...")
    submission_df = pd.DataFrame({'id': test_df['id']})
    for prop in TARGET_COLUMNS:
        if prop in final_submission_predictions:
            submission_df[prop] = final_submission_predictions[prop]
        else:
            submission_df[prop] = 0 
            
    submission_df.to_csv('/kaggle/working/submission.csv', index=False)
    
    logger.info("The submission file submission.csv has been generated successfully!")
    logger.info("The process is complete!")


Feature extraction:   0%|          | 0/3 [00:00<?, ?it/s]

Feature extraction:   0%|          | 0/511 [00:00<?, ?it/s]

[I 2025-06-26 12:57:05,537] A new study created in memory with name: no-name-876df08d-c252-416c-ab2c-fcc10932fe73


  0%|          | 0/75 [00:00<?, ?it/s]

[I 2025-06-26 12:57:14,404] Trial 0 finished with value: 51.12066795562354 and parameters: {'n_estimators': 1642, 'learning_rate': 0.027847999939903605, 'max_depth': 9, 'subsample': 0.8877254055698418, 'colsample_bytree': 0.940637943608009, 'gamma': 0.03365279063253202, 'reg_alpha': 0.06147265188172035, 'reg_lambda': 0.04257520922761776, 'min_child_weight': 7}. Best is trial 0 with value: 51.12066795562354.
[I 2025-06-26 12:57:19,052] Trial 1 finished with value: 51.162691758227474 and parameters: {'n_estimators': 6559, 'learning_rate': 0.05494580525585855, 'max_depth': 7, 'subsample': 0.617142645987601, 'colsample_bytree': 0.6198277719246386, 'gamma': 0.07263605121679674, 'reg_alpha': 0.22050162909916896, 'reg_lambda': 1.0663463052110725e-07, 'min_child_weight': 1}. Best is trial 0 with value: 51.12066795562354.
[I 2025-06-26 12:57:30,214] Trial 2 finished with value: 51.070797102055835 and parameters: {'n_estimators': 5674, 'learning_rate': 0.013094087928426078, 'max_depth': 7, 'subs

Feature extraction:   0%|          | 0/7030 [00:00<?, ?it/s]

[I 2025-06-26 13:10:46,603] A new study created in memory with name: no-name-68beaab8-9b64-456c-9277-2c3000035508


  0%|          | 0/75 [00:00<?, ?it/s]

[I 2025-06-26 13:12:15,615] Trial 0 finished with value: 0.0057110126609957335 and parameters: {'n_estimators': 3255, 'learning_rate': 0.03537784055207119, 'max_depth': 9, 'subsample': 0.7760873192663857, 'colsample_bytree': 0.6554130444437665, 'gamma': 1.6012799807080253e-06, 'reg_alpha': 0.0005246503964168727, 'reg_lambda': 0.08391786653978257, 'min_child_weight': 4}. Best is trial 0 with value: 0.0057110126609957335.
[I 2025-06-26 13:13:19,942] Trial 1 finished with value: 0.006316545984770687 and parameters: {'n_estimators': 4664, 'learning_rate': 0.03262256773090326, 'max_depth': 13, 'subsample': 0.8981216680373351, 'colsample_bytree': 0.9087360968322749, 'gamma': 3.4161641232881816e-05, 'reg_alpha': 0.00012190437483045665, 'reg_lambda': 3.1418048247210694e-08, 'min_child_weight': 3}. Best is trial 0 with value: 0.0057110126609957335.
[I 2025-06-26 13:14:09,384] Trial 2 finished with value: 0.006394600572252459 and parameters: {'n_estimators': 5528, 'learning_rate': 0.066085528654

Feature extraction:   0%|          | 0/866 [00:00<?, ?it/s]

[I 2025-06-26 15:16:10,839] A new study created in memory with name: no-name-5ada5367-0f74-4c79-8164-6023ea179cb3


  0%|          | 0/75 [00:00<?, ?it/s]

[I 2025-06-26 15:16:16,765] Trial 0 finished with value: 0.03713322786812938 and parameters: {'n_estimators': 5995, 'learning_rate': 0.036064748876988295, 'max_depth': 5, 'subsample': 0.9093636592649006, 'colsample_bytree': 0.6596342968666876, 'gamma': 1.5970652582181748e-07, 'reg_alpha': 2.68486108407324e-06, 'reg_lambda': 0.0004154137386692049, 'min_child_weight': 1}. Best is trial 0 with value: 0.03713322786812938.
[I 2025-06-26 15:16:18,993] Trial 1 finished with value: 0.033265242255994566 and parameters: {'n_estimators': 5590, 'learning_rate': 0.044379253270507034, 'max_depth': 10, 'subsample': 0.7271649356061842, 'colsample_bytree': 0.6423972704683597, 'gamma': 0.015494828312289745, 'reg_alpha': 1.3022488204967887e-05, 'reg_lambda': 1.0118456687008359e-07, 'min_child_weight': 9}. Best is trial 1 with value: 0.033265242255994566.
[I 2025-06-26 15:16:26,768] Trial 2 finished with value: 0.032330467161883855 and parameters: {'n_estimators': 2408, 'learning_rate': 0.0217436510374871

Feature extraction:   0%|          | 0/613 [00:00<?, ?it/s]

[I 2025-06-26 15:30:59,598] A new study created in memory with name: no-name-db70e0de-b24a-4e61-bd8e-c820d0339440


  0%|          | 0/75 [00:00<?, ?it/s]

[I 2025-06-26 15:31:02,755] Trial 0 finished with value: 0.037660636031394926 and parameters: {'n_estimators': 4491, 'learning_rate': 0.08142210720539649, 'max_depth': 8, 'subsample': 0.8865876442170708, 'colsample_bytree': 0.8609407761154962, 'gamma': 0.005122865991835187, 'reg_alpha': 2.6628470479405153e-05, 'reg_lambda': 0.00448832684211765, 'min_child_weight': 1}. Best is trial 0 with value: 0.037660636031394926.
[I 2025-06-26 15:31:05,240] Trial 1 finished with value: 0.05300222219045427 and parameters: {'n_estimators': 3951, 'learning_rate': 0.037239867122313836, 'max_depth': 13, 'subsample': 0.7559328664980335, 'colsample_bytree': 0.611843543817742, 'gamma': 0.13331632584153438, 'reg_alpha': 0.46778056145690233, 'reg_lambda': 3.0436500384906495e-05, 'min_child_weight': 3}. Best is trial 0 with value: 0.037660636031394926.
[I 2025-06-26 15:31:19,542] Trial 2 finished with value: 0.03753475975731823 and parameters: {'n_estimators': 1466, 'learning_rate': 0.026432097785256164, 'max

Feature extraction:   0%|          | 0/614 [00:00<?, ?it/s]

[I 2025-06-26 16:04:46,688] A new study created in memory with name: no-name-e9a2e9a5-6c70-4ea1-b2db-545dfed8f7ea


  0%|          | 0/75 [00:00<?, ?it/s]

[I 2025-06-26 16:04:51,150] Trial 0 finished with value: 1.6618850920789945 and parameters: {'n_estimators': 2098, 'learning_rate': 0.07439305112833697, 'max_depth': 7, 'subsample': 0.6181276828662414, 'colsample_bytree': 0.6372121331957332, 'gamma': 5.099691063871447e-08, 'reg_alpha': 0.06610960897293638, 'reg_lambda': 1.165550534506859e-05, 'min_child_weight': 2}. Best is trial 0 with value: 1.6618850920789945.
[I 2025-06-26 16:04:57,346] Trial 1 finished with value: 1.6020798114339918 and parameters: {'n_estimators': 3103, 'learning_rate': 0.06533260143732912, 'max_depth': 12, 'subsample': 0.9518302788276198, 'colsample_bytree': 0.7038487860497845, 'gamma': 6.017412849398579e-05, 'reg_alpha': 0.0011499579378930515, 'reg_lambda': 6.05237843210191e-05, 'min_child_weight': 8}. Best is trial 1 with value: 1.6020798114339918.
[I 2025-06-26 16:05:13,511] Trial 2 finished with value: 1.6221929157030754 and parameters: {'n_estimators': 7337, 'learning_rate': 0.010224863410352593, 'max_depth