#### 00 - Exploratory Data Analysis and Feature Engineering

This notebook performs initial Exploratory Data Analysis (EDA) on the provided dataset, preprocesses the data for machine learning, and engineers new features. The goal is to prepare the data for training a classification model to predict `Personality` (Extrovert/Introvert).


In [16]:
import pandas as pd
import numpy as np
import optuna
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
import math
import lightgbm as lgb
from sklearn.model_selection import StratifiedKFold 
from sklearn.metrics import accuracy_score, roc_auc_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split

In [22]:


train_df = pd.read_csv("../data/raw/train.csv") 
test_df = pd.read_csv("../data/raw/test.csv")
original_df = pd.read_csv("../data/raw/personality_dataset.csv")

original_test_ids = test_df['id']

print(train_df.columns)


Index(['id', 'Time_spent_Alone', 'Stage_fear', 'Social_event_attendance',
       'Going_outside', 'Drained_after_socializing', 'Friends_circle_size',
       'Post_frequency', 'Personality'],
      dtype='object')


### Training Data Inspection and EDA

In [2]:

# Rename columns for consistency 
renamed_columns = {'Time_spent_Alone' : 'time_spent_alone', 'Stage_fear' : 'stage_fear', 'Social_event_attendance' : 'social_event_attendance',
       'Going_outside':'going_outside', 'Drained_after_socializing':'drained_after_socializing', 'Friends_circle_size':'friends_circle_size',
       'Post_frequency':'post_frequency', 'Personality':'personality'}

train_df.rename(columns = renamed_columns , inplace=True)


display(train_df.head())

# Basic Info : Shape and number of unique values
print("Shape of dataset:", train_df.shape)
print("\n# of unique values per column:")
display(train_df.nunique())

# Missing values check
print("\n# of missing values per column:")
display(train_df.isnull().sum())

# Calculation for missing percentage data
total_cells = train_df.shape[0] * train_df.shape[1]
total_missing = train_df.isnull().sum().sum()
missing_percentage = (total_missing / total_cells) * 100
print(f"Percentage of missing data: {missing_percentage:.2f}%")

Unnamed: 0,id,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency,personality
0,0,0.0,No,6.0,4.0,No,15.0,5.0,Extrovert
1,1,1.0,No,7.0,3.0,No,10.0,8.0,Extrovert
2,2,6.0,Yes,1.0,0.0,,3.0,0.0,Introvert
3,3,3.0,No,7.0,3.0,No,11.0,5.0,Extrovert
4,4,1.0,No,4.0,4.0,No,13.0,,Extrovert


Shape of dataset: (18524, 9)

# of unique values per column:


id                           18524
time_spent_alone                12
stage_fear                       2
social_event_attendance         11
going_outside                    8
drained_after_socializing        2
friends_circle_size             16
post_frequency                  11
personality                      2
dtype: int64


# of missing values per column:


id                              0
time_spent_alone             1190
stage_fear                   1893
social_event_attendance      1180
going_outside                1466
drained_after_socializing    1149
friends_circle_size          1054
post_frequency               1264
personality                     0
dtype: int64

Percentage of missing data: 5.52%


In [3]:
display(train_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18524 entries, 0 to 18523
Data columns (total 9 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   id                         18524 non-null  int64  
 1   time_spent_alone           17334 non-null  float64
 2   stage_fear                 16631 non-null  object 
 3   social_event_attendance    17344 non-null  float64
 4   going_outside              17058 non-null  float64
 5   drained_after_socializing  17375 non-null  object 
 6   friends_circle_size        17470 non-null  float64
 7   post_frequency             17260 non-null  float64
 8   personality                18524 non-null  object 
dtypes: float64(5), int64(1), object(3)
memory usage: 1.3+ MB


None

### Test data inspection

In [4]:
test_df.rename(columns = renamed_columns , inplace=True)


display(test_df.head())

# Basic Info : Shape and number of unique values
print("Shape of dataset:", test_df.shape)
print("\n# of unique values per column:")
display(test_df.nunique())

# Missing values check
print("\n# of missing values per column:")
display(test_df.isnull().sum())

# Calculation for missing percentage data
total_cells = test_df.shape[0] * test_df.shape[1]
total_missing = test_df.isnull().sum().sum()
missing_percentage = (total_missing / total_cells) * 100
print(f"Percentage of missing data: {missing_percentage:.2f}%")

Unnamed: 0,id,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency
0,18524,3.0,No,7.0,4.0,No,6.0,
1,18525,,Yes,0.0,0.0,Yes,5.0,1.0
2,18526,3.0,No,5.0,6.0,No,15.0,9.0
3,18527,3.0,No,4.0,4.0,No,5.0,6.0
4,18528,9.0,Yes,1.0,2.0,Yes,1.0,1.0


Shape of dataset: (6175, 8)

# of unique values per column:


id                           6175
time_spent_alone               12
stage_fear                      2
social_event_attendance        11
going_outside                   8
drained_after_socializing       2
friends_circle_size            16
post_frequency                 11
dtype: int64


# of missing values per column:


id                             0
time_spent_alone             425
stage_fear                   598
social_event_attendance      397
going_outside                466
drained_after_socializing    432
friends_circle_size          350
post_frequency               408
dtype: int64

Percentage of missing data: 6.23%


### Original Data Set 

In [5]:
original_df.rename(columns = renamed_columns , inplace=True)


display(original_df.head())

# Basic Info : Shape and number of unique values
print("Shape of dataset:", original_df.shape)
print("\n# of unique values per column:")
display(original_df.nunique())

# Missing values check
print("\n# of missing values per column:")
display(original_df.isnull().sum())

# Calculation for missing percentage data
total_cells = original_df.shape[0] * test_df.shape[1]
total_missing = original_df.isnull().sum().sum()
missing_percentage = (total_missing / total_cells) * 100
print(f"Percentage of missing data: {missing_percentage:.2f}%")

Unnamed: 0,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency,personality
0,4.0,No,4.0,6.0,No,13.0,5.0,Extrovert
1,9.0,Yes,0.0,0.0,Yes,0.0,3.0,Introvert
2,9.0,Yes,1.0,2.0,Yes,5.0,2.0,Introvert
3,0.0,No,6.0,7.0,No,14.0,8.0,Extrovert
4,3.0,No,9.0,4.0,No,8.0,5.0,Extrovert


Shape of dataset: (2900, 8)

# of unique values per column:


time_spent_alone             12
stage_fear                    2
social_event_attendance      11
going_outside                 8
drained_after_socializing     2
friends_circle_size          16
post_frequency               11
personality                   2
dtype: int64


# of missing values per column:


time_spent_alone             63
stage_fear                   73
social_event_attendance      62
going_outside                66
drained_after_socializing    52
friends_circle_size          77
post_frequency               65
personality                   0
dtype: int64

Percentage of missing data: 1.97%


### Encoding the data 

In [6]:
mapping_no_yes = {'No': 0, 'Yes': 1}
mapping_personality = {'Extrovert': 0, 'Introvert': 1} 

for col in ['stage_fear', 'drained_after_socializing', 'personality']:
    if col == 'personality':
        original_df[col] = original_df[col].map(mapping_personality)
        train_df[col] = train_df[col].map(mapping_personality)
    else:
        original_df[col] = original_df[col].map(mapping_no_yes)
        train_df[col] = train_df[col].map(mapping_no_yes)
        test_df[col] = test_df[col].map(mapping_no_yes)

display(original_df.head())
display(test_df.head())
display(train_df.head())

Unnamed: 0,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency,personality
0,4.0,0.0,4.0,6.0,0.0,13.0,5.0,0
1,9.0,1.0,0.0,0.0,1.0,0.0,3.0,1
2,9.0,1.0,1.0,2.0,1.0,5.0,2.0,1
3,0.0,0.0,6.0,7.0,0.0,14.0,8.0,0
4,3.0,0.0,9.0,4.0,0.0,8.0,5.0,0


Unnamed: 0,id,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency
0,18524,3.0,0.0,7.0,4.0,0.0,6.0,
1,18525,,1.0,0.0,0.0,1.0,5.0,1.0
2,18526,3.0,0.0,5.0,6.0,0.0,15.0,9.0
3,18527,3.0,0.0,4.0,4.0,0.0,5.0,6.0
4,18528,9.0,1.0,1.0,2.0,1.0,1.0,1.0


Unnamed: 0,id,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency,personality
0,0,0.0,0.0,6.0,4.0,0.0,15.0,5.0,0
1,1,1.0,0.0,7.0,3.0,0.0,10.0,8.0,0
2,2,6.0,1.0,1.0,0.0,,3.0,0.0,1
3,3,3.0,0.0,7.0,3.0,0.0,11.0,5.0,0
4,4,1.0,0.0,4.0,4.0,0.0,13.0,,0


In [12]:
 X_original = original_df.drop('personality', axis=1)
y_original = original_df['personality']

X_train = train_df.drop('personality', axis=1)
y_train = train_df['personality']

X_test = test_df.copy()

features_to_impute = [col for col in X_train.columns if col != 'id']

# Combine for fitting
combined_features_for_imputation = pd.concat(
    [X_train[features_to_impute], X_original[features_to_impute]],
    axis=0
)


print(f"\nShape of combined features for Imputer fitting: {combined_features_for_imputation.shape}")
print("Missing values in combined features before imputation:\n", combined_features_for_imputation.isnull().sum())


Shape of combined features for Imputer fitting: (21424, 7)
Missing values in combined features before imputation:
 time_spent_alone             1253
stage_fear                   1966
social_event_attendance      1242
going_outside                1532
drained_after_socializing    1201
friends_circle_size          1131
post_frequency               1329
dtype: int64


In [11]:
imputer_mf_like = IterativeImputer(
    estimator=RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1), # n_jobs=-1 for parallel processing
    max_iter=10,
    random_state=42,
    initial_strategy='mean' # Can also be 'median', 'most_frequent'
)

print("\n--- Fitting IterativeImputer with RandomForest Estimator ---")
imputed_combined_features_array = imputer_mf_like.fit_transform(combined_features_for_imputation)

imputed_combined_features_df = pd.DataFrame(
    imputed_combined_features_array,
    columns=features_to_impute,
    index=combined_features_for_imputation.index
)

print("\n--- Imputation complete for combined features (used for fitting) ---")
print("Missing values in imputed combined features:\n", imputed_combined_features_df.isnull().sum())


# --- 5. Transform individual DataFrames ---

print("\n--- Transforming X_train ---")
imputed_X_train_array = imputer_mf_like.transform(X_train[features_to_impute])
imp_train = pd.DataFrame(
    imputed_X_train_array,
    columns=features_to_impute,
    index=X_train.index
)
imp_train['personality'] = y_train


print("\n--- Transforming X_original ---")
imputed_X_original_array = imputer_mf_like.transform(X_original[features_to_impute])
imp_original = pd.DataFrame(
    imputed_X_original_array,
    columns=features_to_impute,
    index=X_original.index
)
imp_original['personality'] = y_original


print("\n--- Transforming X_test ---")
imputed_X_test_array = imputer_mf_like.transform(X_test[features_to_impute])
imp_test = pd.DataFrame(
    imputed_X_test_array,
    columns=features_to_impute,
    index=X_test.index
)


print("\n--- Imputation complete for all individual DataFrames ---")

# --- Post-imputation rounding for binary columns ---
# Since RandomForestRegressor might output floats like 0.1, 0.9, etc.,
# you will want to round these back to 0 or 1 for the binary columns.
binary_cols = ['stage_fear', 'drained_after_socializing'] # Ensure these names match your data

for df in [imp_train, imp_original, imp_test]:
    for col in binary_cols:
        if col in df.columns:
            # Round to nearest integer and convert to appropriate type (e.g., Int64 for nullable int)
            df[col] = df[col].round().astype('Int64') # Using nullable integer dtype is good practice here

print("\nImputed train_df (imp_train) head (with rounding for binary cols):")
display(imp_train.head())
print("\nMissing values in imp_train after Imputation:\n", imp_train.isnull().sum())

print("\nImputed original_df (imp_original) head (with rounding for binary cols):")
display(imp_original.head())
print("\nMissing values in imp_original after Imputation:\n", imp_original.isnull().sum())

print("\nImputed test_df (imp_test) head (with rounding for binary cols):")
display(imp_test.head())
print("\nMissing values in imp_test after Imputation:\n", imp_test.isnull().sum())

# --- Final combined training data ---
fin_train_mf = pd.concat([imp_train, imp_original], axis=0)

print("\n--- Final combined training data (fin_train_mf) head ---")
display(fin_train_mf.head())
print("Missing values in fin_train_mf:\n", fin_train_mf.isnull().sum())


--- Fitting IterativeImputer with RandomForest Estimator ---





--- Imputation complete for combined features (used for fitting) ---
Missing values in imputed combined features:
 time_spent_alone             0
stage_fear                   0
social_event_attendance      0
going_outside                0
drained_after_socializing    0
friends_circle_size          0
post_frequency               0
dtype: int64

--- Transforming X_train ---

--- Transforming X_original ---

--- Transforming X_test ---

--- Imputation complete for all individual DataFrames ---

Imputed train_df (imp_train) head (with rounding for binary cols):


Unnamed: 0,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency,personality
0,0.0,0,6.0,4.0,0,15.0,5.0,0
1,1.0,0,7.0,3.0,0,10.0,8.0,0
2,6.0,1,1.0,0.0,1,3.0,0.0,1
3,3.0,0,7.0,3.0,0,11.0,5.0,0
4,1.0,0,4.0,4.0,0,13.0,7.268801,0



Missing values in imp_train after Imputation:
 time_spent_alone             0
stage_fear                   0
social_event_attendance      0
going_outside                0
drained_after_socializing    0
friends_circle_size          0
post_frequency               0
personality                  0
dtype: int64

Imputed original_df (imp_original) head (with rounding for binary cols):


Unnamed: 0,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency,personality
0,4.0,0,4.0,6.0,0,13.0,5.0,0
1,9.0,1,0.0,0.0,1,0.0,3.0,1
2,9.0,1,1.0,2.0,1,5.0,2.0,1
3,0.0,0,6.0,7.0,0,14.0,8.0,0
4,3.0,0,9.0,4.0,0,8.0,5.0,0



Missing values in imp_original after Imputation:
 time_spent_alone             0
stage_fear                   0
social_event_attendance      0
going_outside                0
drained_after_socializing    0
friends_circle_size          0
post_frequency               0
personality                  0
dtype: int64

Imputed test_df (imp_test) head (with rounding for binary cols):


Unnamed: 0,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency
0,3.0,0,7.0,4.0,0,6.0,5.924252
1,7.065985,1,0.0,0.0,1,5.0,1.0
2,3.0,0,5.0,6.0,0,15.0,9.0
3,3.0,0,4.0,4.0,0,5.0,6.0
4,9.0,1,1.0,2.0,1,1.0,1.0



Missing values in imp_test after Imputation:
 time_spent_alone             0
stage_fear                   0
social_event_attendance      0
going_outside                0
drained_after_socializing    0
friends_circle_size          0
post_frequency               0
dtype: int64

--- Final combined training data (fin_train_mf) head ---


Unnamed: 0,time_spent_alone,stage_fear,social_event_attendance,going_outside,drained_after_socializing,friends_circle_size,post_frequency,personality
0,0.0,0,6.0,4.0,0,15.0,5.0,0
1,1.0,0,7.0,3.0,0,10.0,8.0,0
2,6.0,1,1.0,0.0,1,3.0,0.0,1
3,3.0,0,7.0,3.0,0,11.0,5.0,0
4,1.0,0,4.0,4.0,0,13.0,7.268801,0


Missing values in fin_train_mf:
 time_spent_alone             0
stage_fear                   0
social_event_attendance      0
going_outside                0
drained_after_socializing    0
friends_circle_size          0
post_frequency               0
personality                  0
dtype: int64


In [15]:

# --- 1. Prepare Features (X) and Target (y) ---
# Drop the 'personality' column to get features for training
X_train_lgbm = fin_train_mf.drop('personality', axis=1)
# Select the 'personality' column as the target
y_train_lgbm = fin_train_mf['personality']

# For the test set, we only have features, so copy it directly
X_test_lgbm = imp_test.copy()

# Ensure 'id' column is removed if it's a regular column and not the index.
# Models typically don't use 'id' as a feature.
if 'id' in X_train_lgbm.columns:
    X_train_lgbm = X_train_lgbm.drop('id', axis=1)
if 'id' in X_test_lgbm.columns:
    X_test_lgbm = X_test_lgbm.drop('id', axis=1)

print(f"\nFeatures for training (X_train_lgbm) shape: {X_train_lgbm.shape}")
print(f"Target for training (y_train_lgbm) shape: {y_train_lgbm.shape}")
print(f"Features for test (X_test_lgbm) shape: {X_test_lgbm.shape}")


# --- 2. Initialize the LightGBM Classifier ---
# These are basic parameters. Hyperparameter tuning with Optuna will improve performance.
lgbm_clf = lgb.LGBMClassifier(
    objective='binary',          # Specifies a binary classification task
    metric='auc',                # We'll evaluate using AUC (Area Under the Curve)
    n_estimators=1000,           # Number of boosting rounds (trees). Can be tuned.
    learning_rate=0.05,          # Step size shrinkage. Can be tuned.
    num_leaves=31,               # Max number of leaves in one tree. Can be tuned.
    max_depth=-1,                # No limit on tree depth (-1 means unlimited, can be tuned for regularization)
    min_child_samples=20,        # Minimum data in a leaf. Can be tuned.
    subsample=0.8,               # Fraction of samples used for boosting (for bagging).
    colsample_bytree=0.8,        # Fraction of features used per tree (for feature subsampling).
    random_state=42,             # Ensures reproducibility of results
    n_jobs=-1                    # Use all available CPU cores for faster training
)


# --- 3. Train the Model (using a simple train-validation split for now) ---
# For more robust evaluation, especially in competitions, you'd typically use K-Fold Cross-Validation.
# Here, we create a single train-validation split to monitor performance and use early stopping.
X_train_split, X_val_split, y_train_split, y_val_split = train_test_split(
    X_train_lgbm, y_train_lgbm,
    test_size=0.2,               # 20% of data for validation
    random_state=42,             # For reproducibility of the split
    stratify=y_train_lgbm        # Ensures the same proportion of classes in train and validation sets
)

print("\n--- Training LightGBM Model ---")
lgbm_clf.fit(
    X_train_split, y_train_split,
    eval_set=[(X_val_split, y_val_split)], # Data to evaluate performance on during training
    eval_metric='auc',                     # Metric to monitor for early stopping
    callbacks=[lgb.early_stopping(100, verbose=False)] # Stop if validation AUC doesn't improve for 100 rounds
)

print("\n--- LightGBM Model Training Complete ---")


# --- 4. Make Predictions on the Test Set ---
# Predict probabilities for ROC AUC score (often required for Kaggle submissions)
# We take the probability of the positive class (index 1)
y_pred_proba = lgbm_clf.predict_proba(X_test_lgbm)[:, 1]

# Predict class labels (0 or 1) based on a default threshold (usually 0.5)
y_pred_class = lgbm_clf.predict(X_test_lgbm)

print(f"\nExample probabilities for test set (first 10): {y_pred_proba[:10]}")
print(f"Example class predictions for test set (first 10): {y_pred_class[:10]}")


# --- 5. Evaluate Model Performance on the Validation Set ---
# It's good practice to check the model's performance on the validation set where true labels are known.
y_val_pred_proba = lgbm_clf.predict_proba(X_val_split)[:, 1]
y_val_pred_class = lgbm_clf.predict(X_val_split)

print("\n--- Model Evaluation on Validation Set ---")
print(f"Validation Accuracy: {accuracy_score(y_val_split, y_val_pred_class):.4f}")
print(f"Validation ROC AUC: {roc_auc_score(y_val_split, y_val_pred_proba):.4f}")
print(f"Validation Precision: {precision_score(y_val_split, y_val_pred_class):.4f}")
print(f"Validation Recall: {recall_score(y_val_split, y_val_pred_class):.4f}")
print(f"Validation F1-Score: {f1_score(y_val_split, y_val_pred_class):.4f}")


# --- Optional: Prepare for Submission ---
# For a Kaggle submission, you'd typically format your predictions into a DataFrame
# with an 'id' column and your predicted 'Personality' column.
# Ensure you use the original 'id' from the test_df.
# submission_df = pd.DataFrame({'id': test_df.index, 'Personality': y_pred_proba}) # If submission expects probability
# # Or for class labels if required:
# # submission_df = pd.DataFrame({'id': test_df.index, 'Personality': y_pred_class})
# display(submission_df.head())
# submission_df.to_csv('submission_lgbm.csv', index=False)


Features for training (X_train_lgbm) shape: (21424, 7)
Target for training (y_train_lgbm) shape: (21424,)
Features for test (X_test_lgbm) shape: (6175, 7)

--- Training LightGBM Model ---
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007977 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659

--- LightGBM Model Training Complete ---

Example probabilities for test set (first 10): [0.13735561 0.66566348 0.14386527 0.13744269 0.6539281  0.13738201
 0.14386527 0.6388785  0.13792372 0.64846233]
Example class predictions for test set (first 10): [

In [17]:
def objective(trial):
    # 1. Define hyperparameters to optimize
    # Optuna will suggest values for these within defined ranges
    param = {
        'objective': 'binary',
        'metric': 'auc',
        'n_estimators': trial.suggest_int('n_estimators', 500, 2000), # Range for number of trees
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1, log=True), # Log scale for learning rate
        'num_leaves': trial.suggest_int('num_leaves', 20, 100), # Max number of leaves in one tree
        'max_depth': trial.suggest_int('max_depth', 5, 15), # Max tree depth
        'min_child_samples': trial.suggest_int('min_child_samples', 20, 100), # Min data in a leaf
        'subsample': trial.suggest_float('subsample', 0.6, 1.0), # Fraction of samples
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.6, 1.0), # Fraction of features
        'reg_alpha': trial.suggest_float('reg_alpha', 1e-8, 10.0, log=True), # L1 regularization
        'reg_lambda': trial.suggest_float('reg_lambda', 1e-8, 10.0, log=True), # L2 regularization
        'random_state': 42,
        'n_jobs': -1,
        # 'is_unbalance': True, # Consider adding this if class imbalance is a concern and affecting minority class recall
        # 'scale_pos_weight': trial.suggest_float('scale_pos_weight', 1.0, 5.0), # Another way to handle imbalance
    }

    # 2. Implement Cross-Validation within the objective function
    # This gives a more robust evaluation of the chosen hyperparameters
    kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
    auc_scores = []

    for fold, (train_idx, val_idx) in enumerate(kf.split(X_train_lgbm, y_train_lgbm)):
        X_train_fold, X_val_fold = X_train_lgbm.iloc[train_idx], X_train_lgbm.iloc[val_idx]
        y_train_fold, y_val_fold = y_train_lgbm.iloc[train_idx], y_train_lgbm.iloc[val_idx]

        model = lgb.LGBMClassifier(**param)
        model.fit(X_train_fold, y_train_fold,
                  eval_set=[(X_val_fold, y_val_fold)],
                  eval_metric='auc',
                  callbacks=[lgb.early_stopping(100, verbose=False)], # Early stopping in each fold
                  # categorical_feature=categorical_feature_names # If you had specific categorical features by name
                 )
        
        # Predict probabilities on the validation fold
        val_preds = model.predict_proba(X_val_fold)[:, 1]
        auc_scores.append(roc_auc_score(y_val_fold, val_preds))
    
    # Optuna aims to minimize by default, so we return the negative AUC if we want to maximize AUC
    return np.mean(auc_scores)

# 3. Run the Optuna study
print("--- Starting Optuna Hyperparameter Tuning ---")
# Use study.optimize with a certain number of trials. More trials = better chance of finding optimal params.
# n_trials: Number of different parameter combinations Optuna will try.
# timeout: Max time in seconds.
study = optuna.create_study(direction='maximize') # We want to maximize AUC
study.optimize(objective, n_trials=50, timeout=3600) # Example: 50 trials or 1 hour timeout

print("\n--- Optuna Tuning Complete ---")
print(f"Best trial value (mean AUC): {study.best_value:.4f}")
print(f"Best parameters: {study.best_params}")

# 4. Train the final model with the best parameters found by Optuna
best_lgbm_params = study.best_params
final_lgbm_clf = lgb.LGBMClassifier(
    objective='binary',
    metric='auc',
    random_state=42,
    n_jobs=-1,
    **best_lgbm_params # Unpack the best parameters found by Optuna
)

print("\n--- Training Final LightGBM Model with Best Parameters ---")
# Train on the entire fin_train_mf dataset (no split for final training)
final_lgbm_clf.fit(X_train_lgbm, y_train_lgbm)

print("\n--- Final Model Training Complete ---")

# 5. Make final predictions with the optimized model
y_pred_proba_optimized = final_lgbm_clf.predict_proba(X_test_lgbm)[:, 1]
# And prepare for submission
# submission_df_optimized = pd.DataFrame({'id': test_df.index, 'Personality': y_pred_proba_optimized})
# submission_df_optimized.to_csv('submission_lgbm_optimized.csv', index=False

[I 2025-07-22 13:05:17,162] A new study created in memory with name: no-name-8e8c025b-ae0e-4866-91c2-12c66b44cc18


--- Starting Optuna Hyperparameter Tuning ---
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.016831 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000429 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, num

[I 2025-07-22 13:05:26,127] Trial 0 finished with value: 0.9696593975684774 and parameters: {'n_estimators': 1778, 'learning_rate': 0.015083210004949154, 'num_leaves': 49, 'max_depth': 12, 'min_child_samples': 93, 'subsample': 0.8531738664549511, 'colsample_bytree': 0.60006313987095, 'reg_alpha': 2.3126301943855773e-08, 'reg_lambda': 3.36068522577933}. Best is trial 0 with value: 0.9696593975684774.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007354 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.004796 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [bin

[I 2025-07-22 13:05:32,669] Trial 1 finished with value: 0.9697306733213192 and parameters: {'n_estimators': 1501, 'learning_rate': 0.02743762766060252, 'num_leaves': 92, 'max_depth': 7, 'min_child_samples': 76, 'subsample': 0.673579988620027, 'colsample_bytree': 0.7804451480113332, 'reg_alpha': 0.0011990865669799853, 'reg_lambda': 0.002370418753215495}. Best is trial 1 with value: 0.9697306733213192.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003308 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001345 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:05:40,813] Trial 2 finished with value: 0.9689138979152114 and parameters: {'n_estimators': 812, 'learning_rate': 0.03408602535890296, 'num_leaves': 84, 'max_depth': 12, 'min_child_samples': 24, 'subsample': 0.7472276703410474, 'colsample_bytree': 0.8219233071078376, 'reg_alpha': 0.0461433292230757, 'reg_lambda': 0.37463732640811515}. Best is trial 1 with value: 0.9697306733213192.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.005403 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001327 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:05:47,883] Trial 3 finished with value: 0.9696819131391063 and parameters: {'n_estimators': 1412, 'learning_rate': 0.027664365485017096, 'num_leaves': 60, 'max_depth': 7, 'min_child_samples': 42, 'subsample': 0.8825479019570777, 'colsample_bytree': 0.8626259279508819, 'reg_alpha': 2.5467543898908583e-08, 'reg_lambda': 3.2087889943355294e-08}. Best is trial 1 with value: 0.9697306733213192.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009528 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002137 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:00,654] Trial 4 finished with value: 0.9693197504135981 and parameters: {'n_estimators': 905, 'learning_rate': 0.0120662543525929, 'num_leaves': 55, 'max_depth': 6, 'min_child_samples': 27, 'subsample': 0.7140563948306085, 'colsample_bytree': 0.8684848819469131, 'reg_alpha': 6.06623221001136e-07, 'reg_lambda': 2.674614293691497e-07}. Best is trial 1 with value: 0.9697306733213192.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.012726 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001580 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:04,127] Trial 5 finished with value: 0.9688337185244258 and parameters: {'n_estimators': 681, 'learning_rate': 0.06842616746450678, 'num_leaves': 53, 'max_depth': 14, 'min_child_samples': 55, 'subsample': 0.826778185316928, 'colsample_bytree': 0.8023776139792819, 'reg_alpha': 3.0621460306779353e-06, 'reg_lambda': 2.167549543219316e-05}. Best is trial 1 with value: 0.9697306733213192.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007823 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001339 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:06,605] Trial 6 finished with value: 0.9691188212815736 and parameters: {'n_estimators': 1768, 'learning_rate': 0.020418730272355773, 'num_leaves': 75, 'max_depth': 9, 'min_child_samples': 28, 'subsample': 0.8506785281476341, 'colsample_bytree': 0.8110329661078157, 'reg_alpha': 0.0001667997782487244, 'reg_lambda': 1.8717089862262605e-07}. Best is trial 1 with value: 0.9697306733213192.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009011 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.012606 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:10,341] Trial 7 finished with value: 0.9699226315531388 and parameters: {'n_estimators': 941, 'learning_rate': 0.06431147407623979, 'num_leaves': 34, 'max_depth': 14, 'min_child_samples': 71, 'subsample': 0.9066684103641156, 'colsample_bytree': 0.6036240225082832, 'reg_alpha': 0.00044963601878269497, 'reg_lambda': 0.0005786196469057266}. Best is trial 7 with value: 0.9699226315531388.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008216 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.006286 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:19,079] Trial 8 finished with value: 0.9690075935885684 and parameters: {'n_estimators': 1684, 'learning_rate': 0.01020820214854434, 'num_leaves': 61, 'max_depth': 8, 'min_child_samples': 82, 'subsample': 0.7067581074905039, 'colsample_bytree': 0.9951198800808453, 'reg_alpha': 0.23911892293000414, 'reg_lambda': 0.3042309123456871}. Best is trial 7 with value: 0.9699226315531388.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.004772 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.010399 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:27,140] Trial 9 finished with value: 0.9696853783425174 and parameters: {'n_estimators': 807, 'learning_rate': 0.0255923514854267, 'num_leaves': 38, 'max_depth': 7, 'min_child_samples': 94, 'subsample': 0.6488945015798456, 'colsample_bytree': 0.7805451744300518, 'reg_alpha': 0.00029768088064788197, 'reg_lambda': 3.0665606328108224e-05}. Best is trial 7 with value: 0.9699226315531388.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000458 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001721 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:29,564] Trial 10 finished with value: 0.9701824093379987 and parameters: {'n_estimators': 1082, 'learning_rate': 0.09693678247965194, 'num_leaves': 22, 'max_depth': 14, 'min_child_samples': 64, 'subsample': 0.980325335546167, 'colsample_bytree': 0.6045352741645675, 'reg_alpha': 1.4919457662797337, 'reg_lambda': 0.004559789253411537}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000910 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000467 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:33,811] Trial 11 finished with value: 0.9699337389569846 and parameters: {'n_estimators': 1151, 'learning_rate': 0.09992832632829698, 'num_leaves': 21, 'max_depth': 15, 'min_child_samples': 66, 'subsample': 0.9980393271404497, 'colsample_bytree': 0.632419260076192, 'reg_alpha': 2.488204308597611, 'reg_lambda': 0.0024390786405619836}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001017 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000973 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:06:37,257] Trial 12 finished with value: 0.9697998049015633 and parameters: {'n_estimators': 1206, 'learning_rate': 0.08608346940201952, 'num_leaves': 20, 'max_depth': 15, 'min_child_samples': 56, 'subsample': 0.9952136983825814, 'colsample_bytree': 0.6713493591921371, 'reg_alpha': 7.314473973698622, 'reg_lambda': 0.008156462103810588}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.006642 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002597 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:39,213] Trial 13 finished with value: 0.9701373061678626 and parameters: {'n_estimators': 1145, 'learning_rate': 0.0991131754197037, 'num_leaves': 20, 'max_depth': 12, 'min_child_samples': 60, 'subsample': 0.9993261783202597, 'colsample_bytree': 0.6966439051403241, 'reg_alpha': 1.373285668072293, 'reg_lambda': 0.019759219959426064}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008621 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001116 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:42,551] Trial 14 finished with value: 0.9699149765358005 and parameters: {'n_estimators': 515, 'learning_rate': 0.04334074930547373, 'num_leaves': 33, 'max_depth': 11, 'min_child_samples': 47, 'subsample': 0.9505070477328454, 'colsample_bytree': 0.7002417067948539, 'reg_alpha': 0.08009044438888133, 'reg_lambda': 0.05751311215010335}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007854 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001156 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:06:45,690] Trial 15 finished with value: 0.9696218938898884 and parameters: {'n_estimators': 1016, 'learning_rate': 0.052087789910976665, 'num_leaves': 42, 'max_depth': 13, 'min_child_samples': 44, 'subsample': 0.9363262287441254, 'colsample_bytree': 0.7164339396882301, 'reg_alpha': 0.007009769622746375, 'reg_lambda': 0.00011749846065672764}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000980 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001355 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:06:50,066] Trial 16 finished with value: 0.9696640487909249 and parameters: {'n_estimators': 1370, 'learning_rate': 0.07640737493185011, 'num_leaves': 27, 'max_depth': 11, 'min_child_samples': 63, 'subsample': 0.7905270215399749, 'colsample_bytree': 0.6727597728517074, 'reg_alpha': 0.4147566277228289, 'reg_lambda': 0.023991754134140127}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001707 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001339 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:06:57,693] Trial 17 finished with value: 0.969987419813205 and parameters: {'n_estimators': 1986, 'learning_rate': 0.05060068397453822, 'num_leaves': 74, 'max_depth': 10, 'min_child_samples': 82, 'subsample': 0.950390693911563, 'colsample_bytree': 0.7328714737871338, 'reg_alpha': 8.760063277206832, 'reg_lambda': 6.553413275663698}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001342 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001593 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:07:01,857] Trial 18 finished with value: 0.9697842870423881 and parameters: {'n_estimators': 1091, 'learning_rate': 0.0885026565595274, 'num_leaves': 28, 'max_depth': 13, 'min_child_samples': 50, 'subsample': 0.9696638383155348, 'colsample_bytree': 0.6557185308974012, 'reg_alpha': 0.00991740477839237, 'reg_lambda': 4.331136902528376e-06}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000968 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001303 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:07:05,532] Trial 19 finished with value: 0.9697498653100227 and parameters: {'n_estimators': 1282, 'learning_rate': 0.059192385279410294, 'num_leaves': 44, 'max_depth': 13, 'min_child_samples': 39, 'subsample': 0.609164469297826, 'colsample_bytree': 0.7342994058559036, 'reg_alpha': 0.7600103107326267, 'reg_lambda': 0.21162189876388224}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000569 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.008377 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:07:07,652] Trial 20 finished with value: 0.9698987129870702 and parameters: {'n_estimators': 1547, 'learning_rate': 0.03987343966939721, 'num_leaves': 29, 'max_depth': 10, 'min_child_samples': 35, 'subsample': 0.9085936896568443, 'colsample_bytree': 0.6311804999085682, 'reg_alpha': 1.6115069494173427e-05, 'reg_lambda': 0.0004650193656332007}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000969 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.007505 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:07:11,560] Trial 21 finished with value: 0.9699044937918092 and parameters: {'n_estimators': 1961, 'learning_rate': 0.04866979337607929, 'num_leaves': 71, 'max_depth': 10, 'min_child_samples': 83, 'subsample': 0.9550506812799392, 'colsample_bytree': 0.7407164077644887, 'reg_alpha': 9.313399494188937, 'reg_lambda': 3.0377757962656458}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001317 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001156 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:07:16,802] Trial 22 finished with value: 0.9698343126236983 and parameters: {'n_estimators': 1248, 'learning_rate': 0.07906994299529747, 'num_leaves': 69, 'max_depth': 12, 'min_child_samples': 75, 'subsample': 0.9253396601956905, 'colsample_bytree': 0.6951543552606926, 'reg_alpha': 1.561602704028839, 'reg_lambda': 3.944048053474485}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008454 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001371 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:07:24,091] Trial 23 finished with value: 0.9694496460640316 and parameters: {'n_estimators': 1937, 'learning_rate': 0.05899027358722868, 'num_leaves': 99, 'max_depth': 9, 'min_child_samples': 87, 'subsample': 0.9992850712299843, 'colsample_bytree': 0.7448156253113232, 'reg_alpha': 0.08643782316600596, 'reg_lambda': 0.030400795610978}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.008045 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000981 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:07:29,771] Trial 24 finished with value: 0.9698746034173638 and parameters: {'n_estimators': 1062, 'learning_rate': 0.07194438389184915, 'num_leaves': 84, 'max_depth': 11, 'min_child_samples': 100, 'subsample': 0.8833498127349907, 'colsample_bytree': 0.643009741660948, 'reg_alpha': 2.7718476442051183, 'reg_lambda': 0.5464781961847323}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001484 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001065 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:07:36,963] Trial 25 finished with value: 0.9693106499461518 and parameters: {'n_estimators': 1358, 'learning_rate': 0.0950525151752271, 'num_leaves': 65, 'max_depth': 14, 'min_child_samples': 68, 'subsample': 0.9680332297394035, 'colsample_bytree': 0.6843334778629298, 'reg_alpha': 0.018466694453006322, 'reg_lambda': 0.0038452971017153208}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001326 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001427 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:07:44,395] Trial 26 finished with value: 0.9699256692016022 and parameters: {'n_estimators': 1582, 'learning_rate': 0.05203512449655085, 'num_leaves': 78, 'max_depth': 9, 'min_child_samples': 59, 'subsample': 0.9706400201037533, 'colsample_bytree': 0.7613348585288641, 'reg_alpha': 0.3492941900292407, 'reg_lambda': 0.10799855521348505}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.009952 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001093 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:07:46,570] Trial 27 finished with value: 0.9700541388920947 and parameters: {'n_estimators': 739, 'learning_rate': 0.09877656697577596, 'num_leaves': 26, 'max_depth': 12, 'min_child_samples': 77, 'subsample': 0.7766859307103555, 'colsample_bytree': 0.9691626132576682, 'reg_alpha': 9.385803394761261, 'reg_lambda': 1.3583557059354614}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4988, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001350 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17140, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.291015 -> initscore=-0.890459
[LightGBM] [Info] Start training from score -0.890459
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001357 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:07:51,961] Trial 28 finished with value: 0.9699051527924911 and parameters: {'n_estimators': 616, 'learning_rate': 0.08096797536614722, 'num_leaves': 23, 'max_depth': 13, 'min_child_samples': 73, 'subsample': 0.7840425935706845, 'colsample_bytree': 0.9929858745120347, 'reg_alpha': 0.0026583001212089202, 'reg_lambda': 0.015923010299398344}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000870 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001373 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:08:02,314] Trial 29 finished with value: 0.9696659601619511 and parameters: {'n_estimators': 768, 'learning_rate': 0.017513017727796863, 'num_leaves': 48, 'max_depth': 12, 'min_child_samples': 52, 'subsample': 0.7561308440230803, 'colsample_bytree': 0.9559396495541364, 'reg_alpha': 1.225741942933749, 'reg_lambda': 1.4253799833236895}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001516 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001005 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:08:08,566] Trial 30 finished with value: 0.9698648417671846 and parameters: {'n_estimators': 937, 'learning_rate': 0.09896525314532377, 'num_leaves': 35, 'max_depth': 12, 'min_child_samples': 63, 'subsample': 0.8639426439172048, 'colsample_bytree': 0.9208847735132576, 'reg_alpha': 0.12287418256111067, 'reg_lambda': 1.1603609978130773}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.013164 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001422 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:08:10,875] Trial 31 finished with value: 0.969918735256891 and parameters: {'n_estimators': 1148, 'learning_rate': 0.06321707203534561, 'num_leaves': 25, 'max_depth': 11, 'min_child_samples': 88, 'subsample': 0.9333449764594806, 'colsample_bytree': 0.6056113298955439, 'reg_alpha': 7.165598836898796, 'reg_lambda': 6.022838044280576}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001377 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001352 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:08:16,160] Trial 32 finished with value: 0.9700261580455833 and parameters: {'n_estimators': 1000, 'learning_rate': 0.08412414969799129, 'num_leaves': 20, 'max_depth': 10, 'min_child_samples': 76, 'subsample': 0.8228264756293406, 'colsample_bytree': 0.8544870929006039, 'reg_alpha': 2.6352286657028237, 'reg_lambda': 0.08013753853125959}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.005588 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001365 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:08:19,002] Trial 33 finished with value: 0.9698479835141388 and parameters: {'n_estimators': 982, 'learning_rate': 0.08248244645388443, 'num_leaves': 20, 'max_depth': 14, 'min_child_samples': 77, 'subsample': 0.8272133468702558, 'colsample_bytree': 0.8985558362480657, 'reg_alpha': 0.9214125342567694, 'reg_lambda': 0.07921565439117702}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001367 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011273 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:08:23,944] Trial 34 finished with value: 0.9700387751323489 and parameters: {'n_estimators': 866, 'learning_rate': 0.09960462265196737, 'num_leaves': 29, 'max_depth': 11, 'min_child_samples': 69, 'subsample': 0.7607797444194937, 'colsample_bytree': 0.9482793537432672, 'reg_alpha': 2.443397173900478, 'reg_lambda': 0.0012401872282795387}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001138 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002827 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:08:27,497] Trial 35 finished with value: 0.9698082374706429 and parameters: {'n_estimators': 827, 'learning_rate': 0.07127017842017726, 'num_leaves': 30, 'max_depth': 12, 'min_child_samples': 69, 'subsample': 0.7306988963908404, 'colsample_bytree': 0.9489563581301028, 'reg_alpha': 0.04368131314406484, 'reg_lambda': 0.0019821531569451083}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.005170 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000588 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [bin

[I 2025-07-22 13:08:33,772] Trial 36 finished with value: 0.9693480602626036 and parameters: {'n_estimators': 696, 'learning_rate': 0.03357199858551962, 'num_leaves': 38, 'max_depth': 13, 'min_child_samples': 59, 'subsample': 0.7660537232524154, 'colsample_bytree': 0.9551314892255328, 'reg_alpha': 2.889498335267209e-08, 'reg_lambda': 0.00012033731555335716}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.010667 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.005473 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:08:40,818] Trial 37 finished with value: 0.9701278139056884 and parameters: {'n_estimators': 874, 'learning_rate': 0.09882058405424764, 'num_leaves': 25, 'max_depth': 5, 'min_child_samples': 64, 'subsample': 0.6878013720724426, 'colsample_bytree': 0.842588796036165, 'reg_alpha': 0.31183772750512007, 'reg_lambda': 0.001072601877047924}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001325 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000973 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:08:49,790] Trial 38 finished with value: 0.9701802038349296 and parameters: {'n_estimators': 718, 'learning_rate': 0.07093884265329671, 'num_leaves': 24, 'max_depth': 6, 'min_child_samples': 63, 'subsample': 0.6697740557504774, 'colsample_bytree': 0.8351927209562011, 'reg_alpha': 0.17390992228846036, 'reg_lambda': 0.006605606284566212}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.011139 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003215 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:09:00,205] Trial 39 finished with value: 0.9701011066356706 and parameters: {'n_estimators': 551, 'learning_rate': 0.07020208670524887, 'num_leaves': 33, 'max_depth': 5, 'min_child_samples': 62, 'subsample': 0.6698214394521275, 'colsample_bytree': 0.8880472656073277, 'reg_alpha': 0.026492372242600187, 'reg_lambda': 0.006260965839065021}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001155 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001327 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:09:05,289] Trial 40 finished with value: 0.9701154249174324 and parameters: {'n_estimators': 659, 'learning_rate': 0.023258313405586272, 'num_leaves': 43, 'max_depth': 5, 'min_child_samples': 54, 'subsample': 0.619345284712611, 'colsample_bytree': 0.8331439440658647, 'reg_alpha': 5.2224107492079995e-05, 'reg_lambda': 0.00017915436104359614}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003621 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003958 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:09:13,643] Trial 41 finished with value: 0.9700648695001671 and parameters: {'n_estimators': 642, 'learning_rate': 0.023394325727103822, 'num_leaves': 42, 'max_depth': 5, 'min_child_samples': 55, 'subsample': 0.601734122516796, 'colsample_bytree': 0.8466137793550116, 'reg_alpha': 2.769981559583036e-05, 'reg_lambda': 0.00014444789926077738}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000993 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000969 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:09:18,290] Trial 42 finished with value: 0.9700264457373974 and parameters: {'n_estimators': 890, 'learning_rate': 0.028912977118749437, 'num_leaves': 56, 'max_depth': 6, 'min_child_samples': 51, 'subsample': 0.6232216789615461, 'colsample_bytree': 0.8312591181228751, 'reg_alpha': 4.858915667315303e-05, 'reg_lambda': 2.722381077094198e-05}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000747 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001331 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:09:24,597] Trial 43 finished with value: 0.9694413165809879 and parameters: {'n_estimators': 725, 'learning_rate': 0.014072436001486993, 'num_leaves': 24, 'max_depth': 6, 'min_child_samples': 65, 'subsample': 0.6994581104586342, 'colsample_bytree': 0.7756306073976342, 'reg_alpha': 3.353523329668865e-06, 'reg_lambda': 4.762355577235795e-06}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001572 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000979 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:09:39,119] Trial 44 finished with value: 0.9697903583989115 and parameters: {'n_estimators': 567, 'learning_rate': 0.01999411862119397, 'num_leaves': 37, 'max_depth': 5, 'min_child_samples': 56, 'subsample': 0.6572798274043254, 'colsample_bytree': 0.8301479574930852, 'reg_alpha': 0.0022767165453298557, 'reg_lambda': 0.0009759300746136513}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.014792 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001360 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:09:46,113] Trial 45 finished with value: 0.9697700303453173 and parameters: {'n_estimators': 1091, 'learning_rate': 0.08844074005882857, 'num_leaves': 51, 'max_depth': 7, 'min_child_samples': 60, 'subsample': 0.6250184936271287, 'colsample_bytree': 0.7981621634047908, 'reg_alpha': 0.30540075113597137, 'reg_lambda': 0.0003363949815899745}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002309 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001381 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:09:51,074] Trial 46 finished with value: 0.9701281269347193 and parameters: {'n_estimators': 807, 'learning_rate': 0.038748522643461764, 'num_leaves': 46, 'max_depth': 6, 'min_child_samples': 66, 'subsample': 0.6903851720400866, 'colsample_bytree': 0.8024842964694352, 'reg_alpha': 0.0007850647155404936, 'reg_lambda': 0.011487171070823332}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001290 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001329 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[Light

[I 2025-07-22 13:09:54,283] Trial 47 finished with value: 0.9700128187422029 and parameters: {'n_estimators': 825, 'learning_rate': 0.04154222769234812, 'num_leaves': 32, 'max_depth': 6, 'min_child_samples': 66, 'subsample': 0.729626018697206, 'colsample_bytree': 0.8088979065318664, 'reg_alpha': 0.12963485805670666, 'reg_lambda': 0.011003672952590336}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001280 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000987 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:09:56,464] Trial 48 finished with value: 0.9701459035983339 and parameters: {'n_estimators': 1173, 'learning_rate': 0.06275060078462034, 'num_leaves': 23, 'max_depth': 7, 'min_child_samples': 46, 'subsample': 0.6962575300572912, 'colsample_bytree': 0.8738322995503156, 'reg_alpha': 0.00041894514116045246, 'reg_lambda': 0.004502823631760421}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000981 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info] Number of positive: 4987, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001256 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17139, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290974 -> initscore=-0.890659
[LightGBM] [Info] Start training from score -0.890659
[LightGBM] [Info

[I 2025-07-22 13:09:58,059] Trial 49 finished with value: 0.9698881288518401 and parameters: {'n_estimators': 1471, 'learning_rate': 0.036109455841462676, 'num_leaves': 23, 'max_depth': 8, 'min_child_samples': 47, 'subsample': 0.6520798370349626, 'colsample_bytree': 0.8878172552412797, 'reg_alpha': 0.000333058663303998, 'reg_lambda': 0.005195999027859686}. Best is trial 10 with value: 0.9701824093379987.


[LightGBM] [Info] Number of positive: 4988, number of negative: 12152
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001355 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 17140, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.291015 -> initscore=-0.890459
[LightGBM] [Info] Start training from score -0.890459

--- Optuna Tuning Complete ---
Best trial value (mean AUC): 0.9702
Best parameters: {'n_estimators': 1082, 'learning_rate': 0.09693678247965194, 'num_leaves': 22, 'max_depth': 14, 'min_child_samples': 64, 'subsample': 0.980325335546167, 'colsample_bytree': 0.6045352741645675, 'reg_alpha': 1.4919457662797337, 'reg_lambda': 0.004559789253411537}

--- Training Final LightGBM Model with Best Parameters ---
[LightGBM] [Info] Number of positive: 6234, number of negative: 15190
[LightGBM] [Info] Auto-choosing

In [19]:
final_lgbm_clf
print(study.best_params)

{'n_estimators': 1082, 'learning_rate': 0.09693678247965194, 'num_leaves': 22, 'max_depth': 14, 'min_child_samples': 64, 'subsample': 0.980325335546167, 'colsample_bytree': 0.6045352741645675, 'reg_alpha': 1.4919457662797337, 'reg_lambda': 0.004559789253411537}


In [21]:
import lightgbm as lgb
import pandas as pd # Assuming pandas is imported for submission_df

# ... (Previous code including data loading, imputation, and Optuna study setup) ...

# Assuming 'study' object exists and has completed its optimization:
# best_lgbm_params = study.best_params # This would be the dictionary you just provided

# If you want to hardcode them directly after the study is finished and you have the values:
best_lgbm_params = {
    'n_estimators': 1082,
    'learning_rate': 0.09693678247965194,
    'num_leaves': 22,
    'max_depth': 14,
    'min_child_samples': 64,
    'subsample': 0.980325335546167,
    'colsample_bytree': 0.6045352741645675,
    'reg_alpha': 1.4919457662797337,
    'reg_lambda': 0.004559789253411537
}

print("Best parameters found by Optuna:")
print(best_lgbm_params)

# 4. Train the final model with the best parameters found by Optuna
final_lgbm_clf = lgb.LGBMClassifier(
    objective='binary',
    metric='auc',
    random_state=42, # Keep for reproducibility
    n_jobs=-1,       # Use all CPU cores
    **best_lgbm_params # This unpacks the dictionary of best parameters into keyword arguments
)

print("\n--- Training Final LightGBM Model with Best Parameters ---")
# Train on the entire fin_train_mf dataset (X_train_lgbm, y_train_lgbm)
# No validation split needed here, as the parameters are already optimized.
final_lgbm_clf.fit(X_train_lgbm, y_train_lgbm)

print("\n--- Final Model Training Complete ---")

# 5. Make final predictions with the optimized model
y_pred_proba_optimized = final_lgbm_clf.predict_proba(X_test_lgbm)[:, 1]

print(f"\nFirst 10 optimized test predictions (probabilities): {y_pred_proba_optimized[:10]}")

# Prepare for submission
# Assuming test_df has the original 'id' (before dropping from X_test_lgbm) as its index
submission_df_optimized = pd.DataFrame({
    'id': test_df.index, # Ensure test_df.index correctly holds the original IDs
    'Personality': y_pred_proba_optimized
})

print("\nOptimized Submission DataFrame Head:")
display(submission_df_optimized.head())

# Save to CSV for submission
submission_df_optimized.to_csv('submission_lgbm_optimized.csv', index=False)
print("\nSubmission file ready to be saved as 'submission_lgbm_optimized.csv'")

Best parameters found by Optuna:
{'n_estimators': 1082, 'learning_rate': 0.09693678247965194, 'num_leaves': 22, 'max_depth': 14, 'min_child_samples': 64, 'subsample': 0.980325335546167, 'colsample_bytree': 0.6045352741645675, 'reg_alpha': 1.4919457662797337, 'reg_lambda': 0.004559789253411537}

--- Training Final LightGBM Model with Best Parameters ---
[LightGBM] [Info] Number of positive: 6234, number of negative: 15190
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.014797 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 1279
[LightGBM] [Info] Number of data points in the train set: 21424, number of used features: 7
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.290982 -> initscore=-0.890619
[LightGBM] [Info] Start training from score -0.890619

--- Final Model Training Complete ---

First 10 optimized test predictions (probabilities): [0.00240724 0.98568456 0.03849625 0.00467518 0.95821825 0.00

Unnamed: 0,id,Personality
0,0,0.002407
1,1,0.985685
2,2,0.038496
3,3,0.004675
4,4,0.958218



Submission file ready to be saved as 'submission_lgbm_optimized.csv'


In [24]:
print("\n--- Making Final Class Predictions ---")
# Predict class labels (0 or 1) using your trained model
y_pred_class_optimized = final_lgbm_clf.predict(X_test_lgbm)

print(f"First 10 predicted numerical classes: {y_pred_class_optimized[:10]}")

# Define the reverse mapping from numerical labels back to string labels
# Based on your previous mapping: {'Extrovert': 0, 'Introvert': 1}
reverse_mapping_personality = {0: 'Extrovert', 1: 'Introvert'}

# Apply the reverse mapping to convert numerical predictions to string labels
predicted_personality_labels = pd.Series(y_pred_class_optimized).map(reverse_mapping_personality)

print(f"First 10 predicted string labels: {predicted_personality_labels[:10].tolist()}")


print("\n--- Preparing Submission File with String Labels ---")


submission_df_final = pd.DataFrame({
    'id': original_test_ids,
    'Personality': predicted_personality_labels
})

print("Final Submission DataFrame Head:")
display(submission_df_final.head()) 

submission_df_final.to_csv('submission_lgbm_final_labels.csv', index=False)

print("\nSubmission file saved as 'submission_lgbm_final_labels.csv'.")
print("This file now contains string labels for 'Personality' and should be ready for submission.")


--- Making Final Class Predictions ---
First 10 predicted numerical classes: [0 1 0 0 1 0 0 1 0 1]
First 10 predicted string labels: ['Extrovert', 'Introvert', 'Extrovert', 'Extrovert', 'Introvert', 'Extrovert', 'Extrovert', 'Introvert', 'Extrovert', 'Introvert']

--- Preparing Submission File with String Labels ---
Final Submission DataFrame Head:


Unnamed: 0,id,Personality
0,18524,Extrovert
1,18525,Introvert
2,18526,Extrovert
3,18527,Extrovert
4,18528,Introvert



Submission file saved as 'submission_lgbm_final_labels.csv'.
This file now contains string labels for 'Personality' and should be ready for submission.


### Personality and Social Behavior Overview

The dataset consists of 74% extroverts, who are generally expected to post more frequently, attend more events, go outside more often, spend less time alone, and maintain larger friend groups. These tendencies are clearly reflected in the histograms shown above.

Time Spent Alone
Time spent alone ranges from 0 to 10 hours, with a strongly right-skewed distribution. Most individuals prefer spending between 0 and 3.5 hours alone, and the frequency drops sharply as time spent alone increases. This indicates that people in the dataset generally favor spending less time alone.

Social Event Attendance
Event attendance ranges from 0 to 10 events. The distribution is multimodal, with a smaller peak in the 0–3 range, followed by a rapid increase in frequency between 3 and 9 events, then tapering off near 10. This pattern suggests that a majority of individuals prefer to attend social events regularly.

Going Outside
The frequency of leaving the house ranges from 0 to 7. Most individuals tend to leave the house frequently, as evidenced by the peak in the 3–7 range, reflecting active social engagement.

Friend Circle Size
Friend group sizes vary from 0 to 15, with a majority falling between 4 and 15. This highlights that most people in the dataset maintain relatively large social circles.

Post Frequency
Post frequency ranges from 0 to 10, with a peak at 3 posts. There is a gradual decrease in frequency as the number of posts increases, suggesting that while many individuals post regularly, fewer post very frequently.