# Pump It Up

## High Level Description

This code was developed as part of the [Pump It Up Competition](https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/page/23/), on DrivenData. The goal of this ML project is to promote access to clean and portable water accross Tanzania. 

The model being developed is an ensemble model, utilizing XGBoost, and CatBoost.

The final ML model will predict whether water pumps should be labelled as "functional", "functional needs repair", or "non functional".  

## Table of Contents
1. [Import and Setup](#import-and-setup)
2. [Data Cleaning and Feature Engineering](#data-cleaning--feature-engineering)
3. [Convert Data Types](#convert-data-types)
4. [Training and Hpyerparameter Tuning](#training-and-hyperparameter-tuning)
   1. [Random Forest](#random-forest)
   2. [XGBoost](#xgboost)
   3. [CatBoost](#catboost)
5. [Ensemble Voting](#ensemble-voting)

### Import and Setup
Import the standard python libraries and datasets.

To download a copy of the dataset for yourself, please sign up for the [Pump It Up Competition](https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/page/23/) on DrivenData, and download it for free.

In [43]:
import numpy as np
import pandas as pd
import HelperFunctions

In [44]:
xtrain = pd.read_csv(r'D:\MMAI\Pump-It-Up\data\x_train.csv')
xtest = pd.read_csv(r'D:\MMAI\Pump-It-Up\data\x_test.csv')
ytrain = pd.read_csv(r'D:\MMAI\Pump-It-Up\data\y_train.csv')
raw = pd.concat([xtrain, xtest])

In [45]:
del ytrain['id']

### Data Cleaning & Feature Engineering

In [46]:
# cleanup - make sure everything is lowercase
raw = HelperFunctions.toLowerCase(raw)

In [47]:
# FE - Handle DateType variables
# Convert date into year, month, and days since recorded

def fixDates(df):
    df['date_recorded'] = pd.to_datetime(df['date_recorded'])
    df['year_recorded'] = (df['date_recorded'].dt.year).astype('int')
    df['month_recorded'] = (df['date_recorded'].dt.month).astype('str')
    df['days_since_recorded'] = ((pd.to_datetime('2001-03-26') - df['date_recorded']).dt.days).astype('int')
    del df['date_recorded']
    return df

xtrain = fixDates(xtrain)
xtest = fixDates(xtest)
raw = fixDates(raw)

In [48]:
# Impute 'longitude' based on subvillage, ward, lga, region
raw = HelperFunctions.imputeLong(raw)

# Impute 'permit' based on subvillage, ward, lga, region
permit_geo_mode = raw.groupby(['subvillage', 'ward', 'lga', 'region'])['permit'].agg(pd.Series.mode).reset_index()
permit_geo_mode = permit_geo_mode.rename(columns={"permit": "imputed_permit_geo"})
raw = raw.merge(permit_geo_mode, how='left', on=['subvillage', 'ward', 'lga', 'region'])
raw['imputed_permit'] = np.where(raw['permit'].isna(), raw['imputed_permit_geo'], raw['permit'])
raw['imputed_permit'] = np.where(raw['imputed_permit'].isna(), raw['permit'].mode()[0], raw['imputed_permit'])
raw = raw.drop(['permit', 'imputed_permit_geo'], axis=1)

# Impute 'population' based on subvillage, ward, lga, region
population_geo_mode = raw.groupby(['subvillage', 'ward', 'lga', 'region'])['population'].agg(pd.Series.mode).reset_index()
population_geo_mode = population_geo_mode.rename(columns={"population": "imputed_population_geo"})
raw = raw.merge(population_geo_mode, how='left', on=['subvillage', 'ward', 'lga', 'region'])
raw['imputed_population'] = np.where(raw['population'].isna(), raw['imputed_population_geo'], raw['population'])
raw['imputed_population'] = np.where(raw['imputed_population'].isna(), raw['population'].mode()[0], raw['imputed_population'])
raw = raw.drop(['population', 'imputed_population_geo'], axis=1)

# Impute 'gps_height' based on basin, subvillage, ward, lga, region
gps_height_geo_mode = raw.groupby(['basin', 'subvillage', 'ward', 'lga', 'region'])['gps_height'].agg(pd.Series.mode).reset_index()
gps_height_geo_mode = gps_height_geo_mode.rename(columns={"gps_height": "imputed_gps_height_geo"})
raw = raw.merge(gps_height_geo_mode, how='left', on=['basin','subvillage', 'ward', 'lga', 'region'])
raw['imputed_gps_height'] = np.where(raw['gps_height'].isna(), raw['imputed_gps_height_geo'], raw['gps_height'])
raw['imputed_gps_height'] = np.where(raw['imputed_gps_height'].isna(), raw['gps_height'].mode()[0], raw['imputed_gps_height'])
raw = raw.drop(['gps_height', 'imputed_gps_height_geo'], axis=1)

# Impute public meeting based on mode
raw['public_meeting']=raw['public_meeting'].fillna(raw['public_meeting'].mode()[0])

# Impute construction_year by mode
raw['construction_year'] = raw['construction_year'].fillna(raw['construction_year'].mode()[0])




In [49]:
# FE - Binning
# Group columns with high feature cardinality

cols = [i for i in raw.columns if type(raw[i].iloc[0]) == str]
raw[cols] = raw[cols].where(raw[cols].apply(lambda x: x.map(x.value_counts())) > 100, "other")
for column in cols:
    for i in raw[column].unique():
        if i not in raw[column].unique():
            raw[column].replace(i, 'other', inplace=True)

In [50]:
# Cleanup - Drop columns that are poorly distributed or have high co-linearity

# Poorly distributed
xtrain, xtest = HelperFunctions.removeCol(xtrain, xtest, 'amount_tsh')
xtrain, xtest = HelperFunctions.removeCol(xtrain, xtest, 'wpt_name')
xtrain, xtest = HelperFunctions.removeCol(xtrain, xtest, 'num_private')

# High co-linearity
xtrain, xtest = HelperFunctions.removeCol(xtrain, xtest, 'quantity_group')
xtrain, xtest = HelperFunctions.removeCol(xtrain, xtest, 'region')
xtrain, xtest = HelperFunctions.removeCol(xtrain, xtest, 'waterpoint_type_group')
xtrain, xtest = HelperFunctions.removeCol(xtrain, xtest, 'extraction_type_group')
xtrain, xtest = HelperFunctions.removeCol(xtrain, xtest, 'source_type')


### Convert Data Types

In [51]:

#replace string to integer
raw['public_meeting'] = raw['public_meeting'].replace({True: 1, False: 0})
raw['imputed_permit'] = raw['imputed_permit'].apply(lambda x: 0 if isinstance(x, list) and not x else x)
raw['imputed_permit'] = raw['imputed_permit'].apply(lambda x: x if x in [True, False] else 0)
raw['imputed_permit'] = raw['imputed_permit'].replace({True: 1, False: 0})


#change to integer
raw[['imputed_gps_height', 'construction_year', 'imputed_population']] = raw[['imputed_gps_height', 'construction_year', 'imputed_population']].astype('int')

#change type to categorical
raw[[ 'region_code', 'district_code', 'num_private']] = raw[[ 'region_code', 'district_code', 'num_private']].astype('str')

#remove decimal
raw['district_code'] = raw['district_code'].str.split(".").str[0]

raw= raw.rename(columns={"imputed_permit": "permit",
                    "imputed_gps_height": "gps_height", 
                   'imputed_population': 'population', 'imputed_longitude': 'longitude'}, errors="raise")


  raw['imputed_permit'] = raw['imputed_permit'].apply(lambda x: x if x in [True, False] else 0)


In [52]:
del raw['amount_tsh']
del raw['wpt_name']
del raw['num_private']
del raw['quantity_group']
del raw['region']
del raw['waterpoint_type_group']
del raw['extraction_type_group']
del raw['source_type']


In [53]:
raw.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 74250 entries, 0 to 74249
Data columns (total 34 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   id                     74250 non-null  int64  
 1   funder                 74250 non-null  object 
 2   installer              74250 non-null  object 
 3   latitude               74250 non-null  float64
 4   basin                  74250 non-null  object 
 5   subvillage             74250 non-null  object 
 6   region_code            74250 non-null  object 
 7   district_code          74250 non-null  object 
 8   lga                    74250 non-null  object 
 9   ward                   74250 non-null  object 
 10  public_meeting         74250 non-null  int64  
 11  recorded_by            74250 non-null  object 
 12  scheme_management      74250 non-null  object 
 13  scheme_name            74250 non-null  object 
 14  construction_year      74250 non-null  int32  
 15  ex

In [54]:
x_train = raw[raw['id'].isin(xtrain['id'])]
x_test = raw[raw['id'].isin(xtest['id'])]
del x_train['id']
del x_test['id']

In [55]:
def oneHotEncoding(X_train, X_test):
    columns = [i for i in X_train.columns if type(X_train[i].iloc[0]) == str]
    for column in columns:
        X_train[column].fillna('NULL', inplace = True)
        good_cols = [column+'_'+i for i in X_train[column].unique() if i in X_test[column].unique()]
        X_train = pd.concat((X_train, pd.get_dummies(X_train[column], prefix = column)[good_cols]), axis = 1)
        X_test = pd.concat((X_test, pd.get_dummies(X_test[column], prefix = column)[good_cols]), axis = 1)
        del X_train[column]
        del X_test[column]
    return X_train, X_test

x_train, x_test = oneHotEncoding(x_train, x_test)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train[column].fillna('NULL', inplace = True)


### Training and Hyperparameter Tuning

Trained and tested the following Models: Random Forest, XGBoost, LGBM, CatBoost. Hypertuned with Optuna.

In [56]:
from sklearn.model_selection import train_test_split

x_train, x_validate, y_train, y_validate = train_test_split(x_train, ytrain, test_size=0.2, random_state=42)

#### Random Forest

In [15]:
import optuna
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

rf = RandomForestClassifier(criterion='gini',
                                max_features='sqrt',
                                min_samples_split=6,
                                oob_score=True,
                                random_state=1,
                                n_jobs=-1)

# param_grid = {"n_estimators" : [500, 750, 1000]}
# param_grid = {"n_estimators" : [1000]}

def objective(trial):
    criterion = trial.suggest_categorical('criterion', ['gini'])
    n_estimators = 700
    # n_estimators = trial.suggest_int('n_estimators', 500, 1500)
    max_depth = trial.suggest_int('max_depth', 10, 100, 10)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 40)
    max_features = trial.suggest_categorical('max_features', ['sqrt'])

    clf = RandomForestClassifier(
        n_estimators=n_estimators, 
        max_depth=max_depth, 
        min_samples_split=min_samples_split,
        max_features=max_features,
        random_state=42
    )
    
    return cross_val_score(clf, x_train, ytrain.values.ravel(), cv=5, scoring='accuracy', verbose=1, error_score='raise', n_jobs=-2).mean()

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=5, n_jobs=-1)

print('Best trial: score {}, params {}'.format(study.best_trial.value, study.best_trial.params))

[I 2024-09-25 15:03:34,261] A new study created in memory with name: no-name-363fa0a7-5146-42cb-ab98-2c50542c068b
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend LokyBackend with 11 concurrent workers.
[Parallel(n_jobs=-2)]: Using backend Lo

Best trial: score 0.8095117845117846, params {'criterion': 'gini', 'max_depth': 80, 'min_samples_split': 12, 'max_features': 'sqrt'}


In [39]:
from sklearn.metrics import accuracy_score
top_params = study.best_params

# Previously found best parameters
# top_params = {'criterion': 'gini', 'n_estimators': 958, 'max_depth': 30, 'min_samples_split': 5, 'max_features': 'sqrt'}
# [I 2023-12-18 11:03:19,199] Trial 15 finished with value: 0.8129966329966329 and parameters: {'criterion': 'gini', 'n_estimators': 958, 'max_depth': 30, 'min_samples_split': 5, 'max_features': 'sqrt'}. Best is trial 15 with value: 0.8129966329966329.

# Train a model on the entire training dataset using the top parameters
final_model = RandomForestClassifier(**top_params, random_state=42, n_jobs=-1)
final_model.fit(x_train, ytrain.values.ravel())

# Make predictions on the test set
test_predictions = final_model.predict(x_validate)

# Optionally, evaluate the predictions
test_accuracy = accuracy_score(y_validate, test_predictions)
print(f'Test Accuracy: {test_accuracy}')

Test Accuracy: 0.8893939393939394


In [None]:
# predictions = best_clf.predict(X_test)
import os
y_test = pd.read_csv('data\y_test.csv')
pred = pd.DataFrame(test_predictions, columns = [y_test.columns[1]])
del y_test['status_group']
y_test = pd.concat((y_test, pred), axis = 1)
y_test.to_csv(os.path.join('data', 'y_test101.csv'), sep=",", index = False)

#### XGBoost

In [58]:
import optuna
from xgboost import XGBClassifier
from sklearn.model_selection import cross_val_score

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
ytrain_encoded = le.fit_transform(y_train.values.ravel())

def objective(trial):
    params = {
        'n_estimators': trial.suggest_int('n_estimators', 750, 1250),
        'max_depth': trial.suggest_int('max_depth', 1, 12),
        'min_child_samples': trial.suggest_int('min_child_samples', 20, 70),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.5),
        'reg_lambda': trial.suggest_float('reg_lambda', 0, 10, step=0.5),
        'reg_alpha': trial.suggest_float('reg_alpha', 0, 10, step=0.5),
        'subsample': trial.suggest_float('subsample', 0.5, 1, step=0.05),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.5, 1, step=0.05),
        'gamma': trial.suggest_int('gamma', 0, 5),
        'max_bin': trial.suggest_int('max_bin', 50, 100),
        'objective': 'multi:softmax'
    }
    
    clf = XGBClassifier(**params)
    
    return cross_val_score(clf, x_train, ytrain_encoded, cv=2, verbose=1, scoring='accuracy').mean()

study2 = optuna.create_study(direction='maximize')
study2.optimize(objective, n_trials=50, n_jobs=-2)


[I 2024-09-25 16:32:30,236] A new study created in memory with name: no-name-7ec4aa6c-5c6f-4235-bf70-0dd3136c083d
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent work

In [59]:
print('Best trial: score {}, params {}'.format(study2.best_trial.value, study2.best_trial.params))

from sklearn.metrics import accuracy_score
top_params2 = study2.best_params

# Previously found best parameters
# [I 2023-12-18 11:03:19,199] Trial 15 finished with value: 0.8129966329966329 and parameters: {'criterion': 'gini', 'n_estimators': 958, 'max_depth': 30, 'min_samples_split': 5, 'max_features': 'sqrt'}. Best is trial 15 with value: 0.8129966329966329.
# [Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.

# Train a model on the entire training dataset using the top parameters
final_model2 = XGBClassifier(**top_params2, random_state=42, n_jobs=-1)
final_model2.fit(x_train, ytrain_encoded)

# Make predictions on the test set
test_predictions2 = final_model2.predict(x_validate)

# Optionally, evaluate the predictions
test_accuracy2 = accuracy_score(y_validate, test_predictions)
print(f'Test Accuracy: {test_accuracy2}')

Best trial: score 0.7934132996632997, params {'n_estimators': 825, 'max_depth': 8, 'min_child_samples': 36, 'learning_rate': 0.4408580704679031, 'reg_lambda': 7.0, 'reg_alpha': 8.0, 'subsample': 0.8500000000000001, 'colsample_bytree': 0.55, 'gamma': 0, 'max_bin': 50}


Parameters: { "min_child_samples" } are not used.



Test Accuracy: 0.8893939393939394


In [None]:
# predictions = best_clf.predict(X_test)
import os
y_test = pd.read_csv('data\y_test.csv')
decoded_predictions = le.inverse_transform(test_predictions2)
pred = pd.DataFrame(decoded_predictions, columns = [y_test.columns[1]])
del y_test['status_group']
y_test = pd.concat((y_test, pred), axis = 1)
y_test.to_csv(os.path.join('data', 'y_test102.csv'), sep=",", index = False)

#### CatBoost

In [62]:
from catboost import CatBoostClassifier
from sklearn.model_selection import cross_val_score
import optuna

def objective(trial):
    params = {
        'iterations': trial.suggest_int('iterations', 750, 1250),
        # 'n_estimators': trial.suggest_int('n_estimators', 750, 1250),
        'max_depth': trial.suggest_int('max_depth', 2, 10),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.3),
        'random_strength': trial.suggest_int('random_strength', 0, 100),
        'bagging_temperature': trial.suggest_float('bagging_temperature', 0.0, 1.0),
        'od_type': trial.suggest_categorical('od_type', ['IncToDec', 'Iter']),
        'od_wait': trial.suggest_int('od_wait', 10, 50),
        'max_bin': trial.suggest_int('max_bin', 50, 100)
    }
    
    clf = CatBoostClassifier(**params, verbose=1)
    
    return cross_val_score(clf, x_train, y_train.values.ravel(), cv=2, scoring='accuracy').mean()

study3 = optuna.create_study(direction='maximize')
study3.optimize(objective, n_trials=50, n_jobs=10)

print('Best trial: score {}, params {}'.format(study3.best_trial.value, study3.best_trial.params))

[I 2024-09-25 17:28:27,487] A new study created in memory with name: no-name-6147c16e-50c1-4b92-a147-63ff7134e57d


0:	learn: 1.0321821	total: 273ms	remaining: 4m 30s
0:	learn: 1.0538078	total: 354ms	remaining: 5m 36s
0:	learn: 1.0424228	total: 298ms	remaining: 5m 36s
0:	learn: 0.9505142	total: 281ms	remaining: 5m 3s
0:	learn: 1.0103642	total: 461ms	remaining: 6m 35s
1:	learn: 0.8787671	total: 327ms	remaining: 2m 56s
1:	learn: 0.9642685	total: 496ms	remaining: 3m 32s
1:	learn: 1.0086351	total: 421ms	remaining: 3m 20s
0:	learn: 1.0569662	total: 20ms	remaining: 16.9s
2:	learn: 0.9814540	total: 443ms	remaining: 2m 20s
1:	learn: 0.9893021	total: 332ms	remaining: 3m 7s
1:	learn: 0.9872549	total: 326ms	remaining: 2m 41s
2:	learn: 0.9251681	total: 549ms	remaining: 2m 36s
0:	learn: 1.0293834	total: 24.1ms	remaining: 18.4s
2:	learn: 0.8382579	total: 384ms	remaining: 2m 17s
1:	learn: 1.0242465	total: 60ms	remaining: 25.3s
2:	learn: 0.9505969	total: 364ms	remaining: 2m
2:	learn: 0.9531456	total: 372ms	remaining: 2m 19s
3:	learn: 0.9463115	total: 491ms	remaining: 1m 56s
1:	learn: 0.9681983	total: 48ms	remaining

[I 2024-09-25 17:29:02,551] Trial 6 finished with value: 0.7792508417508417 and parameters: {'iterations': 764, 'max_depth': 3, 'learning_rate': 0.22191814222752515, 'random_strength': 23, 'bagging_temperature': 0.532719309869351, 'od_type': 'IncToDec', 'od_wait': 27, 'max_bin': 98}. Best is trial 6 with value: 0.7792508417508417.


717:	learn: 0.3650102	total: 33.2s	remaining: 6.53s
663:	learn: 0.4611082	total: 32s	remaining: 26.5s
979:	learn: 0.5271819	total: 32.8s	remaining: 435ms
870:	learn: 0.5198793	total: 33s	remaining: 3.1s
724:	learn: 0.4816406	total: 32.8s	remaining: 18.4s
113:	learn: 0.6815057	total: 3.26s	remaining: 20.9s
740:	learn: 0.3429307	total: 33.1s	remaining: 15.2s
payment_pay monthly, bin=0 score 3.383750483
population, bin=3 score 3.380401821
235:	learn: 0.5994772	total: 33.4s	remaining: 1m 48s
980:	learn: 0.5271207	total: 32.8s	remaining: 401ms
quantity_seasonal, bin=0 score 4.688030456


871:	learn: 0.5198224	total: 33s	remaining: 3.06s
725:	learn: 0.4815051	total: 32.8s	remaining: 18.3s

district_code_1, bin=0 score 3.947620741

funder_concern world wide, bin=0 score 0.6276069387

source_machine dbh, bin=0 score 0.9971122389
subvillage_majengo, bin=0 score 15.58631446
ward_ihanda, bin=0 score 1.054533124
funder_roman catholic, bin=0 score 1.630557336
664:	learn: 0.4609565	total: 32.1s	rema

[I 2024-09-25 17:29:26,952] Trial 5 finished with value: 0.7623106060606061 and parameters: {'iterations': 846, 'max_depth': 2, 'learning_rate': 0.12677143349226025, 'random_strength': 47, 'bagging_temperature': 0.129115390749024, 'od_type': 'IncToDec', 'od_wait': 33, 'max_bin': 59}. Best is trial 6 with value: 0.7792508417508417.


165:	learn: 0.4762819	total: 7.71s	remaining: 42.5s
1141:	learn: 0.4168287	total: 55.8s	remaining: 3.42s
675:	learn: 0.5430290	total: 21.9s	remaining: 10.2s
357:	learn: 0.4271498	total: 16.4s	remaining: 22.9s
166:	learn: 0.4760254	total: 7.8s	remaining: 42.7s
119:	learn: 0.6042792	total: 4.2s	remaining: 35.4s
542:	learn: 0.5417415	total: 19.1s	remaining: 14.4s
0:	learn: 1.0346892	total: 20.3ms	remaining: 22.3s
1142:	learn: 0.4167718	total: 56s	remaining: 3.38s
167:	learn: 0.4754478	total: 7.84s	remaining: 42.6s
394:	learn: 0.5007618	total: 57.7s	remaining: 1m 28s
358:	learn: 0.4268058	total: 16.4s	remaining: 22.8s
1:	learn: 0.9767892	total: 39.3ms	remaining: 21.7s
120:	learn: 0.6039382	total: 4.23s	remaining: 35.3s
676:	learn: 0.5429728	total: 21.9s	remaining: 10.2s
2:	learn: 0.9462202	total: 61.1ms	remaining: 22.4s
121:	learn: 0.6028861	total: 4.25s	remaining: 35.1s
3:	learn: 0.9045097	total: 82.3ms	remaining: 22.6s
677:	learn: 0.5429136	total: 21.9s	remaining: 10.2s
1143:	learn: 0.41

[I 2024-09-25 17:29:37,752] Trial 2 finished with value: 0.7713383838383838 and parameters: {'iterations': 993, 'max_depth': 2, 'learning_rate': 0.2163424617729864, 'random_strength': 23, 'bagging_temperature': 0.35042904234563954, 'od_type': 'IncToDec', 'od_wait': 31, 'max_bin': 70}. Best is trial 6 with value: 0.7792508417508417.


gps_height, bin=12 score 2.467956345
ward_mkula, bin=0 score 0.9242846216
extraction_type_class_submersible, bin=0 score 6.216451983
extraction_type_other - play pump, bin=0 score 3.000537771
construction_year, bin=26 score 1.671841926
lga_kilosa, bin=0 score 3.823728326
460:	learn: 0.4779047	total: 1m 8s	remaining: 1m 20s
source_class_surface, bin=0 score 3.944504228
807:	learn: 0.5188110	total: 29.5s	remaining: 5.29s
ward_nduruma, bin=0 score 2.352625167
construction_year, bin=44 score 2.993892862


125:	learn: 0.5850055	total: 6.37s	remaining: 54.9s
permit, bin=0 score 3.599879119
ward_chimala, bin=0 score 1.20947093
water_quality_fluoride, bin=0 score 2.660011573
503:	learn: 0.5167670	total: 10.4s	remaining: 12.3s

latitude, bin=42 score 4.426489067
waterpoint_type_other, bin=0 score 3.470634743
lga_uyui, bin=0 score 1.656225199

extraction_type_class_handpump, bin=0 score 2.440893059
596:	learn: 0.3756517	total: 26.8s	remaining: 11.8s
lga_ngara, bin=0 score 4.866729106
month_recor

[I 2024-09-25 17:29:44,101] Trial 1 finished with value: 0.7755892255892256 and parameters: {'iterations': 953, 'max_depth': 3, 'learning_rate': 0.13759131342630318, 'random_strength': 60, 'bagging_temperature': 0.6976311800110973, 'od_type': 'Iter', 'od_wait': 27, 'max_bin': 100}. Best is trial 6 with value: 0.7792508417508417.


803:	learn: 0.4871633	total: 16.5s	remaining: 6.15s
733:	learn: 0.3548650	total: 33s	remaining: 5.63s
400:	learn: 0.5525376	total: 15.7s	remaining: 31.1s
538:	learn: 0.3673420	total: 24.6s	remaining: 24.8s
804:	learn: 0.4870843	total: 16.5s	remaining: 6.13s
464:	learn: 0.5068875	total: 20.9s	remaining: 29.9s
401:	learn: 0.5523939	total: 15.8s	remaining: 31.1s
734:	learn: 0.3547059	total: 33.1s	remaining: 5.58s
805:	learn: 0.4869358	total: 16.6s	remaining: 6.11s
248:	learn: 0.5273830	total: 12.7s	remaining: 49.1s
539:	learn: 0.3672410	total: 24.7s	remaining: 24.7s
402:	learn: 0.5523134	total: 15.8s	remaining: 31s
806:	learn: 0.4869234	total: 16.6s	remaining: 6.08s
735:	learn: 0.3546375	total: 33.1s	remaining: 5.54s
249:	learn: 0.5270227	total: 12.8s	remaining: 49.2s
807:	learn: 0.4868051	total: 16.6s	remaining: 6.07s
498:	learn: 0.4664179	total: 1m 14s	remaining: 1m 15s
736:	learn: 0.3544430	total: 33.2s	remaining: 5.49s
540:	learn: 0.3671159	total: 24.7s	remaining: 24.7s
478:	learn: 0.

[I 2024-09-25 17:29:50,038] Trial 3 finished with value: 0.7944654882154882 and parameters: {'iterations': 859, 'max_depth': 7, 'learning_rate': 0.1999892584389942, 'random_strength': 20, 'bagging_temperature': 0.31648844439455914, 'od_type': 'IncToDec', 'od_wait': 47, 'max_bin': 50}. Best is trial 3 with value: 0.7944654882154882.


68:	learn: 0.6469266	total: 11.6s	remaining: 2m 2s
91:	learn: 0.7087107	total: 5.26s	remaining: 51.5s
515:	learn: 0.3011142	total: 1m 20s	remaining: 1m 43s
543:	learn: 0.5354064	total: 22s	remaining: 26.2s
354:	learn: 0.5025308	total: 18.9s	remaining: 45.7s
676:	learn: 0.3455871	total: 30.9s	remaining: 18.4s
587:	learn: 0.4914140	total: 27.1s	remaining: 25s


[I 2024-09-25 17:29:50,635] Trial 10 finished with value: 0.7820496632996633 and parameters: {'iterations': 1103, 'max_depth': 3, 'learning_rate': 0.2020325954420567, 'random_strength': 55, 'bagging_temperature': 0.14796417134907802, 'od_type': 'Iter', 'od_wait': 43, 'max_bin': 80}. Best is trial 3 with value: 0.7944654882154882.


7:	learn: 0.8555637	total: 308ms	remaining: 31.4s
594:	learn: 0.4907429	total: 27.6s	remaining: 24.9s
538:	learn: 0.4556911	total: 1m 21s	remaining: 1m 10s
550:	learn: 0.5347374	total: 22.5s	remaining: 26.3s
8:	learn: 0.8412648	total: 330ms	remaining: 29.9s
551:	learn: 0.5345994	total: 22.5s	remaining: 26.2s
683:	learn: 0.3446408	total: 31.4s	remaining: 18.2s
595:	learn: 0.4905965	total: 27.6s	remaining: 24.8s
362:	learn: 0.5010971	total: 19.5s	remaining: 45.6s
9:	learn: 0.8288471	total: 355ms	remaining: 28.9s
97:	learn: 0.7044760	total: 5.85s	remaining: 53.5s
596:	learn: 0.4905251	total: 27.7s	remaining: 24.7s
552:	learn: 0.5345409	total: 22.6s	remaining: 26.2s
10:	learn: 0.8216125	total: 381ms	remaining: 28.2s
71:	learn: 0.6444449	total: 12.2s	remaining: 2m 3s
597:	learn: 0.4903464	total: 27.7s	remaining: 24.7s
684:	learn: 0.3445773	total: 31.5s	remaining: 18.2s
363:	learn: 0.5008423	total: 19.6s	remaining: 45.6s
11:	learn: 0.8125830	total: 410ms	remaining: 27.7s
98:	learn: 0.7035794

[I 2024-09-25 17:30:08,944] Trial 9 finished with value: 0.7927819865319865 and parameters: {'iterations': 1081, 'max_depth': 7, 'learning_rate': 0.25080291382243264, 'random_strength': 8, 'bagging_temperature': 0.6756941270927622, 'od_type': 'IncToDec', 'od_wait': 17, 'max_bin': 50}. Best is trial 3 with value: 0.7944654882154882.


basin_lake rukwa, bin=0 score 3.282638338

days_since_recorded, bin=42 score 4.124814036
construction_year, bin=41 score 4.950658394
installer_mission, bin=0 score 2.44928514
scheme_management_other, bin=0 score 5.933250556
subvillage_madukani, bin=0 score 2.971496937
extraction_type_class_submersible, bin=0 score 4.924445179
population, bin=15 score 6.009864708
lga_kisarawe, bin=0 score 1.818641517
source_machine dbh, bin=0 score 5.892487691
738:	learn: 0.5112344	total: 18s	remaining: 2.07s

district_code_3, bin=0 score 7.238388197
lga_shinyanga urban, bin=0 score 1.00019327
payment_pay per bucket, bin=0 score 1.800302082
installer_isf, bin=0 score 2.41700478
quantity_insufficient, bin=0 score 3.164845638
458:	learn: 0.5667941	total: 17.2s	remaining: 15.4s
1011:	learn: 0.4992580	total: 40.2s	remaining: 7.23s


scheme_name_maambreni gravity water supply, bin=0 score 1.725160471
funder_government of tanzania, bin=0 score 2.791797657
lga_mpwapwa, bin=0 score 1.921722879
648:	learn: 0.432

[I 2024-09-25 17:30:15,620] Trial 4 finished with value: 0.7860058922558923 and parameters: {'iterations': 1131, 'max_depth': 4, 'learning_rate': 0.1724124533186708, 'random_strength': 43, 'bagging_temperature': 0.6170398667191153, 'od_type': 'IncToDec', 'od_wait': 24, 'max_bin': 57}. Best is trial 3 with value: 0.7944654882154882.


216:	learn: 0.5942848	total: 5.58s	remaining: 19.2s
495:	learn: 0.5422399	total: 30s	remaining: 30.1s
229:	learn: 0.4840032	total: 36.6s	remaining: 1m 29s
111:	learn: 0.6553594	total: 3.1s	remaining: 19.7s
1169:	learn: 0.4900136	total: 46.6s	remaining: 956ms
687:	learn: 0.4250999	total: 1m 45s	remaining: 48.6s
668:	learn: 0.2708952	total: 1m 44s	remaining: 1m 20s
841:	learn: 0.4378819	total: 43.7s	remaining: 19.2s
1170:	learn: 0.4899513	total: 46.6s	remaining: 916ms
642:	learn: 0.5356891	total: 23.7s	remaining: 8.34s
112:	learn: 0.6546008	total: 3.12s	remaining: 19.6s
lga_manyoni, bin=0 score 1.794961208
496:	learn: 0.5420575	total: 30s	remaining: 30s
funder_undp, bin=0 score 2.619820537

subvillage_amani, bin=0 score 3.108962552
extraction_type_class_submersible, bin=0 score 2.924839402
funder_h, bin=0 score 1.373455679

management_group_user-group, bin=0 score 4.33835741

population, bin=44 score 3.582380992
days_since_recorded, bin=19 score 3.696143987
installer_hsw, bin=0 score 1.2

[I 2024-09-25 17:30:32,993] Trial 14 finished with value: 0.7791245791245791 and parameters: {'iterations': 824, 'max_depth': 4, 'learning_rate': 0.10023110807697315, 'random_strength': 13, 'bagging_temperature': 0.22697441701154475, 'od_type': 'Iter', 'od_wait': 32, 'max_bin': 89}. Best is trial 3 with value: 0.7944654882154882.


subvillage_kibaoni, bin=0 score 41.38144663
management_vwc, bin=0 score 3.085530025
ward_ruhembe, bin=0 score 1.738246736
extraction_type_class_handpump, bin=0 score 5.115753184
extraction_type_india mark ii, bin=0 score 2.429967788
376:	learn: 0.5502353	total: 14.8s	remaining: 32.2s
permit, bin=0 score 4.548777437
installer_centr, bin=0 score 3.535335513
lga_mtwara urban, bin=0 score 3.611370802
month_recorded_6, bin=0 score 34.16625088
installer_lga, bin=0 score 1.967436598
963:	learn: 0.4957012	total: 23.1s	remaining: 24ms
386:	learn: 0.5487587	total: 15.3s	remaining: 31.9s
759:	learn: 0.5073512	total: 47.8s	remaining: 14.6s
243:	learn: 0.6439620	total: 9.15s	remaining: 23.4s
964:	learn: 0.4956446	total: 23.1s	remaining: 0us
340:	learn: 0.4384361	total: 54.4s	remaining: 1m 12s
1146:	learn: 0.4106056	total: 1m 1s	remaining: 3.48s
481:	learn: 0.5781606	total: 16.5s	remaining: 15.3s
387:	learn: 0.5486205	total: 15.3s	remaining: 31.8s
771:	learn: 0.2528429	total: 2m 2s	remaining: 1m 5s


[I 2024-09-25 17:30:37,743] Trial 7 finished with value: 0.7913089225589225 and parameters: {'iterations': 1212, 'max_depth': 5, 'learning_rate': 0.16620531444863948, 'random_strength': 100, 'bagging_temperature': 0.6745984494934189, 'od_type': 'Iter', 'od_wait': 14, 'max_bin': 56}. Best is trial 3 with value: 0.7944654882154882.



payment_pay per bucket, bin=0 score 5.139141371
funder_danida, bin=0 score 3.90627735
356:	learn: 0.5887631	total: 13.3s	remaining: 19.1s
ward_zinga/ikerege, bin=0 score 3.000969547
basin_lake victoria, bin=0 score 6.233277031
management_company, bin=0 score 9.718083495
waterpoint_type_improved spring, bin=0 score 4.529238011

region_code_19, bin=0 score 10.45608134
ward_other, bin=0 score 12.4031111
ward_other, bin=0 score 3.347118919
503:	learn: 0.5341813	total: 19.5s	remaining: 26.7s
days_since_recorded, bin=59 score 14.76181911
lga_newala, bin=0 score 2.259370572

extraction_type_nira/tanira, bin=0 score 3.255445719
108:	learn: 0.6677242	total: 2.85s	remaining: 22.4s
management_wua, bin=0 score 1.36561934
latitude, bin=37 score 4.866114438

lga_ilemela, bin=0 score 1.308337619
gps_height, bin=93 score 1.528477812
waterpoint_type_communal standpipe multiple, bin=0 score 2.148858554
population, bin=69 score 4.399156981
region_code_1, bin=0 score 5.766635931
602:	learn: 0.5674967	tot

[I 2024-09-25 17:30:57,242] Trial 15 finished with value: 0.7775252525252525 and parameters: {'iterations': 869, 'max_depth': 6, 'learning_rate': 0.045262792486265684, 'random_strength': 46, 'bagging_temperature': 0.022580886477585493, 'od_type': 'IncToDec', 'od_wait': 42, 'max_bin': 71}. Best is trial 3 with value: 0.7944654882154882.


935:	learn: 0.4986287	total: 38.9s	remaining: 10.7s
871:	learn: 0.4971879	total: 22.3s	remaining: 2.38s
144:	learn: 0.6356245	total: 4.81s	remaining: 26.1s
193:	learn: 0.5380846	total: 23.7s	remaining: 1m 43s
85:	learn: 0.7079059	total: 5.97s	remaining: 1m 3s
895:	learn: 0.2352096	total: 2m 27s	remaining: 47.1s
872:	learn: 0.4971244	total: 22.3s	remaining: 2.35s
70:	learn: 0.7646652	total: 18.7s	remaining: 3m 41s
928:	learn: 0.3886969	total: 2m 27s	remaining: 11.9s
873:	learn: 0.4970480	total: 22.3s	remaining: 2.33s
936:	learn: 0.4985484	total: 39s	remaining: 10.7s
145:	learn: 0.6352815	total: 4.86s	remaining: 26.1s
874:	learn: 0.4969990	total: 22.4s	remaining: 2.3s
937:	learn: 0.4984890	total: 39s	remaining: 10.6s
146:	learn: 0.6348893	total: 4.89s	remaining: 26.1s
875:	learn: 0.4968632	total: 22.4s	remaining: 2.27s
86:	learn: 0.7074322	total: 6.06s	remaining: 1m 3s
147:	learn: 0.6344412	total: 4.91s	remaining: 26s
876:	learn: 0.4968131	total: 22.4s	remaining: 2.25s
938:	learn: 0.4984

[I 2024-09-25 17:31:00,283] Trial 16 finished with value: 0.7796296296296297 and parameters: {'iterations': 965, 'max_depth': 4, 'learning_rate': 0.11134354866754764, 'random_strength': 36, 'bagging_temperature': 0.8011975135581709, 'od_type': 'Iter', 'od_wait': 48, 'max_bin': 89}. Best is trial 3 with value: 0.7944654882154882.


907:	learn: 0.2338269	total: 2m 29s	remaining: 45.2s
118:	learn: 0.6849580	total: 8.36s	remaining: 1m 1s
214:	learn: 0.6136113	total: 7.17s	remaining: 23.8s
12:	learn: 0.6052432	total: 1.97s	remaining: 2m 40s
lga_nkasi, bin=0 score 5.776483997
quantity_dry, bin=0 score 2.712773714
gps_height, bin=5 score 46.7151156

lga_magu, bin=0 score 3.761225142
population, bin=63 score 2.468588172
district_code_6, bin=0 score 0.9054434077
funder_amref, bin=0 score 3.603668665
extraction_type_afridev, bin=0 score 3.367314101
district_code_6, bin=0 score 58.53273762
funder_w.b, bin=0 score 5.365737318
998:	learn: 0.4947948	total: 41.4s	remaining: 8.07s
region_code_14, bin=0 score 45.62607689

latitude, bin=83 score 1.194670737
subvillage_misufini, bin=0 score 84.52871587
region_code_20, bin=0 score 6.595309621
215:	learn: 0.6134686	total: 7.2s	remaining: 23.8s
installer_artisan, bin=0 score 2.162832079
management_other - school, bin=0 score 46.75664514

scheme_name_shallow well, bin=0 score 3.824865

[I 2024-09-25 17:31:08,627] Trial 11 finished with value: 0.7787668350168351 and parameters: {'iterations': 1194, 'max_depth': 3, 'learning_rate': 0.17976281032695623, 'random_strength': 98, 'bagging_temperature': 0.9036428159089007, 'od_type': 'Iter', 'od_wait': 16, 'max_bin': 62}. Best is trial 3 with value: 0.7944654882154882.


funder_world vision, bin=0 score 5.612094285
quantity_insufficient, bin=0 score 1.484157926
458:	learn: 0.5760164	total: 15.1s	remaining: 15.5s
region_code_5, bin=0 score 10.04473
month_recorded_7, bin=0 score 108.3004063
241:	learn: 0.6155788	total: 16.4s	remaining: 51s
longitude, bin=12 score 6.886901488

521:	learn: 0.3909864	total: 1m 29s	remaining: 46.8s
459:	learn: 0.5758563	total: 15.1s	remaining: 15.5s
276:	learn: 0.4960454	total: 34.3s	remaining: 1m 34s
47:	learn: 0.5430471	total: 7.06s	remaining: 2m 31s
460:	learn: 0.5757530	total: 15.2s	remaining: 15.4s
74:	learn: 0.4723349	total: 10.2s	remaining: 2m 15s
461:	learn: 0.5756792	total: 15.2s	remaining: 15.4s
242:	learn: 0.6150247	total: 16.5s	remaining: 50.8s
949:	learn: 0.2286227	total: 2m 37s	remaining: 38.7s
462:	learn: 0.5756086	total: 15.2s	remaining: 15.4s
981:	learn: 0.3815806	total: 2m 38s	remaining: 3.55s
463:	learn: 0.5755271	total: 15.2s	remaining: 15.3s
277:	learn: 0.4957037	total: 34.4s	remaining: 1m 33s
522:	learn

[I 2024-09-25 17:31:24,507] Trial 17 finished with value: 0.767824074074074 and parameters: {'iterations': 930, 'max_depth': 2, 'learning_rate': 0.15990277605763975, 'random_strength': 38, 'bagging_temperature': 0.3823489718402179, 'od_type': 'IncToDec', 'od_wait': 50, 'max_bin': 65}. Best is trial 3 with value: 0.7944654882154882.


168:	learn: 0.4118625	total: 22.8s	remaining: 2m 2s
extraction_type_nira/tanira, bin=0 score 6.716726978
lga_magu, bin=0 score 5.844278961

latitude, bin=16 score 5.098184018
ward_maji ya chai, bin=0 score 1.088277085
scheme_management_vwc, bin=0 score 6.698449518
population, bin=63 score 4.455108626
payment_other, bin=0 score 114.2813025
485:	learn: 0.5381796	total: 31.9s	remaining: 33.3s
subvillage_bondeni, bin=0 score 1.462999126
installer_other, bin=0 score 5.756577539

construction_year, bin=18 score 2.178275763
funder_lga, bin=0 score 1.811409626
construction_year, bin=52 score 2.762355002
installer_isf, bin=0 score 3.791381041
waterpoint_type_hand pump, bin=0 score 5.198931147
payment_type_annually, bin=0 score 4.330271484
quantity_enough, bin=0 score 6.548925737
installer_rwe, bin=0 score 2.54883808
payment_pay monthly, bin=0 score 4.953305614
extraction_type_class_submersible, bin=0 score 3.127465787
lga_serengeti, bin=0 score 154.7912915
construction_year, bin=44 score 2.9333

[I 2024-09-25 17:31:55,721] Trial 13 finished with value: 0.782260101010101 and parameters: {'iterations': 993, 'max_depth': 6, 'learning_rate': 0.056253454796460885, 'random_strength': 42, 'bagging_temperature': 0.6206818051821733, 'od_type': 'IncToDec', 'od_wait': 21, 'max_bin': 57}. Best is trial 3 with value: 0.7944654882154882.


longitude, bin=20 score 1.946104784
source_shallow well, bin=0 score 3.473516305
longitude, bin=53 score 0.83175544
population, bin=66 score 72.09102773
228:	learn: 0.3725178	total: 45.6s	remaining: 2m 46s
basin_wami / ruvu, bin=0 score 1.200637634

quality_group_unknown, bin=0 score 35.70073557
extraction_type_windmill, bin=0 score 2.265847587
lga_karagwe, bin=0 score 3.967973461
funder_other, bin=0 score 6.579327573
761:	learn: 0.3470410	total: 2m 15s	remaining: 6.06s
139:	learn: 0.4272127	total: 30s	remaining: 3m 18s
425:	learn: 0.3136302	total: 57s	remaining: 1m 26s
396:	learn: 0.3143202	total: 54s	remaining: 1m 32s
620:	learn: 0.4063243	total: 1m 21s	remaining: 54.3s
207:	learn: 0.6060028	total: 41s	remaining: 2m 36s
1179:	learn: 0.2041266	total: 3m 24s	remaining: 520ms
229:	learn: 0.3721433	total: 45.8s	remaining: 2m 46s
762:	learn: 0.3469286	total: 2m 16s	remaining: 5.89s
306:	learn: 0.6187146	total: 1m 16s	remaining: 2m 30s
426:	learn: 0.3133291	total: 57.1s	remaining: 1m 25s
3

[I 2024-09-25 17:34:40,620] Trial 0 finished with value: 0.7950126262626263 and parameters: {'iterations': 1004, 'max_depth': 10, 'learning_rate': 0.05040885439946374, 'random_strength': 84, 'bagging_temperature': 0.13173456172805698, 'od_type': 'Iter', 'od_wait': 23, 'max_bin': 95}. Best is trial 0 with value: 0.7950126262626263.


734:	learn: 0.3425854	total: 2m 36s	remaining: 13s
1044:	learn: 0.2049172	total: 3m 29s	remaining: 4.01s
766:	learn: 0.2484146	total: 2m 41s	remaining: 1m 27s
467:	learn: 0.2954041	total: 1m 11s	remaining: 1m 31s
419:	learn: 0.3043058	total: 1m 7s	remaining: 1m 44s
654:	learn: 0.3929649	total: 1m 41s	remaining: 59s
952:	learn: 0.2158100	total: 3m 13s	remaining: 22.8s
23:	learn: 0.8776760	total: 6.36s	remaining: 3m 55s
1045:	learn: 0.2048461	total: 3m 29s	remaining: 3.81s
806:	learn: 0.2314585	total: 2m 42s	remaining: 53.8s
735:	learn: 0.3424118	total: 2m 36s	remaining: 12.7s
468:	learn: 0.2951324	total: 1m 11s	remaining: 1m 31s
420:	learn: 0.3040104	total: 1m 7s	remaining: 1m 44s
767:	learn: 0.2482646	total: 2m 42s	remaining: 1m 27s
655:	learn: 0.3927853	total: 1m 41s	remaining: 58.8s
953:	learn: 0.2157303	total: 3m 14s	remaining: 22.6s
807:	learn: 0.2314141	total: 2m 42s	remaining: 53.6s
1046:	learn: 0.2048159	total: 3m 30s	remaining: 3.61s
24:	learn: 0.8738545	total: 6.6s	remaining: 

[I 2024-09-25 17:34:52,858] Trial 12 finished with value: 0.7956860269360269 and parameters: {'iterations': 796, 'max_depth': 9, 'learning_rate': 0.11445497411028978, 'random_strength': 36, 'bagging_temperature': 0.5293318992859163, 'od_type': 'IncToDec', 'od_wait': 20, 'max_bin': 73}. Best is trial 12 with value: 0.7956860269360269.


732:	learn: 0.3806202	total: 1m 53s	remaining: 46.9s
76:	learn: 0.7524967	total: 18.5s	remaining: 3m 20s
33:	learn: 0.5298291	total: 7.25s	remaining: 3m 39s
51:	learn: 0.5618185	total: 11.1s	remaining: 3m 33s
869:	learn: 0.2229420	total: 2m 54s	remaining: 41s
828:	learn: 0.2397958	total: 2m 54s	remaining: 1m 14s
546:	learn: 0.2785152	total: 1m 23s	remaining: 1m 19s
499:	learn: 0.2826394	total: 1m 19s	remaining: 1m 31s
1017:	learn: 0.2097874	total: 3m 26s	remaining: 9.52s
733:	learn: 0.3805550	total: 1m 53s	remaining: 46.8s
547:	learn: 0.2782739	total: 1m 23s	remaining: 1m 19s
52:	learn: 0.5587166	total: 11.3s	remaining: 3m 32s
34:	learn: 0.5281028	total: 7.48s	remaining: 3m 40s
77:	learn: 0.7518250	total: 18.7s	remaining: 3m 20s
870:	learn: 0.2227867	total: 2m 54s	remaining: 40.8s
829:	learn: 0.2397102	total: 2m 54s	remaining: 1m 14s
500:	learn: 0.2823690	total: 1m 19s	remaining: 1m 31s
734:	learn: 0.3804685	total: 1m 53s	remaining: 46.6s
548:	learn: 0.2781179	total: 1m 23s	remaining: 

[I 2024-09-25 17:35:42,738] Trial 18 finished with value: 0.7952651515151514 and parameters: {'iterations': 1036, 'max_depth': 9, 'learning_rate': 0.08240943499628922, 'random_strength': 14, 'bagging_temperature': 0.5898632399160075, 'od_type': 'IncToDec', 'od_wait': 50, 'max_bin': 95}. Best is trial 12 with value: 0.7956860269360269.


month_recorded_12, bin=0 score 2.94591118
quantity_enough, bin=0 score 1.638243425
extraction_type_class_handpump, bin=0 score 2.302887659
installer_other, bin=0 score 5.81350844
payment_pay when scheme fails, bin=0 score 2.615449628
extraction_type_class_wind-powered, bin=0 score 85.65355988
funder_government of tanzania, bin=0 score 2.547453296
water_quality_salty abandoned, bin=0 score 2.426921176
gps_height, bin=21 score 3.467020947
population, bin=35 score 7.071150318
payment_never pay, bin=0 score 3.3044357
816:	learn: 0.2267270	total: 2m 8s	remaining: 40.9s

management_parastatal, bin=0 score 123.0708438
waterpoint_type_communal standpipe, bin=0 score 3.742093977
funder_tassaf, bin=0 score 0.5792242088
extraction_type_nira/tanira, bin=0 score 2.922103909
ward_other, bin=0 score 3.791237138
latitude, bin=0 score 0.9244556987
population, bin=41 score 55.84178
lga_sengerema, bin=0 score 3.274629931
ward_ihanda, bin=0 score 96.09808613
quantity_enough, bin=0 score 5.159750772
popula

[I 2024-09-25 17:36:05,599] Trial 8 finished with value: 0.7936868686868687 and parameters: {'iterations': 1183, 'max_depth': 9, 'learning_rate': 0.24253837995229488, 'random_strength': 10, 'bagging_temperature': 0.4802248236552292, 'od_type': 'IncToDec', 'od_wait': 47, 'max_bin': 94}. Best is trial 12 with value: 0.7956860269360269.


district_code_3, bin=0 score 1.063651441
scheme_name_ngamanga water supplied sch, bin=0 score 55.45635428
latitude, bin=3 score 1.898179631
installer_water board, bin=0 score 1.580509936
days_since_recorded, bin=44 score 50.83678946
installer_ddca, bin=0 score 1.282978139
funder_unicef, bin=0 score 114.9841695
source_rainwater harvesting, bin=0 score 2.335677132
population, bin=42 score 3.965909216
population, bin=34 score 2.121370396
source_class_surface, bin=0 score 23.39238328
gps_height, bin=0 score 71.52139225
funder_district council, bin=0 score 1.601864334
latitude, bin=16 score 75.06089478
385:	learn: 0.2765459	total: 1m 23s	remaining: 2m 24s
1000:	learn: 0.2148474	total: 2m 35s	remaining: 10.7s
962:	learn: 0.2084393	total: 2m 31s	remaining: 17.9s
95:	learn: 0.6407421	total: 21.6s	remaining: 3m 40s
373:	learn: 0.5923119	total: 1m 31s	remaining: 2m 11s
392:	learn: 0.3087430	total: 1m 19s	remaining: 2m 16s
96:	learn: 0.6403676	total: 21.7s	remaining: 3m 38s
296:	learn: 0.3383241	

[I 2024-09-25 17:36:17,220] Trial 20 finished with value: 0.7914772727272728 and parameters: {'iterations': 1070, 'max_depth': 9, 'learning_rate': 0.2836576373684275, 'random_strength': 1, 'bagging_temperature': 0.9631878934538346, 'od_type': 'IncToDec', 'od_wait': 10, 'max_bin': 50}. Best is trial 12 with value: 0.7956860269360269.


437:	learn: 0.2612094	total: 1m 35s	remaining: 2m 14s
448:	learn: 0.2935203	total: 1m 31s	remaining: 2m 5s
152:	learn: 0.5833489	total: 33.8s	remaining: 3m 22s
352:	learn: 0.3203157	total: 1m 13s	remaining: 2m 29s
48:	learn: 0.9266754	total: 10.9s	remaining: 3m 43s
1035:	learn: 0.2013861	total: 2m 43s	remaining: 6.48s
203:	learn: 0.3744494	total: 43.7s	remaining: 3m 6s
323:	learn: 0.6617929	total: 1m 23s	remaining: 3m 5s
420:	learn: 0.5755270	total: 1m 43s	remaining: 2m
449:	learn: 0.2933190	total: 1m 32s	remaining: 2m 5s
438:	learn: 0.2610706	total: 1m 35s	remaining: 2m 14s
1036:	learn: 0.2013326	total: 2m 43s	remaining: 6.32s
153:	learn: 0.5823386	total: 34s	remaining: 3m 22s
353:	learn: 0.3200441	total: 1m 14s	remaining: 2m 28s
49:	learn: 0.9248869	total: 11.1s	remaining: 3m 44s
204:	learn: 0.3737557	total: 43.9s	remaining: 3m 6s
324:	learn: 0.6613277	total: 1m 23s	remaining: 3m 5s
421:	learn: 0.5749881	total: 1m 43s	remaining: 2m
1037:	learn: 0.2012182	total: 2m 44s	remaining: 6.17

[I 2024-09-25 17:36:25,142] Trial 21 finished with value: 0.7927609427609428 and parameters: {'iterations': 1077, 'max_depth': 9, 'learning_rate': 0.2811847072934104, 'random_strength': 4, 'bagging_temperature': 0.3924656933540206, 'od_type': 'IncToDec', 'od_wait': 10, 'max_bin': 50}. Best is trial 12 with value: 0.7956860269360269.


47:	learn: 0.7036062	total: 6.57s	remaining: 1m 36s
83:	learn: 0.8615501	total: 18.2s	remaining: 3m 30s
237:	learn: 0.3587157	total: 50.9s	remaining: 2m 58s
483:	learn: 0.2860561	total: 1m 39s	remaining: 1m 59s
470:	learn: 0.2530918	total: 1m 43s	remaining: 2m 7s
184:	learn: 0.5446839	total: 41.3s	remaining: 3m 18s
448:	learn: 0.5611076	total: 1m 50s	remaining: 1m 54s
48:	learn: 0.7002634	total: 6.7s	remaining: 1m 36s
388:	learn: 0.3096966	total: 1m 21s	remaining: 2m 21s
354:	learn: 0.6536998	total: 1m 30s	remaining: 2m 56s
84:	learn: 0.8603986	total: 18.3s	remaining: 3m 29s
471:	learn: 0.2530384	total: 1m 43s	remaining: 2m 7s
238:	learn: 0.3581686	total: 51.1s	remaining: 2m 58s
484:	learn: 0.2858869	total: 1m 39s	remaining: 1m 58s
49:	learn: 0.6996438	total: 6.78s	remaining: 1m 35s
185:	learn: 0.5441204	total: 41.5s	remaining: 3m 17s
485:	learn: 0.2856826	total: 1m 39s	remaining: 1m 58s
449:	learn: 0.5605565	total: 1m 50s	remaining: 1m 54s
50:	learn: 0.6963952	total: 6.96s	remaining: 

[I 2024-09-25 17:38:27,275] Trial 19 finished with value: 0.7857323232323232 and parameters: {'iterations': 913, 'max_depth': 10, 'learning_rate': 0.03143782007192593, 'random_strength': 80, 'bagging_temperature': 0.9828760475009949, 'od_type': 'IncToDec', 'od_wait': 50, 'max_bin': 69}. Best is trial 12 with value: 0.7956860269360269.


quantity_enough, bin=0 score 11.89301367
funder_tcrs, bin=0 score 1.300990048

month_recorded_3, bin=0 score 7.770135249
489:	learn: 0.4178428	total: 2m	remaining: 2m 17s
scheme_name_s, bin=0 score 1.07736751

days_since_recorded, bin=24 score 1.560653557
construction_year, bin=46 score 4.986199294
ward_itete, bin=0 score 0.6294700256
installer_kkkt, bin=0 score 1.150133033
674:	learn: 0.6479885	total: 2m 19s	remaining: 1m 18s
710:	learn: 0.3548575	total: 2m 42s	remaining: 1m 22s
189:	learn: 0.5629176	total: 26.1s	remaining: 1m 17s
814:	learn: 0.5625144	total: 3m 32s	remaining: 59.4s
997:	learn: 0.1719367	total: 3m 44s	remaining: 12.4s
822:	learn: 0.2253929	total: 2m 52s	remaining: 52.6s
1061:	learn: 0.1996587	total: 3m 40s	remaining: 624ms
970:	learn: 0.2076636	total: 3m 22s	remaining: 19.6s
190:	learn: 0.5617239	total: 26.2s	remaining: 1m 17s
490:	learn: 0.4175958	total: 2m	remaining: 2m 17s
711:	learn: 0.3546910	total: 2m 43s	remaining: 1m 22s
675:	learn: 0.6479179	total: 2m 19s	rem

[I 2024-09-25 17:38:28,549] Trial 22 finished with value: 0.7928240740740741 and parameters: {'iterations': 1065, 'max_depth': 9, 'learning_rate': 0.27733566977579105, 'random_strength': 1, 'bagging_temperature': 0.4097059713807487, 'od_type': 'IncToDec', 'od_wait': 20, 'max_bin': 51}. Best is trial 12 with value: 0.7956860269360269.


198:	learn: 0.5552839	total: 27.3s	remaining: 1m 16s
495:	learn: 0.4165014	total: 2m 1s	remaining: 2m 15s
716:	learn: 0.3538091	total: 2m 44s	remaining: 1m 21s
975:	learn: 0.2070871	total: 3m 23s	remaining: 18.6s
828:	learn: 0.2242618	total: 2m 53s	remaining: 51.4s
1003:	learn: 0.1711986	total: 3m 45s	remaining: 11s
682:	learn: 0.6473528	total: 2m 21s	remaining: 1m 17s
199:	learn: 0.5547259	total: 27.4s	remaining: 1m 16s
819:	learn: 0.5608546	total: 3m 33s	remaining: 58.1s
496:	learn: 0.4163855	total: 2m 1s	remaining: 2m 15s
1004:	learn: 0.1710790	total: 3m 46s	remaining: 10.8s
717:	learn: 0.3536816	total: 2m 44s	remaining: 1m 21s
829:	learn: 0.2242261	total: 2m 53s	remaining: 51.1s
683:	learn: 0.6473184	total: 2m 21s	remaining: 1m 16s
200:	learn: 0.5539397	total: 27.6s	remaining: 1m 16s
976:	learn: 0.2069417	total: 3m 24s	remaining: 18.4s
201:	learn: 0.5535630	total: 27.7s	remaining: 1m 16s
718:	learn: 0.3536178	total: 2m 44s	remaining: 1m 21s
1005:	learn: 0.1710197	total: 3m 46s	rema

[I 2024-09-25 17:38:48,369] Trial 23 finished with value: 0.7911195286195286 and parameters: {'iterations': 1065, 'max_depth': 9, 'learning_rate': 0.28067817399328493, 'random_strength': 3, 'bagging_temperature': 0.45737372359167283, 'od_type': 'IncToDec', 'od_wait': 10, 'max_bin': 52}. Best is trial 12 with value: 0.7956860269360269.


349:	learn: 0.4887056	total: 46.9s	remaining: 54.4s
73:	learn: 0.6749411	total: 19.5s	remaining: 4m 10s
579:	learn: 0.3968148	total: 2m 21s	remaining: 1m 54s
890:	learn: 0.5429740	total: 3m 53s	remaining: 39.8s
778:	learn: 0.6350843	total: 2m 40s	remaining: 57.1s
25:	learn: 0.6256116	total: 6.83s	remaining: 4m 29s
350:	learn: 0.4885296	total: 47s	remaining: 54.3s
922:	learn: 0.2136368	total: 3m 13s	remaining: 31.7s
802:	learn: 0.3397192	total: 3m 3s	remaining: 1m 1s
351:	learn: 0.4882345	total: 47.2s	remaining: 54.1s
63:	learn: 0.6836456	total: 18.6s	remaining: 4m 41s
580:	learn: 0.3966356	total: 2m 21s	remaining: 1m 54s
26:	learn: 0.6230805	total: 7.08s	remaining: 4m 28s
74:	learn: 0.6743884	total: 19.8s	remaining: 4m 10s
891:	learn: 0.5427462	total: 3m 53s	remaining: 39.5s
779:	learn: 0.6349837	total: 2m 40s	remaining: 57s
803:	learn: 0.3393026	total: 3m 4s	remaining: 1m 1s
352:	learn: 0.4880884	total: 47.3s	remaining: 54s
923:	learn: 0.2135811	total: 3m 13s	remaining: 31.5s
64:	lear

[I 2024-09-25 17:39:22,993] Trial 24 finished with value: 0.7913089225589225 and parameters: {'iterations': 1074, 'max_depth': 9, 'learning_rate': 0.2830526225587729, 'random_strength': 1, 'bagging_temperature': 0.4562563432074875, 'od_type': 'IncToDec', 'od_wait': 10, 'max_bin': 50}. Best is trial 12 with value: 0.7956860269360269.


945:	learn: 0.3187730	total: 3m 38s	remaining: 29.3s
605:	learn: 0.4345718	total: 1m 21s	remaining: 20.1s
163:	learn: 0.3752537	total: 41.2s	remaining: 3m 43s
199:	learn: 0.5517043	total: 54.1s	remaining: 3m 42s
190:	learn: 0.5500537	total: 52.9s	remaining: 3m 52s
1016:	learn: 0.5189370	total: 4m 27s	remaining: 6.84s
939:	learn: 0.6179914	total: 3m 15s	remaining: 24.1s
114:	learn: 0.6324226	total: 33.2s	remaining: 4m 19s
723:	learn: 0.3687114	total: 2m 55s	remaining: 1m 19s
606:	learn: 0.4343766	total: 1m 21s	remaining: 20s
724:	learn: 0.3685836	total: 2m 56s	remaining: 1m 18s
164:	learn: 0.3745267	total: 41.4s	remaining: 3m 43s
940:	learn: 0.6177391	total: 3m 15s	remaining: 23.9s
607:	learn: 0.4341793	total: 1m 21s	remaining: 19.9s
200:	learn: 0.5506060	total: 54.3s	remaining: 3m 42s
946:	learn: 0.3186173	total: 3m 38s	remaining: 29.1s
1017:	learn: 0.5187090	total: 4m 28s	remaining: 6.58s
191:	learn: 0.5489509	total: 53.2s	remaining: 3m 52s
115:	learn: 0.6320574	total: 33.6s	remaining

[I 2024-09-25 17:39:44,744] Trial 29 finished with value: 0.7909301346801347 and parameters: {'iterations': 756, 'max_depth': 8, 'learning_rate': 0.07963797933908605, 'random_strength': 32, 'bagging_temperature': 0.5152181530311706, 'od_type': 'IncToDec', 'od_wait': 37, 'max_bin': 81}. Best is trial 12 with value: 0.7956860269360269.


274:	learn: 0.4957926	total: 1m 16s	remaining: 3m 27s
1037:	learn: 0.3068262	total: 4m	remaining: 8.11s
1037:	learn: 0.6065053	total: 3m 37s	remaining: 3.77s
71:	learn: 0.6718902	total: 21s	remaining: 4m 37s
265:	learn: 0.4931098	total: 1m 15s	remaining: 3m 36s
252:	learn: 0.3239473	total: 1m 3s	remaining: 3m 20s
814:	learn: 0.3541089	total: 3m 18s	remaining: 57.1s
38:	learn: 0.8985314	total: 12.1s	remaining: 5m 11s
191:	learn: 0.5551953	total: 55.6s	remaining: 3m 58s
1038:	learn: 0.3067266	total: 4m	remaining: 7.88s
275:	learn: 0.4954001	total: 1m 16s	remaining: 3m 27s
1038:	learn: 0.6064416	total: 3m 37s	remaining: 3.56s
39:	learn: 0.8954264	total: 12.2s	remaining: 5m 7s
72:	learn: 0.6706166	total: 21.3s	remaining: 4m 37s
253:	learn: 0.3232945	total: 1m 3s	remaining: 3m 20s
815:	learn: 0.3540472	total: 3m 18s	remaining: 56.9s
266:	learn: 0.4926246	total: 1m 15s	remaining: 3m 35s
1039:	learn: 0.3066005	total: 4m 1s	remaining: 7.65s
192:	learn: 0.5540534	total: 55.9s	remaining: 3m 57s


[I 2024-09-25 17:43:20,689] Trial 25 finished with value: 0.7896675084175084 and parameters: {'iterations': 1053, 'max_depth': 10, 'learning_rate': 0.2917065191815675, 'random_strength': 77, 'bagging_temperature': 0.3271509975831832, 'od_type': 'Iter', 'od_wait': 10, 'max_bin': 81}. Best is trial 12 with value: 0.7956860269360269.


funder_other, bin=0 score 5.138002628
latitude, bin=36 score 2.788200255
extraction_type_class_gravity, bin=0 score 5.469981168
construction_year, bin=51 score 2.279351938
841:	learn: 0.5466227	total: 3m 34s	remaining: 47s
734:	learn: 0.3442571	total: 3m 23s	remaining: 1m 33s
958:	learn: 0.3262055	total: 4m 49s	remaining: 21.5s
528:	learn: 0.3986316	total: 2m 34s	remaining: 2m 32s
898:	learn: 0.6176214	total: 3m 28s	remaining: 36.4s
775:	learn: 0.3513705	total: 3m 55s	remaining: 1m 14s
887:	learn: 0.3383337	total: 4m 30s	remaining: 38.3s
768:	learn: 0.5683326	total: 3m 46s	remaining: 1m 20s
1002:	learn: 0.3262935	total: 4m 51s	remaining: 6.38s
842:	learn: 0.5464483	total: 3m 34s	remaining: 46.8s
735:	learn: 0.3440712	total: 3m 23s	remaining: 1m 33s
899:	learn: 0.6175643	total: 3m 28s	remaining: 36.2s
776:	learn: 0.3512749	total: 3m 56s	remaining: 1m 14s
529:	learn: 0.3982991	total: 2m 34s	remaining: 2m 31s
959:	learn: 0.3260799	total: 4m 50s	remaining: 21.2s
888:	learn: 0.3382017	total

[I 2024-09-25 17:43:56,926] Trial 28 finished with value: 0.748169191919192 and parameters: {'iterations': 1056, 'max_depth': 9, 'learning_rate': 0.011566409321786791, 'random_strength': 70, 'bagging_temperature': 0.5216002036339797, 'od_type': 'IncToDec', 'od_wait': 37, 'max_bin': 78}. Best is trial 12 with value: 0.7956860269360269.


895:	learn: 0.3335114	total: 4m 31s	remaining: 38.2s
39:	learn: 0.7196231	total: 12.3s	remaining: 5m 3s
1013:	learn: 0.3216752	total: 5m 6s	remaining: 0us
224:	learn: 0.5929582	total: 34.8s	remaining: 2m 3s
977:	learn: 0.5193361	total: 4m 10s	remaining: 12.5s
889:	learn: 0.5345889	total: 4m 22s	remaining: 45.2s
661:	learn: 0.3707055	total: 3m 10s	remaining: 1m 51s
871:	learn: 0.3227154	total: 3m 59s	remaining: 55.3s
88:	learn: 0.6545854	total: 26.5s	remaining: 4m 39s
225:	learn: 0.5918610	total: 34.9s	remaining: 2m 2s
896:	learn: 0.3333723	total: 4m 32s	remaining: 37.9s
40:	learn: 0.7181556	total: 12.6s	remaining: 5m 4s
978:	learn: 0.5192161	total: 4m 10s	remaining: 12.3s
890:	learn: 0.5344243	total: 4m 23s	remaining: 44.9s
662:	learn: 0.3704817	total: 3m 10s	remaining: 1m 51s
872:	learn: 0.3225947	total: 4m	remaining: 55s
226:	learn: 0.5908783	total: 35.1s	remaining: 2m 2s
89:	learn: 0.6510334	total: 26.8s	remaining: 4m 38s
227:	learn: 0.5901735	total: 35.3s	remaining: 2m 2s
897:	lear

[I 2024-09-25 17:44:43,423] Trial 26 finished with value: 0.7758838383838383 and parameters: {'iterations': 1043, 'max_depth': 10, 'learning_rate': 0.01770353923439543, 'random_strength': 77, 'bagging_temperature': 0.4418667230093568, 'od_type': 'Iter', 'od_wait': 12, 'max_bin': 81}. Best is trial 12 with value: 0.7956860269360269.


115:	learn: 0.7624957	total: 30.5s	remaining: 3m 59s
839:	learn: 0.3414060	total: 3m 56s	remaining: 59.2s
1051:	learn: 0.2997908	total: 4m 45s	remaining: 5.71s
256:	learn: 0.4947579	total: 1m 12s	remaining: 3m 36s
208:	learn: 0.5299887	total: 58.5s	remaining: 3m 49s
536:	learn: 0.4815084	total: 1m 20s	remaining: 1m 12s
153:	learn: 0.5933558	total: 43.2s	remaining: 4m 1s
25:	learn: 0.7472259	total: 8.12s	remaining: 5m 11s
261:	learn: 0.5345661	total: 44.9s	remaining: 2m 9s
537:	learn: 0.4812485	total: 1m 20s	remaining: 1m 12s
1052:	learn: 0.2997177	total: 4m 46s	remaining: 5.43s
840:	learn: 0.3411610	total: 3m 56s	remaining: 58.9s
116:	learn: 0.7618447	total: 30.8s	remaining: 3m 59s
262:	learn: 0.5340120	total: 45.1s	remaining: 2m 9s
257:	learn: 0.4943038	total: 1m 12s	remaining: 3m 36s
209:	learn: 0.5290099	total: 58.8s	remaining: 3m 49s
154:	learn: 0.5914827	total: 43.5s	remaining: 4m
538:	learn: 0.4808555	total: 1m 21s	remaining: 1m 12s
26:	learn: 0.7433309	total: 8.48s	remaining: 5m

[I 2024-09-25 17:44:50,129] Trial 27 finished with value: 0.7976641414141414 and parameters: {'iterations': 1073, 'max_depth': 10, 'learning_rate': 0.084778945186221, 'random_strength': 69, 'bagging_temperature': 0.4717755636425249, 'od_type': 'IncToDec', 'od_wait': 37, 'max_bin': 80}. Best is trial 27 with value: 0.7976641414141414.


234:	learn: 0.5088463	total: 1m 5s	remaining: 3m 42s
587:	learn: 0.4721335	total: 1m 28s	remaining: 1m 4s
179:	learn: 0.5630088	total: 50.5s	remaining: 3m 53s
146:	learn: 0.7359956	total: 37.9s	remaining: 3m 46s
869:	learn: 0.3370457	total: 4m 4s	remaining: 50.5s
282:	learn: 0.4791329	total: 1m 19s	remaining: 3m 29s
53:	learn: 0.6933275	total: 15.5s	remaining: 4m 37s
33:	learn: 0.7561389	total: 6.07s	remaining: 3m 17s
305:	learn: 0.5166307	total: 52.2s	remaining: 2m 1s
588:	learn: 0.4719993	total: 1m 28s	remaining: 1m 4s
870:	learn: 0.3368535	total: 4m 4s	remaining: 50.2s
180:	learn: 0.5618047	total: 50.7s	remaining: 3m 53s
235:	learn: 0.5083777	total: 1m 6s	remaining: 3m 42s
306:	learn: 0.5162341	total: 52.4s	remaining: 2m
34:	learn: 0.7492286	total: 6.3s	remaining: 3m 19s
147:	learn: 0.7352163	total: 38.2s	remaining: 3m 46s
589:	learn: 0.4718307	total: 1m 28s	remaining: 1m 4s
283:	learn: 0.4785433	total: 1m 20s	remaining: 3m 29s
54:	learn: 0.6910960	total: 15.8s	remaining: 4m 37s
35:

[I 2024-09-25 17:45:34,104] Trial 30 finished with value: 0.7954545454545454 and parameters: {'iterations': 1050, 'max_depth': 10, 'learning_rate': 0.07563544768430439, 'random_strength': 71, 'bagging_temperature': 0.5112068331879371, 'od_type': 'Iter', 'od_wait': 38, 'max_bin': 81}. Best is trial 27 with value: 0.7976641414141414.


gps_height, bin=15 score 104.1909589
management_company, bin=0 score 4.552850138
population, bin=22 score 4.688332598
payment_pay when scheme fails, bin=0 score 7.419797949
permit, bin=0 score 8.217483374
funder_wsdp, bin=0 score 1.099924984
longitude, bin=15 score 7.686176764
scheme_name_upper ruvu, bin=0 score 1.788893649
273:	learn: 0.5410214	total: 42.4s	remaining: 2m 15s

latitude, bin=79 score 2.229400842
890:	learn: 0.4282692	total: 2m 11s	remaining: 19s

lga_pangani, bin=0 score 2.174233829
funder_hifab, bin=0 score 101.4769904
ward_bumera, bin=0 score 1.128813433
ward_nkoma, bin=0 score 2.688975545
lga_muleba, bin=0 score 2.974458714
payment_other, bin=0 score 1.945179199
extraction_type_gravity, bin=0 score 9.166683364
construction_year, bin=36 score 7.027665483
578:	learn: 0.4464825	total: 1m 35s	remaining: 1m 11s
subvillage_songambele, bin=0 score 2.442079729
extraction_type_india mark iii, bin=0 score 3.051463511
district_code_3, bin=0 score 8.081051346
scheme_name_n, bin=

[I 2024-09-25 17:48:06,099] Trial 31 finished with value: 0.7960858585858586 and parameters: {'iterations': 1025, 'max_depth': 10, 'learning_rate': 0.07589916803206027, 'random_strength': 65, 'bagging_temperature': 0.49178180791949977, 'od_type': 'Iter', 'od_wait': 36, 'max_bin': 80}. Best is trial 27 with value: 0.7976641414141414.


171:	learn: 0.6079768	total: 25.6s	remaining: 2m 25s
962:	learn: 0.5129762	total: 3m 52s	remaining: 15.5s
887:	learn: 0.4206589	total: 2m 13s	remaining: 19.8s
942:	learn: 0.3209960	total: 4m 20s	remaining: 24s
894:	learn: 0.3299526	total: 4m 5s	remaining: 32.6s
466:	learn: 0.4640357	total: 1m 20s	remaining: 1m 34s
81:	learn: 0.6698125	total: 14.2s	remaining: 3m 4s
601:	learn: 0.3844069	total: 2m 30s	remaining: 2m 10s
172:	learn: 0.6077350	total: 25.7s	remaining: 2m 25s
775:	learn: 0.3469996	total: 3m 30s	remaining: 1m 6s
888:	learn: 0.4205692	total: 2m 13s	remaining: 19.7s
963:	learn: 0.5127733	total: 3m 52s	remaining: 15.2s
173:	learn: 0.6071112	total: 25.9s	remaining: 2m 25s
467:	learn: 0.4639220	total: 1m 20s	remaining: 1m 34s
82:	learn: 0.6690581	total: 14.5s	remaining: 3m 5s
889:	learn: 0.4204942	total: 2m 13s	remaining: 19.5s
943:	learn: 0.3208146	total: 4m 20s	remaining: 23.8s
602:	learn: 0.3841065	total: 2m 30s	remaining: 2m 10s
895:	learn: 0.3298570	total: 4m 5s	remaining: 32.

[I 2024-09-25 17:48:23,198] Trial 35 finished with value: 0.776010101010101 and parameters: {'iterations': 1027, 'max_depth': 10, 'learning_rate': 0.01832504489317518, 'random_strength': 72, 'bagging_temperature': 0.30426091546932815, 'od_type': 'Iter', 'od_wait': 23, 'max_bin': 80}. Best is trial 27 with value: 0.7976641414141414.


295:	learn: 0.5249357	total: 43.2s	remaining: 2m 4s
184:	learn: 0.5756648	total: 31.8s	remaining: 2m 44s
837:	learn: 0.3369709	total: 3m 47s	remaining: 50s
572:	learn: 0.4416401	total: 1m 37s	remaining: 1m 15s
54:	learn: 0.7118857	total: 16.1s	remaining: 5m 14s
1008:	learn: 0.3126171	total: 4m 38s	remaining: 5.79s
1011:	learn: 0.4069647	total: 2m 31s	remaining: 1.2s
958:	learn: 0.3217840	total: 4m 22s	remaining: 15.1s
185:	learn: 0.5747964	total: 31.9s	remaining: 2m 44s
296:	learn: 0.5247661	total: 43.4s	remaining: 2m 4s
673:	learn: 0.3703831	total: 2m 48s	remaining: 1m 52s
573:	learn: 0.4415134	total: 1m 38s	remaining: 1m 15s
1012:	learn: 0.4068699	total: 2m 31s	remaining: 1.04s
838:	learn: 0.3368467	total: 3m 48s	remaining: 49.8s
186:	learn: 0.5737307	total: 32.1s	remaining: 2m 43s
297:	learn: 0.5242321	total: 43.6s	remaining: 2m 4s
1009:	learn: 0.3124840	total: 4m 38s	remaining: 5.51s
55:	learn: 0.7074005	total: 16.4s	remaining: 5m 14s
574:	learn: 0.4413359	total: 1m 38s	remaining: 

[I 2024-09-25 17:48:25,390] Trial 36 finished with value: 0.792276936026936 and parameters: {'iterations': 1020, 'max_depth': 8, 'learning_rate': 0.06296586424813831, 'random_strength': 62, 'bagging_temperature': 0.5550627164134855, 'od_type': 'Iter', 'od_wait': 23, 'max_bin': 89}. Best is trial 27 with value: 0.7976641414141414.


581:	learn: 0.4400163	total: 1m 39s	remaining: 1m 13s
59:	learn: 0.7036649	total: 17.5s	remaining: 5m 11s

population, bin=9 score 9.079772761

district_code_8, bin=0 score 3.202660863
month_recorded_6, bin=0 score 3.663606146
ward_kikatiti, bin=0 score 71.375755
installer_kiliwater, bin=0 score 1.007542612
963:	learn: 0.3210509	total: 4m 24s	remaining: 13.7s
gps_height, bin=39 score 5.552428636
gps_height, bin=38 score 4.787528015

basin_lake rukwa, bin=0 score 11.12458502
funder_danida, bin=0 score 4.359970109
construction_year, bin=39 score 172.2254023
basin_pangani, bin=0 score 1.622766129
lga_moshi urban, bin=0 score 1.069207838
population, bin=30 score 119.5836371
subvillage_kibaoni, bin=0 score 4.998184669
4:	learn: 0.9650784	total: 724ms	remaining: 2m 43s

region_code_5, bin=0 score 5.273715996
construction_year, bin=45 score 2.249951447
month_recorded_6, bin=0 score 73.91222534
source_class_surface, bin=0 score 117.9903378
latitude, bin=72 score 3.696818975
payment_pay per buc

[I 2024-09-25 17:48:31,272] Trial 32 finished with value: 0.7954335016835017 and parameters: {'iterations': 1030, 'max_depth': 10, 'learning_rate': 0.07972608475337103, 'random_strength': 69, 'bagging_temperature': 0.5024249564691712, 'od_type': 'Iter', 'od_wait': 36, 'max_bin': 81}. Best is trial 27 with value: 0.7976641414141414.


703:	learn: 0.3645629	total: 2m 55s	remaining: 1m 45s
30:	learn: 0.7298673	total: 4.29s	remaining: 2m 33s
617:	learn: 0.4330716	total: 1m 44s	remaining: 1m 7s
83:	learn: 0.6751662	total: 23.3s	remaining: 4m 48s
986:	learn: 0.3182159	total: 4m 30s	remaining: 7.39s
348:	learn: 0.5071637	total: 50.5s	remaining: 1m 55s
46:	learn: 0.7388272	total: 6.39s	remaining: 2m 28s
865:	learn: 0.3329991	total: 3m 55s	remaining: 42.3s
231:	learn: 0.5406080	total: 39s	remaining: 2m 33s
349:	learn: 0.5067228	total: 50.6s	remaining: 1m 55s
31:	learn: 0.7292200	total: 4.42s	remaining: 2m 33s
47:	learn: 0.7377365	total: 6.51s	remaining: 2m 27s
618:	learn: 0.4328787	total: 1m 45s	remaining: 1m 7s
987:	learn: 0.3180957	total: 4m 30s	remaining: 7.11s
232:	learn: 0.5401753	total: 39.2s	remaining: 2m 32s
350:	learn: 0.5064997	total: 50.7s	remaining: 1m 55s
704:	learn: 0.3644314	total: 2m 55s	remaining: 1m 44s
48:	learn: 0.7359647	total: 6.62s	remaining: 2m 27s
84:	learn: 0.6745706	total: 23.5s	remaining: 4m 48s


[I 2024-09-25 17:48:39,609] Trial 33 finished with value: 0.796506734006734 and parameters: {'iterations': 1014, 'max_depth': 10, 'learning_rate': 0.0776887973142513, 'random_strength': 72, 'bagging_temperature': 0.546115642937573, 'od_type': 'Iter', 'od_wait': 36, 'max_bin': 80}. Best is trial 27 with value: 0.7976641414141414.


95:	learn: 0.6324616	total: 12.5s	remaining: 2m 16s
740:	learn: 0.3571201	total: 3m 3s	remaining: 1m 35s
111:	learn: 0.6593760	total: 14.5s	remaining: 2m 13s
898:	learn: 0.3281626	total: 4m 3s	remaining: 33.3s
414:	learn: 0.4884160	total: 58.7s	remaining: 1m 44s
288:	learn: 0.5158969	total: 47.3s	remaining: 2m 19s
44:	learn: 0.6776565	total: 6.91s	remaining: 2m 48s
675:	learn: 0.4238054	total: 1m 53s	remaining: 56.8s
119:	learn: 0.6363428	total: 31.6s	remaining: 4m 24s
96:	learn: 0.6301144	total: 12.6s	remaining: 2m 15s
415:	learn: 0.4879749	total: 58.9s	remaining: 1m 43s
112:	learn: 0.6573871	total: 14.7s	remaining: 2m 13s
289:	learn: 0.5153764	total: 47.4s	remaining: 2m 19s
741:	learn: 0.3569889	total: 3m 3s	remaining: 1m 35s
676:	learn: 0.4236748	total: 1m 53s	remaining: 56.6s
45:	learn: 0.6772043	total: 7.14s	remaining: 2m 49s
97:	learn: 0.6266237	total: 12.8s	remaining: 2m 16s
416:	learn: 0.4875701	total: 59s	remaining: 1m 43s
899:	learn: 0.3279989	total: 4m 3s	remaining: 33s
113:

[I 2024-09-25 17:49:09,621] Trial 34 finished with value: 0.7964225589225589 and parameters: {'iterations': 1022, 'max_depth': 10, 'learning_rate': 0.08012709961220266, 'random_strength': 65, 'bagging_temperature': 0.5560684445877264, 'od_type': 'Iter', 'od_wait': 36, 'max_bin': 80}. Best is trial 27 with value: 0.7976641414141414.


245:	learn: 0.5300345	total: 1m 1s	remaining: 3m 39s
247:	learn: 0.4786448	total: 36.7s	remaining: 2m 12s
185:	learn: 0.5064297	total: 28.5s	remaining: 1m 59s
647:	learn: 0.4425960	total: 1m 28s	remaining: 1m 8s
498:	learn: 0.4581617	total: 1m 17s	remaining: 1m 39s
343:	learn: 0.5182451	total: 44.4s	remaining: 1m 42s
882:	learn: 0.3964752	total: 2m 22s	remaining: 21.4s
333:	learn: 0.4609349	total: 42.3s	remaining: 1m 42s
875:	learn: 0.3373583	total: 3m 33s	remaining: 1m
246:	learn: 0.5293066	total: 1m 1s	remaining: 3m 38s
334:	learn: 0.4606181	total: 42.4s	remaining: 1m 42s
344:	learn: 0.5178373	total: 44.5s	remaining: 1m 42s
648:	learn: 0.4425006	total: 1m 28s	remaining: 1m 8s
499:	learn: 0.4579945	total: 1m 17s	remaining: 1m 39s
876:	learn: 0.3372715	total: 3m 33s	remaining: 1m
248:	learn: 0.4784094	total: 36.9s	remaining: 2m 12s
186:	learn: 0.5058269	total: 28.7s	remaining: 1m 59s
883:	learn: 0.3963872	total: 2m 23s	remaining: 21.2s
335:	learn: 0.4603439	total: 42.5s	remaining: 1m 4

[I 2024-09-25 17:49:28,131] Trial 37 finished with value: 0.79375 and parameters: {'iterations': 1015, 'max_depth': 8, 'learning_rate': 0.0793627936281063, 'random_strength': 90, 'bagging_temperature': 0.28160493232923617, 'od_type': 'Iter', 'od_wait': 23, 'max_bin': 88}. Best is trial 27 with value: 0.7976641414141414.


797:	learn: 0.4198523	total: 1m 46s	remaining: 47.1s
492:	learn: 0.4795250	total: 1m 2s	remaining: 1m 21s
323:	learn: 0.4855849	total: 1m 19s	remaining: 3m 17s
water_quality_salty, bin=0 score 6.419496956
113:	learn: 0.5747849	total: 17s	remaining: 2m 32s
485:	learn: 0.4228828	total: 1m	remaining: 1m 21s
lga_chato, bin=0 score 1.846078973


631:	learn: 0.4344485	total: 1m 35s	remaining: 1m 16s

scheme_name_uroki-bomang'ombe water sup, bin=0 score 1.114487681

lga_missungwi, bin=0 score 1.217915218

subvillage_misufini, bin=0 score 1.546068371
316:	learn: 0.4508356	total: 46.8s	remaining: 1m 36s
lga_mpwapwa, bin=0 score 1.539940196
lga_pangani, bin=0 score 2.55389014
lga_pangani, bin=0 score 2.809878519
quantity_unknown, bin=0 score 2.046317263
days_since_recorded, bin=76 score 5.287243688
quantity_insufficient, bin=0 score 2.02620788

population, bin=7 score 2.171240168
lga_kilolo, bin=0 score 3.089163102
management_group_user-group, bin=0 score 2.447018455
967:	learn: 0.3246722	total:

[I 2024-09-25 17:50:09,709] Trial 39 finished with value: 0.7932449494949495 and parameters: {'iterations': 1150, 'max_depth': 8, 'learning_rate': 0.07194717904028844, 'random_strength': 68, 'bagging_temperature': 0.5826299209677501, 'od_type': 'IncToDec', 'od_wait': 36, 'max_bin': 86}. Best is trial 27 with value: 0.7976641414141414.


840:	learn: 0.3634297	total: 1m 41s	remaining: 36.6s
permit, bin=0 score 5.50350343
installer_dwe, bin=0 score 5.663869362

payment_unknown, bin=0 score 2.423636202
funder_swedish, bin=0 score 3.297611413
construction_year, bin=4 score 2.621617803
501:	learn: 0.4281891	total: 2m 1s	remaining: 2m 31s
38:	learn: 0.7211653	total: 8.52s	remaining: 3m 57s
411:	learn: 0.4253623	total: 59s	remaining: 1m 44s
277:	learn: 0.4665156	total: 40.6s	remaining: 2m 6s
845:	learn: 0.3626527	total: 1m 42s	remaining: 36s
633:	learn: 0.3774325	total: 1m 28s	remaining: 46.7s
844:	learn: 0.4247941	total: 1m 44s	remaining: 36.2s
695:	learn: 0.3700471	total: 1m 37s	remaining: 1m 1s
944:	learn: 0.3935878	total: 2m 17s	remaining: 28.6s
39:	learn: 0.7193397	total: 8.65s	remaining: 3m 54s
412:	learn: 0.4250146	total: 59.2s	remaining: 1m 44s
278:	learn: 0.4661892	total: 40.7s	remaining: 2m 6s
846:	learn: 0.3624892	total: 1m 42s	remaining: 35.9s
845:	learn: 0.4247198	total: 1m 44s	remaining: 36.1s
696:	learn: 0.3699

[I 2024-09-25 17:50:36,395] Trial 38 finished with value: 0.794297138047138 and parameters: {'iterations': 1142, 'max_depth': 8, 'learning_rate': 0.08093181222682022, 'random_strength': 64, 'bagging_temperature': 0.7647889562937077, 'od_type': 'Iter', 'od_wait': 27, 'max_bin': 86}. Best is trial 27 with value: 0.7976641414141414.


scheme_name_maambreni gravity water supply, bin=0 score 2.940363076
funder_rwssp, bin=0 score 1.592398429
extraction_type_ksb, bin=0 score 5.366190154
ward_diongoya, bin=0 score 3.476020463
ward_bungu, bin=0 score 4.851664312
latitude, bin=58 score 2.189288283

531:	learn: 0.4368454	total: 25.1s	remaining: 21.1s
quantity_insufficient, bin=0 score 4.007716057
installer_amref, bin=0 score 4.893705078
scheme_name_s, bin=0 score 5.821175175

extraction_type_swn 80, bin=0 score 2.695034736
management_group_user-group, bin=0 score 5.707242738
funder_norad, bin=0 score 3.920767288
funder_world vision, bin=0 score 0.9072897292
600:	learn: 0.3851453	total: 1m 25s	remaining: 1m 16s
ward_simbo, bin=0 score 1.125221301

extraction_type_nira/tanira, bin=0 score 1.612965724
610:	learn: 0.4016149	total: 2m 27s	remaining: 2m 4s
gps_height, bin=11 score 6.137768908
lga_ngara, bin=0 score 2.305908801
gps_height, bin=39 score 3.442147676
quantity_insufficient, bin=0 score 4.058451836

extraction_type_ind

[I 2024-09-25 17:51:45,843] Trial 48 finished with value: 0.7930976430976431 and parameters: {'iterations': 979, 'max_depth': 7, 'learning_rate': 0.1353087810449855, 'random_strength': 59, 'bagging_temperature': 0.6497526626336946, 'od_type': 'Iter', 'od_wait': 28, 'max_bin': 76}. Best is trial 27 with value: 0.7976641414141414.


installer_dwe, bin=0 score 5.118322625
management_unknown, bin=0 score 7.155691977
source_shallow well, bin=0 score 6.162925557
909:	learn: 0.3516984	total: 3m 36s	remaining: 51.5s
gps_height, bin=2 score 7.437121832
quantity_enough, bin=0 score 5.0677028
days_since_recorded, bin=60 score 4.386561593
permit, bin=0 score 4.441015669
532:	learn: 0.4064201	total: 59.5s	remaining: 1m 8s


installer_district water department, bin=0 score 1.05135696
payment_pay per bucket, bin=0 score 4.898577289
latitude, bin=0 score 1.466829797
management_group_user-group, bin=0 score 3.236742208
construction_year, bin=11 score 5.301728809
quantity_insufficient, bin=0 score 9.388512476
district_code_43, bin=0 score 1.439051755
extraction_type_other - swn 81, bin=0 score 2.197572376
956:	learn: 0.3359194	total: 2m 15s	remaining: 26.5s
345:	learn: 0.4338187	total: 48.5s	remaining: 1m 27s
latitude, bin=14 score 11.61962018
installer_he, bin=0 score 0.9798201716
271:	learn: 0.4599648	total: 39.6s	remaining: 2m

[I 2024-09-25 17:52:34,189] Trial 49 finished with value: 0.7933922558922559 and parameters: {'iterations': 967, 'max_depth': 7, 'learning_rate': 0.14758695793945914, 'random_strength': 55, 'bagging_temperature': 0.6097472460781701, 'od_type': 'Iter', 'od_wait': 29, 'max_bin': 77}. Best is trial 27 with value: 0.7976641414141414.


lga_ruangwa, bin=0 score 1.826500832
installer_da, bin=0 score 0.8848403556
source_rainwater harvesting, bin=0 score 5.704669508
quantity_enough, bin=0 score 2.891156856
gps_height, bin=2 score 1.380342663
population, bin=10 score 2.15685232
waterpoint_type_communal standpipe, bin=0 score 6.183374216
water_quality_unknown, bin=0 score 3.369081622
gps_height, bin=53 score 2.582651337
1014:	learn: 0.3974535	total: 1m 48s	remaining: 13s
quality_group_salty, bin=0 score 3.406668069
source_class_surface, bin=0 score 122.7946693
ward_kibaoni, bin=0 score 2.907432553
funder_hifab, bin=0 score 3.761832801

scheme_name_n, bin=0 score 1.020961545
307:	learn: 0.4516815	total: 40.1s	remaining: 1m 48s
management_water board, bin=0 score 4.082045985
funder_other, bin=0 score 2.580261219
source_class_surface, bin=0 score 3.745690245
days_since_recorded, bin=61 score 3.310201832
ward_other, bin=0 score 2.027992809
scheme_management_water authority, bin=0 score 1.848297671
payment_pay per bucket, bin=0

[I 2024-09-25 17:52:45,986] Trial 42 finished with value: 0.7940867003367003 and parameters: {'iterations': 1137, 'max_depth': 8, 'learning_rate': 0.0693805221770847, 'random_strength': 64, 'bagging_temperature': 0.5591223105855195, 'od_type': 'Iter', 'od_wait': 36, 'max_bin': 85}. Best is trial 27 with value: 0.7976641414141414.


installer_artisan, bin=0 score 77.94563394
population, bin=69 score 4.334488605
ward_mang'ula, bin=0 score 2.081344594


[I 2024-09-25 17:52:46,844] Trial 43 finished with value: 0.7966119528619529 and parameters: {'iterations': 1143, 'max_depth': 8, 'learning_rate': 0.12611528613532186, 'random_strength': 89, 'bagging_temperature': 0.5724527380392078, 'od_type': 'Iter', 'od_wait': 36, 'max_bin': 75}. Best is trial 27 with value: 0.7976641414141414.


gps_height, bin=22 score 2.785682005


[I 2024-09-25 17:52:56,793] Trial 45 finished with value: 0.7941287878787879 and parameters: {'iterations': 967, 'max_depth': 8, 'learning_rate': 0.14376606564895358, 'random_strength': 61, 'bagging_temperature': 0.5652969447509207, 'od_type': 'Iter', 'od_wait': 28, 'max_bin': 75}. Best is trial 27 with value: 0.7976641414141414.
[I 2024-09-25 17:53:19,153] Trial 44 finished with value: 0.7967171717171717 and parameters: {'iterations': 1140, 'max_depth': 8, 'learning_rate': 0.14196279692216365, 'random_strength': 62, 'bagging_temperature': 0.5744288135382767, 'od_type': 'Iter', 'od_wait': 40, 'max_bin': 77}. Best is trial 27 with value: 0.7976641414141414.





[I 2024-09-25 17:53:19,257] Trial 40 finished with value: 0.7958964646464646 and parameters: {'iterations': 1126, 'max_depth': 10, 'learning_rate': 0.08075514556661337, 'random_strength': 62, 'bagging_temperature': 0.5854046600113317, 'od_type': 'Iter', 'od_wait': 36, 'max_bin': 86}. Best is trial 27 with value: 0.7976641414141414.
[I 2024-09-25 17:53:32,707] Trial 46 finished with value: 0.7954545454545454 and parameters: {'iterations': 1141, 'max_depth': 8, 'learning_rate': 0.1420666372918805, 'random_strength': 88, 'bagging_temperature': 0.5774632861900331, 'od_type': 'Iter', 'od_wait': 34, 'max_bin': 86}. Best is trial 27 with value: 0.7976641414141414.
[I 2024-09-25 17:53:37,115] Trial 47 finished with value: 0.7964436026936026 and parameters: {'iterations': 1144, 'max_depth': 8, 'learning_rate': 0.1380191116381684, 'random_strength': 62, 'bagging_temperature': 0.5926761378268766, 'od_type': 'Iter', 'od_wait': 29, 'max_bin': 86}. Best is trial 27 with value: 0.7976641414141414.
[I

Best trial: score 0.7976641414141414, params {'iterations': 1073, 'max_depth': 10, 'learning_rate': 0.084778945186221, 'random_strength': 69, 'bagging_temperature': 0.4717755636425249, 'od_type': 'IncToDec', 'od_wait': 37, 'max_bin': 80}


In [64]:
print('Best trial: score {}, params {}'.format(study3.best_trial.value, study3.best_trial.params))

from sklearn.metrics import accuracy_score
top_params3 = study3.best_params

# Previously found best parameters
# [I 2023-12-18 11:03:19,199] Trial 15 finished with value: 0.8129966329966329 and parameters: {'criterion': 'gini', 'n_estimators': 958, 'max_depth': 30, 'min_samples_split': 5, 'max_features': 'sqrt'}. Best is trial 15 with value: 0.8129966329966329.
# [Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.

# Train a model on the entire training dataset using the top parameters
final_model3 = CatBoostClassifier(**top_params3, random_state=42)
final_model3.fit(x_train, y_train)

# Make predictions on the test set
test_predictions3 = final_model3.predict(x_validate)

# Optionally, evaluate the predictions
test_accuracy3 = accuracy_score(y_validate, test_predictions3)
print(f'Test Accuracy: {test_accuracy3}')

Best trial: score 0.7976641414141414, params {'iterations': 1073, 'max_depth': 10, 'learning_rate': 0.084778945186221, 'random_strength': 69, 'bagging_temperature': 0.4717755636425249, 'od_type': 'IncToDec', 'od_wait': 37, 'max_bin': 80}
0:	learn: 1.0625036	total: 64.1ms	remaining: 1m 8s
1:	learn: 1.0201838	total: 127ms	remaining: 1m 7s
2:	learn: 0.9924759	total: 168ms	remaining: 1m
3:	learn: 0.9634362	total: 225ms	remaining: 1m
4:	learn: 0.9348844	total: 287ms	remaining: 1m 1s
5:	learn: 0.9097769	total: 345ms	remaining: 1m 1s
6:	learn: 0.8964112	total: 405ms	remaining: 1m 1s
7:	learn: 0.8754043	total: 464ms	remaining: 1m 1s
8:	learn: 0.8582105	total: 524ms	remaining: 1m 1s
9:	learn: 0.8482128	total: 580ms	remaining: 1m 1s
10:	learn: 0.8363971	total: 640ms	remaining: 1m 1s
11:	learn: 0.8209202	total: 699ms	remaining: 1m 1s
12:	learn: 0.8122935	total: 759ms	remaining: 1m 1s
13:	learn: 0.7989260	total: 817ms	remaining: 1m 1s
14:	learn: 0.7916614	total: 874ms	remaining: 1m 1s
15:	learn: 0

In [None]:
import os
y_test = pd.read_csv(r'D:\MMAI\Pump-It-Up\data\y_test.csv')
pred = pd.DataFrame(test_predictions3, columns = [y_test.columns[1]])
del y_test['status_group']
y_test = pd.concat((y_test, pred), axis = 1)
y_test.to_csv(os.path.join('data', 'y_test102.csv'), sep=",", index = False)

### Ensemble Voting

In [None]:
import pandas as pd
from scipy.stats import mode

# Load the prediction files
y_test_RF = pd.read_csv('data/y_test_RF.csv')
y_test_XG = pd.read_csv('data/y_test_XG.csv')
y_test_Cat = pd.read_csv('data/y_test_Cat.csv')

# Extract the predictions
pred_RF = y_test_RF['status_group']
pred_XG = y_test_XG['status_group']
pred_Cat = y_test_Cat['status_group']

# Perform majority voting
majority_vote = mode([pred_RF, pred_XG, pred_Cat])[0][0]

# Create a new DataFrame for the majority vote predictions
majority_vote_df = pd.DataFrame(majority_vote, columns=['status_group'])

# Add the 'id' field from the original dataframes
majority_vote_df['id'] = pred_RF['id']

# Rearrange the columns to make 'id' the first column
majority_vote_df = majority_vote_df.reindex(columns=['id', 'status_group'])

# Save the majority vote predictions to a CSV file
majority_vote_df.to_csv('data/y_test_majority_vote.csv', index=False)

  majority_vote = mode([preds100, preds101, preds102])[0][0]
  majority_vote = mode([preds100, preds101, preds102])[0][0]
