# **Dataset**

Marital status: The marital status of the student. (Categorical) <br>
Application mode: The method of application used by the student. (Categorical)<br>
Application order: The order in which the student applied. (Numerical)<br>
Course: The course taken by the student. (Categorical)<br>
Daytime/evening attendance: Whether the student attends classes during the day or in the evening. (Categorical)<br>
Previous qualification: The qualification obtained by the student before enrolling in higher education. (Categorical)<br>
Nacionality: The nationality of the student. (Categorical)<br>
Mother's qualification: The qualification of the student's mother. (Categorical)<br>
Father's qualification: The qualification of the student's father. (Categorical)<br>
Mother's occupation: The occupation of the student's mother. (Categorical)<br>
Father's occupation: The occupation of the student's father. (Categorical)<br>
Displaced: Whether the student is a displaced person. (Categorical)<br>
Educational special needs: Whether the student has any special educational needs. (Categorical)<br>
Debtor: Whether the student is a debtor. (Categorical)<br>
Tuition fees up to date: Whether the student's tuition fees are up to date. (Categorical)<br>
Gender: The gender of the student. (Categorical)<br>
Scholarship holder: Whether the student is a scholarship holder. (Categorical)<br>
Age at enrollment: The age of the student at the time of enrollment. (Numerical)<br>
International: Whether the student is an international student. (Categorical)<br>
Curricular units 1st sem (credited): The number of curricular units credited by the student in the first semester. (Numerical)<br>
Curricular units 1st sem (enrolled): The number of curricular units enrolled by the student in the first semester. (Numerical)<br>
Curricular units 1st sem (evaluations): The number of curricular units evaluated by the student in the first semester. (Numerical)<br>
Curricular units 1st sem (approved): The number of curricular units approved by the student in the first semester. (Numerical)<br>

# **Install Libraries**

In [None]:
!pip install scikit-learn
!pip install xgboost
!pip install featuretools
!pip install optuna

# **Import Packages**

In [1]:
import sys
import os
import pandas as pd
import numpy as np
import seaborn as sb
import sklearn as sk
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, f1_score,classification_report,accuracy_score
from sklearn.preprocessing import LabelEncoder
import xgboost as xgb
import featuretools as ft
import optuna
import matplotlib.pyplot as plt

# __Exploratory Data Analysis__

In [2]:
df = pd.read_csv('dataset.csv')
df.columns

Index(['Marital status', 'Application mode', 'Application order', 'Course',
       'Daytime/evening attendance', 'Previous qualification', 'Nacionality',
       'Mother's qualification', 'Father's qualification',
       'Mother's occupation', 'Father's occupation', 'Displaced',
       'Educational special needs', 'Debtor', 'Tuition fees up to date',
       'Gender', 'Scholarship holder', 'Age at enrollment', 'International',
       'Curricular units 1st sem (credited)',
       'Curricular units 1st sem (enrolled)',
       'Curricular units 1st sem (evaluations)',
       'Curricular units 1st sem (approved)',
       'Curricular units 1st sem (grade)',
       'Curricular units 1st sem (without evaluations)',
       'Curricular units 2nd sem (credited)',
       'Curricular units 2nd sem (enrolled)',
       'Curricular units 2nd sem (evaluations)',
       'Curricular units 2nd sem (approved)',
       'Curricular units 2nd sem (grade)',
       'Curricular units 2nd sem (without evaluations)

In [3]:
df.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4424 entries, 0 to 4423
Data columns (total 35 columns):
 #   Column                                          Non-Null Count  Dtype  
---  ------                                          --------------  -----  
 0   Marital status                                  4424 non-null   int64  
 1   Application mode                                4424 non-null   int64  
 2   Application order                               4424 non-null   int64  
 3   Course                                          4424 non-null   int64  
 4   Daytime/evening attendance                      4424 non-null   int64  
 5   Previous qualification                          4424 non-null   int64  
 6   Nacionality                                     4424 non-null   int64  
 7   Mother's qualification                          4424 non-null   int64  
 8   Father's qualification                          4424 non-null   int64  
 9   Mother's occupation                      

In [None]:
countA, countB,countC = df.Target.value_counts()
print("Ratio of classes = ", countA, ":", countB, ":", countC)

# __Preprocessing__

In [4]:
# Rename columns
df.rename(columns={'Nacionality': 'Nationality'}, inplace=True)

features_list = ['Marital status', 
                 'Application mode',
                 'Application order',
                 'Course',
                 'Daytime/evening attendance',
                 'Previous qualification',
                 'Nationality',
                 "Mother's qualification",
                 "Father's qualification",
                 "Mother's occupation",
                 "Father's occupation",
                 'Displaced',
                 'Educational special needs',
                 'Debtor',
                 'Tuition fees up to date',
                 'Gender',
                 'Scholarship holder',
                 'Age at enrollment',
                 'International',
                 'Curricular units 1st sem (credited)',
                 'Curricular units 1st sem (enrolled)',
                 'Curricular units 1st sem (evaluations)',
                 'Curricular units 1st sem (approved)',
                 'Curricular units 1st sem (grade)',
                 'Curricular units 1st sem (without evaluations)',
                 'Curricular units 2nd sem (credited)',
                 'Curricular units 2nd sem (enrolled)',
                 'Curricular units 2nd sem (evaluations)',
                 'Curricular units 2nd sem (approved)',
                 'Curricular units 2nd sem (grade)',
                 'Curricular units 2nd sem (without evaluations)',
                 'Unemployment rate',
                 'Inflation rate',
                 'GDP']

forecast_var = ['Target']

In [5]:
df["Marital status"] = df["Marital status"].astype("category")
df["Application mode"] = df["Application mode"].astype("category")
df["Course"] = df["Course"].astype("category")
df["Daytime/evening attendance"] = df["Daytime/evening attendance"].astype("category")
df["Nationality"] = df["Nationality"].astype("category")
df["Mother's occupation"] = df["Mother's occupation"].astype("category")
df["Father's occupation"] = df["Father's occupation"].astype("category")
df["Displaced"] = df["Displaced"].astype("category")
df["Educational special needs"] = df["Educational special needs"].astype("category")
df["Debtor"] = df["Debtor"].astype("category")
df["Tuition fees up to date"] = df["Tuition fees up to date"].astype("category")
df["Gender"] = df["Gender"].astype("category")
df["Scholarship holder"] = df["Scholarship holder"].astype("category")
df["International"] = df["International"].astype("category")


## Creating Ordinal Data

Ordinal scale is where the order matters but not the difference between values. <br>
<br> Level of qualification can be ranked through an ordinal scale; however, it was not done so in the dataset. We hence need to reassign the numerical representations assigned to the qualification variables after ranking the various qualification levels provided.

Reference: https://www.graphpad.com/support/faq/what-is-the-difference-between-ordinal-interval-and-ratio-variables-why-should-i-care/ 

### Replacing and ranking the levels of qualifications under 'Previous qualification'

#### Initial list of variables for Previousqualification
1—Secondary education <br>
2—Higher education—bachelor’s degree <br>
3—Higher education—degree <br>
4—Higher education—master’s degree <br>
5—Higher education—doctorate <br>
6—Frequency of higher education <br>
7—12th year of schooling—not completed <br>
8—11th year of schooling—not completed <br>
9—Other—11th year of schooling <br>
10—10th year of schooling <br>
11—10th year of schooling—not completed <br>
12—Basic education 3rd cycle (9th/10th/11th year) or equivalent <br>
13—Basic education 2nd cycle (6th/7th/8th year) or equivalent <br>
14—Technological specialization course <br>
15—Higher education—degree (1st cycle) <br>
16—Professional higher technical course <br>
17—Higher education—master’s degree (2nd cycle)

#### Rearranged list of variables for Previousqualification

From https://en.wikipedia.org/wiki/Education_in_Portugal, high school = secondary education which constitutes years 10, 11 and 12. 

1 - Basic education 2nd cycle (6th/7th/8th year) or equivalent <br>
2 - 10th year of schooling—not completed <br>
3 - 10th year of schooling <br>
4 - 11th year of schooling—not completed <br>
5 - Other—11th year of schooling = Basic education 3rd cycle (9th/10th/11th year) or equivalent <br>
6 - 12th year of schooling—not completed <br>
7 - Secondary education <br>
8 - Frequency of higher education #assmue that higher education was incomplete <br>
9 - Technological specialization course <br>
10 - Professional higher technical course <br>
11 - Higher education—degree = Higher education—degree (1st cycle) <br>
12 - Higher education—bachelor’s degree <br>
13 - Higher education—master’s degree <br>
14 - Higher education—master’s degree (2nd cycle) <br>
15 - Higher education—doctorate <br>

In [6]:
#using key:value to replace values under 'Previous qualification'
previous_qualification_mapper = {1:7, 2:12, 3:11, 4:13, 5:15, 6:8, 7:6, 8:4, 9:5, 10:3, 11:2, 12:5, 13:1, 14:9, 15:11, 16:10, 17:14}
df["Previous qualification"] = df['Previous qualification'].replace(previous_qualification_mapper)


### Replacing and ranking the levels of qualifications under 'Mother's qualification' and 'Father's qualification'

#### Initial list of variables for Mother's qualification and Father's qualification 
1—Secondary Education—12th Year of Schooling or Equivalent <br>
2—Higher Education—bachelor’s degree <br>
3—Higher Education—degree <br>
4—Higher Education—master’s degree <br>
5—Higher Education—doctorate <br>
6—Frequency of Higher Education <br>
7—12th Year of Schooling—not completed <br>
8—11th Year of Schooling—not completed <br>
9—7th Year (Old) <br>
10—Other—11th Year of Schooling <br>
11—2nd year complementary high school course <br>
12—10th Year of Schooling <br>
13—General commerce course <br>
14—Basic Education 3rd Cycle (9th/10th/11th Year) or Equivalent <br>
15—Complementary High School Course <br>
16—Technical-professional course <br>
17—Complementary High School Course—not concluded <br>
18—7th year of schooling <br>
19—2nd cycle of the general high school course <br>
20—9th Year of Schooling—not completed <br>
21—8th year of schooling <br>
22—General Course of Administration and Commerce <br>
23—Supplementary Accounting and Administration <br>
24—Unknown <br>
25—Cannot read or write <br>
26—Can read without having a 4th year of schooling <br>
27—Basic education 1st cycle (4th/5th year) or equivalent <br>
28—Basic Education 2nd Cycle (6th/7th/8th Year) or equivalent <br>
29—Technological specialization course <br>
30—Higher education—degree (1st cycle) <br>
31—Specialized higher studies course <br>
32—Professional higher technical course <br>
33—Higher Education—master’s degree (2nd cycle) <br>
34—Higher Education—doctorate (3rd cycle) 

#### Rearranged list of variables for Mother's qualification and Father's qualification

0 - Unknown <br>
1 - Cannnot read or write <br>
2 - Can read without having a 4th year of schooling <br>
3 - Basic education 1st cycle (4th/5th year) or equivalent <br>
4 - Basic Education 2nd Cycle (6th/7th/8th Year) or equivalent <br>
5 - 7th year (Old) = 7th year of schooling <br>
6 - 8th year of schooling <br>
7 - 9th Year of Schooling—not completed <br>
8 - Complementary High School Course—not concluded <br> 
9 - 10th Year of Schooling <br>
10 - 11th Year of Schooling—not completed <br>
11 - Other—11th Year of Schooling = 2nd year complementary high school course = Basic Education 3rd Cycle (9th/10th/11th Year) or Equivalent <br>
12 - 12th Year of Schooling—not completed <br>
13 - Secondary Education—12th Year of Schooling or Equivalent = Complementary High School Course = 2nd cycle of the general high school course #assume multiple cycles of high school refers to retaking high school after completing it. <br>
14 - Frequency of Higher Education #assume that higher education was incomplete <br> 
15 - General commerce course = General Course of Administration and Commerce = Supplementary Accounting and Administration = Technical-professional course = Technological specialization course #assume these courses are merely general specialisation courses and part of higher education but does not constitute as degrees <br>
16 - Specialized higher studies course = Professional higher technical course #assume these are higher level specialised that does not constitute degrees <br>
17 - Higher Education—degree = Higher education—degree (1st cycle) #assume that 1st cycle means higher completed first higher education degree (eg double degree holders etc), we also assume that the unspecified degree refers to the lowest level of college associate degree https://thebestschools.org/degrees/college-degree-levels/ <br>
18 - Higher Education—bachelor’s degree <br>
19 - Higher Education—master’s degree <br>
20 - Higher Education—master’s degree (2nd cycle) <br>
21 - Higher Education—doctorate <br>
22 - Higher Education—doctorate (3rd cycle) <br>
                    

In [7]:
parent_qualification_mapper = {1:13, 2:18, 3:17, 4:19, 5:21, 6:14, 7:12, 8:10, 9:5, 10:11, 11:11, 12:9, 13:15, 14:11, 15:13, 16:15, 17:8, 18:5, 19:13, 20:7, 21:6, 22:15, 23:15, 24:0, 25:1, 26:2, 27:3, 28:4, 29:15, 30:17, 31:16, 32:16, 33:20, 34:22}
df["Mother's qualification"] = df["Mother's qualification"].replace(parent_qualification_mapper)
df["Father's qualification"] = df["Father's qualification"].replace(parent_qualification_mapper)

#### Imputing unknown values in 'Mother's qualification', 'Father's qualification', 'Mother's occupation', 'Father's occupation'
Mean imputation is often used when the missing values are numerical and the distribution of the variable is approximately normal. <br>
Median imputation is preferred when the distribution is skewed, as the median is less sensitive to outliers than the mean.<br>
Mode imputation is suitable for categorical variables or numerical variables with a small number of unique values.<br>
Hence, missing values should be imputed with the mode for this dataset.

#### Find skew and mode of each feature

In [None]:
print('Mode: ',df["Mother's qualification"].mode())
print('Mode: ',df["Father's qualification"].mode())
print('Mode: ',df["Mother's occupation"].mode())
print('Mode: ',df["Father's occupation"].mode())

sb.catplot(y = "Mother's qualification", data = df, kind = "count")
sb.catplot(y= "Father's qualification", data = df, kind = 'count')
sb.catplot(y = "Mother's occupation", data = df, kind = "count")
sb.catplot(y= "Father's occupation", data = df, kind = 'count')

plt.show()

In [8]:
#replace features 'Unknown' with the mode
df["Mother's qualification"] = df["Mother's qualification"].replace({24:15})
df["Father's qualification"] = df["Father's qualification"].replace({24:3})

#replace features 'Other Situation' and '(blank)' with the mode
df["Mother's occupation"] = df["Mother's occupation"].replace({12:10, 13:10})
df["Father's occupation"] = df["Father's occupation"].replace({12:10, 13:10})

In [10]:
#Save preprocessed dataset
df.to_csv("clean_dataset_zon.csv", index=False )

In [11]:
""""""
from sklearn.preprocessing import StandardScaler

#Normalise numerical features to reduce differences between values of different columns
ss = StandardScaler()
numerical_features = ['Age at enrollment', 'Curricular units 1st sem (credited)',
                 'Curricular units 1st sem (enrolled)',
                 'Curricular units 1st sem (evaluations)',
                 'Curricular units 1st sem (approved)',
                 'Curricular units 1st sem (grade)',
                 'Curricular units 1st sem (without evaluations)',
                 'Curricular units 2nd sem (credited)',
                 'Curricular units 2nd sem (enrolled)',
                 'Curricular units 2nd sem (evaluations)',
                 'Curricular units 2nd sem (approved)',
                 'Curricular units 2nd sem (grade)',
                 'Curricular units 2nd sem (without evaluations)',
                 'Unemployment rate',
                 'Inflation rate',
                 'GDP',"Mother's qualification","Father's qualification","Previous qualification"]
df[numerical_features] = ss.fit_transform(df[numerical_features])



## Feature Engineering w FeatureTools

In [12]:
from featuretools.primitives import *

In [13]:
# Add tracker column to track index
df.insert(0, 'Tracker', range(4424))

In [14]:
# create an entity set 'es'
es = ft.EntitySet(id = 'Target')

# adding a dataframe 
es = es.add_dataframe(
    dataframe_name="students",
    dataframe=df,
    index="Tracker",
)

### Run Deep Feature Synthesis (DFS)

In [21]:
feature_matrix, feature_names = ft.dfs(entityset=es, target_dataframe_name = 'students',
                                       max_depth = 1, verbose = 1, n_jobs = 3)

Built 35 features
EntitySet scattered to 3 workers in 9 seconds                                                                          
Elapsed: 00:00 | Progress: 100%|███████████████████████████████████████████████████████████████████████████████████████


In [22]:
feature_matrix.columns.reindex(df['Tracker'])
feature_matrix = feature_matrix.reset_index()
feature_matrix.head()

Unnamed: 0,Tracker,Marital status,Application mode,Application order,Course,Daytime/evening attendance,Previous qualification,Nationality,Mother's qualification,Father's qualification,...,Curricular units 2nd sem (credited),Curricular units 2nd sem (enrolled),Curricular units 2nd sem (evaluations),Curricular units 2nd sem (approved),Curricular units 2nd sem (grade),Curricular units 2nd sem (without evaluations),Unemployment rate,Inflation rate,GDP,Target
0,0,1,8,5,2,1,-0.182347,1,0.20634,0.49313,...,-0.282442,-2.838337,-2.04263,-1.471527,-1.963489,-0.199441,-0.287638,0.124386,0.765761,Dropout
1,1,1,6,1,11,1,-0.182347,1,-0.848976,1.645184,...,-0.282442,-0.105726,-0.522682,0.518904,0.659562,-0.199441,0.876222,-1.105222,0.347199,Graduate
2,2,1,1,5,5,1,-0.182347,1,0.20634,-1.042942,...,-0.282442,-0.105726,-2.04263,-1.471527,-1.963489,-0.199441,-0.287638,0.124386,0.765761,Dropout
3,3,1,8,2,15,1,-0.182347,1,0.20634,-1.042942,...,-0.282442,-0.105726,0.490616,0.187165,0.41645,-0.199441,-0.813253,-1.466871,-1.375511,Graduate
4,4,2,12,1,3,0,-0.182347,1,0.20634,-0.850933,...,-0.282442,-0.105726,-0.522682,0.518904,0.531608,-0.199441,0.876222,-1.105222,0.347199,Graduate


In [23]:
#feature_matrix.drop(['Tracker'], axis=1, inplace=True)
X = feature_matrix.iloc[:,:35]
y = feature_matrix.iloc[:,35:]

In [24]:
#Drop features with no importance: Previous qualification, Daytime/evening attendance, International 
#df.drop(['Previous qualification','Daytime/evening attendance','International'], axis = 1)
#X = df[features_list]
#y = df[forecast_var] 

# Encode categorical features
y = y.astype("category")
le = LabelEncoder()
y = le.fit_transform(np.ravel(y))

# **Hyperparameter Tuning w Optuna**

In [25]:
#hide warnings so they do not affect the functionality of the package
import warnings
warnings.filterwarnings('ignore')

In [27]:
# Split train and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.9)
# Split validation set from initial train set to form 8:1:1 train:validation:test ratio
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, train_size=0.89,)

In [31]:
type(dtrain)

xgboost.core.DMatrix

In [52]:
from sklearn.metrics import log_loss

def objective_with_pruning(trial):
    #define the hyperparameters to optimize
    params = {
        'eval_metric': 'mlogloss',
        'objective': 'multi:softmax',
        'max_depth': trial.suggest_int('max_depth', 5, 8),
        'subsample': trial.suggest_discrete_uniform('subsample', 0.6, 1.0, 0.05),
        'n_estimators': trial.suggest_int('n_estimators', 1800, 4000, 50),
        'eta': trial.suggest_loguniform('eta', 0.005, 0.1),
        'reg_alpha': trial.suggest_loguniform('reg_alpha', 1e-8, 1e-5),
        'reg_lambda': trial.suggest_loguniform('reg_lambda', 1e-6, 0.5),
        'min_split_loss': trial.suggest_loguniform('gamma', 1e-5, 1), 
        'min_child_weight': trial.suggest_int('min_child_weight', 5, 15),
        "colsample_bytree": trial.suggest_uniform("colsample_bytree", 0.1, 0.5)}
    pruning_callback = optuna.integration.XGBoostPruningCallback(trial, "validation_0-mlogloss")
    model = xgb.XGBClassifier(early_stopping_rounds = 50, booster = 'gbtree',tree_method= 'approx',enable_categorical=True)
    model.set_params(**params)
    model.fit(X_train,y_train, eval_set = [(X_val,y_val)], callbacks=[pruning_callback], verbose = 10) 
    
    preds = model.predict_proba(X_val)
    loss = log_loss(y_val, preds)
    return loss

In [53]:
"""
1. 100 trials
Trial 72 finished with value: 0.552958542544134 and parameters: 
{'max_depth': 5, 'subsample': 0.8, 'n_estimators': 1950, 'eta': 0.02935051557466342, 
'reg_alpha': 2.7452323045088246e-08, 'reg_lambda': 3.771084614288098e-05, 'gamma': 0.000392937067496624,
'min_child_weight': 12, 'colsample_bytree': 0.2739207213317758}. 

2.500 trials
Trial 244 finished with value: 0.5555424854967693 and parameters: 
{'max_depth': 6, 'subsample': 0.95, 'n_estimators': 3350, 'eta': 0.008666006292274537, 
'reg_alpha': 1.2870589813811533e-06, 'reg_lambda': 0.13154643432513044, 'gamma': 0.026360214147244408
'min_child_weight': 5, 'colsample_bytree': 0.22708548202884274}. 

3. 100 trials, changed parameter search space, added pruning callback
Trial 22 finished with value: 0.5520506186121606 and parameters: 
{'max_depth': 6, 'subsample': 0.65, 'n_estimators': 2500, 'eta': 0.027735798765723383, 
'reg_alpha': 2.8442623168179127e-08, 'reg_lambda': 9.97186254008275e-05, 'gamma': 0.00042214727119007766, 
'min_child_weight': 11, 'colsample_bytree': 0.38249471476294605}.

4. 500 trials
Trial 217 finished with value: 0.5511272925050019 and parameters: 
{'max_depth': 7, 'subsample': 0.8, 'n_estimators': 3950, 'eta': 0.049656985841559836, 
'reg_alpha': 1.9886858590626805e-08, 'reg_lambda': 8.750237693599536e-05, 'gamma': 0.02578091778103115, 
'min_child_weight': 14, 'colsample_bytree': 0.3834674148298466}.

5. Trial 49 finished with value: 0.5502489012619268 and parameters: 
{'max_depth': 7, 'subsample': 0.7, 'n_estimators': 3800, 'eta': 0.09967223617945833, 
'reg_alpha': 4.1181276209901314e-08, 'reg_lambda': 0.040098666589886055, 'gamma': 0.924533874823556, 
'min_child_weight': 5, 'colsample_bytree': 0.33274357291224116}
Accuracy:0.78, F1 Score:[0.77 0.46  0.87]
"""
study = optuna.create_study(direction='minimize')
study.optimize(objective_with_pruning, n_trials=100) #change number of trials later

[32m[I 2023-03-29 13:57:14,034][0m A new study created in memory with name: no-name-694a23af-4bd9-4c55-98b9-49ed3bfdb459[0m


[0]	validation_0-mlogloss:1.09354
[10]	validation_0-mlogloss:1.04707
[20]	validation_0-mlogloss:1.00211
[30]	validation_0-mlogloss:0.96300
[40]	validation_0-mlogloss:0.92815
[50]	validation_0-mlogloss:0.89649
[60]	validation_0-mlogloss:0.86808
[70]	validation_0-mlogloss:0.84245
[80]	validation_0-mlogloss:0.81943
[90]	validation_0-mlogloss:0.79717
[100]	validation_0-mlogloss:0.77775
[110]	validation_0-mlogloss:0.76050
[120]	validation_0-mlogloss:0.74560
[130]	validation_0-mlogloss:0.73075
[140]	validation_0-mlogloss:0.71750
[150]	validation_0-mlogloss:0.70434
[160]	validation_0-mlogloss:0.69358
[170]	validation_0-mlogloss:0.68325
[180]	validation_0-mlogloss:0.67408
[190]	validation_0-mlogloss:0.66521
[200]	validation_0-mlogloss:0.65765
[210]	validation_0-mlogloss:0.64991
[220]	validation_0-mlogloss:0.64310
[230]	validation_0-mlogloss:0.63636
[240]	validation_0-mlogloss:0.63078
[250]	validation_0-mlogloss:0.62540
[260]	validation_0-mlogloss:0.62044
[270]	validation_0-mlogloss:0.61609
[28

[32m[I 2023-03-29 13:59:12,576][0m Trial 0 finished with value: 0.5571405587571175 and parameters: {'max_depth': 8, 'subsample': 0.9, 'n_estimators': 3900, 'eta': 0.008365103949136306, 'reg_alpha': 7.410669113021105e-08, 'reg_lambda': 0.3843956956709221, 'gamma': 0.03983310447516707, 'min_child_weight': 7, 'colsample_bytree': 0.4152706258048521}. Best is trial 0 with value: 0.5571405587571175.[0m


[0]	validation_0-mlogloss:1.08840
[10]	validation_0-mlogloss:1.00007
[20]	validation_0-mlogloss:0.92643
[30]	validation_0-mlogloss:0.86874
[40]	validation_0-mlogloss:0.82035
[50]	validation_0-mlogloss:0.78507
[60]	validation_0-mlogloss:0.75506
[70]	validation_0-mlogloss:0.72727
[80]	validation_0-mlogloss:0.70462
[90]	validation_0-mlogloss:0.68399
[100]	validation_0-mlogloss:0.66736
[110]	validation_0-mlogloss:0.65391
[120]	validation_0-mlogloss:0.64234
[130]	validation_0-mlogloss:0.63211
[140]	validation_0-mlogloss:0.62382
[150]	validation_0-mlogloss:0.61573
[160]	validation_0-mlogloss:0.61011
[170]	validation_0-mlogloss:0.60419
[180]	validation_0-mlogloss:0.59982
[190]	validation_0-mlogloss:0.59465
[200]	validation_0-mlogloss:0.59093
[210]	validation_0-mlogloss:0.58770
[220]	validation_0-mlogloss:0.58507
[230]	validation_0-mlogloss:0.58199
[240]	validation_0-mlogloss:0.57990
[250]	validation_0-mlogloss:0.57829
[260]	validation_0-mlogloss:0.57615
[270]	validation_0-mlogloss:0.57494
[28

[32m[I 2023-03-29 14:00:38,294][0m Trial 1 finished with value: 0.5562613600446124 and parameters: {'max_depth': 6, 'subsample': 0.9, 'n_estimators': 3600, 'eta': 0.018503924224253954, 'reg_alpha': 9.32868321616411e-08, 'reg_lambda': 0.19890153457209536, 'gamma': 0.004900517560155265, 'min_child_weight': 8, 'colsample_bytree': 0.2987209922107936}. Best is trial 1 with value: 0.5562613600446124.[0m


[0]	validation_0-mlogloss:1.08697
[10]	validation_0-mlogloss:0.97601
[20]	validation_0-mlogloss:0.89286
[30]	validation_0-mlogloss:0.82720
[40]	validation_0-mlogloss:0.77668
[50]	validation_0-mlogloss:0.73600
[60]	validation_0-mlogloss:0.70358
[70]	validation_0-mlogloss:0.67746
[80]	validation_0-mlogloss:0.65680
[90]	validation_0-mlogloss:0.63894
[100]	validation_0-mlogloss:0.62498
[110]	validation_0-mlogloss:0.61391
[120]	validation_0-mlogloss:0.60534
[130]	validation_0-mlogloss:0.59758
[140]	validation_0-mlogloss:0.59148
[150]	validation_0-mlogloss:0.58619
[160]	validation_0-mlogloss:0.58236
[170]	validation_0-mlogloss:0.57861
[180]	validation_0-mlogloss:0.57606
[190]	validation_0-mlogloss:0.57285
[200]	validation_0-mlogloss:0.56963
[210]	validation_0-mlogloss:0.56779
[220]	validation_0-mlogloss:0.56656
[230]	validation_0-mlogloss:0.56497
[240]	validation_0-mlogloss:0.56484
[250]	validation_0-mlogloss:0.56396
[260]	validation_0-mlogloss:0.56336
[270]	validation_0-mlogloss:0.56277
[28

[32m[I 2023-03-29 14:01:38,720][0m Trial 2 finished with value: 0.5605040767695755 and parameters: {'max_depth': 8, 'subsample': 0.85, 'n_estimators': 3350, 'eta': 0.019617579914960742, 'reg_alpha': 3.599726897901707e-07, 'reg_lambda': 0.243484850603095, 'gamma': 0.002152494978292825, 'min_child_weight': 5, 'colsample_bytree': 0.492705176818364}. Best is trial 1 with value: 0.5562613600446124.[0m


[0]	validation_0-mlogloss:1.08502
[10]	validation_0-mlogloss:0.95678
[20]	validation_0-mlogloss:0.86328
[30]	validation_0-mlogloss:0.79482
[40]	validation_0-mlogloss:0.74631
[50]	validation_0-mlogloss:0.71140
[60]	validation_0-mlogloss:0.68358
[70]	validation_0-mlogloss:0.66054
[80]	validation_0-mlogloss:0.64120
[90]	validation_0-mlogloss:0.62491
[100]	validation_0-mlogloss:0.61399
[110]	validation_0-mlogloss:0.60434
[120]	validation_0-mlogloss:0.59798
[130]	validation_0-mlogloss:0.59133
[140]	validation_0-mlogloss:0.58693
[150]	validation_0-mlogloss:0.58373
[160]	validation_0-mlogloss:0.58076
[170]	validation_0-mlogloss:0.57683
[180]	validation_0-mlogloss:0.57415
[190]	validation_0-mlogloss:0.57155
[200]	validation_0-mlogloss:0.56965
[210]	validation_0-mlogloss:0.56725
[220]	validation_0-mlogloss:0.56556
[230]	validation_0-mlogloss:0.56308
[240]	validation_0-mlogloss:0.56347
[250]	validation_0-mlogloss:0.56175
[260]	validation_0-mlogloss:0.56043
[270]	validation_0-mlogloss:0.56107
[28

[32m[I 2023-03-29 14:02:31,460][0m Trial 3 finished with value: 0.5541950169255421 and parameters: {'max_depth': 7, 'subsample': 0.6, 'n_estimators': 3950, 'eta': 0.029761503205703116, 'reg_alpha': 7.000640567835314e-08, 'reg_lambda': 0.00040137622513727617, 'gamma': 0.3458339392995608, 'min_child_weight': 8, 'colsample_bytree': 0.27476430683585007}. Best is trial 3 with value: 0.5541950169255421.[0m


[0]	validation_0-mlogloss:1.07719
[10]	validation_0-mlogloss:0.86049
[20]	validation_0-mlogloss:0.75556
[30]	validation_0-mlogloss:0.69563
[40]	validation_0-mlogloss:0.65684
[50]	validation_0-mlogloss:0.63403
[60]	validation_0-mlogloss:0.61878
[70]	validation_0-mlogloss:0.60265
[80]	validation_0-mlogloss:0.59340
[90]	validation_0-mlogloss:0.58336
[100]	validation_0-mlogloss:0.57785
[110]	validation_0-mlogloss:0.57383
[120]	validation_0-mlogloss:0.56949
[130]	validation_0-mlogloss:0.56699
[140]	validation_0-mlogloss:0.56409
[150]	validation_0-mlogloss:0.56362
[160]	validation_0-mlogloss:0.56463
[170]	validation_0-mlogloss:0.56379
[180]	validation_0-mlogloss:0.56233
[190]	validation_0-mlogloss:0.55977
[200]	validation_0-mlogloss:0.55974
[210]	validation_0-mlogloss:0.55911
[220]	validation_0-mlogloss:0.55954
[230]	validation_0-mlogloss:0.55913
[240]	validation_0-mlogloss:0.55906
[250]	validation_0-mlogloss:0.55855
[260]	validation_0-mlogloss:0.55781
[270]	validation_0-mlogloss:0.55676
[28

[32m[I 2023-03-29 14:02:57,225][0m Trial 4 finished with value: 0.5558071736465358 and parameters: {'max_depth': 5, 'subsample': 0.8, 'n_estimators': 3400, 'eta': 0.07474573179889897, 'reg_alpha': 5.141636470332441e-07, 'reg_lambda': 0.2702668839372833, 'gamma': 3.870259449424183e-05, 'min_child_weight': 8, 'colsample_bytree': 0.17480754911661922}. Best is trial 3 with value: 0.5541950169255421.[0m


[0]	validation_0-mlogloss:1.08635
[10]	validation_0-mlogloss:0.95220
[20]	validation_0-mlogloss:0.86482
[30]	validation_0-mlogloss:0.80196
[40]	validation_0-mlogloss:0.75994
[50]	validation_0-mlogloss:0.73213


[32m[I 2023-03-29 14:03:03,928][0m Trial 5 pruned. Trial was pruned at iteration 55.[0m


[0]	validation_0-mlogloss:1.08760


[32m[I 2023-03-29 14:03:04,235][0m Trial 6 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:03:04,567][0m Trial 7 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.09343


[32m[I 2023-03-29 14:03:04,934][0m Trial 8 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.08393
[10]	validation_0-mlogloss:0.94665
[20]	validation_0-mlogloss:0.85110
[30]	validation_0-mlogloss:0.78494
[40]	validation_0-mlogloss:0.73664
[50]	validation_0-mlogloss:0.70385
[60]	validation_0-mlogloss:0.67837
[70]	validation_0-mlogloss:0.65473
[80]	validation_0-mlogloss:0.63765
[90]	validation_0-mlogloss:0.62349
[100]	validation_0-mlogloss:0.61352
[110]	validation_0-mlogloss:0.60515
[120]	validation_0-mlogloss:0.59835
[130]	validation_0-mlogloss:0.59176
[140]	validation_0-mlogloss:0.58680
[150]	validation_0-mlogloss:0.58355
[160]	validation_0-mlogloss:0.58050
[170]	validation_0-mlogloss:0.57665
[180]	validation_0-mlogloss:0.57459
[190]	validation_0-mlogloss:0.57233


[32m[I 2023-03-29 14:03:25,220][0m Trial 9 pruned. Trial was pruned at iteration 195.[0m


[0]	validation_0-mlogloss:1.06195
[10]	validation_0-mlogloss:0.80158
[20]	validation_0-mlogloss:0.69939
[30]	validation_0-mlogloss:0.64589
[40]	validation_0-mlogloss:0.61611
[50]	validation_0-mlogloss:0.59863
[60]	validation_0-mlogloss:0.58820
[70]	validation_0-mlogloss:0.57893
[80]	validation_0-mlogloss:0.57361
[90]	validation_0-mlogloss:0.56929
[100]	validation_0-mlogloss:0.56911
[110]	validation_0-mlogloss:0.56890
[120]	validation_0-mlogloss:0.56639
[130]	validation_0-mlogloss:0.56544
[140]	validation_0-mlogloss:0.56588
[150]	validation_0-mlogloss:0.56570
[160]	validation_0-mlogloss:0.56742
[170]	validation_0-mlogloss:0.56781
[180]	validation_0-mlogloss:0.56782


[32m[I 2023-03-29 14:03:44,423][0m Trial 10 finished with value: 0.56544308328882 and parameters: {'max_depth': 7, 'subsample': 1.0, 'n_estimators': 2750, 'eta': 0.08422935940350774, 'reg_alpha': 1.5144008002615458e-08, 'reg_lambda': 1.1424468064715759e-06, 'gamma': 0.7387828015414005, 'min_child_weight': 10, 'colsample_bytree': 0.24954507324023234}. Best is trial 3 with value: 0.5541950169255421.[0m


[0]	validation_0-mlogloss:1.07504
[10]	validation_0-mlogloss:0.85917
[20]	validation_0-mlogloss:0.75890
[30]	validation_0-mlogloss:0.70261
[40]	validation_0-mlogloss:0.67067
[50]	validation_0-mlogloss:0.64762
[60]	validation_0-mlogloss:0.63394
[70]	validation_0-mlogloss:0.61790
[80]	validation_0-mlogloss:0.60695
[90]	validation_0-mlogloss:0.59555
[100]	validation_0-mlogloss:0.58757
[110]	validation_0-mlogloss:0.58494
[120]	validation_0-mlogloss:0.58255
[130]	validation_0-mlogloss:0.57778
[140]	validation_0-mlogloss:0.57537
[150]	validation_0-mlogloss:0.57567
[160]	validation_0-mlogloss:0.57644
[170]	validation_0-mlogloss:0.57677
[180]	validation_0-mlogloss:0.57585


[32m[I 2023-03-29 14:04:07,780][0m Trial 11 pruned. Trial was pruned at iteration 184.[0m


[0]	validation_0-mlogloss:1.07661
[10]	validation_0-mlogloss:0.89520
[20]	validation_0-mlogloss:0.79320
[30]	validation_0-mlogloss:0.73117
[40]	validation_0-mlogloss:0.68524
[50]	validation_0-mlogloss:0.65953
[60]	validation_0-mlogloss:0.64026
[70]	validation_0-mlogloss:0.61953
[80]	validation_0-mlogloss:0.60637
[90]	validation_0-mlogloss:0.59417
[100]	validation_0-mlogloss:0.58744
[110]	validation_0-mlogloss:0.58343
[120]	validation_0-mlogloss:0.57825
[130]	validation_0-mlogloss:0.57376
[140]	validation_0-mlogloss:0.57097
[150]	validation_0-mlogloss:0.56927
[160]	validation_0-mlogloss:0.56813
[170]	validation_0-mlogloss:0.56589
[180]	validation_0-mlogloss:0.56540
[190]	validation_0-mlogloss:0.56341
[200]	validation_0-mlogloss:0.56313
[210]	validation_0-mlogloss:0.56158
[220]	validation_0-mlogloss:0.56205
[230]	validation_0-mlogloss:0.56081
[240]	validation_0-mlogloss:0.56212
[250]	validation_0-mlogloss:0.56240
[260]	validation_0-mlogloss:0.56428
[270]	validation_0-mlogloss:0.56356
[27

[32m[I 2023-03-29 14:04:37,292][0m Trial 12 finished with value: 0.5608080476628761 and parameters: {'max_depth': 7, 'subsample': 0.7, 'n_estimators': 3650, 'eta': 0.0515611703239618, 'reg_alpha': 9.136000587598434e-07, 'reg_lambda': 0.00420831563865723, 'gamma': 0.00018457823338438427, 'min_child_weight': 11, 'colsample_bytree': 0.2122642688934603}. Best is trial 3 with value: 0.5541950169255421.[0m
[32m[I 2023-03-29 14:04:37,673][0m Trial 13 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.07122
[10]	validation_0-mlogloss:0.86534
[20]	validation_0-mlogloss:0.76167
[30]	validation_0-mlogloss:0.69949
[40]	validation_0-mlogloss:0.65904
[50]	validation_0-mlogloss:0.63531
[60]	validation_0-mlogloss:0.61903
[70]	validation_0-mlogloss:0.60167
[80]	validation_0-mlogloss:0.59039
[90]	validation_0-mlogloss:0.57968
[100]	validation_0-mlogloss:0.57553
[110]	validation_0-mlogloss:0.57210
[120]	validation_0-mlogloss:0.56803
[130]	validation_0-mlogloss:0.56401
[140]	validation_0-mlogloss:0.56245
[150]	validation_0-mlogloss:0.56200
[160]	validation_0-mlogloss:0.56261
[170]	validation_0-mlogloss:0.56120
[180]	validation_0-mlogloss:0.56140
[190]	validation_0-mlogloss:0.55978
[200]	validation_0-mlogloss:0.56045
[210]	validation_0-mlogloss:0.56006
[220]	validation_0-mlogloss:0.56022
[230]	validation_0-mlogloss:0.55999
[240]	validation_0-mlogloss:0.56065
[242]	validation_0-mlogloss:0.56042


[32m[I 2023-03-29 14:04:49,793][0m Trial 14 finished with value: 0.5593407289869233 and parameters: {'max_depth': 6, 'subsample': 1.0, 'n_estimators': 3700, 'eta': 0.06192664194243773, 'reg_alpha': 3.902902532252812e-08, 'reg_lambda': 0.025413942393386135, 'gamma': 0.005850479954470622, 'min_child_weight': 5, 'colsample_bytree': 0.21205212443225147}. Best is trial 3 with value: 0.5541950169255421.[0m


[0]	validation_0-mlogloss:1.08107


[32m[I 2023-03-29 14:04:49,998][0m Trial 15 pruned. Trial was pruned at iteration 1.[0m
[32m[I 2023-03-29 14:04:50,143][0m Trial 16 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:04:50,305][0m Trial 17 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.04837
[10]	validation_0-mlogloss:0.77099
[20]	validation_0-mlogloss:0.66262
[30]	validation_0-mlogloss:0.61513
[40]	validation_0-mlogloss:0.59209
[50]	validation_0-mlogloss:0.58146
[60]	validation_0-mlogloss:0.57273
[70]	validation_0-mlogloss:0.56886
[80]	validation_0-mlogloss:0.56554
[90]	validation_0-mlogloss:0.56430
[100]	validation_0-mlogloss:0.56544
[110]	validation_0-mlogloss:0.56630
[120]	validation_0-mlogloss:0.56524
[130]	validation_0-mlogloss:0.56561
[140]	validation_0-mlogloss:0.56646
[146]	validation_0-mlogloss:0.56647


[32m[I 2023-03-29 14:04:58,515][0m Trial 18 finished with value: 0.5636126284214314 and parameters: {'max_depth': 7, 'subsample': 0.85, 'n_estimators': 3700, 'eta': 0.09493943459732897, 'reg_alpha': 1.6923844715031918e-07, 'reg_lambda': 0.00902209918700889, 'gamma': 0.00032690619284249817, 'min_child_weight': 15, 'colsample_bytree': 0.2902555186538342}. Best is trial 3 with value: 0.5541950169255421.[0m


[0]	validation_0-mlogloss:1.08623


[32m[I 2023-03-29 14:04:58,894][0m Trial 19 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:04:59,236][0m Trial 20 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:04:59,599][0m Trial 21 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:04:59,982][0m Trial 22 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.07653


[32m[I 2023-03-29 14:05:00,689][0m Trial 23 pruned. Trial was pruned at iteration 2.[0m


[0]	validation_0-mlogloss:1.08649


[32m[I 2023-03-29 14:05:01,159][0m Trial 24 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.06653
[10]	validation_0-mlogloss:0.83807
[20]	validation_0-mlogloss:0.73435
[30]	validation_0-mlogloss:0.67673
[40]	validation_0-mlogloss:0.64040
[50]	validation_0-mlogloss:0.62102
[60]	validation_0-mlogloss:0.60678
[70]	validation_0-mlogloss:0.59303
[80]	validation_0-mlogloss:0.58181
[90]	validation_0-mlogloss:0.57417
[100]	validation_0-mlogloss:0.57058
[110]	validation_0-mlogloss:0.56767
[120]	validation_0-mlogloss:0.56361
[130]	validation_0-mlogloss:0.56081
[140]	validation_0-mlogloss:0.55816
[150]	validation_0-mlogloss:0.55866
[160]	validation_0-mlogloss:0.55896
[170]	validation_0-mlogloss:0.55831
[180]	validation_0-mlogloss:0.55712
[190]	validation_0-mlogloss:0.55508
[200]	validation_0-mlogloss:0.55503
[210]	validation_0-mlogloss:0.55481
[220]	validation_0-mlogloss:0.55631
[230]	validation_0-mlogloss:0.55598
[240]	validation_0-mlogloss:0.55826
[245]	validation_0-mlogloss:0.55775


[32m[I 2023-03-29 14:05:15,361][0m Trial 25 finished with value: 0.5542978874400266 and parameters: {'max_depth': 6, 'subsample': 0.8, 'n_estimators': 3100, 'eta': 0.0752645265425817, 'reg_alpha': 2.663739238696e-07, 'reg_lambda': 0.4645383543742413, 'gamma': 0.0021981090211964137, 'min_child_weight': 10, 'colsample_bytree': 0.22662636189698684}. Best is trial 3 with value: 0.5541950169255421.[0m


[0]	validation_0-mlogloss:1.07656
[10]	validation_0-mlogloss:0.86511
[20]	validation_0-mlogloss:0.75943
[30]	validation_0-mlogloss:0.69915
[40]	validation_0-mlogloss:0.65941
[50]	validation_0-mlogloss:0.63767
[60]	validation_0-mlogloss:0.62099
[70]	validation_0-mlogloss:0.60430
[80]	validation_0-mlogloss:0.59333
[90]	validation_0-mlogloss:0.58304
[100]	validation_0-mlogloss:0.57816
[110]	validation_0-mlogloss:0.57405
[120]	validation_0-mlogloss:0.56974
[130]	validation_0-mlogloss:0.56697
[140]	validation_0-mlogloss:0.56503
[150]	validation_0-mlogloss:0.56348
[160]	validation_0-mlogloss:0.56440
[170]	validation_0-mlogloss:0.56415
[180]	validation_0-mlogloss:0.56216
[190]	validation_0-mlogloss:0.56170
[200]	validation_0-mlogloss:0.56089
[210]	validation_0-mlogloss:0.55941
[220]	validation_0-mlogloss:0.56090
[230]	validation_0-mlogloss:0.55977
[240]	validation_0-mlogloss:0.56102
[250]	validation_0-mlogloss:0.56067
[260]	validation_0-mlogloss:0.56105
[262]	validation_0-mlogloss:0.56125


[32m[I 2023-03-29 14:05:28,862][0m Trial 26 finished with value: 0.5591636312903292 and parameters: {'max_depth': 7, 'subsample': 0.8, 'n_estimators': 3050, 'eta': 0.07157712527194895, 'reg_alpha': 2.926331704793976e-07, 'reg_lambda': 0.01235544110796169, 'gamma': 0.0026789875078156444, 'min_child_weight': 12, 'colsample_bytree': 0.1998603426360932}. Best is trial 3 with value: 0.5541950169255421.[0m


[0]	validation_0-mlogloss:1.07713


[32m[I 2023-03-29 14:05:29,267][0m Trial 27 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.07400


[32m[I 2023-03-29 14:05:29,854][0m Trial 28 pruned. Trial was pruned at iteration 1.[0m


[0]	validation_0-mlogloss:1.05601
[10]	validation_0-mlogloss:0.77758
[20]	validation_0-mlogloss:0.67330
[30]	validation_0-mlogloss:0.62628
[40]	validation_0-mlogloss:0.59984
[50]	validation_0-mlogloss:0.58595
[60]	validation_0-mlogloss:0.57558
[70]	validation_0-mlogloss:0.56904
[80]	validation_0-mlogloss:0.56451
[90]	validation_0-mlogloss:0.55855
[100]	validation_0-mlogloss:0.56038
[110]	validation_0-mlogloss:0.55725
[120]	validation_0-mlogloss:0.55374
[130]	validation_0-mlogloss:0.55483
[140]	validation_0-mlogloss:0.55435
[150]	validation_0-mlogloss:0.55380
[160]	validation_0-mlogloss:0.55778
[168]	validation_0-mlogloss:0.55954


[32m[I 2023-03-29 14:05:39,056][0m Trial 29 finished with value: 0.5530204807788763 and parameters: {'max_depth': 8, 'subsample': 0.85, 'n_estimators': 3850, 'eta': 0.09759008931163002, 'reg_alpha': 1.1609075288025477e-07, 'reg_lambda': 0.49656169941578915, 'gamma': 0.03969180560465921, 'min_child_weight': 6, 'colsample_bytree': 0.24059183567792758}. Best is trial 29 with value: 0.5530204807788763.[0m


[0]	validation_0-mlogloss:1.05605
[10]	validation_0-mlogloss:0.77342
[20]	validation_0-mlogloss:0.66991
[30]	validation_0-mlogloss:0.62478
[40]	validation_0-mlogloss:0.59828
[50]	validation_0-mlogloss:0.58567
[60]	validation_0-mlogloss:0.57805
[70]	validation_0-mlogloss:0.57366
[80]	validation_0-mlogloss:0.56635
[90]	validation_0-mlogloss:0.56293
[100]	validation_0-mlogloss:0.56517
[110]	validation_0-mlogloss:0.56311
[120]	validation_0-mlogloss:0.56158
[130]	validation_0-mlogloss:0.56393
[140]	validation_0-mlogloss:0.56539
[150]	validation_0-mlogloss:0.56576
[160]	validation_0-mlogloss:0.56731
[167]	validation_0-mlogloss:0.56921


[32m[I 2023-03-29 14:05:48,312][0m Trial 30 finished with value: 0.5598075220783424 and parameters: {'max_depth': 8, 'subsample': 0.85, 'n_estimators': 3850, 'eta': 0.09790246158307318, 'reg_alpha': 5.988349700644604e-08, 'reg_lambda': 0.00029622625495259107, 'gamma': 0.3542206204634347, 'min_child_weight': 6, 'colsample_bytree': 0.23909733969187297}. Best is trial 29 with value: 0.5530204807788763.[0m


[0]	validation_0-mlogloss:1.06455
[10]	validation_0-mlogloss:0.81827
[20]	validation_0-mlogloss:0.71544
[30]	validation_0-mlogloss:0.66117
[40]	validation_0-mlogloss:0.62658
[50]	validation_0-mlogloss:0.60962
[60]	validation_0-mlogloss:0.59682
[70]	validation_0-mlogloss:0.58438
[80]	validation_0-mlogloss:0.57631
[90]	validation_0-mlogloss:0.57073
[100]	validation_0-mlogloss:0.56623
[110]	validation_0-mlogloss:0.56313
[120]	validation_0-mlogloss:0.56006
[130]	validation_0-mlogloss:0.55916
[140]	validation_0-mlogloss:0.55888
[150]	validation_0-mlogloss:0.55948
[160]	validation_0-mlogloss:0.56280
[170]	validation_0-mlogloss:0.56246
[180]	validation_0-mlogloss:0.56347
[190]	validation_0-mlogloss:0.56526


[32m[I 2023-03-29 14:05:59,803][0m Trial 31 finished with value: 0.5588781589984503 and parameters: {'max_depth': 8, 'subsample': 0.8, 'n_estimators': 3850, 'eta': 0.08180196730663859, 'reg_alpha': 1.361915400236759e-07, 'reg_lambda': 0.3625746833970429, 'gamma': 0.023090193389064435, 'min_child_weight': 6, 'colsample_bytree': 0.2058386259865726}. Best is trial 29 with value: 0.5530204807788763.[0m


[0]	validation_0-mlogloss:1.06742
[10]	validation_0-mlogloss:0.83145
[20]	validation_0-mlogloss:0.71675
[30]	validation_0-mlogloss:0.65555
[40]	validation_0-mlogloss:0.62298
[50]	validation_0-mlogloss:0.60161
[60]	validation_0-mlogloss:0.58719
[70]	validation_0-mlogloss:0.57845
[80]	validation_0-mlogloss:0.56982
[90]	validation_0-mlogloss:0.56452
[100]	validation_0-mlogloss:0.56572
[110]	validation_0-mlogloss:0.56263
[120]	validation_0-mlogloss:0.56004
[130]	validation_0-mlogloss:0.55774
[140]	validation_0-mlogloss:0.55716
[150]	validation_0-mlogloss:0.55602
[160]	validation_0-mlogloss:0.55848
[170]	validation_0-mlogloss:0.55937
[180]	validation_0-mlogloss:0.56184
[190]	validation_0-mlogloss:0.56297
[199]	validation_0-mlogloss:0.56527


[32m[I 2023-03-29 14:06:13,109][0m Trial 32 finished with value: 0.5560193557715358 and parameters: {'max_depth': 8, 'subsample': 0.85, 'n_estimators': 3800, 'eta': 0.06858891667258572, 'reg_alpha': 1.0384803706435423e-07, 'reg_lambda': 0.46161554675510824, 'gamma': 0.052102235547442394, 'min_child_weight': 7, 'colsample_bytree': 0.26500642953710873}. Best is trial 29 with value: 0.5530204807788763.[0m


[0]	validation_0-mlogloss:1.07197


[32m[I 2023-03-29 14:06:13,308][0m Trial 33 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.06358
[10]	validation_0-mlogloss:0.81725
[20]	validation_0-mlogloss:0.71310
[30]	validation_0-mlogloss:0.65789
[40]	validation_0-mlogloss:0.62540
[50]	validation_0-mlogloss:0.60665
[60]	validation_0-mlogloss:0.59595
[70]	validation_0-mlogloss:0.58408
[80]	validation_0-mlogloss:0.57558
[90]	validation_0-mlogloss:0.56855
[100]	validation_0-mlogloss:0.56657
[110]	validation_0-mlogloss:0.56344
[120]	validation_0-mlogloss:0.56493
[130]	validation_0-mlogloss:0.56371
[140]	validation_0-mlogloss:0.56421
[150]	validation_0-mlogloss:0.56371
[160]	validation_0-mlogloss:0.56684
[170]	validation_0-mlogloss:0.56752
[180]	validation_0-mlogloss:0.56956
[182]	validation_0-mlogloss:0.57019


[32m[I 2023-03-29 14:06:24,025][0m Trial 34 finished with value: 0.5623990249922116 and parameters: {'max_depth': 8, 'subsample': 0.9, 'n_estimators': 3450, 'eta': 0.08327718987180972, 'reg_alpha': 2.85168969289288e-07, 'reg_lambda': 0.23221847355919645, 'gamma': 0.001848769985740355, 'min_child_weight': 5, 'colsample_bytree': 0.21722192555009126}. Best is trial 29 with value: 0.5530204807788763.[0m


[0]	validation_0-mlogloss:1.08242


[32m[I 2023-03-29 14:06:24,408][0m Trial 35 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.08574


[32m[I 2023-03-29 14:06:24,801][0m Trial 36 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:06:25,203][0m Trial 37 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.08174


[32m[I 2023-03-29 14:06:25,603][0m Trial 38 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.06490
[10]	validation_0-mlogloss:0.81519
[20]	validation_0-mlogloss:0.70870
[30]	validation_0-mlogloss:0.65335
[40]	validation_0-mlogloss:0.61926
[50]	validation_0-mlogloss:0.60061
[60]	validation_0-mlogloss:0.58797
[70]	validation_0-mlogloss:0.57664
[80]	validation_0-mlogloss:0.56871
[90]	validation_0-mlogloss:0.56507
[100]	validation_0-mlogloss:0.56249
[110]	validation_0-mlogloss:0.56005
[120]	validation_0-mlogloss:0.55573
[130]	validation_0-mlogloss:0.55585
[140]	validation_0-mlogloss:0.55693
[150]	validation_0-mlogloss:0.55851
[160]	validation_0-mlogloss:0.55997
[170]	validation_0-mlogloss:0.56107
[174]	validation_0-mlogloss:0.56152


[32m[I 2023-03-29 14:06:34,551][0m Trial 39 finished with value: 0.555124945391063 and parameters: {'max_depth': 6, 'subsample': 0.8, 'n_estimators': 3400, 'eta': 0.07842383934033467, 'reg_alpha': 2.285470389587829e-07, 'reg_lambda': 0.059354573323654465, 'gamma': 0.03893552372903505, 'min_child_weight': 6, 'colsample_bytree': 0.2490920199630137}. Best is trial 29 with value: 0.5530204807788763.[0m


[0]	validation_0-mlogloss:1.06243
[10]	validation_0-mlogloss:0.81094
[20]	validation_0-mlogloss:0.70090
[30]	validation_0-mlogloss:0.64231
[40]	validation_0-mlogloss:0.61385
[50]	validation_0-mlogloss:0.59911
[60]	validation_0-mlogloss:0.58580
[70]	validation_0-mlogloss:0.57740
[80]	validation_0-mlogloss:0.57155
[90]	validation_0-mlogloss:0.56649
[100]	validation_0-mlogloss:0.56567
[110]	validation_0-mlogloss:0.56395
[120]	validation_0-mlogloss:0.56076
[130]	validation_0-mlogloss:0.55887
[140]	validation_0-mlogloss:0.56041
[150]	validation_0-mlogloss:0.56044
[160]	validation_0-mlogloss:0.56329
[170]	validation_0-mlogloss:0.56502
[180]	validation_0-mlogloss:0.56548


[32m[I 2023-03-29 14:06:43,709][0m Trial 40 finished with value: 0.5588697497892411 and parameters: {'max_depth': 6, 'subsample': 0.85, 'n_estimators': 3300, 'eta': 0.0776780794956047, 'reg_alpha': 6.86771531079228e-08, 'reg_lambda': 0.04895510804480606, 'gamma': 0.03742197356073913, 'min_child_weight': 5, 'colsample_bytree': 0.26778069932752935}. Best is trial 29 with value: 0.5530204807788763.[0m


[0]	validation_0-mlogloss:1.05623
[10]	validation_0-mlogloss:0.77284
[20]	validation_0-mlogloss:0.67199
[30]	validation_0-mlogloss:0.62698
[40]	validation_0-mlogloss:0.59809
[50]	validation_0-mlogloss:0.58480
[60]	validation_0-mlogloss:0.57603
[70]	validation_0-mlogloss:0.56810
[80]	validation_0-mlogloss:0.56269
[90]	validation_0-mlogloss:0.55925
[100]	validation_0-mlogloss:0.55956
[110]	validation_0-mlogloss:0.55573
[120]	validation_0-mlogloss:0.55307
[130]	validation_0-mlogloss:0.55641
[140]	validation_0-mlogloss:0.55703
[150]	validation_0-mlogloss:0.55823
[160]	validation_0-mlogloss:0.56082
[168]	validation_0-mlogloss:0.56269


[32m[I 2023-03-29 14:06:52,503][0m Trial 41 finished with value: 0.5522449565596865 and parameters: {'max_depth': 6, 'subsample': 0.8, 'n_estimators': 3450, 'eta': 0.09924877620060063, 'reg_alpha': 2.0812543641220952e-07, 'reg_lambda': 0.12560426992160323, 'gamma': 0.1321801023158906, 'min_child_weight': 6, 'colsample_bytree': 0.23623464764576532}. Best is trial 41 with value: 0.5522449565596865.[0m


[0]	validation_0-mlogloss:1.05853
[10]	validation_0-mlogloss:0.78368
[20]	validation_0-mlogloss:0.68214
[30]	validation_0-mlogloss:0.63348
[40]	validation_0-mlogloss:0.60518
[50]	validation_0-mlogloss:0.59097
[60]	validation_0-mlogloss:0.58151
[70]	validation_0-mlogloss:0.57378
[80]	validation_0-mlogloss:0.56712
[90]	validation_0-mlogloss:0.56217
[100]	validation_0-mlogloss:0.56251
[110]	validation_0-mlogloss:0.55971
[120]	validation_0-mlogloss:0.55721
[130]	validation_0-mlogloss:0.55893
[140]	validation_0-mlogloss:0.55972
[150]	validation_0-mlogloss:0.56065
[160]	validation_0-mlogloss:0.56415
[168]	validation_0-mlogloss:0.56547


[32m[I 2023-03-29 14:07:02,339][0m Trial 42 finished with value: 0.5571382746534193 and parameters: {'max_depth': 6, 'subsample': 0.8, 'n_estimators': 3050, 'eta': 0.09369498270787004, 'reg_alpha': 1.768953393351831e-07, 'reg_lambda': 0.1291305759575195, 'gamma': 0.160144734181664, 'min_child_weight': 6, 'colsample_bytree': 0.24515620817429343}. Best is trial 41 with value: 0.5522449565596865.[0m


[0]	validation_0-mlogloss:1.05571
[10]	validation_0-mlogloss:0.77404
[20]	validation_0-mlogloss:0.67283
[30]	validation_0-mlogloss:0.62820
[40]	validation_0-mlogloss:0.60027
[50]	validation_0-mlogloss:0.58794
[60]	validation_0-mlogloss:0.57989
[70]	validation_0-mlogloss:0.57115
[80]	validation_0-mlogloss:0.56426
[90]	validation_0-mlogloss:0.55985
[100]	validation_0-mlogloss:0.55769
[110]	validation_0-mlogloss:0.55538
[120]	validation_0-mlogloss:0.55384
[130]	validation_0-mlogloss:0.55618
[140]	validation_0-mlogloss:0.55735
[150]	validation_0-mlogloss:0.55854
[160]	validation_0-mlogloss:0.56050
[168]	validation_0-mlogloss:0.56196


[32m[I 2023-03-29 14:07:10,771][0m Trial 43 finished with value: 0.5532977303913178 and parameters: {'max_depth': 6, 'subsample': 0.8, 'n_estimators': 3600, 'eta': 0.09977331932892333, 'reg_alpha': 2.7423001031410245e-07, 'reg_lambda': 0.2590535786720027, 'gamma': 0.21293834840018372, 'min_child_weight': 5, 'colsample_bytree': 0.23159014490514565}. Best is trial 41 with value: 0.5522449565596865.[0m


[0]	validation_0-mlogloss:1.05726
[10]	validation_0-mlogloss:0.78031
[20]	validation_0-mlogloss:0.68029
[30]	validation_0-mlogloss:0.63360
[40]	validation_0-mlogloss:0.60834
[50]	validation_0-mlogloss:0.59459
[60]	validation_0-mlogloss:0.58537
[70]	validation_0-mlogloss:0.57788
[80]	validation_0-mlogloss:0.57279
[90]	validation_0-mlogloss:0.56630
[100]	validation_0-mlogloss:0.56540
[110]	validation_0-mlogloss:0.56554
[120]	validation_0-mlogloss:0.56428
[130]	validation_0-mlogloss:0.56237
[140]	validation_0-mlogloss:0.56322
[150]	validation_0-mlogloss:0.56703
[160]	validation_0-mlogloss:0.56822
[170]	validation_0-mlogloss:0.56742
[179]	validation_0-mlogloss:0.57009


[32m[I 2023-03-29 14:07:21,598][0m Trial 44 finished with value: 0.5623732044959024 and parameters: {'max_depth': 6, 'subsample': 0.7, 'n_estimators': 3650, 'eta': 0.09852104444431992, 'reg_alpha': 1.290698673885535e-07, 'reg_lambda': 0.4962961352268565, 'gamma': 0.5129703751442928, 'min_child_weight': 5, 'colsample_bytree': 0.2302779100798914}. Best is trial 41 with value: 0.5522449565596865.[0m


[0]	validation_0-mlogloss:1.06952


[32m[I 2023-03-29 14:07:22,108][0m Trial 45 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05794
[10]	validation_0-mlogloss:0.78892
[20]	validation_0-mlogloss:0.68190
[30]	validation_0-mlogloss:0.62830
[40]	validation_0-mlogloss:0.60294
[50]	validation_0-mlogloss:0.58823
[60]	validation_0-mlogloss:0.57617
[70]	validation_0-mlogloss:0.56679
[80]	validation_0-mlogloss:0.56152
[90]	validation_0-mlogloss:0.55715
[100]	validation_0-mlogloss:0.55872
[110]	validation_0-mlogloss:0.55602
[120]	validation_0-mlogloss:0.55403
[130]	validation_0-mlogloss:0.55345
[140]	validation_0-mlogloss:0.55505
[150]	validation_0-mlogloss:0.55362
[160]	validation_0-mlogloss:0.55673
[170]	validation_0-mlogloss:0.55838
[179]	validation_0-mlogloss:0.55973


[32m[I 2023-03-29 14:07:32,141][0m Trial 46 finished with value: 0.5534480294036216 and parameters: {'max_depth': 6, 'subsample': 0.85, 'n_estimators': 3950, 'eta': 0.08849162393043734, 'reg_alpha': 1.0020403115681234e-07, 'reg_lambda': 0.1395102647295898, 'gamma': 0.2546652215118435, 'min_child_weight': 5, 'colsample_bytree': 0.2767067897210072}. Best is trial 41 with value: 0.5522449565596865.[0m


[0]	validation_0-mlogloss:1.05997
[10]	validation_0-mlogloss:0.79656
[20]	validation_0-mlogloss:0.68650
[30]	validation_0-mlogloss:0.63028
[40]	validation_0-mlogloss:0.60400
[50]	validation_0-mlogloss:0.58882
[60]	validation_0-mlogloss:0.57776
[70]	validation_0-mlogloss:0.56981
[80]	validation_0-mlogloss:0.56645
[90]	validation_0-mlogloss:0.56250
[100]	validation_0-mlogloss:0.56503
[110]	validation_0-mlogloss:0.56583
[120]	validation_0-mlogloss:0.56736
[130]	validation_0-mlogloss:0.56793


[32m[I 2023-03-29 14:07:39,451][0m Trial 47 pruned. Trial was pruned at iteration 133.[0m


[0]	validation_0-mlogloss:1.06603


[32m[I 2023-03-29 14:07:39,615][0m Trial 48 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.04368
[10]	validation_0-mlogloss:0.75171
[20]	validation_0-mlogloss:0.64817
[30]	validation_0-mlogloss:0.60464
[40]	validation_0-mlogloss:0.57999
[50]	validation_0-mlogloss:0.56778
[60]	validation_0-mlogloss:0.56099
[70]	validation_0-mlogloss:0.55547
[80]	validation_0-mlogloss:0.55328
[90]	validation_0-mlogloss:0.55295
[100]	validation_0-mlogloss:0.55190
[110]	validation_0-mlogloss:0.55044
[120]	validation_0-mlogloss:0.55127
[130]	validation_0-mlogloss:0.55265
[140]	validation_0-mlogloss:0.55329
[150]	validation_0-mlogloss:0.55596
[155]	validation_0-mlogloss:0.55615


[32m[I 2023-03-29 14:07:47,321][0m Trial 49 finished with value: 0.5502489012619268 and parameters: {'max_depth': 7, 'subsample': 0.7, 'n_estimators': 3800, 'eta': 0.09967223617945833, 'reg_alpha': 4.1181276209901314e-08, 'reg_lambda': 0.040098666589886055, 'gamma': 0.924533874823556, 'min_child_weight': 5, 'colsample_bytree': 0.33274357291224116}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.04963
[10]	validation_0-mlogloss:0.77200
[20]	validation_0-mlogloss:0.66303
[30]	validation_0-mlogloss:0.61456
[40]	validation_0-mlogloss:0.58935
[50]	validation_0-mlogloss:0.57650
[60]	validation_0-mlogloss:0.56786
[70]	validation_0-mlogloss:0.56207
[80]	validation_0-mlogloss:0.55907
[90]	validation_0-mlogloss:0.55498
[100]	validation_0-mlogloss:0.55439
[110]	validation_0-mlogloss:0.55423
[120]	validation_0-mlogloss:0.55516
[130]	validation_0-mlogloss:0.55660
[140]	validation_0-mlogloss:0.55629
[150]	validation_0-mlogloss:0.55778
[153]	validation_0-mlogloss:0.55854


[32m[I 2023-03-29 14:07:55,241][0m Trial 50 finished with value: 0.5529238920399417 and parameters: {'max_depth': 7, 'subsample': 0.7, 'n_estimators': 3750, 'eta': 0.0884220766119296, 'reg_alpha': 4.8580379309643674e-08, 'reg_lambda': 0.037937614193874626, 'gamma': 0.9466905650616205, 'min_child_weight': 5, 'colsample_bytree': 0.34101145640553926}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.04978
[10]	validation_0-mlogloss:0.77277
[20]	validation_0-mlogloss:0.66470
[30]	validation_0-mlogloss:0.61499
[40]	validation_0-mlogloss:0.58931
[50]	validation_0-mlogloss:0.57579
[60]	validation_0-mlogloss:0.56845
[70]	validation_0-mlogloss:0.56211
[80]	validation_0-mlogloss:0.55934
[90]	validation_0-mlogloss:0.55655
[100]	validation_0-mlogloss:0.55553
[110]	validation_0-mlogloss:0.55674
[120]	validation_0-mlogloss:0.55612
[130]	validation_0-mlogloss:0.55847
[140]	validation_0-mlogloss:0.55908
[150]	validation_0-mlogloss:0.56151
[160]	validation_0-mlogloss:0.56253
[165]	validation_0-mlogloss:0.56227


[32m[I 2023-03-29 14:08:04,031][0m Trial 51 finished with value: 0.5550357365000517 and parameters: {'max_depth': 7, 'subsample': 0.7, 'n_estimators': 3750, 'eta': 0.08814433153987479, 'reg_alpha': 3.057262759716854e-08, 'reg_lambda': 0.044372690397025495, 'gamma': 0.9887097080866559, 'min_child_weight': 5, 'colsample_bytree': 0.34235222043102076}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.04492
[10]	validation_0-mlogloss:0.75151
[20]	validation_0-mlogloss:0.64017
[30]	validation_0-mlogloss:0.59949
[40]	validation_0-mlogloss:0.58336
[50]	validation_0-mlogloss:0.57338
[60]	validation_0-mlogloss:0.56660
[70]	validation_0-mlogloss:0.56592
[80]	validation_0-mlogloss:0.56678
[90]	validation_0-mlogloss:0.56448
[100]	validation_0-mlogloss:0.56382
[110]	validation_0-mlogloss:0.56601


[32m[I 2023-03-29 14:08:09,954][0m Trial 52 pruned. Trial was pruned at iteration 113.[0m


[0]	validation_0-mlogloss:1.05169
[10]	validation_0-mlogloss:0.78298
[20]	validation_0-mlogloss:0.67194
[30]	validation_0-mlogloss:0.61954
[40]	validation_0-mlogloss:0.59511
[50]	validation_0-mlogloss:0.58324
[60]	validation_0-mlogloss:0.57368
[70]	validation_0-mlogloss:0.56508
[80]	validation_0-mlogloss:0.56218
[90]	validation_0-mlogloss:0.55972
[100]	validation_0-mlogloss:0.55781
[110]	validation_0-mlogloss:0.55976
[120]	validation_0-mlogloss:0.55746
[130]	validation_0-mlogloss:0.56085
[140]	validation_0-mlogloss:0.56138
[150]	validation_0-mlogloss:0.56490
[160]	validation_0-mlogloss:0.56784
[166]	validation_0-mlogloss:0.56755


[32m[I 2023-03-29 14:08:19,090][0m Trial 53 finished with value: 0.5566382544916804 and parameters: {'max_depth': 7, 'subsample': 0.7, 'n_estimators': 3600, 'eta': 0.0867527403905134, 'reg_alpha': 5.578771129615542e-08, 'reg_lambda': 0.2411600938522127, 'gamma': 0.5203605197954911, 'min_child_weight': 5, 'colsample_bytree': 0.30926717170154844}. Best is trial 49 with value: 0.5502489012619268.[0m
[32m[I 2023-03-29 14:08:19,406][0m Trial 54 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05023
[10]	validation_0-mlogloss:0.77750
[20]	validation_0-mlogloss:0.66924
[30]	validation_0-mlogloss:0.61756
[40]	validation_0-mlogloss:0.59170
[50]	validation_0-mlogloss:0.58169
[60]	validation_0-mlogloss:0.57537
[70]	validation_0-mlogloss:0.56830
[80]	validation_0-mlogloss:0.56630
[90]	validation_0-mlogloss:0.56207
[100]	validation_0-mlogloss:0.56057
[110]	validation_0-mlogloss:0.56228
[120]	validation_0-mlogloss:0.56120
[130]	validation_0-mlogloss:0.56382
[140]	validation_0-mlogloss:0.56377
[150]	validation_0-mlogloss:0.56612
[152]	validation_0-mlogloss:0.56663


[32m[I 2023-03-29 14:08:28,475][0m Trial 55 finished with value: 0.5599432037005703 and parameters: {'max_depth': 7, 'subsample': 0.8, 'n_estimators': 3800, 'eta': 0.08726535831530405, 'reg_alpha': 3.670072381361713e-08, 'reg_lambda': 0.040765814453741694, 'gamma': 0.778035289161364, 'min_child_weight': 5, 'colsample_bytree': 0.29362831848487964}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.04489
[10]	validation_0-mlogloss:0.75196
[20]	validation_0-mlogloss:0.64286
[30]	validation_0-mlogloss:0.60020
[40]	validation_0-mlogloss:0.58418
[50]	validation_0-mlogloss:0.57414
[60]	validation_0-mlogloss:0.56716
[70]	validation_0-mlogloss:0.56445
[80]	validation_0-mlogloss:0.56514
[90]	validation_0-mlogloss:0.56127
[100]	validation_0-mlogloss:0.56297
[110]	validation_0-mlogloss:0.56367
[120]	validation_0-mlogloss:0.56478
[130]	validation_0-mlogloss:0.56581
[140]	validation_0-mlogloss:0.56697
[143]	validation_0-mlogloss:0.56800


[32m[I 2023-03-29 14:08:37,099][0m Trial 56 finished with value: 0.5602666659835625 and parameters: {'max_depth': 6, 'subsample': 0.85, 'n_estimators': 3900, 'eta': 0.0991900174757277, 'reg_alpha': 2.341018273569626e-08, 'reg_lambda': 0.07395126488243102, 'gamma': 0.2226394497836731, 'min_child_weight': 6, 'colsample_bytree': 0.37045025111270374}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.05868


[32m[I 2023-03-29 14:08:37,390][0m Trial 57 pruned. Trial was pruned at iteration 2.[0m


[0]	validation_0-mlogloss:1.05892
[10]	validation_0-mlogloss:0.78779
[20]	validation_0-mlogloss:0.67969
[30]	validation_0-mlogloss:0.62415
[40]	validation_0-mlogloss:0.59982
[50]	validation_0-mlogloss:0.58545
[60]	validation_0-mlogloss:0.57632
[70]	validation_0-mlogloss:0.56924
[80]	validation_0-mlogloss:0.56144
[90]	validation_0-mlogloss:0.55565
[100]	validation_0-mlogloss:0.55543
[110]	validation_0-mlogloss:0.55709
[120]	validation_0-mlogloss:0.55665
[130]	validation_0-mlogloss:0.55505
[140]	validation_0-mlogloss:0.55498
[150]	validation_0-mlogloss:0.55858
[160]	validation_0-mlogloss:0.56032
[170]	validation_0-mlogloss:0.56242
[177]	validation_0-mlogloss:0.56393


[32m[I 2023-03-29 14:08:46,692][0m Trial 58 finished with value: 0.5541785312569317 and parameters: {'max_depth': 7, 'subsample': 0.75, 'n_estimators': 4000, 'eta': 0.08801436926139794, 'reg_alpha': 4.682038444288569e-08, 'reg_lambda': 0.02912437633677731, 'gamma': 0.10655681765024187, 'min_child_weight': 5, 'colsample_bytree': 0.2577132737762833}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.06891


[32m[I 2023-03-29 14:08:46,851][0m Trial 59 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05616


[32m[I 2023-03-29 14:08:47,065][0m Trial 60 pruned. Trial was pruned at iteration 1.[0m


[0]	validation_0-mlogloss:1.06024
[10]	validation_0-mlogloss:0.78905
[20]	validation_0-mlogloss:0.68670
[30]	validation_0-mlogloss:0.63746
[40]	validation_0-mlogloss:0.60820
[50]	validation_0-mlogloss:0.59268
[60]	validation_0-mlogloss:0.58512


[32m[I 2023-03-29 14:08:51,743][0m Trial 61 pruned. Trial was pruned at iteration 62.[0m


[0]	validation_0-mlogloss:1.05982
[10]	validation_0-mlogloss:0.78828
[20]	validation_0-mlogloss:0.68448
[30]	validation_0-mlogloss:0.63508
[40]	validation_0-mlogloss:0.60690
[50]	validation_0-mlogloss:0.59139
[60]	validation_0-mlogloss:0.58398
[70]	validation_0-mlogloss:0.57585
[80]	validation_0-mlogloss:0.56784
[90]	validation_0-mlogloss:0.56178
[100]	validation_0-mlogloss:0.56030
[110]	validation_0-mlogloss:0.55710
[120]	validation_0-mlogloss:0.55657
[130]	validation_0-mlogloss:0.55692
[140]	validation_0-mlogloss:0.55843
[150]	validation_0-mlogloss:0.55983
[160]	validation_0-mlogloss:0.55924
[168]	validation_0-mlogloss:0.56030


[32m[I 2023-03-29 14:09:00,090][0m Trial 62 finished with value: 0.5549551805217653 and parameters: {'max_depth': 7, 'subsample': 0.75, 'n_estimators': 3750, 'eta': 0.0903003687457943, 'reg_alpha': 5.3963608586258535e-08, 'reg_lambda': 0.1661966853358494, 'gamma': 0.3828932916826813, 'min_child_weight': 5, 'colsample_bytree': 0.2543837396946422}. Best is trial 49 with value: 0.5502489012619268.[0m
[32m[I 2023-03-29 14:09:00,266][0m Trial 63 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05310
[10]	validation_0-mlogloss:0.78475
[20]	validation_0-mlogloss:0.67274
[30]	validation_0-mlogloss:0.62196
[40]	validation_0-mlogloss:0.59768
[50]	validation_0-mlogloss:0.58284
[60]	validation_0-mlogloss:0.57501
[70]	validation_0-mlogloss:0.57061
[80]	validation_0-mlogloss:0.56770


[32m[I 2023-03-29 14:09:05,769][0m Trial 64 pruned. Trial was pruned at iteration 82.[0m


[0]	validation_0-mlogloss:1.05959


[32m[I 2023-03-29 14:09:05,991][0m Trial 65 pruned. Trial was pruned at iteration 1.[0m


[0]	validation_0-mlogloss:1.06467


[32m[I 2023-03-29 14:09:06,174][0m Trial 66 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05645
[10]	validation_0-mlogloss:0.77702
[20]	validation_0-mlogloss:0.67718
[30]	validation_0-mlogloss:0.63128
[40]	validation_0-mlogloss:0.60747
[50]	validation_0-mlogloss:0.59241
[60]	validation_0-mlogloss:0.58464


[32m[I 2023-03-29 14:09:08,488][0m Trial 67 pruned. Trial was pruned at iteration 62.[0m
[32m[I 2023-03-29 14:09:08,669][0m Trial 68 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05596


[32m[I 2023-03-29 14:09:08,877][0m Trial 69 pruned. Trial was pruned at iteration 1.[0m


[0]	validation_0-mlogloss:1.06874


[32m[I 2023-03-29 14:09:09,060][0m Trial 70 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05655
[10]	validation_0-mlogloss:0.78163
[20]	validation_0-mlogloss:0.67182
[30]	validation_0-mlogloss:0.61774
[40]	validation_0-mlogloss:0.59510
[50]	validation_0-mlogloss:0.58186
[60]	validation_0-mlogloss:0.57241
[70]	validation_0-mlogloss:0.56850
[80]	validation_0-mlogloss:0.56376
[90]	validation_0-mlogloss:0.55691
[100]	validation_0-mlogloss:0.55820
[110]	validation_0-mlogloss:0.55969
[120]	validation_0-mlogloss:0.55777
[130]	validation_0-mlogloss:0.55855
[140]	validation_0-mlogloss:0.56062
[150]	validation_0-mlogloss:0.56301
[160]	validation_0-mlogloss:0.56864
[165]	validation_0-mlogloss:0.56816


[32m[I 2023-03-29 14:09:19,907][0m Trial 71 finished with value: 0.555150951476253 and parameters: {'max_depth': 7, 'subsample': 0.6, 'n_estimators': 3950, 'eta': 0.09057166914764751, 'reg_alpha': 1.0584796416719197e-07, 'reg_lambda': 0.0675913814055837, 'gamma': 0.22886093448542502, 'min_child_weight': 5, 'colsample_bytree': 0.2657051712500884}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.06456


[32m[I 2023-03-29 14:09:20,165][0m Trial 72 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05573
[10]	validation_0-mlogloss:0.77361
[20]	validation_0-mlogloss:0.67421
[30]	validation_0-mlogloss:0.62565
[40]	validation_0-mlogloss:0.60239
[50]	validation_0-mlogloss:0.58795
[60]	validation_0-mlogloss:0.58160
[70]	validation_0-mlogloss:0.57479


[32m[I 2023-03-29 14:09:23,182][0m Trial 73 pruned. Trial was pruned at iteration 72.[0m


[0]	validation_0-mlogloss:1.06977


[32m[I 2023-03-29 14:09:23,398][0m Trial 74 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05327


[32m[I 2023-03-29 14:09:24,241][0m Trial 75 pruned. Trial was pruned at iteration 3.[0m
[32m[I 2023-03-29 14:09:24,691][0m Trial 76 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.06705


[32m[I 2023-03-29 14:09:25,172][0m Trial 77 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05863
[10]	validation_0-mlogloss:0.78476
[20]	validation_0-mlogloss:0.68131
[30]	validation_0-mlogloss:0.63365
[40]	validation_0-mlogloss:0.60410
[50]	validation_0-mlogloss:0.59093
[60]	validation_0-mlogloss:0.58100
[70]	validation_0-mlogloss:0.57208
[80]	validation_0-mlogloss:0.56552
[90]	validation_0-mlogloss:0.56054
[100]	validation_0-mlogloss:0.56195
[110]	validation_0-mlogloss:0.55946
[120]	validation_0-mlogloss:0.55775
[130]	validation_0-mlogloss:0.55920
[140]	validation_0-mlogloss:0.55957
[150]	validation_0-mlogloss:0.55867
[160]	validation_0-mlogloss:0.56159
[169]	validation_0-mlogloss:0.56357


[32m[I 2023-03-29 14:09:34,966][0m Trial 78 finished with value: 0.5573158231537287 and parameters: {'max_depth': 7, 'subsample': 0.85, 'n_estimators': 2800, 'eta': 0.09231573146467008, 'reg_alpha': 1.2035670337792175e-07, 'reg_lambda': 0.005630317094926402, 'gamma': 0.2670417659833924, 'min_child_weight': 6, 'colsample_bytree': 0.25427676605435245}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.07155


[32m[I 2023-03-29 14:09:35,146][0m Trial 79 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:09:35,354][0m Trial 80 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.06617


[32m[I 2023-03-29 14:09:35,535][0m Trial 81 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:09:35,724][0m Trial 82 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:09:35,933][0m Trial 83 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05700
[10]	validation_0-mlogloss:0.78109
[20]	validation_0-mlogloss:0.67394
[30]	validation_0-mlogloss:0.62092
[40]	validation_0-mlogloss:0.59856
[50]	validation_0-mlogloss:0.58350
[60]	validation_0-mlogloss:0.57255
[70]	validation_0-mlogloss:0.56477
[80]	validation_0-mlogloss:0.56018
[90]	validation_0-mlogloss:0.55624
[100]	validation_0-mlogloss:0.55363
[110]	validation_0-mlogloss:0.55363
[120]	validation_0-mlogloss:0.55207
[130]	validation_0-mlogloss:0.55360
[140]	validation_0-mlogloss:0.55279
[150]	validation_0-mlogloss:0.55310
[160]	validation_0-mlogloss:0.55438
[168]	validation_0-mlogloss:0.55466


[32m[I 2023-03-29 14:09:44,051][0m Trial 84 finished with value: 0.5515019200055448 and parameters: {'max_depth': 6, 'subsample': 0.8, 'n_estimators': 2950, 'eta': 0.09298310479679191, 'reg_alpha': 1.5582959601439845e-07, 'reg_lambda': 0.4957760791951248, 'gamma': 0.2830237961398822, 'min_child_weight': 9, 'colsample_bytree': 0.25957692835515744}. Best is trial 49 with value: 0.5502489012619268.[0m
[32m[I 2023-03-29 14:09:44,229][0m Trial 85 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:09:44,428][0m Trial 86 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.04804
[10]	validation_0-mlogloss:0.76556
[20]	validation_0-mlogloss:0.65730
[30]	validation_0-mlogloss:0.61014
[40]	validation_0-mlogloss:0.58802
[50]	validation_0-mlogloss:0.57808
[60]	validation_0-mlogloss:0.56942
[70]	validation_0-mlogloss:0.56544
[80]	validation_0-mlogloss:0.56528
[90]	validation_0-mlogloss:0.56210
[100]	validation_0-mlogloss:0.56107
[110]	validation_0-mlogloss:0.56321


[32m[I 2023-03-29 14:09:50,931][0m Trial 87 pruned. Trial was pruned at iteration 117.[0m
[32m[I 2023-03-29 14:09:51,115][0m Trial 88 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.06132


[32m[I 2023-03-29 14:09:51,295][0m Trial 89 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:09:51,485][0m Trial 90 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.06554


[32m[I 2023-03-29 14:09:51,678][0m Trial 91 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:09:51,858][0m Trial 92 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:09:52,230][0m Trial 93 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05551
[10]	validation_0-mlogloss:0.77188
[20]	validation_0-mlogloss:0.66587
[30]	validation_0-mlogloss:0.61701
[40]	validation_0-mlogloss:0.59456
[50]	validation_0-mlogloss:0.58143
[60]	validation_0-mlogloss:0.57158
[70]	validation_0-mlogloss:0.56723
[80]	validation_0-mlogloss:0.56326
[90]	validation_0-mlogloss:0.55837
[100]	validation_0-mlogloss:0.55833
[110]	validation_0-mlogloss:0.55824
[120]	validation_0-mlogloss:0.55741
[130]	validation_0-mlogloss:0.55699
[140]	validation_0-mlogloss:0.55720
[150]	validation_0-mlogloss:0.55966
[160]	validation_0-mlogloss:0.56111
[170]	validation_0-mlogloss:0.56144
[176]	validation_0-mlogloss:0.56221


[32m[I 2023-03-29 14:10:02,543][0m Trial 94 finished with value: 0.5564786816695195 and parameters: {'max_depth': 6, 'subsample': 0.8, 'n_estimators': 2650, 'eta': 0.0984413406182523, 'reg_alpha': 2.160155211063961e-07, 'reg_lambda': 0.47871426320147453, 'gamma': 0.527388355207618, 'min_child_weight': 11, 'colsample_bytree': 0.2625309037363252}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.06598


[32m[I 2023-03-29 14:10:02,766][0m Trial 95 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05891


[32m[I 2023-03-29 14:10:02,945][0m Trial 96 pruned. Trial was pruned at iteration 0.[0m
[32m[I 2023-03-29 14:10:03,127][0m Trial 97 pruned. Trial was pruned at iteration 0.[0m


[0]	validation_0-mlogloss:1.05312
[10]	validation_0-mlogloss:0.76655
[20]	validation_0-mlogloss:0.66513
[30]	validation_0-mlogloss:0.61568
[40]	validation_0-mlogloss:0.59364
[50]	validation_0-mlogloss:0.58208
[60]	validation_0-mlogloss:0.57316
[70]	validation_0-mlogloss:0.56817
[80]	validation_0-mlogloss:0.56239
[90]	validation_0-mlogloss:0.55830
[100]	validation_0-mlogloss:0.55947
[110]	validation_0-mlogloss:0.56007
[120]	validation_0-mlogloss:0.55668
[130]	validation_0-mlogloss:0.55825
[140]	validation_0-mlogloss:0.55985
[150]	validation_0-mlogloss:0.56220
[160]	validation_0-mlogloss:0.56384
[170]	validation_0-mlogloss:0.56613
[171]	validation_0-mlogloss:0.56598


[32m[I 2023-03-29 14:10:11,934][0m Trial 98 finished with value: 0.556330393610435 and parameters: {'max_depth': 7, 'subsample': 0.85, 'n_estimators': 3750, 'eta': 0.09994074015308665, 'reg_alpha': 1.481474233924668e-07, 'reg_lambda': 0.3523683310609304, 'gamma': 0.1285048751486935, 'min_child_weight': 6, 'colsample_bytree': 0.2801881217273825}. Best is trial 49 with value: 0.5502489012619268.[0m


[0]	validation_0-mlogloss:1.06711


[32m[I 2023-03-29 14:10:12,128][0m Trial 99 pruned. Trial was pruned at iteration 0.[0m


In [54]:
print('Number of finished trials: {}'.format(len(study.trials)))
print('Best trial:')
trial = study.best_trial

print('  Value: {}'.format(trial.value))
print('  Params: ')

for key, value in trial.params.items():
    print('    {}: {}'.format(key, value))

Number of finished trials: 100
Best trial:
  Value: 0.5502489012619268
  Params: 
    max_depth: 7
    subsample: 0.7
    n_estimators: 3800
    eta: 0.09967223617945833
    reg_alpha: 4.1181276209901314e-08
    reg_lambda: 0.040098666589886055
    gamma: 0.924533874823556
    min_child_weight: 5
    colsample_bytree: 0.33274357291224116


In [55]:
# Train new model on best params from optuna and evaluate performance
best_params = study.best_params
model = xgb.XGBClassifier(early_stopping_rounds = 50, booster = 'gbtree',tree_method= 'approx',enable_categorical=True)
model.set_params(**best_params)
model.fit(X_train,y_train, eval_set = [(X_val,y_val)], verbose = 10) 
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

[0]	validation_0-mlogloss:1.04368
[10]	validation_0-mlogloss:0.75171
[20]	validation_0-mlogloss:0.64817
[30]	validation_0-mlogloss:0.60464
[40]	validation_0-mlogloss:0.57999
[50]	validation_0-mlogloss:0.56778
[60]	validation_0-mlogloss:0.56099
[70]	validation_0-mlogloss:0.55547
[80]	validation_0-mlogloss:0.55328
[90]	validation_0-mlogloss:0.55295
[100]	validation_0-mlogloss:0.55190
[110]	validation_0-mlogloss:0.55044
[120]	validation_0-mlogloss:0.55127
[130]	validation_0-mlogloss:0.55265
[140]	validation_0-mlogloss:0.55329
[150]	validation_0-mlogloss:0.55596
[154]	validation_0-mlogloss:0.55651
              precision    recall  f1-score   support

           0       0.80      0.75      0.77       138
           1       0.59      0.38      0.46        80
           2       0.81      0.95      0.87       225

    accuracy                           0.78       443
   macro avg       0.73      0.69      0.70       443
weighted avg       0.77      0.78      0.77       443



In [56]:
optuna.visualization.plot_param_importances(study)

In [57]:
optuna.visualization.plot_optimization_history(study)

# **Multi-Class Classification with XGBoost**

# **Create Model**

In [None]:
"""
Experiments:
#Zon's dataset (no ordinal features)
1. n_estimators = 5000 (overfit beyond 3900), early_stopping_rounds = 50, learning_rate = 0.001, max_depth = 12, tree_method = 'approx',booster = 'gbtree',enable_categorical=True
logloss:0.55598, Accuracy:0.7742663656884876, F1 Score:[0.76760563 0.4822695  0.86767896]
2. Training Time of Several Hours: n_estimators = 3900 (overfit beyond 2990), early_stopping_rounds = 50, learning_rate = 0.001, max_depth = 12, tree_method = 'approx',booster = 'dart',enable_categorical=True
logloss:0.59134, Accuracy:0.7652370203160271, F1 Score:[0.79856115 0.49350649 0.83700441]
#Wynette's dataset (ordinal features)
3. n_estimators = 3900 (overfit beyond 3400), early_stopping_rounds = 50, learning_rate = 0.001, max_depth = 12, tree_method = 'approx',booster = 'gbtree',enable_categorical=True
logloss:0.55960, Accuracy:0.77, F1 Score:[0.77 0.45  0.85]
4. # drop features with lowest feature_importance score: Previous qualification, Daytime/evening attendance, International
n_estimators = 3900 (overfit beyond 2920), early_stopping_rounds = 50, learning_rate = 0.001, max_depth = 12, tree_method = 'approx',booster = 'gbtree',enable_categorical=True
logloss:0.61181, Accuracy:0.78, F1 Score:[0.76 0.48  0.87]

5. # drop features + scaling of numerical features
n_estimators = 5000 (overfit beyond 3980), early_stopping_rounds = 50, learning_rate = 0.001, max_depth = 12, tree_method = 'approx',booster = 'gbtree',enable_categorical=True
logloss:0.53519, Accuracy:0.77, F1 Score:[0.78 0.43  0.86]

6. # scaling of numerical features
n_estimators = 5000 (overfit beyond 2700), early_stopping_rounds = 50, learning_rate = 0.001, max_depth = 12, tree_method = 'approx',booster = 'gbtree',enable_categorical=True
logloss:0.61050, Accuracy:0.71, F1 Score:[0.71 0.41  0.83]

7. # feature engineering w FeatureTools
n_estimators = 5000 (overfit beyond 3860), early_stopping_rounds = 50, learning_rate = 0.001, max_depth = 12, tree_method = 'approx',booster = 'gbtree',enable_categorical=True
logloss:0.52097, Accuracy:0.75, F1 Score:[0.77 0.39  0.83]
"""
# Logloss better metric for training than accuracy/F1 score. 
# Logloss measures model skill, accuracy/F1 score measures performance on test set, which varies.
model = xgb.XGBClassifier(n_estimators = 5000, early_stopping_rounds = 50, learning_rate = 0.001, max_depth = 12, tree_method = 'approx',booster = 'gbtree',enable_categorical=True)
#model.load_model('model_1.json')
model.fit(X_train, y_train, eval_set = [(X_val,y_val)], verbose = 10)

In [59]:
model.save_model('model_3.json')

In [58]:
# Rank features by importance
df_feature_importance = pd.DataFrame(index = model.get_booster().feature_names, data = model.feature_importances_, columns = ['Importance'])
df_feature_importance.sort_values('Importance',ascending=False)

Unnamed: 0,Importance
Tuition fees up to date,0.106407
Curricular units 2nd sem (approved),0.086541
Curricular units 1st sem (approved),0.078598
Curricular units 2nd sem (grade),0.052596
Scholarship holder,0.036215
Curricular units 1st sem (evaluations),0.035013
Curricular units 2nd sem (enrolled),0.034551
Course,0.034456
Debtor,0.034421
Curricular units 1st sem (enrolled),0.033152


If a feature has a low feature_importance, it doesn’t mean that this feature is uninformative. It only means that the feature was not picked by the tree, likely because another feature encodes the same information.

Drop features with no importance
Train model on new dataset and evaluate

# **Predict and Evaluate Performance**

In [None]:
# Use model to predict on test data
y_test_pred = model.predict(X_test)
df.head()

# Print the Classification Accuracy
print("Test Data")

#print("Accuracy  :\t", model.score(X_test, y_test))
# Dropout, Enrolled, Graduate
#print("F1 Score  :\t",f1_score(y_test, y_test_pred, average=None))

print(classification_report(y_test, y_test_pred))

# Plot the two-way Confusion Matrix
plt.ylabel('Actual')
plt.xlabel('Predicted')
sb.heatmap(confusion_matrix(y_test, y_test_pred), 
           annot = True, fmt=".0f", annot_kws={"size": 18})
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

# **Visualise Tree**

In [None]:
# Visualise first tree of model
fig = xgb.plot_tree(model)
plt.savefig('XGB_tree_3.png', dpi = 4800)

# **Anomaly Detection with Isolation Forest**

In [None]:
from sklearn.ensemble import IsolationForest

In [None]:
df = pd.read_csv('dataset.csv')

# **Create Model**

In [None]:
model_IF = IsolationForest(n_estimators=5000, contamination = 0.32, max_features= 34, verbose = 10, bootstrap=True)
model_IF.fit(X.values)

# **Predict and Evaluate Performance**

In [None]:
# model.load_model('model_IF_1.json')
# predict on testset
y_test_pred = model_IF.predict(X_test.values)
y_test_pred = pd.Series(y_test_pred)

# merge 'Graduate' and 'Enrolled' classes in y_test
y_test = pd.Series(y_test).replace(to_replace=2, value=1)
# 0:outlier (Dropout), 1: inlier
y_test_pred = y_test_pred.map({1: 1, -1: 0})

print(classification_report(y_test, y_test_pred))
# Plot the two-way Confusion Matrix
sb.heatmap(confusion_matrix(y_test, y_test_pred),annot = True, fmt=".0f", annot_kws={"size": 18})
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()    

In [None]:
model.save_model('model_IF_1.json')