## Formalne metode u softverskom inzenjerstvu- Anja Ljiljak1124/22

### Biblioteke:

Za rad sa neuronskim mrežama u ovom projektu koristim TensorFlow biblioteku.

Ovu biblioteku nismo koristili na času, pa stavljam instrukcije za instalaciju:

#### Instalacija TensorFlow biblioteke

Pokrenite sledeću komandu u vašem terminalu:

```bash
pip install tensorflow 

#### Učitavanje biblioteka 

In [12]:
import numpy as np        

# Biblioteka za rad sa neuronskim mrezama                               
import tensorflow as tf                                 
from sklearn.datasets import fetch_rcv1         
from sklearn.model_selection import train_test_split 

#### Učitavanje i priprema podataka

In [13]:
# Ucitavanje RCV1 skupa podataka
rcv1 = fetch_rcv1()   
X = rcv1.data
y = rcv1.target


# Podjela podataka na trening i test (50% i za trening i za test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

# Konvertuje sparse matrice u SparseTensor objekte potrebne za TensorFlow
X_train_sparse = tf.sparse.SparseTensor(indices=np.column_stack(X_train.nonzero()), values=X_train.data, dense_shape=X_train.shape)
X_test_sparse = tf.sparse.SparseTensor(indices=np.column_stack(X_test.nonzero()), values=X_test.data, dense_shape=X_test.shape)

# Konverzija labela(izlaz) u tenzor
y_train_tensor = tf.convert_to_tensor(y_train.toarray(), dtype=tf.float32)       
y_test_tensor = tf.convert_to_tensor(y_test.toarray(), dtype=tf.float32)

#### Definisanje arhitekture Neuronske mreže

In [14]:
#Dobijam upozorenja iako sam instalirala biblioteku, ali radi normalno
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, Callback
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Input

model = Sequential([
    Input(shape=(X_train_sparse.shape[1],)),
    #dodaje gusti sloj sa 512 neurona
    Dense(512, activation='relu'),
    Dropout(0.5),
    #zadnji gusti sloj(103 izlaznih  neurona)
    Dense(y_train_tensor.shape[1], activation='sigmoid')

])

#sprema se za testiranje i treniranje
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['precision', 'recall'])

#### Testiranje i treniranje Neuronske mreže

In [15]:
class MetricsCallback(Callback):
    def on_epoch_end(self, epoch, logs=None):
        #evaluacija modela na testnim podacima na kraju svake epohe
        test_metrics = model.evaluate(X_test_sparse, y_test_tensor, verbose=0)
        print(f"Epoha {epoch+1}/{self.params['epochs']}, trening_preciznost: {logs['precision']:.4f}, trening_odziv: {logs['recall']:.4f}, test_preciznost: {logs['val_precision']:.4f}, test_odziv: {logs['val_recall']:.4f}")

checkpoint = ModelCheckpoint('best_model.keras', monitor='val_precision', save_best_only=True, mode='max')
early_stopping = EarlyStopping(monitor='val_precision', patience=5, mode='max')

# Treniranje i testiranje modela
history = model.fit(X_train_sparse, y_train_tensor, epochs=10, batch_size=128, validation_data=(X_test_sparse, y_test_tensor), callbacks=[MetricsCallback(), checkpoint, early_stopping])

Epoch 1/10
[1m3143/3143[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 173ms/step - loss: 0.0826 - precision: 0.8037 - recall: 0.5009Epoha 1/10, trening_preciznost: 0.8917, trening_odziv: 0.6698, test_preciznost: 0.9182, test_odziv: 0.7943
[1m3143/3143[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m668s[0m 212ms/step - loss: 0.0826 - precision: 0.8037 - recall: 0.5010 - val_loss: 0.0239 - val_precision: 0.9182 - val_recall: 0.7943
Epoch 2/10
[1m3143/3143[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 189ms/step - loss: 0.0231 - precision: 0.9104 - recall: 0.8157Epoha 2/10, trening_preciznost: 0.9089, trening_odziv: 0.8215, test_preciznost: 0.9117, test_odziv: 0.8220
[1m3143/3143[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m714s[0m 227ms/step - loss: 0.0231 - precision: 0.9104 - recall: 0.8157 - val_loss: 0.0221 - val_precision: 0.9117 - val_recall: 0.8220
Epoch 3/10
[1m3143/3143[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 189ms/step - loss: 0.0194 - pre

#### Importovanje potrebnih biblioteka

In [1]:
import lightgbm as lgb
import optuna
import numpy as np
from sklearn.datasets import fetch_rcv1
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import precision_score
from scipy.sparse import csr_matrix


#### Ucitavanje i priprema RCV1 dataset-a

In [2]:
# Ucitavanje RCV1 dataset-a
rcv1 = fetch_rcv1(subset='train')
X = rcv1.data
y = rcv1.target

# Pretvaranje sparse matrice u gustu
y = y.toarray()

# Koristim 50% uzoraka
X, _, y, _ = train_test_split(X, y, train_size=0.5, stratify=np.argmax(y, axis=1))


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=np.argmax(y, axis=1))


#### Standardizacija podataka kako bi model mogao efikasnije da trenira

In [3]:
#StandardScaler bez centriranja zbog sparse matrice
scaler = StandardScaler(with_mean=False)
#Prilagodjenje scalera na trening i testne podatke i primjena standardizacije
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#### Kreiranje LightGBM dataset-a za trening i validaciju

In [4]:
train_data = lgb.Dataset(X_train, label=np.argmax(y_train, axis=1))
valid_data = lgb.Dataset(X_test, label=np.argmax(y_test, axis=1), reference=train_data)

#### Definisanje objective  funkcije koja ce se koristiti za optimizaciju hiperparametara

In [5]:
def objective(trial):
    param = {
        'objective': 'multiclass',
        'num_class': y.shape[1],  # RCV1 ima 103 klase
        'metric': 'multi_logloss',
        #iskljucivanje logovanja, minimizuje se kolicina prikazanih informacija tokom treninga
        'verbosity': -1,
        'boosting_type': 'gbdt', #Gradient Boosting Decision Trees
        #Regularizacija L1 i L2 smanjuju kompleksnost modela da bi se izbjegao overfitting
        'lambda_l1': trial.suggest_float('lambda_l1', 1e-8, 10.0, log=True),
        'lambda_l2': trial.suggest_float('lambda_l2', 1e-8, 10.0, log=True),
        #broj listova 
        'num_leaves': trial.suggest_int('num_leaves', 2, 256),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.4, 1.0),
        'bagging_fraction': trial.suggest_float('bagging_fraction', 0.4, 1.0),
        'bagging_freq': trial.suggest_int('bagging_freq', 1, 10),
        'min_child_samples': trial.suggest_int('min_child_samples', 5, 100),
        'learning_rate': trial.suggest_float('learning_rate', 0.001, 0.1),
        'feature_pre_filter': False  # Iskljucujemo pre-filtering
    }
    #Treniranje modela sa trenutnim parametrima i pracenje validacionih rezultata 
    model = lgb.train(param, train_data, valid_sets=[valid_data],
                      callbacks=[lgb.early_stopping(stopping_rounds=10), lgb.log_evaluation(0)])
    #Pravljenje predikcija na testnom skupu koristeci najbolje iteracije modela, vraca vjerovatnoce za svaku klasu za svaki uzorak
    y_pred = model.predict(X_test, num_iteration=model.best_iteration)
    if y_pred.ndim > 1:
        #Vraca indeks klase sa najvecom vjerovatnocom za svaki uzorak
        y_pred = np.argmax(y_pred, axis=1)
        #Racunanje preciznosti modela koristeci pravu klasu i predikcije
    precision = precision_score(np.argmax(y_test, axis=1), y_pred, average='weighted')
    return precision


#### Optimizacija hiperparametara koriscenjem Optuna biblioteke

In [6]:
study = optuna.create_study(direction='maximize')  # Maksimizacija preciznosti
study.optimize(objective, n_trials=50)

print("Number of finished trials: {}".format(len(study.trials)))

print("Best trial:")
trial = study.best_trial

print("  Value: {}".format(trial.value))

print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))


[I 2024-06-30 12:01:07,460] A new study created in memory with name: no-name-e7789d00-8e32-4a8c-9ec2-559aba6737ac


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[74]	valid_0's multi_logloss: 0.757088


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:02:17,149] Trial 0 finished with value: 0.7806287675090874 and parameters: {'lambda_l1': 1.8058825326333135e-08, 'lambda_l2': 0.011964338539988762, 'num_leaves': 220, 'feature_fraction': 0.9756338269197193, 'bagging_fraction': 0.4384668251439875, 'bagging_freq': 4, 'min_child_samples': 55, 'learning_rate': 0.0675543190548982}. Best is trial 0 with value: 0.7806287675090874.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[69]	valid_0's multi_logloss: 0.761773


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:04:16,745] Trial 1 finished with value: 0.7912306603993167 and parameters: {'lambda_l1': 0.003597311493596239, 'lambda_l2': 0.14276755814816638, 'num_leaves': 166, 'feature_fraction': 0.5762855118240453, 'bagging_fraction': 0.7077349604660885, 'bagging_freq': 9, 'min_child_samples': 24, 'learning_rate': 0.08165312579161246}. Best is trial 1 with value: 0.7912306603993167.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 1.18502


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:05:09,840] Trial 2 finished with value: 0.7110263892791447 and parameters: {'lambda_l1': 4.0064243600062805e-05, 'lambda_l2': 0.02460379789129369, 'num_leaves': 3, 'feature_fraction': 0.41993322031716895, 'bagging_fraction': 0.5932725019056392, 'bagging_freq': 9, 'min_child_samples': 37, 'learning_rate': 0.027466270086103677}. Best is trial 1 with value: 0.7912306603993167.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[76]	valid_0's multi_logloss: 0.730609


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:07:41,118] Trial 3 finished with value: 0.7936896130632477 and parameters: {'lambda_l1': 7.107905663170374e-07, 'lambda_l2': 3.4228918445044586e-08, 'num_leaves': 157, 'feature_fraction': 0.5321255550637529, 'bagging_fraction': 0.7886005318098357, 'bagging_freq': 1, 'min_child_samples': 88, 'learning_rate': 0.05971685131865686}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.831635


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:08:57,648] Trial 4 finished with value: 0.7724205788106728 and parameters: {'lambda_l1': 0.01342424708316229, 'lambda_l2': 5.864792204661227e-06, 'num_leaves': 129, 'feature_fraction': 0.7224328139439489, 'bagging_fraction': 0.6516154323094716, 'bagging_freq': 9, 'min_child_samples': 62, 'learning_rate': 0.02218305837345736}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[91]	valid_0's multi_logloss: 0.776647


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:09:55,824] Trial 5 finished with value: 0.7843113389670231 and parameters: {'lambda_l1': 0.00010211826986705488, 'lambda_l2': 6.566647096805194e-05, 'num_leaves': 7, 'feature_fraction': 0.7261702081482679, 'bagging_fraction': 0.9118911338163314, 'bagging_freq': 3, 'min_child_samples': 41, 'learning_rate': 0.09015114288639348}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 1.92552


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:12:28,801] Trial 6 finished with value: 0.7036653060580546 and parameters: {'lambda_l1': 7.158331034372394e-05, 'lambda_l2': 0.0002192061269136244, 'num_leaves': 108, 'feature_fraction': 0.9092104674485357, 'bagging_fraction': 0.8387732569234121, 'bagging_freq': 1, 'min_child_samples': 16, 'learning_rate': 0.0019368851894893879}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.73093


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:13:42,195] Trial 7 finished with value: 0.7818781569102851 and parameters: {'lambda_l1': 2.0674228581737166, 'lambda_l2': 7.158840916134377e-07, 'num_leaves': 20, 'feature_fraction': 0.7535198823426492, 'bagging_fraction': 0.9764853704713307, 'bagging_freq': 3, 'min_child_samples': 38, 'learning_rate': 0.05498203687944041}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.995943


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:14:39,501] Trial 8 finished with value: 0.7450311354008747 and parameters: {'lambda_l1': 0.00013867302878859715, 'lambda_l2': 0.0008797695854257082, 'num_leaves': 30, 'feature_fraction': 0.6616436218563654, 'bagging_fraction': 0.47012547257248366, 'bagging_freq': 2, 'min_child_samples': 85, 'learning_rate': 0.015907806875421874}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 1.58042


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:16:49,747] Trial 9 finished with value: 0.7552079243001939 and parameters: {'lambda_l1': 2.8727335630988806e-05, 'lambda_l2': 2.2877792807588024e-07, 'num_leaves': 79, 'feature_fraction': 0.7059999988328125, 'bagging_fraction': 0.5557758188467694, 'bagging_freq': 4, 'min_child_samples': 10, 'learning_rate': 0.0041134309949839385}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[99]	valid_0's multi_logloss: 0.732764


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:19:10,558] Trial 10 finished with value: 0.7883034708348818 and parameters: {'lambda_l1': 7.465917200331241e-08, 'lambda_l2': 1.3568880409775957e-08, 'num_leaves': 239, 'feature_fraction': 0.40721119970239017, 'bagging_fraction': 0.7739896850601388, 'bagging_freq': 6, 'min_child_samples': 96, 'learning_rate': 0.05267321370565553}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[87]	valid_0's multi_logloss: 0.702732


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:21:07,276] Trial 11 finished with value: 0.7899311286844907 and parameters: {'lambda_l1': 0.023007865306495987, 'lambda_l2': 1.6594424279837832, 'num_leaves': 188, 'feature_fraction': 0.5535001062386931, 'bagging_fraction': 0.7391951746750482, 'bagging_freq': 7, 'min_child_samples': 82, 'learning_rate': 0.08876487605905131}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.707716


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:23:06,163] Trial 12 finished with value: 0.7837162089446628 and parameters: {'lambda_l1': 4.731394472527256e-07, 'lambda_l2': 9.1198962118842, 'num_leaves': 162, 'feature_fraction': 0.5341327259667793, 'bagging_fraction': 0.8101945165524764, 'bagging_freq': 10, 'min_child_samples': 69, 'learning_rate': 0.07285522087056498}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[79]	valid_0's multi_logloss: 0.761201


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:25:25,200] Trial 13 finished with value: 0.7923380504366261 and parameters: {'lambda_l1': 1.589711691299787e-06, 'lambda_l2': 0.11545972027193886, 'num_leaves': 172, 'feature_fraction': 0.5657566201914281, 'bagging_fraction': 0.6588824229119915, 'bagging_freq': 8, 'min_child_samples': 23, 'learning_rate': 0.07297001680983389}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.734547


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:27:13,019] Trial 14 finished with value: 0.7783349944415957 and parameters: {'lambda_l1': 1.324460972024247e-06, 'lambda_l2': 0.00414523365588682, 'num_leaves': 209, 'feature_fraction': 0.48461966286162467, 'bagging_fraction': 0.6416837119832973, 'bagging_freq': 6, 'min_child_samples': 100, 'learning_rate': 0.04104978197745633}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[42]	valid_0's multi_logloss: 0.762471


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:28:30,659] Trial 15 finished with value: 0.7874297218242635 and parameters: {'lambda_l1': 1.7724901287954228e-06, 'lambda_l2': 1.509858210525433e-08, 'num_leaves': 83, 'feature_fraction': 0.6248300625573262, 'bagging_fraction': 0.8788627321005782, 'bagging_freq': 8, 'min_child_samples': 72, 'learning_rate': 0.0993943773623846}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[50]	valid_0's multi_logloss: 0.948128


[I 2024-06-30 12:30:06,281] Trial 16 finished with value: 0.7631272003749885 and parameters: {'lambda_l1': 4.723117098748917e-06, 'lambda_l2': 7.77560850893292e-06, 'num_leaves': 148, 'feature_fraction': 0.8248231765299752, 'bagging_fraction': 0.5531010588011349, 'bagging_freq': 5, 'min_child_samples': 27, 'learning_rate': 0.06787390112081197}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.74543


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:33:17,077] Trial 17 finished with value: 0.7882735559742967 and parameters: {'lambda_l1': 1.207680944237511e-07, 'lambda_l2': 0.29009196738918036, 'num_leaves': 254, 'feature_fraction': 0.48344580660417225, 'bagging_fraction': 0.6844017864634135, 'bagging_freq': 1, 'min_child_samples': 5, 'learning_rate': 0.041048539626701325}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[69]	valid_0's multi_logloss: 0.756744


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:35:06,448] Trial 18 finished with value: 0.7869014597767332 and parameters: {'lambda_l1': 1.0559658453875805e-08, 'lambda_l2': 2.3278971699958408e-05, 'num_leaves': 183, 'feature_fraction': 0.6225204137925673, 'bagging_fraction': 0.7666137897385491, 'bagging_freq': 7, 'min_child_samples': 49, 'learning_rate': 0.059672201459066175}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[98]	valid_0's multi_logloss: 0.736314


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:37:55,714] Trial 19 finished with value: 0.7826152337430221 and parameters: {'lambda_l1': 7.632608073073884e-06, 'lambda_l2': 2.631032163274184e-07, 'num_leaves': 125, 'feature_fraction': 0.48507382898826756, 'bagging_fraction': 0.9841814501543715, 'bagging_freq': 7, 'min_child_samples': 86, 'learning_rate': 0.04150949754091981}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[53]	valid_0's multi_logloss: 0.77411


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:38:50,891] Trial 20 finished with value: 0.778694393505794 and parameters: {'lambda_l1': 0.0005174816264228288, 'lambda_l2': 0.0011964074229342194, 'num_leaves': 55, 'feature_fraction': 0.8004062466379417, 'bagging_fraction': 0.5014453382588084, 'bagging_freq': 5, 'min_child_samples': 49, 'learning_rate': 0.0765300811566248}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[62]	valid_0's multi_logloss: 0.75501


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:40:46,650] Trial 21 finished with value: 0.7842874668277872 and parameters: {'lambda_l1': 0.00220888504653347, 'lambda_l2': 0.18626679893930131, 'num_leaves': 170, 'feature_fraction': 0.5799768699765415, 'bagging_fraction': 0.7041092633028984, 'bagging_freq': 10, 'min_child_samples': 21, 'learning_rate': 0.08147440087139528}. Best is trial 3 with value: 0.7936896130632477.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.717562


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:42:50,293] Trial 22 finished with value: 0.795304352465662 and parameters: {'lambda_l1': 0.19851783968615042, 'lambda_l2': 0.2659664427322425, 'num_leaves': 144, 'feature_fraction': 0.5953465502714416, 'bagging_fraction': 0.7278572543466821, 'bagging_freq': 8, 'min_child_samples': 33, 'learning_rate': 0.0632571713421006}. Best is trial 22 with value: 0.795304352465662.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.717791


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:44:00,801] Trial 23 finished with value: 0.7830811859439654 and parameters: {'lambda_l1': 1.4782493950293996, 'lambda_l2': 1.8261744218548577, 'num_leaves': 135, 'feature_fraction': 0.6452602020064969, 'bagging_fraction': 0.6171011417987144, 'bagging_freq': 8, 'min_child_samples': 32, 'learning_rate': 0.061387054225955195}. Best is trial 22 with value: 0.795304352465662.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.709912


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:46:13,026] Trial 24 finished with value: 0.7891554760753606 and parameters: {'lambda_l1': 0.1813797859836417, 'lambda_l2': 0.04786455115543745, 'num_leaves': 203, 'feature_fraction': 0.5143117787004845, 'bagging_fraction': 0.8189196558640031, 'bagging_freq': 8, 'min_child_samples': 44, 'learning_rate': 0.04639944212281798}. Best is trial 22 with value: 0.795304352465662.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[97]	valid_0's multi_logloss: 0.692559


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:48:32,476] Trial 25 finished with value: 0.7980040001619023 and parameters: {'lambda_l1': 0.22602522336559708, 'lambda_l2': 0.9012888384757944, 'num_leaves': 102, 'feature_fraction': 0.581333596366441, 'bagging_fraction': 0.7497642328270097, 'bagging_freq': 7, 'min_child_samples': 32, 'learning_rate': 0.0642891549663305}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.713226


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:50:24,083] Trial 26 finished with value: 0.7890742038808151 and parameters: {'lambda_l1': 0.21792674110934032, 'lambda_l2': 9.01159144620319, 'num_leaves': 99, 'feature_fraction': 0.4536652457018757, 'bagging_fraction': 0.8827706522689267, 'bagging_freq': 6, 'min_child_samples': 58, 'learning_rate': 0.06271210658246767}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.709355


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:52:35,815] Trial 27 finished with value: 0.7871049772414223 and parameters: {'lambda_l1': 0.08734264420134367, 'lambda_l2': 0.8490728006981257, 'num_leaves': 108, 'feature_fraction': 0.5912144578992116, 'bagging_fraction': 0.7411789123333421, 'bagging_freq': 7, 'min_child_samples': 31, 'learning_rate': 0.05273950599034503}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.808017


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:54:12,602] Trial 28 finished with value: 0.765924066061384 and parameters: {'lambda_l1': 4.453609108671093, 'lambda_l2': 0.0073044326542149535, 'num_leaves': 60, 'feature_fraction': 0.6752322734786544, 'bagging_fraction': 0.9370484835774756, 'bagging_freq': 4, 'min_child_samples': 66, 'learning_rate': 0.03298690434131696}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[99]	valid_0's multi_logloss: 0.708989


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:56:37,261] Trial 29 finished with value: 0.7941948712494963 and parameters: {'lambda_l1': 0.7133983682971755, 'lambda_l2': 0.0018323674602320297, 'num_leaves': 134, 'feature_fraction': 0.9907717256949282, 'bagging_fraction': 0.7850350294308805, 'bagging_freq': 5, 'min_child_samples': 56, 'learning_rate': 0.06711211749287342}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[85]	valid_0's multi_logloss: 0.720249


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:58:45,591] Trial 30 finished with value: 0.7881142687377239 and parameters: {'lambda_l1': 0.4741093675066939, 'lambda_l2': 0.02469017644787624, 'num_leaves': 139, 'feature_fraction': 0.933348662990078, 'bagging_fraction': 0.8539137163655685, 'bagging_freq': 5, 'min_child_samples': 54, 'learning_rate': 0.06847111875011186}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.827852


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 12:59:50,068] Trial 31 finished with value: 0.7611613899509817 and parameters: {'lambda_l1': 8.550318118102666, 'lambda_l2': 0.0019960236361966626, 'num_leaves': 154, 'feature_fraction': 0.9600939306073208, 'bagging_fraction': 0.7882832584185346, 'bagging_freq': 2, 'min_child_samples': 45, 'learning_rate': 0.05964927411261145}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[64]	valid_0's multi_logloss: 0.775715


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:01:16,587] Trial 32 finished with value: 0.7858052637851257 and parameters: {'lambda_l1': 0.031759462655423655, 'lambda_l2': 0.0001713738466358791, 'num_leaves': 115, 'feature_fraction': 0.8916255683927995, 'bagging_fraction': 0.7318400565970908, 'bagging_freq': 3, 'min_child_samples': 33, 'learning_rate': 0.06586558725454689}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[92]	valid_0's multi_logloss: 0.729278


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:02:09,938] Trial 33 finished with value: 0.7802629008354584 and parameters: {'lambda_l1': 0.004685569147403016, 'lambda_l2': 0.5444192926254577, 'num_leaves': 143, 'feature_fraction': 0.6165568282282227, 'bagging_fraction': 0.4110943539371121, 'bagging_freq': 9, 'min_child_samples': 80, 'learning_rate': 0.07943067843620097}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.718318


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:04:23,642] Trial 34 finished with value: 0.7869961293596168 and parameters: {'lambda_l1': 0.5324454714102429, 'lambda_l2': 0.044139085554488394, 'num_leaves': 89, 'feature_fraction': 0.5221469338651581, 'bagging_fraction': 0.6924218728407389, 'bagging_freq': 6, 'min_child_samples': 16, 'learning_rate': 0.04904067811708499}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[58]	valid_0's multi_logloss: 0.758082


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:05:37,966] Trial 35 finished with value: 0.782119134322576 and parameters: {'lambda_l1': 0.0015720708449900329, 'lambda_l2': 1.9087032706923125e-06, 'num_leaves': 68, 'feature_fraction': 0.7760159177935475, 'bagging_fraction': 0.7998771607108738, 'bagging_freq': 9, 'min_child_samples': 76, 'learning_rate': 0.07221830840077191}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[85]	valid_0's multi_logloss: 0.728826


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:07:07,828] Trial 36 finished with value: 0.7848240612923463 and parameters: {'lambda_l1': 0.05227626950810117, 'lambda_l2': 6.555061497321777e-08, 'num_leaves': 123, 'feature_fraction': 0.8485751601189451, 'bagging_fraction': 0.7622066928908088, 'bagging_freq': 4, 'min_child_samples': 90, 'learning_rate': 0.05646251821451632}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[97]	valid_0's multi_logloss: 0.684308


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:09:13,668] Trial 37 finished with value: 0.7920343171854755 and parameters: {'lambda_l1': 0.01227237557730757, 'lambda_l2': 3.0563557796316916, 'num_leaves': 156, 'feature_fraction': 0.446432901593087, 'bagging_fraction': 0.8383671915085512, 'bagging_freq': 2, 'min_child_samples': 61, 'learning_rate': 0.08763826355067557}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.715167


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:10:40,060] Trial 38 finished with value: 0.7890877505917951 and parameters: {'lambda_l1': 1.2679495862478167, 'lambda_l2': 0.0005505554640431702, 'num_leaves': 186, 'feature_fraction': 0.6855207633962468, 'bagging_fraction': 0.7275077992712998, 'bagging_freq': 8, 'min_child_samples': 38, 'learning_rate': 0.04637816897956927}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[63]	valid_0's multi_logloss: 0.773652


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:12:19,735] Trial 39 finished with value: 0.7877680150243297 and parameters: {'lambda_l1': 0.00041226961165251373, 'lambda_l2': 0.007936770119031756, 'num_leaves': 107, 'feature_fraction': 0.7287584633627231, 'bagging_fraction': 0.6690728330048354, 'bagging_freq': 1, 'min_child_samples': 26, 'learning_rate': 0.06815861146716883}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.800364


[I 2024-06-30 13:15:00,489] Trial 40 finished with value: 0.7811175704799107 and parameters: {'lambda_l1': 0.0075546566875114695, 'lambda_l2': 7.438825323692868e-05, 'num_leaves': 44, 'feature_fraction': 0.5997561211844536, 'bagging_fraction': 0.9420872254858542, 'bagging_freq': 7, 'min_child_samples': 14, 'learning_rate': 0.03302731891335771}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[61]	valid_0's multi_logloss: 0.768966


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:16:48,913] Trial 41 finished with value: 0.7824953519351484 and parameters: {'lambda_l1': 4.5721199797622164e-08, 'lambda_l2': 0.06180787041981161, 'num_leaves': 173, 'feature_fraction': 0.5701263290119994, 'bagging_fraction': 0.639801764686227, 'bagging_freq': 8, 'min_child_samples': 35, 'learning_rate': 0.08422702641039141}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[77]	valid_0's multi_logloss: 0.740665


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:19:12,073] Trial 42 finished with value: 0.7897928723651769 and parameters: {'lambda_l1': 4.032206776451219e-07, 'lambda_l2': 0.26957447234983284, 'num_leaves': 195, 'feature_fraction': 0.5509066978402717, 'bagging_fraction': 0.6134919523814362, 'bagging_freq': 8, 'min_child_samples': 19, 'learning_rate': 0.07459046620573592}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[82]	valid_0's multi_logloss: 0.768594


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:21:06,460] Trial 43 finished with value: 0.789745988389876 and parameters: {'lambda_l1': 2.0272487921264845e-05, 'lambda_l2': 0.024941920431847786, 'num_leaves': 218, 'feature_fraction': 0.6535113935821582, 'bagging_fraction': 0.7199582751666188, 'bagging_freq': 9, 'min_child_samples': 28, 'learning_rate': 0.06420538988991978}. Best is trial 25 with value: 0.7980040001619023.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.73767


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:23:35,566] Trial 44 finished with value: 0.7982770235060412 and parameters: {'lambda_l1': 0.17522744413925673, 'lambda_l2': 0.08870705429122101, 'num_leaves': 130, 'feature_fraction': 0.5454918173863869, 'bagging_fraction': 0.7885201159827951, 'bagging_freq': 10, 'min_child_samples': 22, 'learning_rate': 0.07120189868694556}. Best is trial 44 with value: 0.7982770235060412.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.730829


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:25:14,158] Trial 45 finished with value: 0.789161440725552 and parameters: {'lambda_l1': 0.3962780501150196, 'lambda_l2': 0.00286560443546977, 'num_leaves': 128, 'feature_fraction': 0.9836070030864125, 'bagging_fraction': 0.7572557924153135, 'bagging_freq': 10, 'min_child_samples': 41, 'learning_rate': 0.05596469544086149}. Best is trial 44 with value: 0.7982770235060412.


Training until validation scores don't improve for 10 rounds
Early stopping, best iteration is:
[81]	valid_0's multi_logloss: 0.737482


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:28:04,492] Trial 46 finished with value: 0.784171429401446 and parameters: {'lambda_l1': 0.08897782367142237, 'lambda_l2': 0.6362367163407842, 'num_leaves': 97, 'feature_fraction': 0.5396147420495843, 'bagging_fraction': 0.8256134192883045, 'bagging_freq': 10, 'min_child_samples': 8, 'learning_rate': 0.06996145731725653}. Best is trial 44 with value: 0.7982770235060412.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.727393


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:29:37,888] Trial 47 finished with value: 0.7806760474395225 and parameters: {'lambda_l1': 4.0705545762288775, 'lambda_l2': 0.11306606869255453, 'num_leaves': 118, 'feature_fraction': 0.46266217365373535, 'bagging_fraction': 0.8645576804460433, 'bagging_freq': 9, 'min_child_samples': 94, 'learning_rate': 0.09435992122115279}. Best is trial 44 with value: 0.7982770235060412.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.70176


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:31:54,795] Trial 48 finished with value: 0.7904104516619228 and parameters: {'lambda_l1': 1.3363102004028158, 'lambda_l2': 3.166710667546944, 'num_leaves': 136, 'feature_fraction': 0.512491124729581, 'bagging_fraction': 0.7728986852574921, 'bagging_freq': 3, 'min_child_samples': 20, 'learning_rate': 0.07850742792067096}. Best is trial 44 with value: 0.7982770235060412.


Training until validation scores don't improve for 10 rounds
Did not meet early stopping. Best iteration is:
[100]	valid_0's multi_logloss: 0.812704


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
[I 2024-06-30 13:34:33,767] Trial 49 finished with value: 0.7853725968307694 and parameters: {'lambda_l1': 0.02079819165683564, 'lambda_l2': 0.013135628464685007, 'num_leaves': 162, 'feature_fraction': 0.6048069088687937, 'bagging_fraction': 0.7958275027018878, 'bagging_freq': 6, 'min_child_samples': 11, 'learning_rate': 0.06398537823364654}. Best is trial 44 with value: 0.7982770235060412.


Number of finished trials: 50
Best trial:
  Value: 0.7982770235060412
  Params: 
    lambda_l1: 0.17522744413925673
    lambda_l2: 0.08870705429122101
    num_leaves: 130
    feature_fraction: 0.5454918173863869
    bagging_fraction: 0.7885201159827951
    bagging_freq: 10
    min_child_samples: 22
    learning_rate: 0.07120189868694556


#### Neuronska mreza pokazuje bolje performanse u smislu preciznosti
lambda_l1 i lambda_l2 su parametri koji kontrolisu regularizaciju tj. smanjuju kompleksnost modela da bi se izbjegao overfitting i njihov opseg je od 1x10^-8 do 10.0(logaritamska skala). Odabrala sam ovaj siroki opseg da bi omogucila modelu da pronadje balans izmedju overfittinga i underfittinga.

num_leaves je u opsegu od 2 do 256, jer bi manji broj listova doveo do jednostavnijeg modela i moglo bi doci do smanjenja overfittinga, dok veci broj listova omogucava modelu da nauci slozenije obrasce ali moze povecati rizik od overfittinga, dok za optimalan opseg Optuna moze testirati razlicite vrijednosti kako bi nasla najbolji balans izmedju slozenosti modela i performansi.

feature_fraction ima opseg od 0.4 do 1.0, siri opseg omogucava modelu da istrazi kako razlicite vrijednosti uticu na performanse, sto moze dovesti do boljih rezultata

bagging_fraction je u opsegu od 0.4 do 1.0 i pomaze u smanjeu  prekomjernog prilagodjenja, tako sto nasumicno bira podskup podataka za svako stablo, te  siri opseg omogucava bolju optimizaciju

bagging_freq odredjuje koliko cesto se podaci ponovo uzorkuju, ima opseg od 1 do 10 smatram da je to optimalan opseg, jer omogucava dovoljno frekventnu upotrebu bagginga, da se iskoriste njegove prednosti, a da se ne preoptereti proces treniranja

min_child_samples ima opseg od 5 do 100 jer ovaj opseg omogucava razlicite nivoe dijeljenja cvorova, manje vrijednosti mogu kreirati veoma male cvorove i to moze dovesti do overfittinga

learning_rate ima opseg od 0.001 do 0.1, niza stopa ucenja moze dovesti do boljih performansi i cesto omogucava modelu da pronadje bolji globalni minimum, dok siri opseg omogucava modelu da brze uci, ali moze preskociti globalni minimum

feature_pre_filter ima vrijednost False jer to omogucava da se koriste sve dostupne karakteristike, sto moze poboljsati performanse modela