## Weight & Biases
Zur Nachverfolgung und Analyse der Model-Performance

### Dataset & Runs
Namen für Datensets und Runs für bessere Nachvollziehbarkeit auf wandb

#### Runs
- **gb-hnf-uc**: Gradient Boosting mit HNF & UsageCluster
- **gb-hnf-cu**: Gradient Boosting mit HNF & Combined Usage


### Gradient Boosting
Imports & Datensatz laden

In [7]:
import wandb
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import linear_model
from sklearn.model_selection import train_test_split

import src.package.charts as charts
import src.package.importer as im
import src.package.importer_usages as imp_usg
import src.package.ml_helper as ml_helper
from numpy import mean

# load dataset
df = im.get_dataset('../package/dataset.csv')
df = imp_usg.extract_usage_details(df)
df = imp_usg.extract_garage_details(df)

X, y = ml_helper.hnf_dataset_full(df)

wandb.login()

config = {
    'model': 'LinearRegression',
    'features': ', '.join(X.columns.values),
    'dataset-count': len(X.index),
    'description': 'Drop all, Default Hyperparameter'
}
wandb.init(project='Metriken Bauwesen', entity='devcore', config=config)

# train and evaluate model
#regr = GradientBoostingRegressor(random_state=0)
regr = linear_model.LinearRegression()
scores_map = ml_helper.cross_validation(regr, X, y)

# log result of cross validation to wandb
for key in scores_map.keys():
    if key in {'fit_time', 'score_time'}:
        continue
    scores = scores_map[key]
    wandb.log({f'{key}_mean': mean(scores)})
    #print({f'{key}_mean': mean(scores)})


# use wandb regression eval
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
regr.fit(X_train, y_train)

wandb.sklearn.plot_regressor(regr, X_train, X_test, y_train, y_test,  model_name='Gradient Boost')

# charts.plot_feature_importance(regr.feature_importances_, X.columns, 'GRADIENT BOOSTING')

id                                                   353
source                               wbw_193_1993_3     
verification_status                          VERIFIED_OK
title                  Überbauung Hellmutstrasse, Zürich
neubau_umbau                                      NEUBAU
                                     ...                
nom_bki_kostenstand                                  NaN
total_expenses                                13647000.0
cost_ref_gf                                       6380.0
cost_ref_gfs                                      3660.0
ratio_hnf_gf                                         NaN
Name: 217, Length: 186, dtype: object


[34m[1mwandb[0m: 
[34m[1mwandb[0m: Plotting Gradient Boost.
[34m[1mwandb[0m: Logged summary metrics.
[34m[1mwandb[0m: Logged learning curve.
[34m[1mwandb[0m: Logged outlier candidates.
[34m[1mwandb[0m: Logged residuals.


VBox(children=(Label(value=' 0.07MB of 0.07MB uploaded (0.00MB deduped)\r'), FloatProgress(value=1.0, max=1.0)…

0,1
test_r2_mean,0.90552
_runtime,11.0
_timestamp,1623078139.0
_step,8.0
test_neg_mean_absolute_percentage_error_mean,-0.19485
test_neg_root_mean_squared_error_mean,-1251.86036
test_neg_mean_absolute_error_mean,-550.31913
test_max_error_mean,-8067.56527


0,1
test_r2_mean,▁
_runtime,▁▁▁▁▁▁▃▆█
_timestamp,▁▁▁▁▁▁▃▆█
_step,▁▂▃▄▅▅▆▇█
test_neg_mean_absolute_percentage_error_mean,▁
test_neg_root_mean_squared_error_mean,▁
test_neg_mean_absolute_error_mean,▁
test_max_error_mean,▁


### Linear Regression
Trainieren und testen des Models mit Repeated-Cross-Validation

In [8]:
# TODO: evaluate model

