You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importlightgbmaslgbimportnumpyasnpX=np.array(30*[[1]] +30*[[2]] +30*[[0]])
y=np.array(60* [5] +30*[10])
train_data=lgb.Dataset(X, label=y, categorical_feature=[0])
bst=lgb.train({}, train_data, 1)
bst.predict(np.array([[np.NaN], [0.0]])) # array([6.5 , 6.99999999])bst.save_model("model.txt")
importtreeliteimporttreelite_runtimemodel=treelite.Model.load('model.txt', model_format='lightgbm')
model.export_lib(toolchain="gcc", libpath='./mymodel.so', verbose=True)
predictor=treelite_runtime.Predictor('./mymodel.so', verbose=True)
dmat=treelite_runtime.DMatrix(np.array([[np.NaN], [0.0]]))
predictor.predict(dmat) # array([6.99999999, 6.99999999])# just to make sure it's not due to LightGBM model exportx=lgb.Booster(model_file="model.txt")
x.predict(np.array([[np.NaN], [0.0]])) # array([6.5 , 6.99999999])
Model plot:
I think this happens because treelight implements DecisionCategorical from LightGBM's code where NaNs get mapped to 0.0, but LightGBM seems to run DecisionCategoricalnner from here instead. CategoricalDecisionInner forgoes the NaN ⇒ 0.0 mapping, and NaNs take the right branch, which would explain the different outputs.
So far this is just my hypothesis, I haven't checked yet whether LightGBM runs DecisionCategorical or DecisionCategoricalInner.
Model plot:
I think this happens because treelight implements
DecisionCategorical
from LightGBM's code where NaNs get mapped to 0.0, but LightGBM seems to runDecisionCategoricalnner
from here instead.CategoricalDecisionInner
forgoes the NaN ⇒ 0.0 mapping, and NaNs take the right branch, which would explain the different outputs.So far this is just my hypothesis, I haven't checked yet whether LightGBM runs
DecisionCategorical
orDecisionCategoricalInner
.Version:
LightGBM == 3.2.1
,treelight==1.3.0
Model.txt
tree version=v3 num_class=1 num_tree_per_iteration=1 label_index=0 max_feature_idx=0 objective=regression feature_names=Column_0 feature_infos=-1:0:1:2 tree_sizes=322Tree=0
num_leaves=2
num_cat=1
split_feature=0
split_gain=500
threshold=0
decision_type=1
left_child=-1
right_child=-2
leaf_value=6.9999999920527145 6.5000000039736436
leaf_weight=30 60
leaf_count=30 60
internal_value=6.66667
internal_weight=0
internal_count=90
cat_boundaries=0 1
cat_threshold=1
is_linear=0
shrinkage=1
end of trees
feature_importances:
Column_0=1
parameters:
[boosting: gbdt]
[objective: regression]
[metric: ]
[tree_learner: serial]
[device_type: cpu]
[linear_tree: 0]
[data: ]
[valid: ]
[num_iterations: 1]
[learning_rate: 0.1]
[num_leaves: 31]
[num_threads: 0]
[deterministic: 0]
[force_col_wise: 0]
[force_row_wise: 0]
[histogram_pool_size: -1]
[max_depth: -1]
[min_data_in_leaf: 20]
[min_sum_hessian_in_leaf: 0.001]
[bagging_fraction: 1]
[pos_bagging_fraction: 1]
[neg_bagging_fraction: 1]
[bagging_freq: 0]
[bagging_seed: 3]
[feature_fraction: 1]
[feature_fraction_bynode: 1]
[feature_fraction_seed: 2]
[extra_trees: 0]
[extra_seed: 6]
[early_stopping_round: 0]
[first_metric_only: 0]
[max_delta_step: 0]
[lambda_l1: 0]
[lambda_l2: 0]
[linear_lambda: 0]
[min_gain_to_split: 0]
[drop_rate: 0.1]
[max_drop: 50]
[skip_drop: 0.5]
[xgboost_dart_mode: 0]
[uniform_drop: 0]
[drop_seed: 4]
[top_rate: 0.2]
[other_rate: 0.1]
[min_data_per_group: 100]
[max_cat_threshold: 32]
[cat_l2: 10]
[cat_smooth: 10]
[max_cat_to_onehot: 4]
[top_k: 20]
[monotone_constraints: ]
[monotone_constraints_method: basic]
[monotone_penalty: 0]
[feature_contri: ]
[forcedsplits_filename: ]
[refit_decay_rate: 0.9]
[cegb_tradeoff: 1]
[cegb_penalty_split: 0]
[cegb_penalty_feature_lazy: ]
[cegb_penalty_feature_coupled: ]
[path_smooth: 0]
[interaction_constraints: ]
[verbosity: 1]
[saved_feature_importance_type: 0]
[max_bin: 255]
[max_bin_by_feature: ]
[min_data_in_bin: 3]
[bin_construct_sample_cnt: 200000]
[data_random_seed: 1]
[is_enable_sparse: 1]
[enable_bundle: 1]
[use_missing: 1]
[zero_as_missing: 0]
[feature_pre_filter: 1]
[pre_partition: 0]
[two_round: 0]
[header: 0]
[label_column: ]
[weight_column: ]
[group_column: ]
[ignore_column: ]
[categorical_feature: 0]
[forcedbins_filename: ]
[objective_seed: 5]
[num_class: 1]
[is_unbalance: 0]
[scale_pos_weight: 1]
[sigmoid: 1]
[boost_from_average: 1]
[reg_sqrt: 0]
[alpha: 0.9]
[fair_c: 1]
[poisson_max_delta_step: 0.7]
[tweedie_variance_power: 1.5]
[lambdarank_truncation_level: 30]
[lambdarank_norm: 1]
[label_gain: ]
[eval_at: ]
[multi_error_top_k: 1]
[auc_mu_weights: ]
[num_machines: 1]
[local_listen_port: 12400]
[time_out: 120]
[machine_list_filename: ]
[machines: ]
[gpu_platform_id: -1]
[gpu_device_id: -1]
[gpu_use_dp: 0]
[num_gpu: 1]
end of parameters
pandas_categorical:null
The text was updated successfully, but these errors were encountered: