-
Notifications
You must be signed in to change notification settings - Fork 4
Closed
Description
When executing rule induction for survival analysis on the complete dataset, I've encountered inconsistent results between the Python wrapper and the Java RuleKit. I observed variations specifically in the parameters of IBS (Integrated Brier Score), the count of rules, and the number of conditions per rule. The datasets used for this comparison include BMT-ch and mgus, both sourced from the RuleKit/data/contrast-sets/survival repository.
Here is a snippet of my code:
def get_ruleset_stats(model) -> pd.DataFrame:
tmp = model.parameters.__dict__
del tmp['_java_object']
return pd.DataFrame.from_records([{**tmp, **model.stats.__dict__}])
def survival_python_wrapper(x, y):
clf = SurvivalRules(
survival_time_attr='survival_time'
)
clf.fit(x, y)
start_time=time.time()
prediction = clf.predict(x)
end_time=time.time()
model_stats = get_ruleset_stats(clf.model)
display(model_stats)
ibs=clf.score(x,y)
prediction_time=end_time-start_time
results={'IBS': np.round(ibs,4), 'number of rules': model_stats['rules_count'][0],
'number of conditions': model_stats['conditions_per_rule'][0]*model_stats['rules_count'][0],
'model building time': model_stats['time_total_s'][0],
'prediction time': prediction_time }
display(results)
Metadata
Metadata
Assignees
Labels
No labels