You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
iris = pd.concat(load_iris(return_X_y=True, as_frame=True), axis=1)
iris["target"] = iris["target"].astype("category")
amp_iris = iris.copy()
na_where = {}
for c in iris.columns:
na_where[c] = sorted(np.random.choice(amp_iris.shape[0], size=25, replace=False))
amp_iris.loc[na_where[c],c] = np.NaN
# Only class 0 was imputed
from isotree import IsolationForest
imputer = IsolationForest(
ntrees=100,
build_imputer=True,
ndim=1,
missing_action="impute"
)
imp_iris = imputer.fit_transform(amp_iris)
t = "target"
imp_iris.loc[na_where[t], t].unique()
# Use less trees, process is much more accurate
imputer = IsolationForest(
ntrees=10,
build_imputer=True,
ndim=1,
missing_action="impute"
)
imp_iris = imputer.fit_transform(amp_iris)
(imp_iris.loc[na_where[t], t] == iris.loc[na_where[t], t]).mean()
Using any number of trees over 100 caused only the first class (0) to ever be imputed. Using only 10 trees usually makes the imputation much more accurate. I tried playing around with different max_depths, but to no avail. Are there any obvious parameters I am missing to make the categorical imputation more accurate?
The text was updated successfully, but these errors were encountered:
Thanks again for the bug report. There is an issue in the code calculations with some numbers turning into infinite so in the meantime better not use fit_transform.
See this example:
Using any number of trees over 100 caused only the first class (0) to ever be imputed. Using only 10 trees usually makes the imputation much more accurate. I tried playing around with different max_depths, but to no avail. Are there any obvious parameters I am missing to make the categorical imputation more accurate?
The text was updated successfully, but these errors were encountered: