**WARNING**
For some reason now the Decision Lattice works for small datasets only. Before that (approx. Jan 2021) it operated datasets of medium size (thousands of elements)

In [1]:
!pip install -U -q fcapy==0.1.3

In [2]:
#load data from sci-kit learn
from sklearn.datasets import fetch_california_housing
california_data = fetch_california_housing(as_frame=True)
df = california_data['data']
y = california_data['target']

In [3]:
from fcapy.mvcontext import MVContext, PS
# define a specific type of PatternStructure for each column of a dataframe
ptypes = {f: PS.IntervalPS for f in df.columns}
# create a MVContext
K = MVContext(
    df.values, target=y.values,
    pattern_types=ptypes, attribute_names=df.columns
)
K

ManyValuedContext (20640 objects, 8 attributes)

In [4]:
n_train, n_test = 100, 100 # 16000, 4000

Split to train and test set

In [5]:
#K_train, K_test = K[:16000], K[16000:]
K_train, K_test = K[:n_train], K[n_train:n_train+n_test]

Initialize a DecisionLattice model (which uses RandomForest in the construction process)

In [6]:
from fcapy.ml.decision_lattice import DecisionLatticeRegressor
rf_params = {'n_estimators':5, 'max_depth':10}
dlr = DecisionLatticeRegressor(algo='RandomForest', algo_params={'rf_params':rf_params})

Fit the model

In [7]:
%time dlr.fit(K_train, use_tqdm=True)

CPU times: user 13.4 s, sys: 14.2 ms, total: 13.4 s
Wall time: 13.5 s


Predict the values

In [8]:
preds_train_dlr = dlr.predict(K_train)
preds_test_dlr = dlr.predict(K_test)

## sometimes, a test object can not be described by any concept from ConceptLattice
## in this case, the model predicts None. We replace it with mean target value over the train context
preds_test_dlr = [p if p is not None else K_train.target.mean() for p in preds_test_dlr]

Calculate the MSE

In [9]:
from sklearn.metrics import mean_squared_error
mean_squared_error(K_train.target, preds_train_dlr), mean_squared_error(K_test.target, preds_test_dlr)

(0.005696215460586414, 1.2633592142061563)

Fit a Random Forest model for the comparison

In [10]:
from sklearn.ensemble import RandomForestRegressor
rf = RandomForestRegressor(**rf_params)

%time rf.fit(df[:n_train], y[:n_train])

CPU times: user 8.41 ms, sys: 3 µs, total: 8.41 ms
Wall time: 7.94 ms


RandomForestRegressor(max_depth=10, n_estimators=5)

In [11]:
preds_train_rf = rf.predict(df[:n_train])
preds_test_rf = rf.predict(df[n_train:n_train+n_test])

mean_squared_error(K_train.target, preds_train_rf), mean_squared_error(K_test.target, preds_test_rf)

(0.07154953684880447, 0.6678685627999998)