# 45-dataset tabular benchmark

Tabular data benchmark from [27]. We used the repository from [27] at https://github.com/LeoGrin/tabular-benchmark, modifying the code as needed to incorporate our method. On all datasets, we grid search over 5 iterations of RFM with the Laplace kernel, solving kernel regression in closed form at all steps. This benchmark consists of 
- 20 medium regression datasets (without categorical variables), 
- 3 large regression datasets (without categorical variables), 
- 15 medium classification datasets (without categorical variables), 
- 4 large classification datasets (without categorical variables), 
- 13 medium classification datasets (with categorical variables), 
- 5 large regression datasets (with categorical variables), 
- 7 medium classification datasets (with categorical variables), and 
- 2 large classification datasets (with categorical variables). 

Following the terminology from [27], “medium” refers to datasets with at most 10000 training examples and “large” refers to those with more than 10000 training examples. 

**Hyperparameter Tuning:** In general [for RFM], we grid-searched over ridge regularization parameters in $\{10^{−4}, 10^{−3}, 10^{−2}, 10^{−1}, 1\}$ with fixed bandwidth $L = 10$. For regression, we centered the labels and scaled their variance to 1. On large regression datasets, we also optimized for bandwidths over $\{1, 5, 10, 15, 20\}$. 

We searched over two target transformations - the log transform $(\hat{y} = |y| log(1 + |y|))$ and `sklearn.preprocessing.QuantileTransformer`. In both cases, we inverted the transform before testing. We also searched over data transformations - `sklearn.preprocessing.StandardScaler` and `sklearn.preprocessing.QuantileTransformer`. We also optimized for the use of centering/not centering the gradients in our computation, and extracting just the diagonal of the feature matrix. For non-kernel methods, we compare to the metrics reported in [27]. 

**Evaluation Scores:** For classification, we report the average accuracy across the random iterations in each sweep (including random train/val/test splits). For regression, the average R2 is reported. The reported test score is the average performance of the model with the highest average validation performance.

| Dataset name | size | features | description |
|--------------|------|----------|-------------|
|california    | 20634 | 8       | https://www.dcc.fc.up.pt/ltorgo/Regression/cal_housing.html%22 |
|Diabetes130US | 71090 | 7       | https://www.openml.org/d/4541 |
| jannis       | 57580 | 54      | https://www.openml.org/search?type=data&sort=runs&id=41168&status=active |
| covertype    | 566602 | 10     | https://www.openml.org/search?type=data&sort=runs&id=293&status=active |
| Higgs	       | 940160 | 24     | https://www.openml.org/search?type=data&sort=runs&id=42769&status=active |

In [1]:
# install using `conda install -c conda-forge line_profiler`
%load_ext line_profiler
%load_ext autoreload
%autoreload 2

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, accuracy_score
from sklearn.preprocessing import StandardScaler, QuantileTransformer
import numpy as np
import pandas as pd
from copy import deepcopy

# utils for plotting
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.express as px

# utils for kernel ridge regression
from goodpoints.krr.util_estimators import get_estimator, get_sigma_heuristic
# utils for evaluating kernels
from goodpoints.krr.util_k_mmd import kernel_eval, to_regression_kernel
# utils for generate samples from the data distribution
from goodpoints.krr.util_sample import get_Xy #, ToyData , get_toy_dataset, logistic
from goodpoints.krr.util_load_data import get_real_dataset
# utils for dataset thinning
from goodpoints.krr.util_thin import sd_thin, kt_thin2

In [3]:
# add this to be able to render plotly plots in non-vscode notebooks
import plotly.io as pio
pio.renderers.default = "notebook_connected"

In [4]:
# helper functions
def sample(arr, n=1000):
    return arr[np.random.choice(len(arr), n, replace=True)]
def histogram(arr, height=400, width=600):
    return px.histogram(arr, width=width, height=height)

def classification_accuracy(labels, pred):
    decision = pred.copy()
    # implement classification rule
    decision[decision > 0.5] = 1
    decision[decision <= 0.5] = 0
    return accuracy_score(labels, decision)

## Set hyperparameters

In [5]:
### Regression parameters

kernel = 'laplace'  # ['gauss', 'laplace']
sigma = 10
alpha = 1e-3 # 1.0

### RFM parameters
# usually one is enough to learn "reasonable" features
rfm_iters = 1

### Experiment parameters

k_fold = 5      # k >= 2
n_repeats = 10
use_cross_validation = False

n_jobs = 2 # -1 = use all CPUs
save = False

### Thinning parameters

m = None # Thinned dataset will have size n/2**m

In [6]:
# Determine auxiliary parameters

task = 'classification'
refit = 'accuracy'
postprocess = 'threshold'
ydim = 1

Kernels:
- RBF:
$$\mathbf{k}(x, y) = \exp(-\gamma ||x-y||_2^2)$$
- Laplacian:
$$\mathbf{k}(x, y) = \exp(-\gamma ||x-y||_1)$$

Median heuristic to choose the bandwidth parameter, i.e., median of squared pairwise distances:
- For Gaussian data, we can compute this exactly. Assume $X\sim \mathcal{N}(0,\sigma^2 I_d)$. For the RBF kernel, $X_1-X_2\sim \mathcal{N}(0,2\sigma^2 I_d)$. Then $(X_1-X_2)^2$ follows a chi-squared distribution with $d$ degrees of freedom, mean $d\cdot \sqrt{2}\sigma$ and median roughly $d(1-\frac{2}{9d})^3 \cdot \sqrt{2}\sigma$. For the Laplacian kernel, $||x-y||_1$ follows a folded normal distribution (https://en.wikipedia.org/wiki/Folded_normal_distribution) with median roughly $\sqrt{2}\sigma$.

Available kernels in sklearn: 
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.pairwise

## Get dataset

For list of available subsets and their OpenML links:
https://huggingface.co/datasets/inria-soda/tabular-benchmark

```
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103185/credit.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361055
Task URL.............: https://www.openml.org/t/361055
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: SeriousDlqin2yrs
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103245/electricity.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361060
Task URL.............: https://www.openml.org/t/361060
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: class
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103246/covertype.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361061
Task URL.............: https://www.openml.org/t/361061
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: Y
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103247/pol.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361062
Task URL.............: https://www.openml.org/t/361062
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: binaryClass
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103248/house_16H.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361063
Task URL.............: https://www.openml.org/t/361063
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: binaryClass
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103250/MagicTelescope.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361065
Task URL.............: https://www.openml.org/t/361065
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: class
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103251/bank-marketing.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361066
Task URL.............: https://www.openml.org/t/361066
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: Class
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103253/MiniBooNE.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361068
Task URL.............: https://www.openml.org/t/361068
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: signal
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103254/Higgs.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361069
Task URL.............: https://www.openml.org/t/361069
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: target
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22103255/eye_movements.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361070
Task URL.............: https://www.openml.org/t/361070
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: label
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22111908/Diabetes130US.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361273
Task URL.............: https://www.openml.org/t/361273
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: readmitted
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22111907/jannis.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361274
Task URL.............: https://www.openml.org/t/361274
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: class
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22111906/default-of-credit-card-clients.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361275
Task URL.............: https://www.openml.org/t/361275
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: y
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22111905/Bioresponse.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361276
Task URL.............: https://www.openml.org/t/361276
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: target
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22111914/california.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361277
Task URL.............: https://www.openml.org/t/361277
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: price_above_median
# of Classes.........: 2
Cost Matrix..........: Available
WARNING:root:Received uncompressed content from OpenML for https://api.openml.org/data/v1/download/22111912/heloc.arff.
OpenML Classification Task
==========================
Task Type Description: https://www.openml.org/tt/TaskType.SUPERVISED_CLASSIFICATION
Task ID..............: 361278
Task URL.............: https://www.openml.org/t/361278
Estimation Procedure.: crossvalidation
Evaluation Measure...: predictive_accuracy
Target Feature.......: RiskPerformance
# of Classes.........: 2
Cost Matrix..........: Available

```

In [7]:
import openml
openml.config.apikey = 'e6d1ecc68afe6fbcd296c034335dd888'  # set the OpenML Api Key

# # SUITE_ID = 336 # Regression on numerical features
# SUITE_ID = 337 # Classification on numerical features
# #SUITE_ID = 335 # Regression on numerical and categorical features
# #SUITE_ID = 334 # Classification on numerical and categorical features
# benchmark_suite = openml.study.get_suite(SUITE_ID)  # obtain the benchmark suite
# for task_id in benchmark_suite.tasks:  # iterate over all tasks
#     task = openml.tasks.get_task(task_id)  # download the OpenML task
#     dataset = task.get_dataset()
#     X, y, categorical_indicator, attribute_names = dataset.get_data(
#         dataset_format="dataframe", target=dataset.default_target_attribute
#     )

dataset = openml.tasks.get_task(361277).get_dataset() # download the OpenML dataset
print(task)

classification


In [8]:
X, y, categorical_indicator, attribute_names = dataset.get_data(
    dataset_format="array", target=dataset.default_target_attribute
)
print('converting types')
X = X.astype(np.float64)
y = y.astype(np.float64)

print('normalizing X')
X_mean = X.mean(0, keepdims=True)
X_std = X.std(0, keepdims=True)
X -= X_mean
X /= X_std
# use QuantileTransformer to normalize X
# X = QuantileTransformer(
#     output_distribution="normal", random_state=42
# ).fit_transform(X)

converting types
normalizing X


In [9]:
X.shape, y.shape

((20634, 8), (20634,))

In [10]:
X[:10]

array([[-8.88588377e-01, -2.09816541e-01, -3.66863611e-01,
        -3.69682934e-01, -9.89654814e-01, -8.58627479e-02,
         2.06878536e+00, -1.26304969e+00],
       [-4.18631666e-01,  2.66967114e-01, -3.25786175e-01,
        -2.39017596e-01,  2.15756595e+00,  1.47184040e-01,
        -1.33995416e+00,  1.25266373e+00],
       [-1.07766586e+00,  9.02678654e-01, -2.94229551e-01,
         6.28993468e-02, -4.39354916e-01,  4.67020988e-02,
         9.91848597e-01, -1.29300125e+00],
       [-1.26769093e+00,  6.64286826e-01, -4.77432337e-01,
         1.89394221e-02,  2.18708364e-01,  5.15203017e-02,
        -7.68708685e-01,  6.43700852e-01],
       [-1.03408122e+00, -1.24284779e+00, -7.98617715e-01,
        -2.36861393e-01, -1.00643764e+00,  1.34617040e-01,
        -7.78073570e-01,  7.03596342e-01],
       [ 2.21325638e-03,  1.61785414e+00, -2.00243811e-01,
        -2.50522920e-01, -8.45675868e-01, -4.66739360e-02,
         1.00589503e+00, -1.29799000e+00],
       [-1.19678682e+00,  4.258949

In [11]:
y[:10]

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=(k_fold-1)/k_fold, 
                                                    shuffle=True, random_state=42)

In [13]:
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

(16507, 8) (16507,)
(4127, 8) (4127,)


In [14]:
histogram(np.linalg.norm(X_train, axis=1, ord=2))

In [15]:
heur_sigma, distances = get_sigma_heuristic(X_train, sample_size=200, return_dist=True)
print('heuristic bandwidth:', heur_sigma)

heuristic bandwidth: 2.928627765678848


In [16]:
histogram(sample(distances, 10000))

In [17]:
if ydim == 1:
    fig = histogram(y_train)
else:
    fig = histogram(np.argmax(y_train, axis=-1))
fig.show()

### Standard Thinning (ST)

In [18]:
%%time
sd_coreset = sd_thin(X_train, m=m)
print('sd coreset:', len(sd_coreset))
X_train_sd_thin, y_train_sd_thin = X_train[sd_coreset], y_train[sd_coreset]

sd coreset: 128
CPU times: user 453 µs, sys: 266 µs, total: 719 µs
Wall time: 515 µs


### Kernel Thinning (KT)

In [19]:
from functools import partial

# KERNEL THINNING

# Define kernel params
d = X_train.shape[-1]
var_k = sigma**2
params_k_swap = {"name": kernel, "var": var_k, "d": int(d)}
params_k_split = {"name": kernel, "var": var_k, "d": int(d)}

split_kernel = partial(kernel_eval, params_k=params_k_split)
swap_kernel = partial(kernel_eval, params_k=params_k_swap)

regression_split_kernel = to_regression_kernel(split_kernel, ydim=ydim)
regression_swap_kernel = to_regression_kernel(swap_kernel, ydim=ydim)

In [20]:
Xy_train = get_Xy(X_train, y_train)
print(Xy_train.shape)


(16507, 9)


In [21]:
# %lprun -f kt_thin3 kt_coreset = kt_thin3(X_train, split_kernel, swap_kernel, m=m)

In [22]:
# from goodpoints.compress import compress_gsn_kt
# X_intermediate = compress_gsn_kt(X_train)

In [23]:
# from goodpoints import compress
# %lprun -f compress.compresspp ktr_coreset = kt_thin2(Xy_train, regression_split_kernel, regression_swap_kernel, m=m, store_K=True)

| n | 5,000 | 20,000 |
| -------- | -------- | -------- |
| store_K=True | 7.9s | 46.9s |
| store_K=False | 20.8s | 1m59s |

In [24]:
# X_train_ktr_thin, y_train_ktr_thin = X_train[ktr_coreset], y_train[ktr_coreset]

In [25]:
# X_train_ktr_thin.shape

In [26]:
# print('n:', len(Xy_train))
# log2n = int(np.log2(len(Xy_train)))
# log4n = int(np.log2(len(Xy_train)) / 2)
# print('log2n:', log2n)
# print('log4n:', log4n) 

# print('2^log2n:', 2**log2n)
# print('4^log4n:', 4**log4n)

# for i in range(log2n // 2 + 1):
#     with TicToc():
#         print(i, kt_thin2(Xy_train, regression_split_kernel, regression_swap_kernel, m=i).shape[0])
#         print(i, sd_thin(X_train, m=i).shape[0])

## KRR (Full)

In [27]:
krr_full = get_estimator(
    'regression',
    'full', 
    alpha=alpha, 
    kernel=kernel, 
    sigma=sigma, 
    postprocess=None, # no postprocessing so that we can compute the MSE
)

In [28]:
krr_full

In [29]:
%%time
K_full = krr_full.fit(X_train, y_train)

CPU times: user 39.5 s, sys: 7.36 s, total: 46.9 s
Wall time: 13 s


In [30]:
histogram(sample(K_full.flatten(), n=10000))

In [31]:
pred_full = krr_full.predict(X_test)
train_pred_full = krr_full.predict(X_train)

In [32]:
fig = make_subplots(rows=2, cols=1, subplot_titles=['train', 'test'])

fig.add_trace(go.Histogram(x=train_pred_full.flatten(), name='train', opacity=0.5), row=1, col=1)
fig.add_trace(go.Histogram(x=y_train.flatten(), name='ground truth', opacity=0.5, legendgroup=1), row=1, col=1)

fig.add_trace(go.Histogram(x=pred_full.flatten(), name='test', opacity=0.5), row=2, col=1)
fig.add_trace(go.Histogram(x=y_test.flatten(), name='ground truth', opacity=0.5, legendgroup=1), row=2, col=1)
fig.show()

In [33]:
print('Train acc:', classification_accuracy(y_train, train_pred_full))
print('acc:', classification_accuracy(y_test, pred_full))
print()
print('Train MSE:', mean_squared_error(y_train, train_pred_full))
print('MSE:', mean_squared_error(y_test, pred_full))

Train acc: 1.0
acc: 0.8684274291252726

Train MSE: 0.00016669262750389154
MSE: 0.09648699338001852


In [34]:
histogram(krr_full.sol_)

In [35]:
len(krr_full.sol_)

16507

## KRR + ST

In [36]:
krr_sd_thin = get_estimator(
    'regression', 
    'st', 
    alpha=alpha, # / np.power(len(X_train), 1/4), 
    kernel=kernel, 
    sigma=sigma, 
    m=m, 
    postprocess=None
)

In [37]:
%%time
krr_sd_thin.fit(X_train, y_train)

CPU times: user 370 ms, sys: 45.5 ms, total: 416 ms
Wall time: 121 ms


In [38]:
krr_sd_thin.X_fit_.shape

(128, 8)

In [39]:
%%time
pred_sd = krr_sd_thin.predict(X_test)
train_pred_sd = krr_sd_thin.predict(X_train)


CPU times: user 192 ms, sys: 133 ms, total: 324 ms
Wall time: 40.7 ms


In [40]:
print('Train acc:', classification_accuracy(y_train, train_pred_sd))
print('acc:', classification_accuracy(y_test, pred_sd))
print()
print('train MSE:', mean_squared_error(y_train, krr_sd_thin.predict(X_train)))
print('MSE:', mean_squared_error(y_test, pred_sd))

Train acc: 0.7742775792088205
acc: 0.7656893627332203

train MSE: 0.1518759002719003
MSE: 0.15658489621511842


## KRR + KT

In [41]:
krr_kt_thin = get_estimator(
    'regression',
    'kt', 
    kernel=kernel, 
    alpha=alpha, # / np.power(len(X_train), 1/4), 
    sigma=sigma, 
    m=m, 
    postprocess=None,
    ydim=ydim,
)

In [42]:
%%time
krr_kt_thin.fit(X_train, y_train)

# To run line profiler, uncomment the next line
# %lprun -f krr_kt_thin.fit krr_kt_thin.fit(X_train, y_train)

CPU times: user 1.36 s, sys: 540 ms, total: 1.9 s
Wall time: 748 ms


In [43]:
krr_kt_thin.X_fit_.shape

(128, 8)

In [44]:
%%time
pred_kt = krr_kt_thin.predict(X_test)
train_pred_kt = krr_kt_thin.predict(X_train)


CPU times: user 225 ms, sys: 102 ms, total: 327 ms
Wall time: 40.3 ms


In [45]:
print('Train acc:', classification_accuracy(y_train, train_pred_kt))
print('acc:', classification_accuracy(y_test, pred_kt))
print()
print('train MSE:', mean_squared_error(y_train, krr_kt_thin.predict(X_train)))
print('MSE:', mean_squared_error(y_test, pred_kt))

Train acc: 0.7999636517840916
acc: 0.7933123334141022

train MSE: 0.14136561664580938
MSE: 0.14312094008133905


## RFM

Note: changing the bandwidth for RFM doesn't make a big difference, since increasing bandwidth will lead to greater weight values. However, there is a big difference in terms of numerical stability. Therefore, it's better to use the default bandwidth $L=10$.

In [46]:
rfm = get_estimator(
    'regression', 
    'rfm', 
    alpha=alpha, 
    kernel=kernel, 
    sigma=sigma,
    iters=rfm_iters,
    ydim=ydim,
)

In [47]:
rfm

In [48]:
Ms, mses, preds = rfm.fit(
    X_train, y_train, 
    val_data=(X_test, y_test),
)

Round 0, Test MSE: 0.0965
Using batch size of 4032


  0%|          | 0/5 [00:00<?, ?it/s]

Final MSE: 0.0853


In [49]:
# plot correlation matrices Ms as subplots
fig = make_subplots(rows=1, cols=len(Ms), subplot_titles=[f'iter {i}' for i in range(len(Ms))])
for i, M in enumerate(Ms):
    # add image
    fig.add_trace(go.Heatmap(z=M, showlegend=False), row=1, col=i+1)
    fig.update_layout(height=400, width=1000, title_text="Feature matrix per iteration")
fig.show()

In [50]:
np.linalg.eigvals(Ms[0])

array([4.29717805, 0.83292077, 0.34060664, 0.24095768, 0.21245195,
       0.18559495, 0.20099358, 0.1997036 ])

In [51]:
histogram(rfm._model.weights)

In [52]:
%%time
pred_rfm = rfm.predict(X_test)
train_pred_rfm = rfm.predict(X_train)

CPU times: user 2.49 s, sys: 283 ms, total: 2.77 s
Wall time: 642 ms


In [53]:
pred_rfm

array([[-0.0484735 ],
       [-0.02122692],
       [ 0.35967129],
       ...,
       [ 0.05908065],
       [ 0.694575  ],
       [ 0.80817998]])

In [54]:
print('Train acc:', classification_accuracy(y_train, train_pred_rfm))
print('acc:', classification_accuracy(y_test, pred_rfm))
print()
print('train MSE:', mean_squared_error(y_train, train_pred_rfm))
print('MSE:', mean_squared_error(y_test, pred_rfm))

Train acc: 1.0
acc: 0.8819966077053549

train MSE: 0.0004091322083310464
MSE: 0.08525310082048927


## KRR + KT + Feature Learning

In [55]:
krr_kf_thin = get_estimator(
    'regression',
    'kf', 
    kernel=kernel, 
    alpha=alpha, # / np.power(len(X_train), 1/4), 
    sigma=10, 
    m=m, 
    postprocess=None,
    ydim=ydim,
    rfm_iters=rfm_iters,
    # rank=8
)

In [56]:
krr_kf_thin

In [70]:
%%time
# krr_kf_thin.rank = 8
K = krr_kf_thin.fit(X_train, y_train, val_data=(X_test, y_test))

CPU times: user 1.23 s, sys: 7.49 ms, total: 1.24 s
Wall time: 1.25 s


In [71]:
%%time
pred_kf = krr_kf_thin.predict(X_test)
train_pred_kf = krr_kf_thin.predict(X_train)

CPU times: user 218 ms, sys: 43.4 ms, total: 262 ms
Wall time: 34.9 ms


In [72]:
print('Train acc:', classification_accuracy(y_train, train_pred_kf))
print('acc:', classification_accuracy(y_test, pred_kf))
print()
print('train MSE:', mean_squared_error(y_train, train_pred_kf))
print('MSE:', mean_squared_error(y_test, pred_kf))

Train acc: 0.8003271339431757
acc: 0.7911315725708747

train MSE: 0.6409641741123815
MSE: 0.6092712594556142


In [60]:
print('Train acc:', classification_accuracy(y_train, train_pred_kf))
print('acc:', classification_accuracy(y_test, pred_kf))
print()
print('train MSE:', mean_squared_error(y_train, train_pred_kf))
print('MSE:', mean_squared_error(y_test, pred_kf))

Train acc: 0.650330162961168
acc: 0.6442936757935547

train MSE: 0.4118836534282637
MSE: 0.4096418895517681


In [220]:
%%time
# krr_kf_thin.rank = 8
K = krr_kf_thin.fit(X_train, y_train, val_data=(X_test, y_test), rank=2)
pred_kf = krr_kf_thin.predict(X_test)
train_pred_kf = krr_kf_thin.predict(X_train)

(16507, 2)
distances: (128, 128)
CPU times: user 1.63 s, sys: 1.55 s, total: 3.18 s
Wall time: 844 ms


In [221]:
print('Train acc:', classification_accuracy(y_train, train_pred_kf))
print('acc:', classification_accuracy(y_test, pred_kf))
print()
print('train MSE:', mean_squared_error(y_train, train_pred_kf))
print('MSE:', mean_squared_error(y_test, pred_kf))

Train acc: 0.6582661901011692
acc: 0.6588320814150714

train MSE: 2.302327620598286
MSE: 2.2777875646349313


In [48]:
fig = go.Figure(data=[go.Heatmap(z=krr_kf_thin.M)])
fig.update_layout(height=400, width=400, title_text="Feature matrix")
fig.show()

In [73]:
np.linalg.svd(krr_kf_thin.M)[1]

array([4.29717805, 0.83292077, 0.34060664, 0.24095768, 0.21245195,
       0.20099358, 0.1997036 , 0.18559495])

In [93]:
U, S, V = np.linalg.svd(krr_kf_thin.M)
W = np.diag(np.sqrt(S)) @ V
print(np.isclose(krr_kf_thin.M, W.T @ W))

[[ True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True]]


In [94]:
from goodpoints.krr.util_k_mmd import laplacian, laplacian_M

In [95]:
laplacian(X_test, X_test, sigma)

distances: (4127, 4127)


array([[1.        , 0.57134342, 0.65761896, ..., 0.62982597, 0.6431303 ,
        0.6449978 ],
       [0.57134342, 1.        , 0.68731841, ..., 0.70634685, 0.71904356,
        0.64224553],
       [0.65761896, 0.68731841, 1.        , ..., 0.64361118, 0.6827845 ,
        0.77619631],
       ...,
       [0.62982597, 0.70634685, 0.64361118, ..., 1.        , 0.86351658,
        0.57323381],
       [0.6431303 , 0.71904356, 0.6827845 , ..., 0.86351658, 1.        ,
        0.6028259 ],
       [0.6449978 , 0.64224553, 0.77619631, ..., 0.57323381, 0.6028259 ,
        0.99999999]])

In [96]:
laplacian(X_test @ W.T, X_test @ W.T, sigma)

distances: (4127, 4127)


array([[1.        , 0.82770122, 0.58661727, ..., 0.80951522, 0.73720113,
        0.60595797],
       [0.82770122, 1.        , 0.67377517, ..., 0.95626749, 0.85388203,
        0.6926639 ],
       [0.58661727, 0.67377517, 0.99999999, ..., 0.67007667, 0.75887219,
        0.95945539],
       ...,
       [0.80951522, 0.95626749, 0.67007667, ..., 1.        , 0.86158484,
        0.68844649],
       [0.73720113, 0.85388203, 0.75887219, ..., 0.86158484, 1.        ,
        0.78191943],
       [0.60595797, 0.6926639 , 0.95945539, ..., 0.68844649, 0.78191943,
        1.        ]])

In [111]:
laplacian_M(X_test, X_test, krr_kf_thin.M, sigma)

array([[1.        , 0.82770122, 0.58661727, ..., 0.80951522, 0.73720113,
        0.60595797],
       [0.82770122, 1.        , 0.67377517, ..., 0.95626749, 0.85388203,
        0.6926639 ],
       [0.58661727, 0.67377517, 1.        , ..., 0.67007667, 0.75887219,
        0.95945539],
       ...,
       [0.80951522, 0.95626749, 0.67007667, ..., 1.        , 0.86158484,
        0.68844649],
       [0.73720113, 0.85388203, 0.75887219, ..., 0.86158484, 1.        ,
        0.78191943],
       [0.60595797, 0.6926639 , 0.95945539, ..., 0.68844649, 0.78191943,
        1.        ]])

In [109]:
# np.isclose(laplacian(X_test @ W.T, X_test @ W.T, sigma), laplacian_M(X_test, X_test, krr_kf_thin.M, sigma), rtol=0, atol=1e-12).all()
np.median(np.abs(laplacian(X_test @ W.T, X_test @ W.T, sigma) -laplacian_M(X_test, X_test, krr_kf_thin.M, sigma)).flatten())

distances: (4127, 4127)


1.1102230246251565e-16

In [99]:
krr_kf_thin.W.shape

(8, 8)

In [50]:
krr_kf_thin.X_fit_.shape

(128, 8)

In [51]:
K.shape

(128, 128)

In [52]:
histogram(K.flatten())

In [53]:
histogram(krr_kf_thin.sol_)

## RFM-Thin

In [65]:
rfm_thin = get_estimator(
    'regression', 
    'rfm', 
    alpha=alpha, 
    kernel=kernel, 
    sigma=sigma,
    iters=rfm_iters,
    ydim=ydim,
    use_kt = True,
)

In [66]:
Ms, mses, preds = rfm_thin.fit(
    X_train, y_train, 
    val_data=(X_test, y_test),
)

Using kernel thinning to select centers...
Round 0, Test MSE: 0.1355
Using batch size of 4032


  0%|          | 0/5 [00:00<?, ?it/s]

Using kernel thinning to select centers...
Round 1, Test MSE: 0.1281
Using batch size of 4032


  0%|          | 0/5 [00:00<?, ?it/s]

Using kernel thinning to select centers...
Final MSE: 0.1361


In [67]:
%%time
pred_rfm_thin = rfm_thin.predict(X_test)
train_pred_rfm_thin = rfm_thin.predict(X_train)

CPU times: user 19.7 ms, sys: 4.17 ms, total: 23.8 ms
Wall time: 8.12 ms


In [68]:
print('Train acc:', classification_accuracy(y_train, train_pred_rfm_thin))
print('acc:', classification_accuracy(y_test, pred_rfm_thin))
print()
print('train MSE:', mean_squared_error(y_train, train_pred_rfm_thin))
print('MSE:', mean_squared_error(y_test, pred_rfm_thin))

Train acc: 0.8225601260071485
acc: 0.8134237945238673

train MSE: 0.1305584630883244
MSE: 0.13614135210584952


## FALKON

In [69]:
krr_falkon = get_estimator(
    task,
    'falkon',
    kernel=kernel,
    sigma=sigma,
    alpha=alpha,
    m=m,
    postprocess=postprocess,
)

No module named 'falkon'


In [70]:
%%time
if krr_falkon:
    krr_falkon.fit(X_train, y_train)

CPU times: user 1e+03 ns, sys: 1e+03 ns, total: 2 µs
Wall time: 2.86 µs


In [71]:
%%time
if krr_falkon:
    pred_falkon = krr_falkon.predict(X_test)
    train_pred_falkon = krr_falkon.predict(X_train)

    print('Train acc:', classification_accuracy(y_train, train_pred_falkon))
    print('acc:', classification_accuracy(y_test, pred_falkon))
    print()
    print('train MSE:', mean_squared_error(y_train, train_pred_falkon))
    print('MSE:', mean_squared_error(y_test, pred_falkon))

CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 3.1 µs


## FALKON + KT

In [72]:
# krr_falkon_kt = get_estimator(
#     task,
#     'falkon+kt',
#     kernel=kernel,
#     sigma=sigma,
#     alpha=alpha,
#     m=m,
#     postprocess=postprocess,
#     ydim=ydim,
# )

In [73]:
# %lprun -f krr_falkon_kt.fit krr_falkon_kt.fit(X_train, y_train)

In [74]:
# %%time
# if krr_falkon_kt:
#     pred_falkon_kt = krr_falkon_kt.predict(X_test)
#     print('Score:', accuracy_score(y_test, pred_falkon_kt))
#     print('RMSE:', np.sqrt(mean_squared_error(y_test, pred_falkon_kt)))

## Run experiment

We now run a full grid search with cross validation across different-size datasets.

Reference: https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_hist_grad_boosting_comparison.html#sphx-glr-auto-examples-ensemble-plot-forest-hist-grad-boosting-comparison-py

In [75]:
from sklearn.model_selection import GridSearchCV, KFold

In [76]:
# NOTE: these will only be applied if `use_cross_validation` is True
# Default param grid to search for each model
default_param_grid = {
    "sigma" :   [10,], 
    "alpha" :   [10, 1, 1e-1, 1e-2, 1e-3, 1e-4],
}
falkon_param_grid = {
    "sigma" :   [10,],
    "alpha" :   [1e-2, 1e-3, 1e-4,1e-5,1e-6, 0], # Falkon requires smaller alpha
}
# falkon_param_grid = default_param_grid

# rfm_param_grid = {
#     "sigma" :   [10,], 
#     "alpha" :   [1e-3,1e-4,1e-5],
#     # "iters" :   [1,2,3],
# }
rfm_param_grid = default_param_grid

In [77]:
# The different values will correspond to different columns in the final plots
varying_variable = 'kernel'
varying_variable_values = ['gauss', 'laplace',]
datasets = ['housing',]

In [78]:
# Model constructors and data size for each model
# We allow for different data sizes to avoid running Full KR on large datasets
model_configs = {
    'full' : {
        'dataset' : datasets,
        'kwargs': {
            'postprocess' : postprocess
        },
        'param_grid' : default_param_grid
    },
}

# for m in [None,]:
model_configs[f'st'] = {
    'dataset' : datasets,
    'kwargs' : {
        'm' : m,
        'postprocess' : postprocess
    },
    'param_grid' : default_param_grid
}

model_configs[f'kt'] = {
    'dataset' : datasets,
    'kwargs' : {
        'm' : m,
        'postprocess' : postprocess,
        'ydim' : ydim,
    },
    'param_grid' : default_param_grid
}

model_configs[f'falkon'] = {
    'dataset' : datasets,
    'kwargs' : {
        'm' : m,
        'postprocess' : postprocess,
    },
    'param_grid' : falkon_param_grid
}

# model_configs[f'falkon+kt_{m}'] = {
#     'dataset' : datasets,
#     'kwargs' : {
#         'm' : m,
#         'postprocess' : postprocess,
#         'ydim' : ydim,
#     },
#     'param_grid' : falkon_param_grid
# }
model_configs[f'rfm'] = {
    'dataset' : datasets,
    'kwargs' : {
        'iters' : rfm_iters,
        'postprocess' : postprocess,
    },
    'param_grid' : rfm_param_grid
}

model_configs[f'kf'] = {
    'dataset' : datasets,
    'kwargs' : {
        'm' : m,
        'postprocess' : postprocess,
        'ydim' : ydim,
        'rfm_iters' : rfm_iters,
    },
    'param_grid' : rfm_param_grid
}

model_configs[f'rfm-thin'] = {
    'dataset' : datasets,
    'kwargs' : {
        'iters' : rfm_iters,
        'use_kt' : True,
        'postprocess' : postprocess,
    },
    'param_grid' : rfm_param_grid
}

In [79]:
model_configs

{'full': {'dataset': ['housing'],
  'kwargs': {'postprocess': 'threshold'},
  'param_grid': {'sigma': [10], 'alpha': [1, 0.1, 0.01, 0.001, 0.0001]}},
 'st': {'dataset': ['housing'],
  'kwargs': {'m': None, 'postprocess': 'threshold'},
  'param_grid': {'sigma': [10], 'alpha': [1, 0.1, 0.01, 0.001, 0.0001]}},
 'kt': {'dataset': ['housing'],
  'kwargs': {'m': None, 'postprocess': 'threshold', 'ydim': 1},
  'param_grid': {'sigma': [10], 'alpha': [1, 0.1, 0.01, 0.001, 0.0001]}},
 'falkon': {'dataset': ['housing'],
  'kwargs': {'m': None, 'postprocess': 'threshold'},
  'param_grid': {'sigma': [10], 'alpha': [1, 0.1, 0.01, 0.001, 0.0001]}},
 'rfm': {'dataset': ['housing'],
  'kwargs': {'iters': 2, 'postprocess': 'threshold'},
  'param_grid': {'sigma': [10], 'alpha': [1, 0.1, 0.01, 0.001, 0.0001]}},
 'kf': {'dataset': ['housing'],
  'kwargs': {'m': None, 'postprocess': 'threshold', 'ydim': 1, 'rfm_iters': 2},
  'param_grid': {'sigma': [10], 'alpha': [1, 0.1, 0.01, 0.001, 0.0001]}},
 'rfm-thin'

In [80]:
use_cross_validation

True

In [81]:
# Run experiment (depending on experiment_type)

results = []

count = 0
for name, config in model_configs.items():
    for dataset in config['dataset']:

        for v in varying_variable_values:
            kwargs = deepcopy(config['kwargs'])
            kwargs[varying_variable] = v
            model_name = f"{name}_{v}"
            # NOTE: full and rfm are deterministic, so we only need to run them once
            trials = (1 if name in ['full', 'rfm'] else n_repeats)

            # STEP 1: Get data
            # use X_train, y_train, X_test, y_test from above
            
            if 'kernel' not in kwargs:
                kwargs['kernel'] = kernel

            model = get_estimator(task, name=name, **kwargs)
            if model is None: continue
            print(f'i={count+1}: dataset={dataset}, model={model}')

            # STEP 2: Get optimal parameters through grid search
            # NOTE: we want to get rid of randomness in the Kernel Thinning (or Standard Thinning) routine
            # so we do k-fold cross validation `trials` times using the *same* split.
            # This is different from sklearn's repeated k-fold implementation which uses a 
            # different random split each time.            

            if use_cross_validation:
                split = list(KFold(n_splits=k_fold).split(X_train)) * trials
                grid_search = GridSearchCV(
                    estimator=model,
                    param_grid=config['param_grid'],
                    return_train_score=True,
                    cv=split,
                    scoring=refit,
                    refit=False,
                    n_jobs=n_jobs,
                ).fit(X_train, y_train)
                # get validation scores
                cv_results = pd.DataFrame(grid_search.cv_results_)
                val_scores = []
                for i in range(trials):
                    # compute classification error from accuracy
                    val_scores.append( 1-cv_results.iloc[grid_search.best_index_][f'split{i}_test_score'] )
            
                # get optimal parameters
                best_params = grid_search.best_params_
            
            else:
                # Dummy values (these won't be displayed in the plots)
                val_scores = [np.nan,] * trials
                
                best_params = {
                    'sigma' : sigma,
                    'alpha' : alpha, # * (len(X_train)**(1/4) if name in ['st', 'kt'] else 1),
                }
            print(f"best params: {best_params}")
            best_model = get_estimator(task, name=name, 
                                       sigma=best_params['sigma'],
                                       alpha=best_params['alpha'],
                                       **kwargs)
            print(best_model)

            # STEP: Estimate test score
            train_scores = []
            test_scores = []
            for _ in range(trials):
                best_model.fit(X_train, y_train)

                # compute train score
                train_pred = best_model.predict(X_train).squeeze()
                # compute test score
                test_pred = best_model.predict(X_test).squeeze()

                train_score = 1- classification_accuracy(y_train, train_pred)
                test_score = 1- classification_accuracy(y_test, test_pred)
                
                train_scores.append( train_score )
                test_scores.append( test_score )

            results.append({
                "dataset": dataset, 
                "model": model_name, 
                "cv_results": pd.DataFrame(grid_search.cv_results_) if use_cross_validation else None,
                "best_index_" : grid_search.best_index_ if use_cross_validation else 0,
                "best_params_" : best_params,
                "val_scores" : val_scores,
                "train_scores" : train_scores,
                "test_scores" : test_scores,
            })

            count += 1

i=1: dataset=housing, model=KernelRidgeClassifier(kernel='gauss', postprocess='threshold')
distances: (13205, 13205)
distances: (13205, 13205)
distances: (13205, 3302)
distances: (13205, 3302)
distances: (13205, 13205)
distances: (13205, 13205)
distances: (13206, 13206)
distances: (13206, 13206)
distances: (13206, 3301)
distances: (13206, 13206)
distances: (13206, 3301)
distances: (13206, 13206)
distances: (13206, 13206)
distances: (13205, 13205)
distances: (13206, 3301)
distances: (13206, 13206)
distances: (13205, 3302)
distances: (13205, 13205)
distances: (13205, 13205)
distances: (13206, 13206)
distances: (13205, 3302)
distances: (13205, 13205)
distances: (13206, 3301)
distances: (13206, 13206)
distances: (13206, 13206)
distances: (13206, 13206)
distances: (13206, 3301)
distances: (13206, 13206)
distances: (13206, 3301)
distances: (13206, 13206)
distances: (13205, 13205)
distances: (13205, 13205)
distances: (13205, 3302)
distances: (13205, 13205)
distances: (13205, 3302)
distances: 

100%|██████████| 3/3 [00:00<00:00,  6.07it/s]
100%|██████████| 3/3 [00:00<00:00,  5.78it/s]


Round 1, Test MSE: 0.1486
Round 1, Test MSE: 0.1473
Using batch size of 5056
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.86it/s]
100%|██████████| 3/3 [00:00<00:00,  5.13it/s]


Final MSE: 0.1484
Final MSE: 0.1499


Traceback (most recent call last):
  File "/Users/ag2435/anaconda3/envs/goodpoints/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 813, in _score
    scores = scorer(estimator, X_test, y_test)
  File "/Users/ag2435/anaconda3/envs/goodpoints/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 266, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true, **_kwargs)
  File "/Users/ag2435/anaconda3/envs/goodpoints/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 353, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "/Users/ag2435/anaconda3/envs/goodpoints/lib/python3.10/site-packages/sklearn/metrics/_scorer.py", line 86, in _cached_call
    result, _ = _get_response_values(
  File "/Users/ag2435/anaconda3/envs/goodpoints/lib/python3.10/site-packages/sklearn/utils/_response.py", line 74, in _get_response_values
    classes = estimator.classes_
AttributeError: 'RFMClassifier' object has no at

Round 0, Test MSE: 0.1301
Round 0, Test MSE: 0.1320
Using batch size of 5056
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.39it/s]
100%|██████████| 3/3 [00:00<00:00,  6.16it/s]


Round 1, Test MSE: 0.1494
Round 1, Test MSE: 0.1474
Using batch size of 5056
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.45it/s]
100%|██████████| 3/3 [00:00<00:00,  5.61it/s]


Final MSE: 0.1507
Final MSE: 0.1489
Round 0, Test MSE: 0.1324
Round 0, Test MSE: 0.1268
Using batch size of 5056


  0%|          | 0/3 [00:00<?, ?it/s].94it/s]

Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.60it/s]
100%|██████████| 3/3 [00:00<00:00,  5.02it/s]


Round 1, Test MSE: 0.1498
Round 1, Test MSE: 0.1426
Using batch size of 5056
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.15it/s]
100%|██████████| 3/3 [00:00<00:00,  5.71it/s]


Final MSE: 0.1512
Final MSE: 0.1449
Round 0, Test MSE: 0.1254
Round 0, Test MSE: 0.1256
Using batch size of 5056
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.05it/s]
100%|██████████| 3/3 [00:00<00:00,  5.58it/s]


Round 1, Test MSE: 0.1417
Round 1, Test MSE: 0.1418
Using batch size of 5056
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.78it/s]
100%|██████████| 3/3 [00:00<00:00,  5.71it/s]


Final MSE: 0.1435
Final MSE: 0.1444
Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.34it/s]


Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  6.03it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.66it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.53it/s]


Final MSE: 0.1405
Final MSE: 0.1451
Round 0, Test MSE: 0.1214
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.19it/s]


Round 0, Test MSE: 0.1196
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.60it/s]


Round 1, Test MSE: 0.1351
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.30it/s]


Round 1, Test MSE: 0.1331
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.63it/s]


Final MSE: 0.1409
Final MSE: 0.1399
Round 0, Test MSE: 0.1197
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.32it/s]


Round 0, Test MSE: 0.1189
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.08it/s]


Round 1, Test MSE: 0.1340
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.32it/s]


Round 1, Test MSE: 0.1315
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.56it/s]


Final MSE: 0.1404
Final MSE: 0.1375
Round 0, Test MSE: 0.1206
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.70it/s]


Round 0, Test MSE: 0.1149
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.03it/s]


Round 1, Test MSE: 0.1347
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.20it/s]


Round 1, Test MSE: 0.1259
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.58it/s]


Final MSE: 0.1409
Final MSE: 0.1282
Round 0, Test MSE: 0.1137
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.50it/s]


Round 0, Test MSE: 0.1137
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.59it/s]


Round 1, Test MSE: 0.1251
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.00it/s]


Round 1, Test MSE: 0.1252
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.57it/s]


Final MSE: 0.1268
Final MSE: 0.1283
Round 0, Test MSE: 0.1137
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.00it/s]


Round 0, Test MSE: 0.1144
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.43it/s]


Round 1, Test MSE: 0.1244
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.84it/s]


Round 1, Test MSE: 0.1259
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.45it/s]


Final MSE: 0.1265
Final MSE: 0.1291
Round 0, Test MSE: 0.1095
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.64it/s]


Round 0, Test MSE: 0.1079
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.67it/s]


Round 1, Test MSE: 0.1218
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.70it/s]


Round 1, Test MSE: 0.1212
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.53it/s]


Final MSE: 0.1244
Final MSE: 0.1236
Round 0, Test MSE: 0.1080
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.68it/s]


Round 0, Test MSE: 0.1080
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.13it/s]


Round 1, Test MSE: 0.1209
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.18it/s]


Round 1, Test MSE: 0.1201
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.20it/s]


Final MSE: 0.1239
Final MSE: 0.1224
Round 0, Test MSE: 0.1083
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  6.17it/s]


Round 1, Test MSE: 0.1217
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  8.13it/s]


Final MSE: 0.1249
best params: {'alpha': 1, 'sigma': 10}
RFMClassifier(iters=2, kernel='gauss', postprocess='threshold', sigma=10)



One or more of the test scores are non-finite: [nan nan nan nan nan]


One or more of the train scores are non-finite: [nan nan nan nan nan]



Round 0, Test MSE: 0.1312
Using batch size of 4032


  0%|          | 0/5 [00:00<?, ?it/s]

Round 1, Test MSE: 0.1481
Using batch size of 4032


  0%|          | 0/5 [00:00<?, ?it/s]

Final MSE: 0.1494
i=8: dataset=housing, model=RFMClassifier(alpha=0.001, iters=2, kernel='laplace', postprocess='threshold',
              sigma=10)
Round 0, Test MSE: 0.0922
Round 0, Test MSE: 0.0914
Using batch size of 5056
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.11it/s]
100%|██████████| 3/3 [00:00<00:00,  3.04it/s]


Round 1, Test MSE: 0.0970
Round 1, Test MSE: 0.0955
Using batch size of 5056
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.10it/s]
100%|██████████| 3/3 [00:01<00:00,  2.77it/s]


Final MSE: 0.1009
Final MSE: 0.0993
Round 0, Test MSE: 0.0915
Round 0, Test MSE: 0.0908
Using batch size of 5056


  0%|          | 0/3 [00:00<?, ?it/s]

Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.70it/s]
100%|██████████| 3/3 [00:01<00:00,  2.90it/s]


Round 1, Test MSE: 0.0955
Round 1, Test MSE: 0.0953
Using batch size of 5056


  0%|          | 0/3 [00:00<?, ?it/s]

Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.78it/s]
100%|██████████| 3/3 [00:00<00:00,  3.01it/s]


Final MSE: 0.0994
Final MSE: 0.0990
Round 0, Test MSE: 0.0911
Using batch size of 5056
Round 0, Test MSE: 0.0506


 33%|███▎      | 1/3 [00:00<00:00,  2.76it/s]

Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.05it/s]
100%|██████████| 3/3 [00:01<00:00,  2.75it/s]


Round 1, Test MSE: 0.0953
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.82it/s]


Round 1, Test MSE: 0.0670
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.76it/s]


Final MSE: 0.0988
Final MSE: 0.0699
Round 0, Test MSE: 0.0504
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.28it/s]


Round 0, Test MSE: 0.0502
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.13it/s]


Round 1, Test MSE: 0.0666
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.53it/s]


Round 1, Test MSE: 0.0667
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.48it/s]


Final MSE: 0.0695
Final MSE: 0.0698
Round 0, Test MSE: 0.0496
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.57it/s]


Round 0, Test MSE: 0.0498
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.44it/s]


Round 1, Test MSE: 0.0659
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.65it/s]


Round 1, Test MSE: 0.0656
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.65it/s]


Final MSE: 0.0691
Final MSE: 0.0684
Round 0, Test MSE: 0.0071
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.39it/s]


Round 0, Test MSE: 0.0071
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.49it/s]


Round 1, Test MSE: 0.0152
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.60it/s]


Round 1, Test MSE: 0.0149
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.44it/s]


Final MSE: 0.0177
Final MSE: 0.0176
Round 0, Test MSE: 0.0070
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.69it/s]


Round 0, Test MSE: 0.0069
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.87it/s]


Round 1, Test MSE: 0.0151
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.60it/s]


Round 1, Test MSE: 0.0147
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.81it/s]


Final MSE: 0.0178
Final MSE: 0.0175
Round 0, Test MSE: 0.0070
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.62it/s]


Round 0, Test MSE: 0.0002
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.85it/s]


Round 1, Test MSE: 0.0147
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.80it/s]


Round 1, Test MSE: 0.0004
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.46it/s]


Final MSE: 0.0172
Final MSE: 0.0005
Round 0, Test MSE: 0.0002
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.52it/s]


Round 0, Test MSE: 0.0002
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.57it/s]


Round 1, Test MSE: 0.0004
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.59it/s]


Round 1, Test MSE: 0.0004
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.58it/s]


Final MSE: 0.0005
Final MSE: 0.0005
Round 0, Test MSE: 0.0002
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.40it/s]


Round 0, Test MSE: 0.0002
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.51it/s]


Round 1, Test MSE: 0.0004
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.77it/s]


Round 1, Test MSE: 0.0004
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.75it/s]


Final MSE: 0.0005
Final MSE: 0.0005
Round 0, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.81it/s]


Round 0, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.69it/s]


Round 1, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.72it/s]


Round 1, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.93it/s]


Final MSE: 0.0000
Final MSE: 0.0000
Round 0, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.80it/s]


Round 0, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.64it/s]


Round 1, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  2.64it/s]


Round 1, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:01<00:00,  1.94it/s]


Final MSE: 0.0000
Final MSE: 0.0000
Round 0, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.29it/s]


Round 1, Test MSE: 0.0000
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.46it/s]


Final MSE: 0.0000
best params: {'alpha': 1, 'sigma': 10}
RFMClassifier(iters=2, kernel='laplace', postprocess='threshold', sigma=10)



One or more of the test scores are non-finite: [nan nan nan nan nan]


One or more of the train scores are non-finite: [nan nan nan nan nan]



Round 0, Test MSE: 0.0904
Using batch size of 4032


  0%|          | 0/5 [00:00<?, ?it/s]

Round 1, Test MSE: 0.0941
Using batch size of 4032


  0%|          | 0/5 [00:00<?, ?it/s]

Final MSE: 0.0977
i=9: dataset=housing, model=KernelRidgeKTFeatureClassifier(kernel='M_gauss', postprocess='threshold',
                               rfm_iters=2)
learning feature matrix...
learning feature matrix...
Round 0, Test MSE: 0.1328
Round 0, Test MSE: 0.1320
Using batch size of 5056


 67%|██████▋   | 2/3 [00:00<00:00,  4.95it/s]

Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.65it/s]
100%|██████████| 3/3 [00:00<00:00,  4.87it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  6.08it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.82it/s]


Final MSE: 0.1484
learning feature matrix...
Final MSE: 0.1499
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.86it/s]


Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.74it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.19it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.07it/s]


Final MSE: 0.1507
learning feature matrix...
Final MSE: 0.1489
learning feature matrix...
Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.13it/s]


Round 0, Test MSE: 0.1328
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.04it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.09it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.86it/s]


Final MSE: 0.1512
learning feature matrix...
Final MSE: 0.1484
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.67it/s]


Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.53it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.60it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.42it/s]


Final MSE: 0.1499
learning feature matrix...
Final MSE: 0.1507
learning feature matrix...
Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.26it/s]


Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.51it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.43it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.65it/s]


Final MSE: 0.1489
learning feature matrix...
Final MSE: 0.1512
learning feature matrix...
Round 0, Test MSE: 0.1328
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.02it/s]


Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.79it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.27it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.40it/s]


Final MSE: 0.1484
learning feature matrix...
Final MSE: 0.1499
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.71it/s]


Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.88it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.37it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.80it/s]


Final MSE: 0.1507
learning feature matrix...
Final MSE: 0.1489
learning feature matrix...
Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.31it/s]


Round 0, Test MSE: 0.1328
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.14it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.22it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.09it/s]


Final MSE: 0.1512
learning feature matrix...
Final MSE: 0.1484
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.58it/s]


Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.80it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.64it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.29it/s]


Final MSE: 0.1499
learning feature matrix...
Final MSE: 0.1507
learning feature matrix...
Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.89it/s]


Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.21it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.75it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.83it/s]


Final MSE: 0.1489
learning feature matrix...
Final MSE: 0.1512
learning feature matrix...
Round 0, Test MSE: 0.1328
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.22it/s]


Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.80it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.04it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.34it/s]


Final MSE: 0.1484
learning feature matrix...
Final MSE: 0.1499
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.42it/s]


Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.63it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.77it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.36it/s]


Final MSE: 0.1507
learning feature matrix...
Final MSE: 0.1489
learning feature matrix...
Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.71it/s]


Round 0, Test MSE: 0.1328
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.82it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.21it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.07it/s]


Final MSE: 0.1512
learning feature matrix...
Final MSE: 0.1484
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.41it/s]


Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.17it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.36it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.54it/s]


Final MSE: 0.1499
learning feature matrix...
Final MSE: 0.1507
learning feature matrix...
Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.96it/s]


Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.96it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.22it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.41it/s]


Final MSE: 0.1489
learning feature matrix...
Final MSE: 0.1512
learning feature matrix...
Round 0, Test MSE: 0.1328
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.68it/s]


Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.08it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.07it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.54it/s]


Final MSE: 0.1484
learning feature matrix...
Final MSE: 0.1499
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.71it/s]


Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.14it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.68it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.34it/s]


Final MSE: 0.1507
learning feature matrix...
Final MSE: 0.1489
learning feature matrix...
Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.45it/s]


Round 0, Test MSE: 0.1328
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.57it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.75it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.63it/s]


Final MSE: 0.1512
learning feature matrix...
Final MSE: 0.1484
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.30it/s]


Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.64it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.57it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.09it/s]


Final MSE: 0.1499
learning feature matrix...
Final MSE: 0.1507
learning feature matrix...
Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.29it/s]


Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.78it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.63it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.44it/s]


Final MSE: 0.1489
learning feature matrix...
Final MSE: 0.1512
learning feature matrix...
Round 0, Test MSE: 0.1328
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.43it/s]


Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.33it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.46it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.66it/s]


Final MSE: 0.1484
learning feature matrix...
Final MSE: 0.1499
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.65it/s]


Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.60it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.55it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.99it/s]


Final MSE: 0.1507
learning feature matrix...
Final MSE: 0.1489
learning feature matrix...
Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.34it/s]


Round 0, Test MSE: 0.1328
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.85it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.90it/s]


Round 1, Test MSE: 0.1473
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.77it/s]


Final MSE: 0.1512
learning feature matrix...
Final MSE: 0.1484
learning feature matrix...
Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.08it/s]


Round 0, Test MSE: 0.1320
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.97it/s]


Round 1, Test MSE: 0.1486
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.41it/s]


Round 1, Test MSE: 0.1494
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.56it/s]


Final MSE: 0.1499
learning feature matrix...
Final MSE: 0.1507
learning feature matrix...
Round 0, Test MSE: 0.1301
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.45it/s]


Round 0, Test MSE: 0.1324
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.66it/s]


Round 1, Test MSE: 0.1474
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.87it/s]


Round 1, Test MSE: 0.1498
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.45it/s]


Final MSE: 0.1489
learning feature matrix...
Final MSE: 0.1512
learning feature matrix...
Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.47it/s]


Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.70it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.64it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.50it/s]


Final MSE: 0.1449
learning feature matrix...
Final MSE: 0.1435
learning feature matrix...
Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.13it/s]


Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.31it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.58it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.21it/s]


Final MSE: 0.1444
learning feature matrix...
Final MSE: 0.1405
learning feature matrix...
Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.37it/s]


Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.35it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.87it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.62it/s]


Final MSE: 0.1451
learning feature matrix...
Final MSE: 0.1449
learning feature matrix...
Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.23it/s]


Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.23it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.09it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.45it/s]


Final MSE: 0.1435
learning feature matrix...
Final MSE: 0.1444
learning feature matrix...
Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.92it/s]


Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.44it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.85it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.05it/s]


Final MSE: 0.1405
learning feature matrix...
Final MSE: 0.1451
learning feature matrix...
Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.49it/s]


Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.91it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.06it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.68it/s]


Final MSE: 0.1449
learning feature matrix...
Final MSE: 0.1435
learning feature matrix...
Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.64it/s]


Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.36it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.96it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.27it/s]


Final MSE: 0.1444
learning feature matrix...
Final MSE: 0.1405
learning feature matrix...
Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.12it/s]


Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.45it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.34it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.93it/s]


Final MSE: 0.1451
learning feature matrix...
Final MSE: 0.1449
learning feature matrix...
Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.65it/s]


Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.50it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.28it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.72it/s]


Final MSE: 0.1435
learning feature matrix...
Final MSE: 0.1444
learning feature matrix...
Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.17it/s]


Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.36it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.31it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.30it/s]


Final MSE: 0.1405
learning feature matrix...
Final MSE: 0.1451
learning feature matrix...
Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.28it/s]


Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.31it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.03it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.88it/s]


Final MSE: 0.1449
learning feature matrix...
Final MSE: 0.1435
learning feature matrix...
Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.76it/s]


Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.96it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.95it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.43it/s]


Final MSE: 0.1444
learning feature matrix...
Final MSE: 0.1405
learning feature matrix...
Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.02it/s]


Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.69it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.16it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.11it/s]


Final MSE: 0.1451
learning feature matrix...
Final MSE: 0.1449
learning feature matrix...
Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.58it/s]


Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.87it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.75it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.40it/s]


Final MSE: 0.1435
learning feature matrix...
Final MSE: 0.1444
learning feature matrix...
Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.43it/s]


Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.46it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.06it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.78it/s]


Final MSE: 0.1405
learning feature matrix...
Final MSE: 0.1451
learning feature matrix...
Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.44it/s]


Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.98it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.16it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.56it/s]


Final MSE: 0.1449
learning feature matrix...
Final MSE: 0.1435
learning feature matrix...
Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.25it/s]


Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.52it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.53it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.70it/s]


Final MSE: 0.1444
learning feature matrix...
Final MSE: 0.1405
learning feature matrix...
Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.42it/s]


Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.94it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.68it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.00it/s]


Final MSE: 0.1451
learning feature matrix...
Final MSE: 0.1449
learning feature matrix...
Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.58it/s]


Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.19it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.57it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.16it/s]


Final MSE: 0.1435
learning feature matrix...
Final MSE: 0.1444
learning feature matrix...
Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.64it/s]


Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.28it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.45it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.16it/s]


Final MSE: 0.1405
learning feature matrix...
Final MSE: 0.1451
learning feature matrix...
Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.56it/s]


Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.52it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.44it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.57it/s]


Final MSE: 0.1449
learning feature matrix...
Final MSE: 0.1435
learning feature matrix...
Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.17it/s]


Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.96it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.36it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.51it/s]


Final MSE: 0.1444
learning feature matrix...
Final MSE: 0.1405
learning feature matrix...
Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.78it/s]


Round 0, Test MSE: 0.1268
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.59it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.46it/s]


Round 1, Test MSE: 0.1426
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.57it/s]


Final MSE: 0.1451
learning feature matrix...
Final MSE: 0.1449
learning feature matrix...
Round 0, Test MSE: 0.1254
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.02it/s]


Round 0, Test MSE: 0.1256
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.77it/s]


Round 1, Test MSE: 0.1417
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.47it/s]


Round 1, Test MSE: 0.1418
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.41it/s]


Final MSE: 0.1435
learning feature matrix...
Final MSE: 0.1444
learning feature matrix...
Round 0, Test MSE: 0.1241
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.17it/s]


Round 0, Test MSE: 0.1262
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.54it/s]


Round 1, Test MSE: 0.1380
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.88it/s]


Round 1, Test MSE: 0.1425
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.97it/s]


Final MSE: 0.1405
learning feature matrix...
Final MSE: 0.1451
learning feature matrix...
Round 0, Test MSE: 0.1214
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.79it/s]


Round 0, Test MSE: 0.1196
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.49it/s]


Round 1, Test MSE: 0.1351
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.60it/s]


Round 1, Test MSE: 0.1331
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.40it/s]


Final MSE: 0.1409
learning feature matrix...
Final MSE: 0.1399
learning feature matrix...
Round 0, Test MSE: 0.1197
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.29it/s]


Round 0, Test MSE: 0.1189
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.01it/s]


Round 1, Test MSE: 0.1340
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.78it/s]


Round 1, Test MSE: 0.1315
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.90it/s]


Final MSE: 0.1404
learning feature matrix...
Final MSE: 0.1375
learning feature matrix...
Round 0, Test MSE: 0.1206
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.56it/s]


Round 0, Test MSE: 0.1214
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.06it/s]


Round 1, Test MSE: 0.1347
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.71it/s]


Round 1, Test MSE: 0.1351
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.43it/s]


Final MSE: 0.1409
learning feature matrix...
Final MSE: 0.1409
learning feature matrix...
Round 0, Test MSE: 0.1196
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.65it/s]


Round 0, Test MSE: 0.1197
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.37it/s]


Round 1, Test MSE: 0.1331
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.70it/s]


Round 1, Test MSE: 0.1340
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.51it/s]


Final MSE: 0.1399
learning feature matrix...
Final MSE: 0.1404
learning feature matrix...
Round 0, Test MSE: 0.1189
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.86it/s]


Round 0, Test MSE: 0.1206
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.94it/s]


Round 1, Test MSE: 0.1315
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.73it/s]


Round 1, Test MSE: 0.1347
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.03it/s]


Final MSE: 0.1375
learning feature matrix...
Final MSE: 0.1409
learning feature matrix...
Round 0, Test MSE: 0.1214
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.96it/s]


Round 0, Test MSE: 0.1196
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.59it/s]


Round 1, Test MSE: 0.1351
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.07it/s]


Round 1, Test MSE: 0.1331
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.88it/s]


Final MSE: 0.1409
learning feature matrix...
Final MSE: 0.1399
learning feature matrix...
Round 0, Test MSE: 0.1197
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.76it/s]


Round 0, Test MSE: 0.1189
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.42it/s]


Round 1, Test MSE: 0.1340
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.77it/s]


Round 1, Test MSE: 0.1315
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.77it/s]


Final MSE: 0.1404
learning feature matrix...
Final MSE: 0.1375
learning feature matrix...
Round 0, Test MSE: 0.1206
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.31it/s]


Round 0, Test MSE: 0.1214
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.75it/s]


Round 1, Test MSE: 0.1347
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.62it/s]


Round 1, Test MSE: 0.1351
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.65it/s]


Final MSE: 0.1409
learning feature matrix...
Final MSE: 0.1409
learning feature matrix...
Round 0, Test MSE: 0.1196
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.59it/s]


Round 0, Test MSE: 0.1197
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.76it/s]


Round 1, Test MSE: 0.1331
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.14it/s]


Round 1, Test MSE: 0.1340
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.36it/s]


Final MSE: 0.1399
learning feature matrix...
Final MSE: 0.1404
learning feature matrix...
Round 0, Test MSE: 0.1189
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.19it/s]


Round 0, Test MSE: 0.1206
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.50it/s]


Round 1, Test MSE: 0.1315
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.80it/s]


Round 1, Test MSE: 0.1347
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  3.93it/s]


Final MSE: 0.1375
learning feature matrix...
Final MSE: 0.1409
learning feature matrix...
Round 0, Test MSE: 0.1214
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.69it/s]


Round 0, Test MSE: 0.1196
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.84it/s]


Round 1, Test MSE: 0.1351
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.10it/s]


Round 1, Test MSE: 0.1331
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.03it/s]


Final MSE: 0.1409
learning feature matrix...
Final MSE: 0.1399
learning feature matrix...
Round 0, Test MSE: 0.1197
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.43it/s]


Round 0, Test MSE: 0.1189
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.62it/s]


Round 1, Test MSE: 0.1340
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.00it/s]


Round 1, Test MSE: 0.1315
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.80it/s]


Final MSE: 0.1404
learning feature matrix...
Final MSE: 0.1375
learning feature matrix...
Round 0, Test MSE: 0.1206
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.43it/s]


Round 0, Test MSE: 0.1214
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.44it/s]


Round 1, Test MSE: 0.1347
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.61it/s]


Round 1, Test MSE: 0.1351
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.44it/s]


Final MSE: 0.1409
learning feature matrix...
Final MSE: 0.1409
learning feature matrix...
Round 0, Test MSE: 0.1196
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.17it/s]


Round 0, Test MSE: 0.1197
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.77it/s]


Round 1, Test MSE: 0.1331
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.11it/s]


Round 1, Test MSE: 0.1340
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.44it/s]


Final MSE: 0.1399
learning feature matrix...
Final MSE: 0.1404
learning feature matrix...
Round 0, Test MSE: 0.1189
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.26it/s]


Round 0, Test MSE: 0.1206
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.05it/s]


Round 1, Test MSE: 0.1315
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.11it/s]


Round 1, Test MSE: 0.1347
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.68it/s]


Final MSE: 0.1375
learning feature matrix...
Final MSE: 0.1409
learning feature matrix...
Round 0, Test MSE: 0.1214
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.83it/s]


Round 0, Test MSE: 0.1196
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.73it/s]


Round 1, Test MSE: 0.1351
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.54it/s]


Round 1, Test MSE: 0.1331
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.77it/s]


Final MSE: 0.1409
learning feature matrix...
Final MSE: 0.1399
learning feature matrix...
Round 0, Test MSE: 0.1197
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.25it/s]


Round 0, Test MSE: 0.1189
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.79it/s]


Round 1, Test MSE: 0.1340
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  5.32it/s]


Round 1, Test MSE: 0.1315
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.72it/s]


Final MSE: 0.1404
learning feature matrix...
Final MSE: 0.1375
learning feature matrix...
Round 0, Test MSE: 0.1206
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.41it/s]


Round 0, Test MSE: 0.1214
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.52it/s]


Round 1, Test MSE: 0.1347
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.22it/s]


Round 1, Test MSE: 0.1351
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.91it/s]


Final MSE: 0.1409
learning feature matrix...
Final MSE: 0.1409
learning feature matrix...
Round 0, Test MSE: 0.1196
Using batch size of 5056


100%|██████████| 3/3 [00:00<00:00,  4.54it/s]


Round 0, Test MSE: 0.1197
Using batch size of 5056


 67%|██████▋   | 2/3 [00:00<00:00,  3.80it/s]

KeyboardInterrupt: 

In [82]:
results

[{'dataset': 'housing',
  'model': 'full_gauss',
  'cv_results':    mean_fit_time  std_fit_time  mean_score_time  std_score_time param_alpha  \
  0      20.412423      0.840643         0.472376        0.015221           1   
  1      18.134334      2.392218         0.437903        0.021440         0.1   
  2      16.807481      1.331455         0.470624        0.036863        0.01   
  3      16.490436      1.135387         0.463899        0.084647       0.001   
  4      15.251296      1.385214         0.419785        0.016398      0.0001   
  
    param_sigma                          params  split0_test_score  \
  0          10       {'alpha': 1, 'sigma': 10}           0.841308   
  1          10     {'alpha': 0.1, 'sigma': 10}           0.852211   
  2          10    {'alpha': 0.01, 'sigma': 10}           0.857056   
  3          10   {'alpha': 0.001, 'sigma': 10}           0.860388   
  4          10  {'alpha': 0.0001, 'sigma': 10}           0.864022   
  
     split1_test_score  s

In [None]:
# Save results with pickle
if save:
    import pickle
    filename = dataset + ('_cv' if use_cross_validation else '') # '_'.join(['toy', housing])
    pickle_file = filename + '.p'
    print(pickle_file)

    with open(pickle_file, 'wb') as f:
        pickle.dump(results, f)

## Plot Results

In [83]:
import plotly.colors as colors
import seaborn as sns

from functools import reduce
from operator import concat

### Varying variable (e.g., kernel choice)

In [84]:
def plot_results(varying_variable, varying_variable_values, scale='linear'):
    # row_subplot_titles = ["Test score vs n"], #, "Test Neg-MSE vs n"] #, "Train time vs n", "Predict time vs n"]
    row_subplot_titles = ["Test Score", "Val Score", "Train Score"]

    fig = make_subplots(
        rows=len(row_subplot_titles),
        cols=len(varying_variable_values),
        shared_yaxes=True,
        subplot_titles=reduce(concat, [[f'{varying_variable}={v}' for v in varying_variable_values] for _ in row_subplot_titles]),
        vertical_spacing=0.1,
    )
    model_names = [model_name.split('_')[0] for model_name in model_configs.keys()]
    colors_list = colors.qualitative.Plotly * (
        len(model_names) // len(colors.qualitative.Plotly) + 1
    )
    colors_used = set()

    def plot_vs_n(print_name, attr_name, vvv, r, c, is_better='higher', scale='log2'):
        """
        Args:
        - vvv: varying variable value
        """
        
        for result in results:
            model_name = result["model"]
            name_components = model_name.split('_') # E.g., Kernel-Thin_rbf -> Kernel-Thin, rbf
            if len(name_components) == 2:
                model_name_prefix, vv_name = name_components
                m = '0'
            else:
                model_name_prefix, m, vv_name = name_components        
            best_params = result["best_params_"]

            if vv_name != vvv:
                continue

            color = colors_list[model_names.index(model_name_prefix)]

            if scale == 'log2':
                y = np.log2(np.abs(result[attr_name]))
            elif scale == 'linear':
                y = np.abs(result[attr_name])

            trace = go.Box(
                x=[result['dataset']]*len(result[attr_name]),
                y=y,
                name=model_name_prefix,
                # opacity=0.5,
                legendgroup=model_name_prefix,
                line_color=color,
                offsetgroup=model_name_prefix,
                showlegend=color not in colors_used,
                boxmean=True,
            )

            fig.add_trace(trace, row=r, col=c)
            colors_used.add(color)

        if c == 1: fig.update_yaxes(title_text=f"{scale}({print_name}) - {is_better} is better", row=r, col=c)
        fig.update_xaxes(title_text="dataset", row=r, col=c)
        fig.update_layout(boxmode='group')

    def plot_test_score_vs_n(vvv, r, c, scale):
        plot_vs_n(f"Test MSE", "test_scores", vvv, r, c, is_better='lower', scale=scale)

    def plot_val_score_vs_n(vvv, r, c, scale):
        plot_vs_n(f"Val MSE", "val_scores", vvv, r, c, is_better='lower', scale=scale)
    def plot_train_score_vs_n(vvv, r, c, scale):
        plot_vs_n(f"Train MSE", "train_scores", vvv, r, c, is_better='lower', scale=scale)

    for c, vvv in enumerate(varying_variable_values):
        plot_test_score_vs_n(str(vvv), 1, c+1, scale=scale)
        plot_val_score_vs_n(str(vvv), 2, c+1, scale=scale)
        plot_train_score_vs_n(str(vvv), 3, c+1, scale=scale)

    return fig

In [85]:
fig = plot_results(varying_variable, varying_variable_values, scale='linear')
fig.update_layout(
    legend=dict(traceorder="normal", borderwidth=1),
    title=dict(x=0.5, text=f"Evaluation for {varying_variable} in {varying_variable_values}"), # \
            #    f"sigma {param_grid['sigma']} / alpha {param_grid['alpha']}"),
    width=800,
    height=1000,
)
fig.show()
if save:
    fig_file = filename + '.png'
    print(fig_file)
    fig.write_image(fig_file)

In [86]:
fig = plot_results(varying_variable, varying_variable_values, scale='log2')
fig.update_layout(
    legend=dict(traceorder="normal", borderwidth=1),
    title=dict(x=0.5, text=f"Evaluation for {varying_variable} in {varying_variable_values}"), # \
            #    f"sigma {param_grid['sigma']} / alpha {param_grid['alpha']}"),
    width=800,
    height=1000,
)
fig.show()
if save:
    fig_file = filename + '_log2.png'
    print(fig_file)
    fig.write_image(fig_file)

### Generalization / Overfitting

In [None]:
def plot_results_overfitting(varying_variable, varying_variable_values, scale='linear'):
    col_subplot_titles = ["Test Score", "Val Score", "Train Score", ]

    fig = make_subplots(
        rows=len(varying_variable_values),
        cols=len(col_subplot_titles),
        shared_yaxes=True,
        subplot_titles=col_subplot_titles + [None,] * len(varying_variable_values),
        vertical_spacing=0.1,
    )
    model_names = [model_name.split('_')[0] for model_name in model_configs.keys()]
    colors_list = colors.qualitative.Plotly * (
        len(model_names) // len(colors.qualitative.Plotly) + 1
    )
    colors_used = set()

    def plot(print_name, attr_name, vvv, r, c, is_better='higher', scale='log2'):
        """
        Args:
        - vvv: varying variable value
        """
        
        for result in results:
            model_name = result["model"]
            name_components = model_name.split('_') # E.g., Kernel-Thin_rbf -> Kernel-Thin, rbf
            if len(name_components) == 2:
                model_name_prefix, vv_name = name_components
                m = '0'
            else:
                model_name_prefix, m, vv_name = name_components        
            best_params = result["best_params_"]

            if vv_name != vvv:
                continue

            color = colors_list[model_names.index(model_name_prefix)]

            if scale == 'log2':
                y = np.log2(np.abs(result[attr_name]))
            elif scale == 'linear':
                y = np.abs(result[attr_name])

            trace = go.Box(
                x=[result['dataset']]*len(result[attr_name]),
                y=y,
                name=model_name_prefix,
                # opacity=0.5,
                legendgroup=model_name_prefix,
                line_color=color,
                offsetgroup=model_name_prefix,
                showlegend=color not in colors_used,
                boxmean=True,
            )

            fig.add_trace(trace, row=r, col=c)
            colors_used.add(color)

        if c == 1: fig.update_yaxes(title_text=f"{varying_variable}={vvv}", row=r, col=c)
        fig.update_xaxes(title_text="dataset", row=r, col=c)
        fig.update_layout(boxmode='group')

    def plot_test_score(vvv, r, c, scale):
        plot(f"Test MSE", "test_scores", vvv, r, c, is_better='lower', scale=scale)
    def plot_val_score(vvv, r, c, scale):
        plot(f"Val MSE", "val_scores", vvv, r, c, is_better='lower', scale=scale)
    def plot_train_score(vvv, r, c, scale):
        plot(f"Train MSE", "train_scores", vvv, r, c, is_better='lower', scale=scale)

    for r, vvv in enumerate(varying_variable_values):
        plot_test_score(str(vvv), r+1, 1, scale=scale)
        plot_val_score(str(vvv), r+1, 2, scale=scale)
        plot_train_score(str(vvv), r+1, 3, scale=scale)

    return fig

In [None]:
fig = plot_results_overfitting(varying_variable, varying_variable_values, scale='linear')
fig.update_layout(
    legend=dict(traceorder="normal", borderwidth=1),
    title=dict(x=0.5, text=f"Evaluation for {varying_variable} in {varying_variable_values}" \
            #    f"sigma {param_grid['sigma']} / alpha {param_grid['alpha']}"
               "<br>scale: linear"
               ),
    width=1000,
    height=600,
)
fig.show()

In [None]:
fig = plot_results_overfitting(varying_variable, varying_variable_values, scale='log2')
fig.update_layout(
    legend=dict(traceorder="normal", borderwidth=1),
    title=dict(x=0.5, text=f"Evaluation for {varying_variable} in {varying_variable_values}" \
            #    f"sigma {param_grid['sigma']} / alpha {param_grid['alpha']}"
               "<br>scale: log2"
               ),
    width=1000,
    height=600,
)
fig.show()


divide by zero encountered in log2

