MultiOutputClassifier does not rely on estimator to provide pairwise tag #29016

Alex-Xenos · 2024-05-14T10:55:28Z

Describe the bug

I use the MultiOutputClassifier function to make SVC multilabel.

Then, if I use the linear or rbf kernel the cross_validation function works perfectly fine.

However, when I use SVC with precomputed kernel is having an ValueError: Precomputed matrix must be a square matrix.

Steps/Code to Reproduce

import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import pairwise_distances
from sklearn.multioutput import MultiOutputClassifier
from sklearn.model_selection import cross_val_score, cross_validate

svm = SVC(kernel='precomputed', C=100, random_state=42)
multilabel_classifier = MultiOutputClassifier(svm, n_jobs=-1)

X = np.random.rand(1000, 1000)
y = np.random.randint(0, 2, size=(1000, 6))

kernel_eucl = pairwise_distances(X, metric='euclidean')

cross_validate(
    multilabel_classifier, kernel_eucl, y, cv=10, scoring='f1_weighted', n_jobs=-1
)

Expected Results

An weighted f1-score.

Actual Results

ValueError: 
All the 10 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:

 File "C:\Users\bscuser\anaconda3\Lib\site-packages\sklearn\svm\_base.py", line 217, in fit
    raise ValueError(
ValueError: Precomputed matrix must be a square matrix. Input is a 900x1000 matrix.

Versions

System:
    python: 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\bscuser\anaconda3\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.2.2
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.26.4
        scipy: 1.11.4
       Cython: None
       pandas: 2.1.4
   matplotlib: 3.8.0
       joblib: 1.2.0
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       filepath: C:\Users\bscuser\anaconda3\Library\bin\mkl_rt.2.dll
         prefix: mkl_rt
       user_api: blas
   internal_api: mkl
        version: 2023.1-Product
    num_threads: 4
threading_layer: intel

       filepath: C:\Users\bscuser\anaconda3\vcomp140.dll
         prefix: vcomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 4

       filepath: C:\Users\bscuser\anaconda3\Library\bin\libiomp5md.dll
         prefix: libiomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 4

The text was updated successfully, but these errors were encountered:

glemaitre · 2024-05-15T17:54:59Z

It seems that the issue boils to down to not setting properly the pairwise tag for the meta-estimator. So we could solve the issue by including the following in the dictionary returned by _more_tags:

"pairwise": _safe_tags(self.estimator, key="pairwise"),

It delegates the pairwise feature to the underlying estimator. I assume we have the same bug for the MultiOutputRegressor and maybe other meta-estimators in this module.

@Alex-Xenos Do you wish to solve this bug and make a pull-request including the fix and the non-regression tests?

glemaitre · 2024-05-16T09:32:03Z

Actually, we should probably come with a common test to be sure that all meta-estimators are relying on the underlying estimator to set this tag.

glemaitre · 2024-05-31T19:16:53Z

Please don't provide an LLM suggestion that does not answer to the question. Here we need to work a common test and not just a non-regression specifically for the estimator.

Alex-Xenos added Bug Needs Triage Issue requires triage labels May 14, 2024

glemaitre removed the Needs Triage Issue requires triage label May 15, 2024

glemaitre changed the title ~~Cross_validation/cross_val_score does not work with precomputed kernel in Multilabel SVC~~ MultiOutputClassifier does not rely on estimator to provide pairwise tag May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiOutputClassifier does not rely on estimator to provide pairwise tag #29016

MultiOutputClassifier does not rely on estimator to provide pairwise tag #29016

Alex-Xenos commented May 14, 2024 •

edited by glemaitre

glemaitre commented May 15, 2024

glemaitre commented May 16, 2024

glemaitre commented May 31, 2024

MultiOutputClassifier does not rely on estimator to provide pairwise tag #29016

MultiOutputClassifier does not rely on estimator to provide pairwise tag #29016

Comments

Alex-Xenos commented May 14, 2024 • edited by glemaitre

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

glemaitre commented May 15, 2024

glemaitre commented May 16, 2024

glemaitre commented May 31, 2024

Alex-Xenos commented May 14, 2024 •

edited by glemaitre