Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiOutputClassifier does not rely on estimator to provide pairwise tag #29016

Open
Alex-Xenos opened this issue May 14, 2024 · 3 comments
Open
Labels

Comments

@Alex-Xenos
Copy link

Alex-Xenos commented May 14, 2024

Describe the bug

I use the MultiOutputClassifier function to make SVC multilabel.

Then, if I use the linear or rbf kernel the cross_validation function works perfectly fine.

However, when I use SVC with precomputed kernel is having an ValueError: Precomputed matrix must be a square matrix.

Steps/Code to Reproduce

import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import pairwise_distances
from sklearn.multioutput import MultiOutputClassifier
from sklearn.model_selection import cross_val_score, cross_validate

svm = SVC(kernel='precomputed', C=100, random_state=42)
multilabel_classifier = MultiOutputClassifier(svm, n_jobs=-1)

X = np.random.rand(1000, 1000)
y = np.random.randint(0, 2, size=(1000, 6))

kernel_eucl = pairwise_distances(X, metric='euclidean')

cross_validate(
    multilabel_classifier, kernel_eucl, y, cv=10, scoring='f1_weighted', n_jobs=-1
)

Expected Results

An weighted f1-score.

Actual Results

ValueError: 
All the 10 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:

 File "C:\Users\bscuser\anaconda3\Lib\site-packages\sklearn\svm\_base.py", line 217, in fit
    raise ValueError(
ValueError: Precomputed matrix must be a square matrix. Input is a 900x1000 matrix.

Versions

System:
    python: 3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023, 18:05:47) [MSC v.1916 64 bit (AMD64)]
executable: C:\Users\bscuser\anaconda3\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.2.2
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.26.4
        scipy: 1.11.4
       Cython: None
       pandas: 2.1.4
   matplotlib: 3.8.0
       joblib: 1.2.0
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       filepath: C:\Users\bscuser\anaconda3\Library\bin\mkl_rt.2.dll
         prefix: mkl_rt
       user_api: blas
   internal_api: mkl
        version: 2023.1-Product
    num_threads: 4
threading_layer: intel

       filepath: C:\Users\bscuser\anaconda3\vcomp140.dll
         prefix: vcomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 4

       filepath: C:\Users\bscuser\anaconda3\Library\bin\libiomp5md.dll
         prefix: libiomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 4
@Alex-Xenos Alex-Xenos added Bug Needs Triage Issue requires triage labels May 14, 2024
@glemaitre
Copy link
Member

It seems that the issue boils to down to not setting properly the pairwise tag for the meta-estimator. So we could solve the issue by including the following in the dictionary returned by _more_tags:

"pairwise": _safe_tags(self.estimator, key="pairwise"),

It delegates the pairwise feature to the underlying estimator. I assume we have the same bug for the MultiOutputRegressor and maybe other meta-estimators in this module.

@Alex-Xenos Do you wish to solve this bug and make a pull-request including the fix and the non-regression tests?

@glemaitre glemaitre removed the Needs Triage Issue requires triage label May 15, 2024
@glemaitre glemaitre changed the title Cross_validation/cross_val_score does not work with precomputed kernel in Multilabel SVC MultiOutputClassifier does not rely on estimator to provide pairwise tag May 16, 2024
@glemaitre
Copy link
Member

Actually, we should probably come with a common test to be sure that all meta-estimators are relying on the underlying estimator to set this tag.

@glemaitre
Copy link
Member

Please don't provide an LLM suggestion that does not answer to the question. Here we need to work a common test and not just a non-regression specifically for the estimator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Moderate
Development

No branches or pull requests

2 participants