Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numpy arrays and Unknown label type: 'continuous' #16

Closed
clementpoiret opened this issue May 22, 2020 · 5 comments
Closed

Numpy arrays and Unknown label type: 'continuous' #16

clementpoiret opened this issue May 22, 2020 · 5 comments
Assignees

Comments

@clementpoiret
Copy link

clementpoiret commented May 22, 2020

Hi,
I'm quickly experimenting by implementing ppscore in my pipeline for the assessment of functional connectivity between brain regions, and I noticed two things:
1/ I think we should be able to use pps.matrix() even on a 2D numpy array when we don't have explicit column names: as of now, it is raising the error AttributeError: 'numpy.ndarray' object has no attribute 'columns'
2/ I got a strange error telling me that "continuous" is an unknown label. File "/home/clementpoiret/anaconda3/envs/nilearn/lib/python3.8/site-packages/sklearn/utils/multiclass.py", line 172, in check_classification_targets raise ValueError("Unknown label type: %r" % y_type) ValueError: Unknown label type: 'continuous'
Code to reproduce the error:

import numpy as np
import pandas as pd

X = pd.DataFrame(np.random.randn(10,10))
pps.matrix(X)

The error is solved by passing task='regression'. I have sklearn 0.23.0
Maybe an additional comment: maybe that the diagonal of the resulting matrix should be 1, because it makes sense that the predictive power of a vector on itself is 1, no?

@8080labs
Copy link
Owner

Hi Clement,

thank you for your suggestions.

  1. I think that makes sense and we should be able to add this easily to the API.
  2. I think we already saw a similar error which occurs when the value is a float but the task is a regression. this can also be fixed via changing the dtype of the series.
  3. That makes totally sense that the diagonal should be 1 and this should also be the case. In which example of yours was the diagonal not 1?

Thank you,
Florian

@clementpoiret
Copy link
Author

Hi Florian,

Thanks for your answer. It's occuring when I use the pps on timeseries extracted from an fMRI.
But it's also occurring with the code in the original post, for example I just ran:

import numpy as np
import pandas as pd
import ppscore as pps

X = pd.DataFrame(np.random.randn(10,10))
pps.matrix(X, task='regression')

and it returned me the following matrix:

>>> pps.matrix(X, task='regression')
          0        1         2  3  4         5       6  7  8  9
0  0.000000  0.00000  0.000000  0  0  0.000000  0.0000  0  0  0
1  0.000000  0.00000  0.000000  0  0  0.000000  0.0000  0  0  0
2  0.000000  0.00000  0.000000  0  0  0.000000  0.0000  0  0  0
3  0.000000  0.00000  0.000000  0  0  0.000000  0.0000  0  0  0
4  0.000000  0.00000  0.000000  0  0  0.000000  0.0000  0  0  0
5  0.000000  0.00000  0.000000  0  0  0.000000  0.0000  0  0  0
6  0.000000  0.00000  0.000000  0  0  0.085524  0.0000  0  0  0
7  0.000000  0.29528  0.000000  0  0  0.000000  0.0422  0  0  0
8  0.255183  0.00000  0.000000  0  0  0.000000  0.0000  0  0  0
9  0.000000  0.00000  0.027208  0  0  0.000000  0.0000  0  0  0

@8080labs
Copy link
Owner

Thank you for the example. When passing a task to the matrix, this bypasses the logic for the diagonal

I would love to see your example with the timeseries data in case that it is not under an NDA. If you want, we could have a quick video session about it

Florian

@clementpoiret
Copy link
Author

Sorry for the delay,
I have some deadlines soon with the end of my MSc and the beginning of my PhD so I don't have a lot of free time, but I'd be happy to discuss about the potential benefits of the pps in neuroimaging!
If you want to take a look, here is the repo of the script where I added support for pps : https://github.com/clementpoiret/fmri_connectivity_measures

@FlorianWetschoreck FlorianWetschoreck self-assigned this Jul 13, 2020
@FlorianWetschoreck
Copy link
Collaborator

To summarize this issue:

  • (DON'T) support numpy matrices - which we will not provide - fix is to convert it to a Dataframe with pd.DataFrame(matrix)
  • there shall be no error for numeric targets - this happened because the series was inferred as a classification task but the model expected a LabelEncoded series. This won't happen in the future because the task will be derived only based on the dtype.

If you want to discuss the pps in neuroimaging, please open a new issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants