Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] nan value when complexity measure is calculated #131

Closed
Yeoonsu opened this issue Jul 14, 2023 · 2 comments
Closed

[BUG] nan value when complexity measure is calculated #131

Yeoonsu opened this issue Jul 14, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@Yeoonsu
Copy link

Yeoonsu commented Jul 14, 2023

Describe the bug
Hello. It's the first time to write 'Issues' tab. (I'm a fresh github user)
If I'm not proper in github format and English writing, please understand.

The problem is,
when I calculate complexity measure, I usually get 'nan' value randomly. I have been through a few weeks, and read the code with some papers, but I can't guess why.
I replace the nan value with 0, but I'm worried there's no exact logic for me.
what is the difference between nan and 0.0? Do you think is it okay to this replacement(nan to 0)?

Could you give me any advice?
Thank you.

Screenshots
If applicable, add screenshots to help explain your problem.
image

Additional context
Thank you for creating pymfe. It is big helpful to my first research.

@Yeoonsu Yeoonsu added the bug Something isn't working label Jul 14, 2023
@FelSiq
Copy link
Collaborator

FelSiq commented Jul 14, 2023

Hi @Yeoonsu, thanks for your feedback.

When "NaN" values are generated, it typically occurs when either meta-feature extraction methods or summary functions fail to compute. In your case, it appears that the standard deviation (sd) returned "NaN" because there was only a single value available for calculation, since the mean values appear to be all valid. This behavior is expected and not considered a bug.

It is difficult to tell you what you should do, because the preferable approach might depends on the nature of your analysis. You can set the missing values to an obvious invalid value (e.g. -1.0 to the standard deviation), or fill it using mean/median values from other datasets. Each method for filling missing data has its advantages and downsides, and there is no universally "correct" answer.

@FelSiq
Copy link
Collaborator

FelSiq commented Jul 20, 2023

I'm closing this issue. Please fell free to reopen it if necessary.

Best regards,
Felipe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants