Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dudi.mix fails when ordered variable only has two levels #10

Closed
CeresBarros opened this issue Mar 15, 2018 · 4 comments
Closed

dudi.mix fails when ordered variable only has two levels #10

CeresBarros opened this issue Mar 15, 2018 · 4 comments

Comments

@CeresBarros
Copy link

Hi everyone,

I'm trying to run dudi.mix on a data set that has a mix of categorical (factor and ordered variables) and continuous variables and I've noticed that it can't compute when an ordered variable only has two levels.

I believe the problem is in the internal floc function, where apply(w, 2, sum) fails because there is only one column in this particular case (and thus w is no longer a matrix, but a vector).
I first thought that perhaps we should always have more than two levels (if that is the case, would it possible to add some checks to dudi.mix?). But then I noticed that prior to the dudi object X being created, the function does deal with the case of only having two levels (see if (deg.poly == 1) cha <- names(df))

I'm not sure what the best way of dealing with this is, so I thought it best to post an issue here. Here's a reproducible example:

library(ade4)
data(dunedata)

## remove one level of the ordered factor, so that only two are present
dunedata$envir$use <- as.character(dunedata$envir$use)
dunedata$envir$use[dunedata$envir$use == "grazing"] <- c(rep("both", 3), 
                                                   rep("hayfield", 2))
dunedata$envir$use <- as.ordered(dunedata$envir$use)
dudi.mix(dunedata$envir, scannf = FALSE, nf = ncol(dunedata$envir))

Any help is much appreciated!
Ceres Barros

@CeresBarros
Copy link
Author

CeresBarros commented Mar 15, 2018

Not entirely related to the issue, but I also noted that dudi.mix is using nlevels() to count the number of levels present in an ordered variable, which is causing issues when the dataset used for the ordination is a subset of a larger dataset - the number of levels defined for the variable can be larger than the actual
number of values in the data.

When this is the case, nlevels() causes computations to fail (in dudi.mix) with the error:

Error in poly(w, deg.poly) : 
  'degree' must be less than number of unique points

So I suggest replacing nlevels() with length(unique()) :
3be8b03

Let me know if you'd like me to do a pull request :)

Cheers,
Ceres Barros

@sdray
Copy link
Collaborator

sdray commented Mar 16, 2018

Hi Ceres,

Thanks for the report. I just correct the bug for your first point. Considering the second point, we consider that data should be cleaned before the analysis (for instance by using droplevels(df[1:5])).

We could do this checking in ade4 in the future. But this requires to do it for all functions that takes factors as arguments (I open issue #11) for this point.

Cheers !

@sdray sdray closed this as completed Mar 16, 2018
@CeresBarros
Copy link
Author

Thanks for your prompt answer and for correcting the said bug :)
I see. I think I'll do those checks in my fork (I'll try to do it for all functions) and, if you want, I can do a pull request.

Cheers,
Ceres

@sdray
Copy link
Collaborator

sdray commented Mar 16, 2018

Yes ! Thanks Ceres.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants