Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dependent(ordered factor) variable is coming as a feature when doing mRMR.ensemble #10

Closed
akbaramed opened this issue Nov 24, 2017 · 2 comments

Comments

@akbaramed
Copy link

Hello,

I have a classification problem , I have a combination of ordered columns and numeric fields as my independent variables. My dependent variable is a ordered factor in binary format (Yes/no)[1/2]

I have tried using the example of
TitanicDF using the link https://amunategui.github.io/variable-importance-shuffler/ where all the variables are numeric which works fine.
CGPS using this link https://cran.r-project.org/web/packages/mRMRe/vignettes/mRMRe.pdf where the dependent variable cgps.ic50 is coming up with the other selected features, is it the expected result ?.

My problem is similar to the CGPS Dataset, here is the final feature selection (variableImportance) the dependent variable is also coming up, the position of the dependent variable is at index 1 (first line of code) and feature count is 30 including the dependent variable To counter this I have written a line of code highlighted in yellow. My question is , it the right approach ?

Then If I append the dependent variable as the last column of the dataset and in target_indicies give ncol(dataset) and then run the below code I get all the variables of the dataset (97 columns in count minus the dependent variable) the features count does not stop at 30. So then what I do is to filter the features (line highlighted in gray). If this is the right approach can you please suggest a threshold to filter the features with like 0.5 etc.

ds_fe <- data.frame( CLASS=ds_fe$CLASS, ds_fe[-(1:7)])
data <- mRMR.data(data = ds_fe)
feature_new <- mRMR.ensemble(data = data, target_indices = c(1), feature_count = 30, solution_count = 5)
variableImportance <-data.frame('importance'= feature_new@mi_matrix[nrow(feature_new@mi_matrix),])
variableImportance$feature <- rownames(variableImportance)
row.names(variableImportance) <- NULL
variableImportance <- na.omit(variableImportance)
variableImportance <- variableImportance[variableImportance$feature != "CLASS",] ### removing the dependent variable
variableImportance <- variableImportance[variableImportance$importance > 0.1 | variableImportance$importance < -0.1, ]
print(variableImportance)

@spapillon
Copy link
Contributor

It seems that your problem stems from the
data.frame('importance'= feature_new@mi_matrix[nrow(feature_new@mi_matrix),])

Here you are selecting the last row of the mutual information (mi) matrix. See the code below which will extract the MI between the selected features and the independent variable.

selected_features = unique(as.vector(solutions(feature_new)[[1]]))
variableImportance = data.frame(importance = feature_new@mi_matrix[selected_features,1])

@akbaramed
Copy link
Author

Thanks for the correction. Closing query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants