dependent(ordered factor) variable is coming as a feature when doing mRMR.ensemble #10

akbaramed · 2017-11-24T02:29:23Z

Hello,

I have a classification problem , I have a combination of ordered columns and numeric fields as my independent variables. My dependent variable is a ordered factor in binary format (Yes/no)[1/2]

I have tried using the example of
TitanicDF using the link https://amunategui.github.io/variable-importance-shuffler/ where all the variables are numeric which works fine.
CGPS using this link https://cran.r-project.org/web/packages/mRMRe/vignettes/mRMRe.pdf where the dependent variable cgps.ic50 is coming up with the other selected features, is it the expected result ?.

My problem is similar to the CGPS Dataset, here is the final feature selection (variableImportance) the dependent variable is also coming up, the position of the dependent variable is at index 1 (first line of code) and feature count is 30 including the dependent variable To counter this I have written a line of code highlighted in yellow. My question is , it the right approach ?

Then If I append the dependent variable as the last column of the dataset and in target_indicies give ncol(dataset) and then run the below code I get all the variables of the dataset (97 columns in count minus the dependent variable) the features count does not stop at 30. So then what I do is to filter the features (line highlighted in gray). If this is the right approach can you please suggest a threshold to filter the features with like 0.5 etc.

ds_fe <- data.frame( CLASS=ds_fe$CLASS, ds_fe[-(1:7)])
data <- mRMR.data(data = ds_fe)
feature_new <- mRMR.ensemble(data = data, target_indices = c(1), feature_count = 30, solution_count = 5)
variableImportance <-data.frame('importance'= feature_new@mi_matrix[nrow(feature_new@mi_matrix),])
variableImportance$feature <- rownames(variableImportance)
row.names(variableImportance) <- NULL
variableImportance <- na.omit(variableImportance)
variableImportance <- variableImportance[variableImportance$feature != "CLASS",] ### removing the dependent variable
variableImportance <- variableImportance[variableImportance$importance > 0.1 | variableImportance$importance < -0.1, ]
print(variableImportance)

spapillon · 2018-01-17T19:03:50Z

It seems that your problem stems from the
data.frame('importance'= feature_new@mi_matrix[nrow(feature_new@mi_matrix),])

Here you are selecting the last row of the mutual information (mi) matrix. See the code below which will extract the MI between the selected features and the independent variable.

selected_features = unique(as.vector(solutions(feature_new)[[1]]))
variableImportance = data.frame(importance = feature_new@mi_matrix[selected_features,1])

akbaramed · 2018-01-31T09:40:55Z

Thanks for the correction. Closing query

akbaramed closed this as completed Jan 31, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dependent(ordered factor) variable is coming as a feature when doing mRMR.ensemble #10

dependent(ordered factor) variable is coming as a feature when doing mRMR.ensemble #10

akbaramed commented Nov 24, 2017

spapillon commented Jan 17, 2018

akbaramed commented Jan 31, 2018

dependent(ordered factor) variable is coming as a feature when doing mRMR.ensemble #10

dependent(ordered factor) variable is coming as a feature when doing mRMR.ensemble #10

Comments

akbaramed commented Nov 24, 2017

spapillon commented Jan 17, 2018

akbaramed commented Jan 31, 2018