You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a classification problem , I have a combination of ordered columns and numeric fields as my independent variables. My dependent variable is a ordered factor in binary format (Yes/no)[1/2]
My problem is similar to the CGPS Dataset, here is the final feature selection (variableImportance) the dependent variable is also coming up, the position of the dependent variable is at index 1 (first line of code) and feature count is 30 including the dependent variable To counter this I have written a line of code highlighted in yellow. My question is , it the right approach ?
Then If I append the dependent variable as the last column of the dataset and in target_indicies give ncol(dataset) and then run the below code I get all the variables of the dataset (97 columns in count minus the dependent variable) the features count does not stop at 30. So then what I do is to filter the features (line highlighted in gray). If this is the right approach can you please suggest a threshold to filter the features with like 0.5 etc.
It seems that your problem stems from the data.frame('importance'= feature_new@mi_matrix[nrow(feature_new@mi_matrix),])
Here you are selecting the last row of the mutual information (mi) matrix. See the code below which will extract the MI between the selected features and the independent variable.
Hello,
I have a classification problem , I have a combination of ordered columns and numeric fields as my independent variables. My dependent variable is a ordered factor in binary format (Yes/no)[1/2]
I have tried using the example of
TitanicDF using the link https://amunategui.github.io/variable-importance-shuffler/ where all the variables are numeric which works fine.
CGPS using this link https://cran.r-project.org/web/packages/mRMRe/vignettes/mRMRe.pdf where the dependent variable cgps.ic50 is coming up with the other selected features, is it the expected result ?.
My problem is similar to the CGPS Dataset, here is the final feature selection (variableImportance) the dependent variable is also coming up, the position of the dependent variable is at index 1 (first line of code) and feature count is 30 including the dependent variable To counter this I have written a line of code highlighted in yellow. My question is , it the right approach ?
Then If I append the dependent variable as the last column of the dataset and in target_indicies give ncol(dataset) and then run the below code I get all the variables of the dataset (97 columns in count minus the dependent variable) the features count does not stop at 30. So then what I do is to filter the features (line highlighted in gray). If this is the right approach can you please suggest a threshold to filter the features with like 0.5 etc.
ds_fe <- data.frame( CLASS=ds_fe$CLASS, ds_fe[-(1:7)])
data <- mRMR.data(data = ds_fe)
feature_new <- mRMR.ensemble(data = data, target_indices = c(1), feature_count = 30, solution_count = 5)
variableImportance <-data.frame('importance'= feature_new@mi_matrix[nrow(feature_new@mi_matrix),])
variableImportance$feature <- rownames(variableImportance)
row.names(variableImportance) <- NULL
variableImportance <- na.omit(variableImportance)
variableImportance <- variableImportance[variableImportance$feature != "CLASS",] ### removing the dependent variable
variableImportance <- variableImportance[variableImportance$importance > 0.1 | variableImportance$importance < -0.1, ]
print(variableImportance)
The text was updated successfully, but these errors were encountered: