Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entering factors for classification #99

Closed
Nathaniel-Haines opened this issue May 5, 2017 · 3 comments
Closed

Entering factors for classification #99

Nathaniel-Haines opened this issue May 5, 2017 · 3 comments
Assignees
Milestone

Comments

@Nathaniel-Haines
Copy link
Member

I have realized that when factors are entered into easy_glmnet as the outcome variable, the "Replicating Metrics" part of the model fitting just sits at 0% complete. That said, if the factors are converted to 0 and 1 (i.e. as.integer(outcome_variable)), then things work fine. It may make things easier for users if this conversion is done automatically?

@paulhendricks
Copy link
Contributor

paulhendricks commented May 26, 2017

Here's a minrex for this issue as of 4573549. We can use this to write a unit test and submit a patch.

library(easyml) # https://github.com/CCS-Lab/easyml

# Load data
data("cocaine_dependence", package = "easyml")

cocaine_dependence$diagnosis <- factor(cocaine_dependence$diagnosis)

results <- easy_support_vector_machine(cocaine_dependence, "diagnosis",
                                       family = "binomial", preprocess = preprocess_scale,
                                       exclude_variables = c("subject"),
                                       categorical_variables = c("male"),
                                       n_samples = .n_samples, n_divisions = .n_divisions,
                                       n_iterations = .n_iterations, random_state = 12345, n_core = .n_core)
[1] "Generating predictions for a single train test split:"
  |==================================================| 100% elapsed =00s, remaining ~00s
 Show Traceback
 
 Rerun with Debug
 Error in stats::cor(y_true, y_pred) : 'x' must be numeric 
> 

@paulhendricks paulhendricks self-assigned this Jul 8, 2017
@paulhendricks paulhendricks added this to the 0.1.0 milestone Jul 8, 2017
@paulhendricks
Copy link
Contributor

Proposal for solution within core:

  # Set dependent variable
  y <- set_dependent_variable(.data, dependent_variable)
  
  # Process dependent variable
  if (family == "binomial") {
    # Check that dependent variable has two classes
    if (length(unique(y)) != 2) {
      stop("Error! Dependent variable must have two classes!")
    }
    
    # Check if dependent variable is a factor 
    if (is.factor(y)) {
      y <- as.numeric(y) - 1
    }
  }
  
  # Capture dependent variable
  object[["y"]] <- y

Will test and merge if it works.

@paulhendricks
Copy link
Contributor

See c9c9184.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants