Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of missing values in GenerateReport #16

Closed
djhurio opened this issue Mar 4, 2016 · 6 comments
Closed

Better handling of missing values in GenerateReport #16

djhurio opened this issue Mar 4, 2016 · 6 comments

Comments

@djhurio
Copy link

@djhurio djhurio commented Mar 4, 2016

The example looks very nice. I am trying to run it on my own data, but I am getting the following error:

label: correlation_continuous
Quitting from lines 51-52 (report.rmd) 
Error in seq.default(from = best$lmin, to = best$lmax, by = best$lstep) : 
  'from' must be of length 1
@boxuancui
Copy link
Owner

@boxuancui boxuancui commented Mar 4, 2016

Could you run PlotMissing function first and see if certain features are mostly NA? If so, that could be the reason.

To quick fix this, I would remove those features and run GenerateReport again.

I plan to add some missing value scanning before plotting. Please confirm this is the actual cause and I will make use of this issue as the enhancement.

@djhurio
Copy link
Author

@djhurio djhurio commented Mar 7, 2016

Yes, I confirm this. Removing variables with NA rate more then 50% removed the first error. But now I have stopped on the next error:

label: correlation_discrete
Quitting from lines 63-64 (report.rmd) 
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels
@boxuancui
Copy link
Owner

@boxuancui boxuancui commented Mar 7, 2016

It is probably because you have some problematic discrete features too. Could you update the package to the latest develop branch? I have pushed some bug fixes and your issues should be addressed. Please let me know otherwise.

if (!require(devtools)) install.packages("devtools")
library(devtools)
install_github("boxuancui/DataExplorer", ref="develop")
@boxuancui boxuancui added the type: bug label Mar 7, 2016
@boxuancui boxuancui self-assigned this Mar 7, 2016
@boxuancui boxuancui changed the title Error in seq.default Better handling of missing values in GenerateReport Mar 7, 2016
@boxuancui boxuancui added this to the 0.2.6 milestone Mar 9, 2016
@boxuancui boxuancui removed the type: bug label Mar 10, 2016
@djhurio
Copy link
Author

@djhurio djhurio commented Mar 14, 2016

I have installed development version. I am not getting errors any more. Report is generated with a warning:

Warning message:
In writeLines(if (encoding == "") res else native_encode(res, to = encoding),  :
  invalid char string in output conversion

And report is unreadable.

@boxuancui
Copy link
Owner

@boxuancui boxuancui commented Mar 14, 2016

I believe it is due to non-ASCII characters in the data. I have created #19 to address this. For now, it is inherited from default rmarkdown settings.

@boxuancui
Copy link
Owner

@boxuancui boxuancui commented Mar 14, 2016

I will close this ticket since it is a bug about missing values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.