Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
[SPARK-11569] [ML] Fix StringIndexer to handle null value properly #9920
I was having some problem with rebase on #9709, so I had to close that PR and creating a new pull request with my latest fix.
Thanks to @jkbradley and @holdenk for your comments. I have updated my fix so that it will allow user to config either to filter out null values or throw an error with StringIndexer.setHandleInvalid("skip") API. The default is StringIndexer.setHandleInvalid("error").
Please let me know what you think. Thanks again!
referenced this pull request
Nov 23, 2015
Sorry for my slow reply - looking at this it seems like you've updated the meaning of handleInvalid - it no longer serves its original purposes (unless I've missed something). This is probably not quite the best path forward - maybe something for handleNulls and keep the old handle invalid?
I really like the thoroughness of the tests & I think the logic is pretty solid (just changing the meaning of things in the API is to avoided).