Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
AnalyzeSpark.getUnique for more than a single feature column #71
Currently getUnique only handles a single column per call.
For larger quantities of data with multiple columns where unique values would be nice to know it'd be handy to be able to specify multiple columns in the method call.
The static method could be something like .getUniques(List columnNames, Schema schema,
with a return type HashMap<String, List> where the String is the column name.