This repository has been archived by the owner. It is now read-only.

AnalyzeSpark.getUnique for more than a single feature column #71

gffde3 opened this Issue Oct 12, 2016 · 2 comments


None yet
4 participants
Copy link

gffde3 commented Oct 12, 2016

Currently getUnique only handles a single column per call.

For larger quantities of data with multiple columns where unique values would be nice to know it'd be handy to be able to specify multiple columns in the method call.

The static method could be something like .getUniques(List columnNames, Schema schema,
JavaRDD<java.util.List> data)

with a return type HashMap<String, List> where the String is the column name.

@raver119 raver119 added the ETL label Apr 29, 2018


This comment has been minimized.

Copy link

raver119 commented Apr 29, 2018

Is this issue still viable?


This comment has been minimized.

Copy link

AlexDBlack commented Apr 30, 2018

Yes, might as well implement this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.