This repository has been archived by the owner. It is now read-only.

AnalyzeSpark.getUnique for more than a single feature column #71

Closed
gffde3 opened this Issue Oct 12, 2016 · 2 comments

Comments

Projects
None yet
4 participants
@gffde3
Copy link

gffde3 commented Oct 12, 2016

Currently getUnique only handles a single column per call.

For larger quantities of data with multiple columns where unique values would be nice to know it'd be handy to be able to specify multiple columns in the method call.

The static method could be something like .getUniques(List columnNames, Schema schema,
JavaRDD<java.util.List> data)

with a return type HashMap<String, List> where the String is the column name.

@raver119 raver119 added the ETL label Apr 29, 2018

@raver119

This comment has been minimized.

Copy link
Contributor

raver119 commented Apr 29, 2018

Is this issue still viable?

@AlexDBlack

This comment has been minimized.

Copy link
Member

AlexDBlack commented Apr 30, 2018

Yes, might as well implement this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.