[SPARK-13610][ML] Create a Transformer to disassemble vectors in Data… #16486

leonfl · 2017-01-06T07:43:15Z

JIRA Issue: https://issues.apache.org/jira/browse/SPARK-13610

What changes were proposed in this pull request?

Add a VectorDisassembler used for disassemble the vector field to single fields.

How was this patch tested?

Unit tests have added into ml for this feature.

…Frames

srowen · 2017-01-06T09:21:09Z

I don't think this is worth adding. It's pretty easy to pull out a single fiedl from a vector already.

leonfl · 2017-01-06T12:27:15Z

It's a method like VectorAssembler, which make user easy to handle single fields and vector field.
Pull out a single field is easy, but for all single fields in a vector, it still need some code by users.

Our business use disassemble transform a lot, it need always handle by write some code, this Transformer will make user easy to understand and use, right?

leonfl · 2017-01-06T15:52:47Z

@mengxr, could you help to check this patch? Thanks

leonfl · 2017-01-09T07:31:13Z

@jkbradley, Could you also help to check this patch cause you are familiar with this defect, Thanks.

mrjrdnthms · 2017-04-19T22:37:01Z

I could use this. I have udf to pick out single values I want but my implementation is slow: here is my python udf:
probTrue_udf = udf(lambda value: value[1].item(), FloatType())
I was hoping there would be a lower level api that did the disassemble transformation quickly.

leonfl · 2017-04-24T08:13:01Z

@mrjrdnthms , this is implemented by UDF, which will run a little bit slower, but easy to use.
If you want it run faster, you can implement it using mappatition and row iterator instead of udf.
That implementation will reduce the running time a lot.

mrjrdnthms · 2017-04-30T17:39:35Z

@leonfl The python udf is too slow for my task. By "mappatition and row iterator" do you mean doing the transformation on the RDD directly instead of the dataframe? Sorry for the basic question. I am new to spark. And thanks for help.

leonfl · 2017-05-02T02:48:12Z

@mrjrdnthms ,Yes, your understand is correct, in scala it like this:

    val rows: RDD[Row] = df.rdd.map(
      rowIn => {
        // handle the rowIn and return a Row
      }
    )
    val newDF = df.sqlContext.createDataFrame(rows, /*create the newDF schema*/)

AmplabJenkins · 2017-12-14T21:05:08Z

Can one of the admins verify this patch?

AlbertPlaPlanas · 2018-08-02T15:35:05Z

Was this ever implemented?

HarborZeng · 2020-08-18T03:34:07Z

such a great transformer, don't understand why they chose to ingore this patch.

diegoxfx · 2021-03-23T13:55:01Z

I don't think this is worth adding. It's pretty easy to pull out a single fiedl from a vector already.

It is not possible to retrieve a single element from VectorAssembler, it's only possible to retrieve a subset of the array, but it is still an array the element

[SPARK-13610][ML] Create a Transformer to disassemble vectors in Data…

3076617

…Frames

add shapefile of china

8b5319e

leonfl closed this Dec 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13610][ML] Create a Transformer to disassemble vectors in Data… #16486

[SPARK-13610][ML] Create a Transformer to disassemble vectors in Data… #16486

leonfl commented Jan 6, 2017

srowen commented Jan 6, 2017

leonfl commented Jan 6, 2017 •

edited

leonfl commented Jan 6, 2017

leonfl commented Jan 9, 2017

mrjrdnthms commented Apr 19, 2017

leonfl commented Apr 24, 2017

mrjrdnthms commented Apr 30, 2017

leonfl commented May 2, 2017

AmplabJenkins commented Dec 14, 2017

AlbertPlaPlanas commented Aug 2, 2018

HarborZeng commented Aug 18, 2020

diegoxfx commented Mar 23, 2021

[SPARK-13610][ML] Create a Transformer to disassemble vectors in Data… #16486

[SPARK-13610][ML] Create a Transformer to disassemble vectors in Data… #16486

Conversation

leonfl commented Jan 6, 2017

What changes were proposed in this pull request?

How was this patch tested?

srowen commented Jan 6, 2017

leonfl commented Jan 6, 2017 • edited

leonfl commented Jan 6, 2017

leonfl commented Jan 9, 2017

mrjrdnthms commented Apr 19, 2017

leonfl commented Apr 24, 2017

mrjrdnthms commented Apr 30, 2017

leonfl commented May 2, 2017

AmplabJenkins commented Dec 14, 2017

AlbertPlaPlanas commented Aug 2, 2018

HarborZeng commented Aug 18, 2020

diegoxfx commented Mar 23, 2021

leonfl commented Jan 6, 2017 •

edited