[SPARK-21633][ML][Python] UnaryTransformer in Python #18746

ajaysaini725 · 2017-07-27T05:06:09Z

What changes were proposed in this pull request?

Implemented UnaryTransformer in Python.

How was this patch tested?

This patch was tested by creating a MockUnaryTransformer class in the unit tests that extends UnaryTransformer and testing that the transform function produced correct output.

ajaysaini725 · 2017-07-27T05:06:36Z

@jkbradley @thunterdb @MrBago Could you please review this?

SparkQA · 2017-07-27T05:24:18Z

Test build #79988 has finished for PR 18746 at commit 960de95.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-07-27T06:54:18Z

Test build #79991 has finished for PR 18746 at commit 11f8f29.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

WeichenXu123

I suggest add a python example code for UnaryTramsformer in python. Like the scala example MyTransformer.

WeichenXu123 · 2017-08-01T04:38:36Z

python/pyspark/ml/base.py

+        return StructType(outputFields)
+
+    def transform(self, dataset, paramMap=None):
+        transformSchema(dataset.schema())


Here seems exist some problem.
The transform provide paramMap, but createTransformFunc has no way to get the passed in paramMap, here lost something I think.
Because custom UnaryTransformer will only need to override the createTransformFunc, the base class need to handle the passed in paramMap properly.

Right, I accidentally overrode transform instead of _transform. Fixed!

WeichenXu123 · 2017-08-01T04:47:18Z

python/pyspark/ml/base.py

+    def transform(self, dataset, paramMap=None):
+        transformSchema(dataset.schema())
+        transformUDF = udf(self.createTransformFunc(), self.outputDataType())
+        dataset.withColumn(self.getOutputCol(), transformUDF(self.getInputCol()))


The udf need first parameter to be a function, but here why you pass in the return value of self.createTransformFunc ?

self.createTransformFunc returns a function which is passed to the udf so in this case I think it is okay

SparkQA · 2017-08-03T01:08:28Z

Test build #80183 has finished for PR 18746 at commit 692aa5d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2017-08-03T01:10:46Z

@ajaysaini725 Is there a JIRA for this PR? Please tag this PR in the title.

jkbradley · 2017-08-03T01:11:14Z

Also, you can remove "implemented" from the title. & update the description now that you have tests, please

jkbradley

OK done with review pass. Thanks for the PR!

jkbradley · 2017-08-03T18:21:14Z

python/pyspark/ml/base.py

+@inherit_doc
+class UnaryTransformer(HasInputCol, HasOutputCol, Transformer):
+    """
+    Abstract class for transformers that tae one input column, apply a transoformation to it,


Actually multiple typos. Why not just copy the text from Scala?

jkbradley · 2017-08-03T18:23:50Z

python/pyspark/ml/base.py

+    @abstractmethod
+    def createTransformFunc(self):
+        """
+        Creates the transoform function using the given param map.


Please use the IntelliJ spellcheck feature

jkbradley · 2017-08-03T18:27:39Z

python/pyspark/ml/base.py

+
+    def _transform(self, dataset):
+        self.transformSchema(dataset.schema)
+        transformFunc = self.createTransformFunc()


jkbradley · 2017-08-03T20:58:52Z

python/pyspark/ml/tests.py

+        df = df.withColumn("input", df.input.cast(dataType="double"))
+
+        transformed_df = transformer.transform(df)
+        output = transformed_df.select("output").collect()


It's better practice to select both input & output and collect both for comparison, rather than relying on DataFrame rows maintaining their order.

jkbradley · 2017-08-03T21:00:26Z

python/pyspark/ml/tests.py

@@ -1957,6 +1987,24 @@ def test_chisquaretest(self):
        self.assertTrue(all(field in fieldNames for field in expectedFields))


+class UnaryTransformerTests(SparkSessionTestCase):
+
+    def test_unary_transformer_transform(self):


Could you also please test validateInputType?

SparkQA · 2017-08-03T23:56:53Z

Test build #80225 has finished for PR 18746 at commit 527bc88.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2017-08-04T00:57:58Z

@ajaysaini725 Is there a JIRA for this PR? Please tag this PR in the title.

jkbradley

Just 1 comment left!

jkbradley · 2017-08-04T01:09:28Z

python/pyspark/ml/tests.py

+        df = df.withColumn("input", df.input.cast(dataType="double"))
+
+        transformed_df = transformer.transform(df)
+        inputCol = transformed_df.select("input").collect()


Do this instead:

results = transformed_df.select("input", "output").collect() for res in results: self.assertEqual(res.input + shiftVal, res.output)

SparkQA · 2017-08-04T02:34:22Z

Test build #80228 has finished for PR 18746 at commit a30ae39.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2017-08-04T08:00:32Z

LGTM
Merging with master
Thanks @ajaysaini725 !

ajaysaini725 added 4 commits July 10, 2017 14:06

Started

10ab7bc

Started

4fe9420

Implemented Unary Transforme in Python.

7d25c70

Fixed merge conflict.

960de95

ajaysaini725 changed the title ~~Implemented UnaryTransformer in Python~~ [ML][Python]Implemented UnaryTransformer in Python Jul 27, 2017

ajaysaini725 changed the title ~~[ML][Python]Implemented UnaryTransformer in Python~~ [ML][Python] Implemented UnaryTransformer in Python Jul 27, 2017

Fixed small issue with None being returned.

11f8f29

WeichenXu123 reviewed Aug 1, 2017

View reviewed changes

ajaysaini725 added 2 commits August 1, 2017 20:23

Some progress on testing

0eed7c3

Added test for unary transformer.

692aa5d

ajaysaini725 changed the title ~~[ML][Python] Implemented UnaryTransformer in Python~~ [ML][Python] UnaryTransformer in Python Aug 3, 2017

jkbradley reviewed Aug 3, 2017

View reviewed changes

Fixed based on pull request comments.

527bc88

jkbradley reviewed Aug 4, 2017

View reviewed changes

Fixed test based on comments

a30ae39

ajaysaini725 changed the title ~~[ML][Python] UnaryTransformer in Python~~ [SPARK-21633][ML][Python] UnaryTransformer in Python Aug 4, 2017

asfgit closed this in 1347b2a Aug 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21633][ML][Python] UnaryTransformer in Python #18746

[SPARK-21633][ML][Python] UnaryTransformer in Python #18746

ajaysaini725 commented Jul 27, 2017 •

edited

ajaysaini725 commented Jul 27, 2017

SparkQA commented Jul 27, 2017

SparkQA commented Jul 27, 2017

WeichenXu123 left a comment

WeichenXu123 Aug 1, 2017

ajaysaini725 Aug 2, 2017

WeichenXu123 Aug 1, 2017

ajaysaini725 Aug 2, 2017

SparkQA commented Aug 3, 2017

jkbradley commented Aug 3, 2017

jkbradley commented Aug 3, 2017 •

edited

jkbradley left a comment

jkbradley Aug 3, 2017

jkbradley Aug 3, 2017

jkbradley Aug 3, 2017

jkbradley Aug 3, 2017

jkbradley Aug 3, 2017

jkbradley Aug 3, 2017

SparkQA commented Aug 3, 2017

jkbradley commented Aug 4, 2017

jkbradley left a comment

jkbradley Aug 4, 2017

SparkQA commented Aug 4, 2017

jkbradley commented Aug 4, 2017

[SPARK-21633][ML][Python] UnaryTransformer in Python #18746

[SPARK-21633][ML][Python] UnaryTransformer in Python #18746

Conversation

ajaysaini725 commented Jul 27, 2017 • edited

What changes were proposed in this pull request?

How was this patch tested?

ajaysaini725 commented Jul 27, 2017

SparkQA commented Jul 27, 2017

SparkQA commented Jul 27, 2017

WeichenXu123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 3, 2017

jkbradley commented Aug 3, 2017

jkbradley commented Aug 3, 2017 • edited

jkbradley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 3, 2017

jkbradley commented Aug 4, 2017

jkbradley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Aug 4, 2017

jkbradley commented Aug 4, 2017

ajaysaini725 commented Jul 27, 2017 •

edited

jkbradley commented Aug 3, 2017 •

edited