[SPARK-28131][PYTHON] Update document type conversion between Python data and SQL types in normal UDFs (Python 3.7) #24929

HyukjinKwon · 2019-06-21T06:29:12Z

What changes were proposed in this pull request?

This PR updates the chart generated at SPARK-25666. We deprecated Python 2. It's better to use Python 3.

We don't have to test unicode and long anymore in Python 3. So it was removed.

Use this code to generate the chart:

import sys
import array
import datetime
from decimal import Decimal

from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.sql.functions import udf

data = [
    None,
    True,
    1,
    "a",
    datetime.date(1970, 1, 1),
    datetime.datetime(1970, 1, 1, 0, 0),
    1.0,
    array.array("i", [1]),
    [1],
    (1,),
    bytearray([65, 66, 67]),
    Decimal(1),
    {"a": 1},
    Row(kwargs=1),
    Row("namedtuple")(1),
]

types =  [
    BooleanType(),
    ByteType(),
    ShortType(),
    IntegerType(),
    LongType(),
    StringType(),
    DateType(),
    TimestampType(),
    FloatType(),
    DoubleType(),
    ArrayType(IntegerType()),
    BinaryType(),
    DecimalType(10, 0),
    MapType(StringType(), IntegerType()),
    StructType([StructField("_1", IntegerType())]),
]


df = spark.range(1)
results = []
count = 0
total = len(types) * len(data)
spark.sparkContext.setLogLevel("FATAL")
for t in types:
    result = []
    for v in data:
        try:
            row = df.select(udf(lambda: v, t)()).first()
            ret_str = repr(row[0])
        except Exception:
            ret_str = "X"
        result.append(ret_str)
        progress = "SQL Type: [%s]\n  Python Value: [%s(%s)]\n  Result Python Value: [%s]" % (
            t.simpleString(), str(v), type(v).__name__, ret_str)
        count += 1
        print("%s/%s:\n  %s" % (count, total, progress))
    results.append([t.simpleString()] + list(map(str, result)))

schema = ["SQL Type \\ Python Value(Type)"] + list(map(lambda v: "%s(%s)" % (str(v), type(v).__name__), data))
strings = spark.createDataFrame(results, schema=schema)._jdf.showString(20, 20, False)
print("\n".join(map(lambda line: "    # %s  # noqa" % line, strings.strip().split("\n"))))

How was this patch tested?

Manually.

…normal UDFs (Python 3.7)

HyukjinKwon · 2019-06-21T06:43:23Z

cc @BryanCutler and @dongjoon-hyun

SparkQA · 2019-06-21T06:58:26Z

Test build #106751 has finished for PR 24929 at commit 4ce268a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-06-21T07:05:02Z

Test build #106753 has finished for PR 24929 at commit 59a4374.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2019-06-21T07:11:20Z

retest this please

SparkQA · 2019-06-21T07:44:25Z

Test build #106757 has finished for PR 24929 at commit 59a4374.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

BryanCutler

LGTM

BryanCutler · 2019-06-21T17:28:18Z

merged to master, thanks @HyukjinKwon !

HyukjinKwon · 2019-06-22T02:02:54Z

Thanks @BryanCutler !

…data and SQL types in normal UDFs (Python 3.7) ## What changes were proposed in this pull request? This PR updates the chart generated at SPARK-25666. We deprecated Python 2. It's better to use Python 3. We don't have to test `unicode` and `long` anymore in Python 3. So it was removed. Use this code to generate the chart: ```python import sys import array import datetime from decimal import Decimal from pyspark.sql import Row from pyspark.sql.types import * from pyspark.sql.functions import udf data = [ None, True, 1, "a", datetime.date(1970, 1, 1), datetime.datetime(1970, 1, 1, 0, 0), 1.0, array.array("i", [1]), [1], (1,), bytearray([65, 66, 67]), Decimal(1), {"a": 1}, Row(kwargs=1), Row("namedtuple")(1), ] types = [ BooleanType(), ByteType(), ShortType(), IntegerType(), LongType(), StringType(), DateType(), TimestampType(), FloatType(), DoubleType(), ArrayType(IntegerType()), BinaryType(), DecimalType(10, 0), MapType(StringType(), IntegerType()), StructType([StructField("_1", IntegerType())]), ] df = spark.range(1) results = [] count = 0 total = len(types) * len(data) spark.sparkContext.setLogLevel("FATAL") for t in types: result = [] for v in data: try: row = df.select(udf(lambda: v, t)()).first() ret_str = repr(row[0]) except Exception: ret_str = "X" result.append(ret_str) progress = "SQL Type: [%s]\n Python Value: [%s(%s)]\n Result Python Value: [%s]" % ( t.simpleString(), str(v), type(v).__name__, ret_str) count += 1 print("%s/%s:\n %s" % (count, total, progress)) results.append([t.simpleString()] + list(map(str, result))) schema = ["SQL Type \\ Python Value(Type)"] + list(map(lambda v: "%s(%s)" % (str(v), type(v).__name__), data)) strings = spark.createDataFrame(results, schema=schema)._jdf.showString(20, 20, False) print("\n".join(map(lambda line: " # %s # noqa" % line, strings.strip().split("\n")))) ``` ## How was this patch tested? Manually. Closes apache#24929 from HyukjinKwon/SPARK-28131. Lead-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Bryan Cutler <cutlerb@gmail.com>

HyukjinKwon added 2 commits June 21, 2019 15:26

Update document type conversion between Python data and SQL types in …

4ce268a

…normal UDFs (Python 3.7)

typo

59a4374

BryanCutler approved these changes Jun 21, 2019

View reviewed changes

BryanCutler closed this in 9b9d81b Jun 21, 2019

dongjoon-hyun added the SQL label Feb 5, 2020

HyukjinKwon deleted the SPARK-28131 branch March 3, 2020 01:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-28131][PYTHON] Update document type conversion between Python data and SQL types in normal UDFs (Python 3.7) #24929

[SPARK-28131][PYTHON] Update document type conversion between Python data and SQL types in normal UDFs (Python 3.7) #24929

HyukjinKwon commented Jun 21, 2019

HyukjinKwon commented Jun 21, 2019

SparkQA commented Jun 21, 2019

SparkQA commented Jun 21, 2019

HyukjinKwon commented Jun 21, 2019

SparkQA commented Jun 21, 2019

BryanCutler left a comment

BryanCutler commented Jun 21, 2019

HyukjinKwon commented Jun 22, 2019

[SPARK-28131][PYTHON] Update document type conversion between Python data and SQL types in normal UDFs (Python 3.7) #24929

[SPARK-28131][PYTHON] Update document type conversion between Python data and SQL types in normal UDFs (Python 3.7) #24929

Conversation

HyukjinKwon commented Jun 21, 2019

What changes were proposed in this pull request?

How was this patch tested?

HyukjinKwon commented Jun 21, 2019

SparkQA commented Jun 21, 2019

SparkQA commented Jun 21, 2019

HyukjinKwon commented Jun 21, 2019

SparkQA commented Jun 21, 2019

BryanCutler left a comment

Choose a reason for hiding this comment

BryanCutler commented Jun 21, 2019

HyukjinKwon commented Jun 22, 2019