[SPARK-33480][SQL] Support char/varchar type #30412

cloud-fan · 2020-11-18T16:26:36Z

What changes were proposed in this pull request?

This PR adds the char/varchar type which is kind of a variant of string type:

Char type is fixed-length string. When comparing char type values, we need to pad the shorter one to the longer length.
Varchar type is string with a length limitation.

To implement the char/varchar semantic, this PR:

Do string length check when writing to char/varchar type columns.
Do string padding when reading char type columns. We don't do it at the writing side to save storage space.
Do string padding when comparing char type column with string literal or another char type column. (string literal is fixed length so should be treated as char type as well)

To simplify the implementation, this PR doesn't propagate char/varchar type info through functions/operators(e.g. substring). That said, a column can only be char/varchar type if it's a table column, not a derived column like SELECT substring(col).

To be safe, this PR doesn't add char/varchar type to the query engine(expression input check, internal row framework, codegen framework, etc.). We will replace char/varchar type by string type with metadata (Attribute.metadata or StructField.metadata) that includes the original type string before it goes into the query engine. That said, the existing code will not see char/varchar type but only string type.

char/varchar type may come from several places:

v1 table from hive catalog.
v2 table from v2 catalog.
user-specified schema in spark.read.schema and spark.readStream.schema
Column.cast
schema string in places like from_json, pandas UDF, etc. These places use SQL parser which replaces char/varchar with string already, even before this PR.

This PR covers all the above cases, implements the length check and padding feature by looking at string type with special metadata.

Why are the changes needed?

char and varchar are standard SQL types. varchar is widely used in other databases instead of string type.

Does this PR introduce any user-facing change?

For hive tables: now the table insertion fails if the value exceeds char/varchar length. Previously we truncate the value silently.

For other tables:

now char type is allowed.
now we have length check when inserting to varchar columns. Previously we write the value as it is.

How was this patch tested?

new tests

cloud-fan · 2020-11-18T16:30:30Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-          v2Write.withNewQuery(projection)
+          val cleanedTable = v2Write.table match {
+            case r: DataSourceV2Relation =>
+              r.copy(output = r.output.map(CharVarcharUtils.cleanAttrMetadata))


We remove the char/varchar metadata after length check expressions are added, so that we don't do it repeatedly and this rule is idempotent.

Does the current implementation assume the analyzer removes the metadata in plans before the optimizer phase? If so, how about checking if plans don't have the metadata in CheckAnalysis?

No it doesn't. Metadata is fine as it's harmless. We only need to watch out for some specific rules that look at the char/varchar metadata, and make sure they are idempotent.

As a fact, the added cast and length check expression is wrapped by an Alias which retains char/varchar metadata. So the output attributes of Project above the v2 relation still have metadata. It's necessary as we need to rely on it later to do padding for char type column comparison.

cloud-fan · 2020-11-18T16:33:27Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ApplyCharTypePadding.scala

+        if (projectList == r.output) {
+          r -> Nil
+        } else {
+          val cleanedOutput = r.output.map(CharVarcharUtils.cleanAttrMetadata)


same here, we remove the char/varchar metadata to make this rule idempotent.

cloud-fan · 2020-11-18T16:35:03Z

cc @dongjoon-hyun @viirya @maropu

cloud-fan · 2020-11-18T16:48:21Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala

@@ -1000,18 +995,10 @@ private[hive] object HiveClientImpl extends Logging {
  /** Builds the native StructField from Hive's FieldSchema. */
  def fromHiveColumn(hc: FieldSchema): StructField = {
    val columnType = getSparkSQLDataType(hc)
-    val replacedVoidType = HiveVoidType.replaceVoidType(columnType)


This comes from 339eec5#diff-45c9b065d76b237bcfecda83b8ee08c1ff6592d6f85acca09c0fa01472e056afR1009 but is actually useless. The test still pass.

dongjoon-hyun · 2020-11-18T16:58:10Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/CharType.scala

+import org.apache.spark.unsafe.types.UTF8String
+
+@Experimental
+case class CharType(length: Int) extends AtomicType {


Shall we add SPARK-6412 (Add Char support in dataTypes) into the PR title as a secondary JIRA?

SPARK-6412 was resolved as Later.

The PR description says To be safe, this PR doesn't add char/varchar type to the core type system., but this adds CharType into org.apache.spark.sql.types. Do you revise a little the PR description? It seems that you wanted to mention the SQL execution parts.

It's already there: https://github.com/apache/spark/pull/30412/files#diff-aacc5ed42589c636615a3c09a44fa6a5248195242c9fbe0e996db17471cd35fdL70

I just moved it to a new file and remove the parent class.

And type system means a lot more, e.g. expression type check, InternalRow.get, etc. Maybe I should say "type system framework"?

PR description updated

dongjoon-hyun · 2020-11-18T17:04:34Z

...lyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala

-    val output = table.schema().toAttributes
-    DataSourceV2Relation(table, output, catalog, identifier, options)
+    // The v2 source may return schema containing char/varchar type. We replace char/varchar
+    // with string type here as Spark's type system doesn't support char/varchar yet.


with string type -> with string type with metadata?

dongjoon-hyun · 2020-11-18T17:05:12Z

cc @wangyum and @MaxGekk , too.

SparkQA · 2020-11-18T17:20:24Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35895/

SparkQA · 2020-11-18T17:44:54Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35895/

SparkQA · 2020-11-18T21:21:07Z

Test build #131291 has finished for PR 30412 at commit a7f0730.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

maropu · 2020-11-19T05:24:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/VarcharType.scala

+import org.apache.spark.unsafe.types.UTF8String
+
+@Experimental
+case class VarcharType(length: Int) extends AtomicType {


How about checking if length has a valid number in the constructor of char/vchar?

if (length > 0) { throw new AnalysisException("XXX") }

Also, how about setting a reasonable max length for these types like postgresql does so?

postgres=# create table t (v char(100000000)); ERROR: length for type char cannot exceed 10485760 LINE 1: create table t (v char(100000000));

VarcharType was already there: https://github.com/apache/spark/pull/30412/files#diff-aacc5ed42589c636615a3c09a44fa6a5248195242c9fbe0e996db17471cd35fdL79

I think it's a good idea to check the negative value, but adding max length should be done separately.

maropu · 2020-11-19T05:49:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

-          v2Write.withNewQuery(projection)
+          val cleanedTable = v2Write.table match {
+            case r: DataSourceV2Relation =>
+              r.copy(output = r.output.map(CharVarcharUtils.cleanAttrMetadata))


Does the current implementation assume the analyzer removes the metadata in plans before the optimizer phase? If so, how about checking if plans don't have the metadata in CheckAnalysis?

maropu · 2020-11-19T06:02:30Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala

      }
+      val strLenChecked = CharVarcharUtils.stringLengthCheck(casted, tableAttr)


To avoid accidentally adding the length check exprs again, we cannot remove the metadata at the same time as adding the exprs?

nit: strLenChecked -> exprWithStrLenCheck?

To avoid accidentally adding the length check exprs again, we cannot remove the metadata at the same time as adding the exprs?

We can't access the table relation here, only table output attrs.

We can't access the table relation here, only table output attrs.

Oh, I see.

maropu · 2020-11-19T06:12:49Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala

+  private def stringLengthCheck(expr: Expression, dt: DataType): Expression = dt match {
+    case CharType(length) =>
+      val trimmed = StringTrimRight(expr)
+      val errorMsg = Concat(Seq(


nit: How about extract errorMsg as a private method?

private def raiseError(expr: Expression, typeName: String, length: Int): Expression = { val errorMsg = Concat(Seq( Literal("input string '"), expr, Literal(s"' exceeds $typeName type length limitation: $length"))) Cast(RaiseError(errorMsg), StringType) } private def stringLengthCheck(expr: Expression, dt: DataType): Expression = dt match { case CharType(length) => val trimmed = StringTrimRight(expr) // Trailing spaces do not count in the length check. We don't need to retain the trailing // spaces, as we will pad char type columns/fields at read time. If( GreaterThan(Length(trimmed), Literal(length)), raiseError(expr, "char", length), trimmed) ...

maropu · 2020-11-19T06:23:06Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala

+
+  // We replace char/varchar with string type in the table schema, as Spark's type system doesn't
+  // support char/varchar yet.
+  private def removeCharVarcharFromTableSchema(t: CatalogTable): CatalogTable = {


Could you inline this in L476?

SparkQA · 2020-11-19T14:49:36Z

Test build #131356 has finished for PR 30412 at commit 17d13e5.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

MaxGekk

@cloud-fan You should rebase on the master to see the errors #30412 (comment) locally. The check came from #30351

SparkQA · 2020-11-19T16:11:59Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35960/

SparkQA · 2020-11-19T16:36:20Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35960/

SparkQA · 2020-11-19T18:56:39Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35969/

SparkQA · 2020-11-19T19:25:09Z

Test build #131365 has finished for PR 30412 at commit c239834.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-19T19:28:04Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35969/

maropu

Do you have a plan to update the doc: https://github.com/apache/spark/blob/master/docs/sql-ref-datatypes.md ? It would be nice to describe something about char/varchar there, I think.

maropu · 2020-11-20T04:54:55Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala

+   * Re-construct the original StructType from the type strings in the metadata of StructFields.
+   * This is needed when dealing with char/varchar columns/fields.
+   */
+  def getRawSchema(schema: StructType): StructType = {


not used now?

maropu · 2020-11-20T04:59:46Z

sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala

+    }
+  }
+
+  test("char type comparison: nested in array of array") {


Could you add char comparison tests for a map case, too?

map type is not comparable.

Ah, I see. If so, why the function typeWithWiderCharLength has an entry for Map? https://github.com/apache/spark/pull/30412/files#diff-16753285e80505e04c445ea8ccee1dde7ae601ed7bae224e212d90d395f57928R225-R226
It seems the function is only used for addPaddingInStringComparison.

You are right, I should remove it.

maropu · 2020-11-20T05:19:59Z

sql/core/src/test/scala/org/apache/spark/sql/CharVarcharTestSuite.scala

+    withTable("t") {
+      sql(s"CREATE TABLE t(i STRING, c CHAR(5)) USING $format")
+      sql("INSERT INTO t VALUES ('1', 'a')")
+      checkAnswer(spark.table("t"), Row("1", "a" + " " * 4))


How about checking an output schema, too, in these tests?

scala> sql("CREATE TABLE t(i STRING, c CHAR(5)) USING parquet PARTITIONED BY (c)") scala> spark.table("t").printSchema root |-- i: string (nullable = true) |-- c: string (nullable = true) <---- this check

btw, how do users check a char length after defining a table? In pg, users can check a char length via some commands, e.g., \d;

postgres=# create table t (c char(5)); CREATE TABLE postgres=# \d t Table "public.t" Column | Type | Collation | Nullable | Default --------+--------------+-----------+----------+--------- c | character(5) | | |

We ensure that LocalPlan.output won't have char/varchar type and we have UT to verify it. It seems not necessary to check it again here.

For DDL command support, can we leave it to followup? This PR is already very big. I can create a ticket for it.

The followup looks fine.

maropu · 2020-11-20T05:24:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

@@ -94,6 +94,10 @@ trait CheckAnalysis extends PredicateHelper {

      case p if p.analyzed => // Skip already analyzed sub-plans

+      case p if p.output.map(_.dataType).exists(CharVarcharUtils.hasCharVarchar) =>
+        throw new IllegalStateException(
+          "[BUG] logical plan should not have output of char/varchar type: " + p)


In the case below, could we use AnalysisException instead?

scala> sql("""SELECT from_json("{'a': 'aaa'}", "a char(3)")""").printSchema() java.lang.IllegalStateException: [BUG] logical plan should not have output of char/varchar type: Project [from_json(StructField(a,CharType(3),true), {'a': 'aaa'}, Some(Asia/Tokyo)) AS from_json({'a': 'aaa'})#37] +- OneRowRelation

I choose IllegalStateException because this should not happen. Thanks for catching this corner case and I've fixed it in DataType.fromDDL.

SparkQA · 2020-11-23T18:33:19Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36163/

SparkQA · 2020-11-23T18:42:09Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36165/

SparkQA · 2020-11-23T18:58:33Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36163/

SparkQA · 2020-11-25T20:54:27Z

Test build #131786 has finished for PR 30412 at commit 8789494.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class SerdeInfo(
case class FormatClasses(input: String, output: String)
case class ShowColumns(
case class TruncateTable(

SparkQA · 2020-11-25T21:30:04Z

Test build #131803 has finished for PR 30412 at commit 9850d38.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-25T23:03:45Z

Test build #131797 has finished for PR 30412 at commit 662f297.

This patch fails SparkR unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-25T23:09:48Z

Test build #131798 has finished for PR 30412 at commit 8f05eee.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2020-11-26T07:40:47Z

docs/sql-ref-datatypes.md

@@ -37,6 +37,8 @@ Spark SQL and DataFrames support the following data types:
  - `DecimalType`: Represents arbitrary-precision signed decimal numbers. Backed internally by `java.math.BigDecimal`. A `BigDecimal` consists of an arbitrary precision integer unscaled value and a 32-bit integer scale.
 * String type
  - `StringType`: Represents character string values.
+  - `VarcharType(length)`: A variant of `StringType` which has a length limitation. Data writing will fail if the input string exceeds the length limitation. Note: this type can only be used in table schema, not functions/operators.
+  - `CharType(length)`: A variant of `VarcharType(length)` which is fixed length. Reading column of type `VarcharType(n)` always returns string values of length `n`. Char type column comparison will pad the short one to the longer length.


Reading column of type VarcharType(n) -> Reading column of type CharType(n)?

viirya · 2020-11-26T08:08:34Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CharVarcharUtils.scala

+  }
+
+  /**
+   * Returns an expression to apply write-side char type padding for the given expression. A string


char type padding -> char type checking?

viirya · 2020-11-26T08:19:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/VarcharType.scala

+
+@Experimental
+case class VarcharType(length: Int) extends AtomicType {
+  require(length >= 0, "The length if varchar type cannot be negative.")


viirya · 2020-11-26T08:19:44Z

sql/catalyst/src/main/scala/org/apache/spark/sql/types/CharType.scala

+
+@Experimental
+case class CharType(length: Int) extends AtomicType {
+  require(length >= 0, "The length if char type cannot be negative.")


viirya · 2020-11-26T08:23:44Z

sql/core/src/main/scala/org/apache/spark/sql/Column.scala

+  def cast(to: DataType): Column = withExpr {
+    Cast(expr, CharVarcharUtils.replaceCharVarcharWithString(to))
+  }


So we can do cast(CharType)? It actually casts to StringType? But don't we loss length info?

If you do col.cast("char(5)") before this PR, Spark already silently treats it as cast to string type. I don't want to change this behavior here. We can make it better later, by failing explicitly and saying that cast to char type is not supported.

SparkQA · 2020-11-26T08:33:40Z

Test build #131827 has finished for PR 30412 at commit 671471f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-11-27T07:07:53Z

okay, it looks good to me. Btw, this new feature will land in the v3.1 release? There are still some remaining works (e.g., #30412 (comment) and #30412 (comment)) though. cc: @HyukjinKwon because of a v3.1 release manager

cloud-fan · 2020-11-27T07:47:50Z

Not that, before this PR, people can already create varchar type column but it has no length check during write. People can already create char type column with hive table but it truncates input string during write. This PR doesn't make things worse.

SparkQA · 2020-11-27T08:00:26Z

Test build #131866 has finished for PR 30412 at commit 69adca5.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-27T12:58:10Z

Test build #131881 has finished for PR 30412 at commit 73b99dc.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-27T13:07:01Z

retest this please

SparkQA · 2020-11-27T13:07:14Z

Test build #131876 has finished for PR 30412 at commit 3bbe7e7.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-11-27T17:26:50Z

Test build #131884 has finished for PR 30412 at commit 73b99dc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-11-30T09:23:03Z

thanks for the review, merging to master!

dongjoon-hyun · 2020-11-30T16:43:18Z

Thank you, @cloud-fan and all!

…rovider for CREATE TABLE command ### What changes were proposed in this pull request? For CRETE TABLE [AS SELECT] command, creates native Parquet table if neither USING nor STORE AS is specified and `spark.sql.legacy.createHiveTableByDefault` is false. This is a retry after we unify the CREATE TABLE syntax. It partially reverts d2bec5e This PR allows `CREATE EXTERNAL TABLE` when `LOCATION` is present. This was not allowed for data source tables before, which is an unnecessary behavior different with hive tables. ### Why are the changes needed? Changing from Hive text table to native Parquet table has many benefits: 1. be consistent with `DataFrameWriter.saveAsTable`. 2. better performance 3. better support for nested types (Hive text table doesn't work well with nested types, e.g. `insert into t values struct(null)` actually inserts a null value not `struct(null)` if `t` is a Hive text table, which leads to wrong result) 4. better interoperability as Parquet is a more popular open file format. ### Does this PR introduce _any_ user-facing change? No by default. If the config is set, the behavior change is described below: Behavior-wise, the change is very small as the native Parquet table is also Hive-compatible. All the Spark DDL commands that works for hive tables also works for native Parquet tables, with two exceptions: `ALTER TABLE SET [SERDE | SERDEPROPERTIES]` and `LOAD DATA`. char/varchar behavior has been taken care by #30412, and there is no behavior difference between data source and hive tables. One potential issue is `CREATE TABLE ... LOCATION ...` while users want to directly access the files later. It's more like a corner case and the legacy config should be good enough. Another potential issue is users may use Spark to create the table and then use Hive to add partitions with different serde. This is not allowed for Spark native tables. ### How was this patch tested? Re-enable the tests Closes #30554 from cloud-fan/create-table. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…rovider for CREATE TABLE command ### What changes were proposed in this pull request? For CRETE TABLE [AS SELECT] command, creates native Parquet table if neither USING nor STORE AS is specified and `spark.sql.legacy.createHiveTableByDefault` is false. This is a retry after we unify the CREATE TABLE syntax. It partially reverts apache/spark@d2bec5e This PR allows `CREATE EXTERNAL TABLE` when `LOCATION` is present. This was not allowed for data source tables before, which is an unnecessary behavior different with hive tables. ### Why are the changes needed? Changing from Hive text table to native Parquet table has many benefits: 1. be consistent with `DataFrameWriter.saveAsTable`. 2. better performance 3. better support for nested types (Hive text table doesn't work well with nested types, e.g. `insert into t values struct(null)` actually inserts a null value not `struct(null)` if `t` is a Hive text table, which leads to wrong result) 4. better interoperability as Parquet is a more popular open file format. ### Does this PR introduce _any_ user-facing change? No by default. If the config is set, the behavior change is described below: Behavior-wise, the change is very small as the native Parquet table is also Hive-compatible. All the Spark DDL commands that works for hive tables also works for native Parquet tables, with two exceptions: `ALTER TABLE SET [SERDE | SERDEPROPERTIES]` and `LOAD DATA`. char/varchar behavior has been taken care by apache/spark#30412, and there is no behavior difference between data source and hive tables. One potential issue is `CREATE TABLE ... LOCATION ...` while users want to directly access the files later. It's more like a corner case and the legacy config should be good enough. Another potential issue is users may use Spark to create the table and then use Hive to add partitions with different serde. This is not allowed for Spark native tables. ### How was this patch tested? Re-enable the tests Closes #30554 from cloud-fan/create-table. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? This is a followup of #30412. This PR updates the error message of char/varchar table insertion length check, to not expose user data. ### Why are the changes needed? This is risky to expose user data in the error message, especially the string data, as it may contain sensitive data. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests Closes #30653 from cloud-fan/minor2. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This is a followup of #30412. This PR updates the error message of char/varchar table insertion length check, to not expose user data. ### Why are the changes needed? This is risky to expose user data in the error message, especially the string data, as it may contain sensitive data. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests Closes #30653 from cloud-fan/minor2. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit c0874ba) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

github-actions bot added SQL STRUCTURED STREAMING labels Nov 18, 2020

cloud-fan commented Nov 18, 2020

View reviewed changes

dongjoon-hyun reviewed Nov 18, 2020

View reviewed changes

HyukjinKwon changed the title ~~[SPARK-33480][SQL] support char/varchar type~~ [SPARK-33480][SQL] Support char/varchar type Nov 19, 2020

maropu reviewed Nov 19, 2020

View reviewed changes

cloud-fan force-pushed the char branch from a7f0730 to 17d13e5 Compare November 19, 2020 14:37

MaxGekk reviewed Nov 19, 2020

View reviewed changes

cloud-fan force-pushed the char branch from 17d13e5 to c239834 Compare November 19, 2020 18:04

maropu reviewed Nov 20, 2020

View reviewed changes

cloud-fan force-pushed the char branch from c239834 to b21157d Compare November 23, 2020 17:43

github-actions bot added the DOCS label Nov 23, 2020

cloud-fan force-pushed the char branch from b21157d to 6182921 Compare November 23, 2020 17:53

cloud-fan force-pushed the char branch from 8f05eee to 9850d38 Compare November 25, 2020 19:06

add more tests and fix bugs

671471f

cloud-fan force-pushed the char branch from 9850d38 to 671471f Compare November 26, 2020 03:27

viirya reviewed Nov 26, 2020

View reviewed changes

more fixes

69adca5

cloud-fan added 2 commits November 27, 2020 16:32

fix test

3bbe7e7

fix JDBC

73b99dc

maropu approved these changes Nov 27, 2020

View reviewed changes

cloud-fan closed this in 5cfbddd Nov 30, 2020

cloud-fan mentioned this pull request Dec 1, 2020

[SPARK-30098][SQL] Add a configuration to use default datasource as provider for CREATE TABLE command #30554

Closed

cloud-fan mentioned this pull request Dec 7, 2020

[SPARK-33480][SQL][FOLLOWUP] do not expose user data in error message #30653

Closed

		}
		val strLenChecked = CharVarcharUtils.stringLengthCheck(casted, tableAttr)

[SPARK-33480][SQL] Support char/varchar type #30412

[SPARK-33480][SQL] Support char/varchar type #30412

Conversation

cloud-fan commented Nov 18, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloud-fan commented Nov 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dongjoon-hyun commented Nov 18, 2020

SparkQA commented Nov 18, 2020

SparkQA commented Nov 18, 2020

SparkQA commented Nov 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 19, 2020

MaxGekk left a comment

Choose a reason for hiding this comment

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

SparkQA commented Nov 19, 2020

maropu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 23, 2020

SparkQA commented Nov 23, 2020

SparkQA commented Nov 23, 2020

SparkQA commented Nov 25, 2020

SparkQA commented Nov 25, 2020

SparkQA commented Nov 25, 2020

SparkQA commented Nov 25, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Nov 26, 2020

maropu commented Nov 27, 2020

cloud-fan commented Nov 27, 2020

SparkQA commented Nov 27, 2020

SparkQA commented Nov 27, 2020

cloud-fan commented Nov 27, 2020

SparkQA commented Nov 27, 2020

SparkQA commented Nov 27, 2020

cloud-fan commented Nov 30, 2020

dongjoon-hyun commented Nov 30, 2020

cloud-fan commented Nov 18, 2020 •

edited

Loading