Skip to content

[CALCITE-5806] add a String data type (enabled in Apache Spark and BigQuery)#3468

Closed
MasseGuillaume wants to merge 3 commits intoapache:mainfrom
MasseGuillaume:CALCITE-5806
Closed

[CALCITE-5806] add a String data type (enabled in Apache Spark and BigQuery)#3468
MasseGuillaume wants to merge 3 commits intoapache:mainfrom
MasseGuillaume:CALCITE-5806

Conversation

@MasseGuillaume
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@mihaibudiu mihaibudiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is string different from varchar?
If string is the same as varchar, can it be implemented by just having
rootSchema.add("STRING", factory -> factory.createSqlType(sqlTypeName.VARCHAR))?

@MasseGuillaume
Copy link
Copy Markdown
Contributor Author

Good point, I also consider this alias.

While implementing I saw there is an extension for types:

parser.dataTypeParserMethods!default.parser.dataTypeParserMethods

For RelToSqlConverter, how would I solve unparse for VARCHAR for Apache Spark?

scala> spark.sql("""select cast("foo" as VARCHAR)""").show()
org.apache.spark.sql.catalyst.parser.ParseException:
[DATATYPE_MISSING_SIZE] DataType "VARCHAR" requires a length parameter, for example "VARCHAR"(10). Please specify the length.(line 1, pos 21)

== SQL ==
select cast("foo" as VARCHAR)
---------------------^^^

@sonarqubecloud
Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@MasseGuillaume
Copy link
Copy Markdown
Contributor Author

Ah, to unparse user defined types it's via:

SqlDialect.getCastSpec then you can use SqlUserDefinedTypeNameSpec

val pos = SqlParserPos.ZERO
new SqlDataTypeSpec(new SqlUserDefinedTypeNameSpec(new SqlIdentifier("STRING", pos), pos), pos)

While it's possible to add a type without modifying Calcite's codebase, it's a lot of gymnastics for a type that is fairly common in practice:

https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type
https://spark.apache.org/docs/latest/sql-ref-datatypes.html

@mihaibudiu
Copy link
Copy Markdown
Contributor

Once you add a new type you have to maintain it forever. Moreover, technically you should write many more tests; everything that works with varchar should be tested with string if you want to be sure it works. If something can be done without modifying Calcite I think that's the right way to do it.

@MasseGuillaume
Copy link
Copy Markdown
Contributor Author

Okay, I will close this PR and write a how-to guide to defined custom types and use string as an example.

However, there is a bug for the Spark SQL dialect in RelToSqlConverter since it does not unparse varchar to string.

@mihaibudiu
Copy link
Copy Markdown
Contributor

I just expressed my personal opinion above; my experience with Calcite isn't that significant, so I have been wrong before.

Regarding SPARK, you should file an issue, and it looks like you can perhaps send a PR for the fix for it as well.
Does the SPARK parser parse STRING types already?

@MasseGuillaume
Copy link
Copy Markdown
Contributor Author

MasseGuillaume commented Oct 13, 2023

Yes string is a first class citizen.

scala> spark.sql("""select cast(NULL as string)""").show()
+--------------------+
|CAST(NULL AS STRING)|
+--------------------+
|                null|
+--------------------+

scala> spark.sql("""select cast(NULL as string)""").printSchema
root
 |-- CAST(NULL AS STRING): string (nullable = true)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants