Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4985][SQL Parquet] Parquet date support #3855

Closed
wants to merge 3 commits into from

Conversation

alexbaretta
Copy link
Contributor

No description provided.

@ash211
Copy link
Contributor

ash211 commented Jan 1, 2015

Jenkins, this is ok to test

@SparkQA
Copy link

SparkQA commented Jan 1, 2015

Test build #24979 has finished for PR 3855 at commit d29224e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@nchammas
Copy link
Contributor

nchammas commented Jan 8, 2015

cc @marmbrus

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25509 has finished for PR 3855 at commit 8851d1a.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25510 has finished for PR 3855 at commit 929f294.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -55,7 +55,7 @@ class ParquetSchemaSuite extends FunSuite with ParquetTest {
|}
""".stripMargin)

testSchema[(Byte, Short, Int, Long)](
testSchema[(Byte, Short, Int, Long, java.sql.Date)](
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Usually I'd prefer import java.sql.Date and just use Date here.

@marmbrus
Copy link
Contributor

Seems like this is subsumed by #3822, and thus we can close this issue.

@@ -207,6 +207,7 @@ private[parquet] class RowWriteSupport extends WriteSupport[Row] with Logging {
case DoubleType => writer.addDouble(value.asInstanceOf[Double])
case FloatType => writer.addFloat(value.asInstanceOf[Float])
case BooleanType => writer.addBoolean(value.asInstanceOf[Boolean])
case DateType => writer.addInteger(value.asInstanceOf[java.sql.Date].getTime.toInt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't conform to the Parquet specification for date and produces invalid data.

When using the DATE annotation, the value must be the number of days from the Unix epoch, 1 January 1970. java.sql.Date and java.util.Date are backed by a long timestamp, the number of milliseconds from the Unix epoch (which is a Parquet TIMESTAMP_MILLIS) and casting that value to an integer makes it impossible to recover the real date.

I recommend using TIMESTAMP_MILLIS instead of date here (you won't need the toInt part). That seems to be what you want, if you're interested in using java.sql.Date. The reason why there is a name mismatch is that the Parquet types mirror SQL types more closely than Java objects.

@rdblue
Copy link
Contributor

rdblue commented Apr 10, 2015

I just looked at #3822 and it looks correct, so you can ignore my review comment above. In the future, please feel free to ping me for reviews when you're using logical types like Date, Timestamp, Decimal, etc. in either Parquet or Avro. I've been working on the specs in those communities and I'm happy to make sure the implementations look correct.

@liancheng
Copy link
Contributor

@rdblue Thanks all the same for the review! BTW, it would be great if you can have a look at #5422, which refactors Spark SQL Parquet converter (Parquet records to Spark SQL rows) and implements backwards-compatibility rules.

@asfgit asfgit closed this in 0cc8fcb Apr 12, 2015
@rdblue
Copy link
Contributor

rdblue commented Apr 14, 2015

@liancheng, I'll take a look as soon as I can. I'm a little swamped this week though, so I can't guarantee it'll be quick. Sorry!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants