Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8307] [SQL] improve timestamp from parquet #6759

Closed
wants to merge 16 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Jun 11, 2015

This PR change to convert julian day to unix timestamp directly (without Calendar and Timestamp).

cc @adrian-wang @rxin

@@ -26,7 +26,7 @@ import org.apache.spark.sql.catalyst.expressions.Cast
/**
* Helper function to convert between Int value of days since 1970-01-01 and java.sql.Date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java doc here

@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34675 has finished for PR 6759 at commit ea196d4.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


def convertToTimestamp(value: Binary): Timestamp = {
def convertToTimestamp(value: Binary): Long = {
val nt = NanoTime.fromBinary(value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just get rid of NanoTime ...

@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34724 has finished for PR 6759 at commit 361fd62.

  • This patch fails Scala style tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 11, 2015

Test build #34726 has finished for PR 6759 at commit 6e445e6.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
	sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala
@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #34728 has finished for PR 6759 at commit a3171b8.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2015

Test build #898 has finished for PR 6759 at commit a3171b8.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Davies Liu added 2 commits June 13, 2015 14:19
Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
	sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
	sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala
	sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetIOSuite.scala
@SparkQA
Copy link

SparkQA commented Jun 13, 2015

Test build #34841 has finished for PR 6759 at commit c834108.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 13, 2015

Test build #34842 has finished for PR 6759 at commit 212143b.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 13, 2015

Test build #34849 has finished for PR 6759 at commit 8ace611.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #34858 has finished for PR 6759 at commit 2f2e48c.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #911 has finished for PR 6759 at commit 2f2e48c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #912 has finished for PR 6759 at commit 2f2e48c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2015

Test build #913 has finished for PR 6759 at commit 2f2e48c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 15, 2015

Test build #34899 has finished for PR 6759 at commit 602b969.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Jun 17, 2015

@rxin This is ready to review

* and nanoseconds in a day
*/
def fromJulianDay(day: Int, nanoseconds: Long): Long = {
// use integer to avoid rounding errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're using longs here to avoid the rounding errors, not integers, so maybe we should update this comment.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35296 has finished for PR 6759 at commit bfc437c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

DateTimeUtils.fromJulianDay(julianDay, timeOfDayNanos)
}

def convertFromTimestamp(ts: Long): Binary = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I think we should add a comment here as well (it can be the inverse of my other suggested Scaladoc comment).

@JoshRosen
Copy link
Contributor

At a high level, this looks like it's in really good shape. I might have more comments in a bit; have to step out to handle some calls now, but I'll loop back later in the afternoon to finish looking this over.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35321 has finished for PR 6759 at commit 4891efb.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -313,10 +314,16 @@ private[parquet] class RowWriteSupport extends WriteSupport[InternalRow] with Lo
writer.addBinary(Binary.fromByteArray(scratchBytes, 0, numBytes))
}

// array used to write Timestamp as Int96 (fixed-length binary)
private val int96buf = new Array[Byte](12)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that RowWriteSupport instances are thread-safe; just want to check that that assumption is true.

This could also be private[this].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I guess this is fine since we use the same pattern with scratchBytes for writeDecimal in the preceding lines.

@SparkQA
Copy link

SparkQA commented Jun 19, 2015

Test build #35338 has finished for PR 6759 at commit 8e2d56f.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #35332 has finished for PR 6759 at commit 634b9f5.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #941 has finished for PR 6759 at commit 8e2d56f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #942 has finished for PR 6759 at commit 8e2d56f.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 20, 2015

Test build #945 has finished for PR 6759 at commit 8e2d56f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 21, 2015

Test build #948 has finished for PR 6759 at commit 8e2d56f.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

Conflicts:
	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@SparkQA
Copy link

SparkQA commented Jun 22, 2015

Test build #35481 has finished for PR 6759 at commit b0e4cad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Conflicts:
	sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/UnsafeRowConverterSuite.scala
@davies
Copy link
Contributor Author

davies commented Jun 22, 2015

@JoshRosen Is this ready to go?

@JoshRosen
Copy link
Contributor

@davies, yeah, I think this should be good to go; looks like all of my comments have been addressed.

@SparkQA
Copy link

SparkQA commented Jun 23, 2015

Test build #35489 has finished for PR 6759 at commit 849e301.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Jun 23, 2015

The last failure is not related, I'd merge this into master.

@asfgit asfgit closed this in 6b7f2ce Jun 23, 2015
animeshbaranawal pushed a commit to animeshbaranawal/spark that referenced this pull request Jun 25, 2015
This PR change to convert julian day to unix timestamp directly (without Calendar and Timestamp).

cc adrian-wang rxin

Author: Davies Liu <davies@databricks.com>

Closes apache#6759 from davies/improve_ts and squashes the following commits:

849e301 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
b0e4cad [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
8e2d56f [Davies Liu] address comments
634b9f5 [Davies Liu] fix mima
4891efb [Davies Liu] address comment
bfc437c [Davies Liu] fix build
ae5979c [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
602b969 [Davies Liu] remove jodd
2f2e48c [Davies Liu] fix test
8ace611 [Davies Liu] fix mima
212143b [Davies Liu] fix mina
c834108 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
a3171b8 [Davies Liu] Merge branch 'master' of github.com:apache/spark into improve_ts
5233974 [Davies Liu] fix scala style
361fd62 [Davies Liu] address comments
ea196d4 [Davies Liu] improve timestamp from parquet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants