Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-510 ARROW-582 ARROW-663 ARROW-729: [Java] Added units for Time and Date types, and integration tests #475

Closed

Conversation

leifwalsh
Copy link
Contributor

@leifwalsh leifwalsh commented Apr 1, 2017

closes #366

@leifwalsh leifwalsh changed the title ARROW-670: [Java] Added day and milli implementations for date, change Time to TimeMilli ARROW-729: [Java] Added day and milli implementations for date, change Time to TimeMilli Apr 1, 2017
@leifwalsh leifwalsh force-pushed the feature/java-date-time-types branch from 8a85ab9 to d10713e Compare April 1, 2017 00:13
@icexelloss
Copy link
Contributor

This is how I would do it.

@leifwalsh
Copy link
Contributor Author

@julienledem @wesm where are the roundtrip integration tests so I can add these to them and how do I run them?

@icexelloss
Copy link
Contributor

icexelloss commented Apr 1, 2017

@leifwalsh
Copy link
Contributor Author

It seems to work but I think the test data doesn't have any timestamps, dates, or times in it. Were the json files constructed manually or programmatically?

@wesm
Copy link
Member

wesm commented Apr 1, 2017

The integration tests for date/times are in the patch for ARROW-510. There's a line commented out that generates the test cases

@leifwalsh
Copy link
Contributor Author

@wesm awesome. I created a branch off this and merged yours into that, uncommented the test, and got a failure:

Command failed: java -cp /home/leif/git/arrow/java/tools/target/arrow-tools-0.2.1-SNAPSHOT-jar-with-dependencies.jar org.apache.arrow.tools.Integration -a /tmp/tmp4dpiwmfk/c377ba55676a45d28e09e9a5ea70b617 -j /tmp/tmpadyd70vm/628548c02e5440eba693410d754f7c31.json -c VALIDATE
With output:
--------------
12:32:16.427 [main] DEBUG i.n.u.i.l.InternalLoggerFactory - Using SLF4J as the default logging framework
12:32:16.432 [main] DEBUG io.netty.buffer.AbstractByteBuf - -Dio.netty.buffer.bytebuf.checkAccessible: true
12:32:16.434 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.level: simple
12:32:16.434 [main] DEBUG io.netty.util.ResourceLeakDetector - -Dio.netty.leakDetection.maxRecords: 4
12:32:16.447 [main] DEBUG i.n.util.internal.PlatformDependent0 - java.nio.Buffer.address: available
12:32:16.447 [main] DEBUG i.n.util.internal.PlatformDependent0 - sun.misc.Unsafe.theUnsafe: available
12:32:16.448 [main] DEBUG i.n.util.internal.PlatformDependent0 - sun.misc.Unsafe.copyMemory: available
12:32:16.448 [main] DEBUG i.n.util.internal.PlatformDependent0 - direct buffer constructor: available
12:32:16.449 [main] DEBUG i.n.util.internal.PlatformDependent0 - java.nio.Bits.unaligned: available, true
12:32:16.449 [main] DEBUG i.n.util.internal.PlatformDependent0 - java.nio.DirectByteBuffer.<init>(long, int): available
12:32:16.450 [main] DEBUG io.netty.util.internal.Cleaner0 - java.nio.ByteBuffer.cleaner(): available
12:32:16.451 [main] DEBUG i.n.util.internal.PlatformDependent - Java version: 8
12:32:16.451 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.noUnsafe: false
12:32:16.451 [main] DEBUG i.n.util.internal.PlatformDependent - sun.misc.Unsafe: available
12:32:16.451 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.noJavassist: false
12:32:16.452 [main] DEBUG i.n.util.internal.PlatformDependent - Javassist: unavailable
12:32:16.452 [main] DEBUG i.n.util.internal.PlatformDependent - You don't have Javassist in your class path or you don't have enough permission to load dynamically generated classes.  Please check the configuration for better performance.
12:32:16.452 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.tmpdir: /tmp (java.io.tmpdir)
12:32:16.453 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.bitMode: 64 (sun.arch.data.model)
12:32:16.453 [main] DEBUG i.n.util.internal.PlatformDependent - -Dio.netty.noPreferDirect: false
12:32:16.453 [main] DEBUG i.n.util.internal.PlatformDependent - io.netty.maxDirectMemory: 1836580864 bytes
12:32:16.454 [main] DEBUG i.n.util.ResourceLeakDetectorFactory - Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector@1c6b6478
12:32:16.464 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numHeapArenas: 8
12:32:16.464 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.numDirectArenas: 8
12:32:16.464 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.pageSize: 8192
12:32:16.465 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxOrder: 11
12:32:16.465 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.chunkSize: 16777216
12:32:16.465 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.tinyCacheSize: 512
12:32:16.465 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.smallCacheSize: 256
12:32:16.465 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.normalCacheSize: 64
12:32:16.465 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.maxCachedBufferCapacity: 32768
12:32:16.465 [main] DEBUG i.n.buffer.PooledByteBufAllocator - -Dio.netty.allocator.cacheTrimInterval: 8192
12:32:16.724 [main] DEBUG o.a.arrow.vector.file.ReadChannel - Reading buffer with size: 10
12:32:16.726 [main] DEBUG o.a.a.vector.file.ArrowFileReader - Footer starts at 5968, length: 1040
12:32:16.726 [main] DEBUG o.a.arrow.vector.file.ReadChannel - Reading buffer with size: 1040
12:32:16.808 [main] DEBUG org.apache.arrow.tools.Integration - Arrow Input file size: 7018
12:32:16.809 [main] DEBUG org.apache.arrow.tools.Integration - ARROW schema: Schema<f0: Date(DAY), f1: Date(MILLISECOND), f2: Time(SECOND, 0), f3: Time(MILLISECOND, 0), f4: Time(MICROSECOND, 0), f5: Time(NANOSECOND, 0), f6: Timestamp(SECOND, null), f7: Timestamp(MILLISECOND, null), f8: Timestamp(MICROSECOND, null), f9: Timestamp(NANOSECOND, null), f10: Timestamp(MILLISECOND, America/New_York)>
12:32:16.809 [main] DEBUG org.apache.arrow.tools.Integration - JSON Input file size: 15314
12:32:16.809 [main] DEBUG org.apache.arrow.tools.Integration - JSON schema: Schema<f0: Date(DAY), f1: Date(MILLISECOND), f2: Time(SECOND, 32), f3: Time(MILLISECOND, 32), f4: Time(MICROSECOND, 64), f5: Time(NANOSECOND, 64), f6: Timestamp(SECOND, null), f7: Timestamp(MILLISECOND, null), f8: Timestamp(MICROSECOND, null), f9: Timestamp(NANOSECOND, null), f10: Timestamp(MILLISECOND, America/New_York)>
Incompatible files
Different schemas:
Schema<f0: Date(DAY), f1: Date(MILLISECOND), f2: Time(SECOND, 0), f3: Time(MILLISECOND, 0), f4: Time(MICROSECOND, 0), f5: Time(NANOSECOND, 0), f6: Timestamp(SECOND, null), f7: Timestamp(MILLISECOND, null), f8: Timestamp(MICROSECOND, null), f9: Timestamp(NANOSECOND, null), f10: Timestamp(MILLISECOND, America/New_York)>
Schema<f0: Date(DAY), f1: Date(MILLISECOND), f2: Time(SECOND, 32), f3: Time(MILLISECOND, 32), f4: Time(MICROSECOND, 64), f5: Time(NANOSECOND, 64), f6: Timestamp(SECOND, null), f7: Timestamp(MILLISECOND, null), f8: Timestamp(MICROSECOND, null), f9: Timestamp(NANOSECOND, null), f10: Timestamp(MILLISECOND, America/New_York)>
12:32:16.815 [main] ERROR org.apache.arrow.tools.Integration - Incompatible files
java.lang.IllegalArgumentException: Different schemas:
Schema<f0: Date(DAY), f1: Date(MILLISECOND), f2: Time(SECOND, 0), f3: Time(MILLISECOND, 0), f4: Time(MICROSECOND, 0), f5: Time(NANOSECOND, 0), f6: Timestamp(SECOND, null), f7: Timestamp(MILLISECOND, null), f8: Timestamp(MICROSECOND, null), f9: Timestamp(NANOSECOND, null), f10: Timestamp(MILLISECOND, America/New_York)>
Schema<f0: Date(DAY), f1: Date(MILLISECOND), f2: Time(SECOND, 32), f3: Time(MILLISECOND, 32), f4: Time(MICROSECOND, 64), f5: Time(NANOSECOND, 64), f6: Timestamp(SECOND, null), f7: Timestamp(MILLISECOND, null), f8: Timestamp(MICROSECOND, null), f9: Timestamp(NANOSECOND, null), f10: Timestamp(MILLISECOND, America/New_York)>
	at org.apache.arrow.vector.util.Validator.compareSchemas(Validator.java:43) ~[arrow-tools-0.2.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.tools.Integration$Command$3.execute(Integration.java:177) ~[arrow-tools-0.2.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.tools.Integration.run(Integration.java:101) ~[arrow-tools-0.2.1-SNAPSHOT-jar-with-dependencies.jar:na]
	at org.apache.arrow.tools.Integration.main(Integration.java:62) ~[arrow-tools-0.2.1-SNAPSHOT-jar-with-dependencies.jar:na]

I'm not sure if I'm reading this right but it looks like the schema in the json file is specifying bit widths for Times but the arrow file that C++ generated sets them to 0?

@wesm
Copy link
Member

wesm commented Apr 1, 2017

Yeah, that looks like the case: https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/metadata.cc#L389

Let me fix that in #458 right now and add a check to validate that the bit width is what we expect for each unit

@leifwalsh leifwalsh changed the title ARROW-729: [Java] Added day and milli implementations for date, change Time to TimeMilli ARROW-729: [Java] Added units for Time and Date types Apr 1, 2017
@wesm
Copy link
Member

wesm commented Apr 1, 2017

done 06c4eed

@leifwalsh
Copy link
Contributor Author

Okay, I implemented these types in JsonFileReader and JsonFileWriter and now I get this failure:

-- Java producing, C++ consuming
Testing file /home/leif/git/arrow/integration/data/struct_example.json
-- Creating binary inputs
-- Validating file
-- Validating stream
Testing file /home/leif/git/arrow/integration/data/simple.json
-- Creating binary inputs
-- Validating file
-- Validating stream
Testing file /tmp/tmpugxk26ua/053a211f7b8f448cb7093a44b16258c5.json
-- Creating binary inputs
-- Validating file
-- Validating stream
Testing file /tmp/tmpugxk26ua/9b7a257132e74a5f81364c88d94c1b12.json
-- Creating binary inputs
-- Validating file
-- Validating stream
Testing file /tmp/tmpugxk26ua/34173bff0e02405eb44015751d80d2cd.json
-- Creating binary inputs
-- Validating file
-- Validating stream
Testing file /tmp/tmpugxk26ua/1e0a01f57e7740a7bc715f95d6b9ea7c.json
-- Creating binary inputs
-- Validating file
Command failed: /home/leif/git/arrow/cpp/test-build/debug/json-integration-test --integration --arrow=/tmp/tmp4wae8kve/641c555737874c5e80c5f2ea5aa91179 --json=/tmp/tmpugxk26ua/1e0a01f57e7740a7bc715f95d6b9ea7c.json --mode=VALIDATE
With output:
--------------
Error message: Invalid: Record batch 0 did not match
JSON:
f0: [null, -1534653433, null, -471089378, null, null, 1726023356]
f1: [1218641917, -2011903818, -470743102, null, -1756786353, null, null]
f2: [810115012, null, null, -883898073, -1982197003, null, -1729996989]
f3: [1591774331, -314120230, null, 1196117999, -1539384497, null, 872289996]
f4: [658927759, null, null, -961381010, -1231869285, -1369355730, null]
f5: [null, 552443512, null, 2013848155, 709970719, null, -1972820288]
f6: [null, null, -102932588, 1536992262, null, null, null]
f7: [-1730827518, 1939266448, -678382409, null, null, null, null]
f8: [null, -264948505, -1250649123, 2006597095, null, 1050350308, -394676957]
f9: [33833774, null, 114513685, null, 1900471701, -669924741, 1015463225]
f10: [399170240, null, null, null, null, null, -548617532]

Arrow:
f0: [null, -1534653433, null, -471089378, null, null, 1726023356]
f1: [1218641917, -2011903818, -470743102, null, -1756786353, null, null]
f2: [810115012, null, null, -883898073, -1982197003, null, -1729996989]
f3: [1591774331, -314120230, null, 1196117999, -1539384497, null, 872289996]
f4: [658927759, null, null, -961381010, -1231869285, -1369355730, null]
f5: [null, 552443512, null, 2013848155, 709970719, null, -1972820288]
f6: [null, null, -102932588, 1536992262, null, null, null]
f7: [-1730827518, 1939266448, -678382409, null, null, null, null]
f8: [null, -264948505, -1250649123, 2006597095, null, 1050350308, -394676957]
f9: [33833774, null, 114513685, null, 1900471701, -669924741, 1015463225]
f10: [399170240, null, null, null, null, null, -548617532]

Funny thing is, those are identical according to diff.

@wesm
Copy link
Member

wesm commented Apr 1, 2017

Interesting. Can you pull my commits from #458 into this patch and I'll take a look to see what's up?

@leifwalsh
Copy link
Contributor Author

Why do you not check if right.IsNull() here? https://github.com/apache/arrow/blob/master/cpp/src/arrow/compare.cc#L472

Is the expectation that left.IsNull() iff right.IsNull()?

@leifwalsh
Copy link
Contributor Author

Sure.

@leifwalsh
Copy link
Contributor Author

Pushed

@wesm
Copy link
Member

wesm commented Apr 1, 2017

looking

@wesm
Copy link
Member

wesm commented Apr 1, 2017

Found the issue, the bit width on Date32Type in arrow/type.h was incorrect. fixed at wesm@2e5570c. integration tests pass now... awesome!

@leifwalsh
Copy link
Contributor Author

leifwalsh commented Apr 1, 2017 via email

@wesm
Copy link
Member

wesm commented Apr 1, 2017

I closed my PR -- do you mind doing all the code reviews in this patch? Feel free to review the integration test / C++ stuff (the changes were pretty minimal)

@wesm
Copy link
Member

wesm commented Apr 1, 2017

Also, if you can add "closes #366" into your PR description then that will close that PR when this is merged

@leifwalsh
Copy link
Contributor Author

leifwalsh commented Apr 1, 2017 via email

@leifwalsh leifwalsh changed the title ARROW-729: [Java] Added units for Time and Date types ARROW-510 ARROW-582 ARROW-729: [Java] Added units for Time and Date types, and integration tests Apr 1, 2017
@leifwalsh leifwalsh changed the title ARROW-510 ARROW-582 ARROW-729: [Java] Added units for Time and Date types, and integration tests ARROW-510 ARROW-582 ARROW-663 ARROW-729: [Java] Added units for Time and Date types, and integration tests Apr 1, 2017
@icexelloss
Copy link
Contributor

Wow, awesome! Looks like you guys are pretty much all set. Let me know if I can help with anything.

@leifwalsh
Copy link
Contributor Author

@icexelloss would appreciate review, especially of what I'm doing with micros and nanos to create joda objects here #475 (comment)

@leifwalsh leifwalsh force-pushed the feature/java-date-time-types branch from 2e5570c to 0bfe92e Compare April 1, 2017 17:58
@leifwalsh
Copy link
Contributor Author

Reorganized commits a little

@wesm
Copy link
Member

wesm commented Apr 1, 2017

@leifwalsh looks like you pulled in a few unintentional commits, can you rebase on apache/master?


<#elseif minor.class == "TimeStampMilli">
switch (dateUnit) {
case DAY:
Copy link
Contributor

@icexelloss icexelloss Apr 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this works for both day and millis:

long millisSinceEpoch = timeUnit.toMillis((long) get(index))
DateTime date = new DateTime(millisSinceEpoch, DateTimeZone.getDefault())

long millis;
switch (timeUnit) {
case SECOND:
millis = java.util.concurrent.TimeUnit.SECONDS.toMillis(get(index));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the timeUnit in the class

Copy link
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me.
@jacques-n ?

@wesm
Copy link
Member

wesm commented Apr 3, 2017

@leifwalsh there is a conflict (hopefully not too horrible to resolve) after merging #409 -- the idea is that this will make it easier to add more types that have additional metadata

@jacques-n
Copy link
Contributor

@julienledem, I think this patch further propagates the issues we've with using DateTime from Joda instead of LocalDateTime. I think those should probably be addressed separately but wanted to confirm your thoughts?

@julienledem
Copy link
Member

@jacques-n: right. This should probably be fixed in a separate change. I'd suggest the new vectors should just use Integer and Long in getObject().

@wesm
Copy link
Member

wesm commented Apr 3, 2017

I just opened https://issues.apache.org/jira/browse/ARROW-768 and made it a blocker for 0.3

@leifwalsh
Copy link
Contributor Author

@julienledem do you want me to change getObject() to return Integer and Long or leave the way it is for now and address later?

@leifwalsh
Copy link
Contributor Author

Resolved merge conflicts by rebasing on master, passes java unit and c++/java integration tests.

leifwalsh and others added 3 commits April 3, 2017 23:23
…r consistency with FixedSizeList

As discussed on JIRA

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#473 from wesm/ARROW-733 and squashes the following commits:

0e30af3 [Wes McKinney] Rename FixedWidthBinary to FixedSizeBinary for consistency with FixedSizeList type
Change-Id: I79f5d942f64c275c87568703f9edf6a7e89467ac
@leifwalsh leifwalsh force-pushed the feature/java-date-time-types branch from 49392f2 to 47f83a8 Compare April 4, 2017 03:26
@julienledem
Copy link
Member

@leifwalsh: Yes, please make getObject() return Integer and Long (as appropriate) for new vectors.

@leifwalsh
Copy link
Contributor Author

@julienledem should I also do that to the existing TimeStamp* types?

@leifwalsh
Copy link
Contributor Author

Also, ComplexReaders.java defines read${minor.boxedType} for all the timestamp types and my new types. Should I remove that as well?

@julienledem
Copy link
Member

@leifwalsh I think for now it's better to do it only for new vectors. Changing the behavior of existing vectors requires more work on the code using them. Which is why I'd rather have those as separate changes.

@leifwalsh
Copy link
Contributor Author

Okay then I think this is good to go.

@wesm
Copy link
Member

wesm commented Apr 5, 2017

Cool. +1 -- thanks @leifwalsh for your contribution!

@asfgit asfgit closed this in f4fcb42 Apr 5, 2017
Copy link
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd better not change the behavior of dateVector.getAccessor().getObject() just yet.
Otherwise, this looks good to me.

{ class: "TimeStampMilli", javaType: "long", boxedType: "Long", friendlyType: "DateTime" }
{ class: "TimeStampMicro", javaType: "long", boxedType: "Long", friendlyType: "DateTime" }
{ class: "TimeStampNano", javaType: "long", boxedType: "Long", friendlyType: "DateTime" }
{ class: "DateMilli" },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is changing an existing vector, right?

@@ -482,12 +483,15 @@ public long getTwoAsLong(int index) {

</#if>

<#if minor.class == "Date">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're changing the behavior of Date.

@leifwalsh
Copy link
Contributor Author

leifwalsh commented Apr 6, 2017 via email

@wesm
Copy link
Member

wesm commented Apr 6, 2017

@julienledem sorry for merging early, should have waited for your final review. Let's revert what needs to be reverted for 0.3?

@wesm
Copy link
Member

wesm commented Apr 6, 2017

I opened https://issues.apache.org/jira/browse/ARROW-777, will leave it to you all to resolve ahead of 0.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants