Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding TopWikipediaSessions integration test #6311

Merged
merged 3 commits into from Sep 14, 2018
Merged

Conversation

pabloem
Copy link
Member

@pabloem pabloem commented Aug 30, 2018

This test should run with postcommits.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- --- --- --- ---
Java Build Status Build Status Build Status Build Status Build Status Build Status Build Status
Python Build Status --- Build Status
Build Status
--- --- --- ---

@pabloem
Copy link
Member Author

pabloem commented Aug 31, 2018

Run Java PostCommit

timestamp = ((BigDecimal) row.get("timestamp")).intValue();
} catch (ClassCastException e) {
timestamp = ((Integer) row.get("timestamp")).intValue();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataflow converts into BigDecimal, while Direct runner converts into Integer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this issue a bit please? Do we have similar issue or different types for other runners?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apilloud do you know why this happens on parsing of JSON files?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're thinking the TableRow here is related to SQL, it isn't. This is a BigQuery datatype. The JSON parsing is being configured to use a global instance of Jackson in ParseTableRowJson, later in this file. The TableRow type is really just Map<String,Object>, so Jackson will pick a type for numbers based on global config which must not be the same on all runners. (See https://fasterxml.github.io/jackson-databind/javadoc/2.8/index.html?com/fasterxml/jackson/databind/DeserializationFeature.html for some of these config options.) One way to fix this would be to provide the actual types for json decoding rather then using Map<String,Object>.

@pabloem
Copy link
Member Author

pabloem commented Aug 31, 2018

r: @Ardagan

@huygaa11
Copy link
Contributor

@apilloud @Ardagan friendly ping!

@pabloem
Copy link
Member Author

pabloem commented Sep 11, 2018

@Ardagan and I had a chat offline, and I need to look into what Andrew recommended me to do. Give me a couple weeks to get to this again.

@pabloem
Copy link
Member Author

pabloem commented Sep 14, 2018

@Ardagan I think I'd like to get this in for now, and keep a todo (https://issues.apache.org/jira/browse/BEAM-5390). LMK what you think

@pabloem pabloem merged commit 859b8d5 into apache:master Sep 14, 2018
@pabloem pabloem deleted the twsit branch September 14, 2018 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants