streaming_perf #52

ymahajan · 2015-11-30T09:56:10Z

No description provided.

Adding StreamingSnappyContext, SchemaDStream, window clause in CQs Adding DDLs to create stream tables for sockets, file and kafka sources

…h DDL

Adding StreamingSnappyContext, SchemaDStream, window clause in CQs Adding DDLs to create stream tables for sockets, file and kafka sources

…h DDL

…commons into wip/streaming Conflicts: snappy-core/build.gradle snappy-core/src/main/scala/org/apache/spark/sql/snappyParsers.scala snappy-core/src/main/scala/org/apache/spark/sql/streaming/SchemaDStream.scala snappy-core/src/main/scala/org/apache/spark/sql/streaming/StreamingSnappyContext.scala

Cleaned up kafka configuration APIs

…ith latest upstream master

Throw exception when truncate/drop stream tables

… and unit tests Enhanced Twitter quickstart to use twitter4j.StatusJSONImpl class Added a JSON parser to parse the tweet json Cleand up code and added docs to key classes

Conflicts: snappy-core/src/main/scala/org/apache/spark/sql/SnappyContext.scala

a) A new join operator LocalJoin is added to perform replicated table join with either a replicat$a This join mimics Broadcast join. Instead of taking build side join from a Broadcast relation we iterate over the single partition of replicated relation. A relation can declare itself replicated by implementing PartitionedDataSourceScan and defining numPartitions to 1. A new RDD NarrowPartitionsRDD is used to execute both the build side and stream side RDDs. Stream side is iterated for all the partition, while build side which has single partition is iterated for each stream side partition. NarrowPartitionsRDD takes care of preferred location based on the common node. b) For Partition to partion join Spark always shuffles if the relations are from DataSources as PhysicalRDD does not have a partitioner. We added a new physical plan PartitionedPhysicalRDD which has a partitioner based on the partitioning column. If the join operation is on the same columns as that of partition columns of the underlying store we can avoid shuffle and do a partition to partion join. Thankfully ZipPartitionRDD which is used by both merge join and shuffled join , takes care of the preferred locations. c) I have tested it for equijoins and not LeftSemiJoin.

Joining stream to static tables Added sql method to StreamingSnappyContext to route SQL queries

Conflicts: snappy-core/src/main/scala/org/apache/spark/sql/sources/errorEstimates.scala

…ck for long

Store optimization is now default. Column store integration can only be done after Surnajan's checkin

Conflicts: snappy-core/src/main/scala/org/apache/spark/sql/SnappyContext.scala

hbhanawat · 2015-12-08T11:25:06Z

snappy-core/src/test/scala/io/snappydata/app/StreamingInputWithLoadData.scala

@@ -65,6 +65,7 @@ object StreamingInputWithLoadData extends Serializable {
    val ingestionStream = stream.window(Seconds(5), Seconds(5))


+    import org.apache.spark.sql.streaming.snappy._


hbhanawat · 2015-12-08T11:27:40Z

First round of review done. I have only reviewed Streaming changes and no other changes in this PR. I have also not reviewed tests. There are few things that we need immediately for quickstart testing:

foreachDataframe function
A SQL way to insert to into gemxd as mentioned in spec.
clean up the api to insert into gemxd.

ymahajan · 2015-12-08T11:35:28Z

We are using following in foreachRDD code instead for foreachDataFrame
val df = ssnc.createDataFrame(rdd, tableStream.schema)
That should be sufficient for now.

On Tue, Dec 8, 2015 at 4:57 PM, hbhanawat notifications@github.com wrote:

First round of review done. I have only reviewed Streaming changes and no
other changes in this PR. I have also not reviewed tests. There are few
things that we need immediately for quickstart testing:

foreachDataframe function

A SQL way to insert to into gemxd as mentioned in spec.

clean up the api to insert into gemxd.

—
Reply to this email directly or view it on GitHub
#52 (comment)
.

hbhanawat · 2015-12-09T03:48:05Z

We are using following in foreachRDD code instead for foreachDataFrame val df = snc.createDataFrame(rdd, tableStream.schema) That should be sufficient for now.

Ok. But since this was part of spec, we need to get it reviewed/accepted by Jags, Sumedh and team.

2) Modified Stream related tests to honour the cleanup

Fixed samll issue of case in ExternalShellDunit

DirectKafka is a seperate DDL/Relation/Source now. Cleaned up tests combined all streaming tests in StreamingSuite

ymahajan · 2015-12-10T12:26:40Z

foreachDataFrame is not yet implemented, will file a JIRA to track this
story.
Currently we are using following in foreachRDD as a workaround
val df = ssnc.createDataFrame(rdd, tableStream.schema)

On Fri, Dec 4, 2015 at 3:59 PM, hbhanawat notifications@github.com wrote:

General comments:foreachDataFrame which was mentioned in Spec is not
implemented?

—
Reply to this email directly or view it on GitHub
#52 (comment)
.

ymahajan · 2015-12-10T12:28:25Z

buildScan is not implemented yet, will track as a story. we are using
df.save APIs

On Fri, Dec 4, 2015 at 3:59 PM, hbhanawat notifications@github.com wrote:

General comments:We have not implemented a buildscan on our stream?

—
Reply to this email directly or view it on GitHub
#52 (comment)
.

ymahajan · 2015-12-10T12:29:16Z

We need to see how this will be supported from CLI, works with scala
program though.

On Fri, Dec 4, 2015 at 4:00 PM, hbhanawat notifications@github.com wrote:

General comments:Once a stream table is created using sql, how do I insert
my stream table into gem xd using a sql command?

—
Reply to this email directly or view it on GitHub
#52 (comment)
.

hbhanawat · 2015-12-10T13:56:51Z

Remove dat and log files in the checkin.

hbhanawat · 2015-12-10T13:57:12Z

Sorry I closed this by mistake. Reopening it.

+ adding provision to pass a closure for SparkConf additions

+ StreamToRow now returns Seq[InternalRow] + minor refactoring

…-commons into wip/tests/tpch

Conflicts: snappy-core/src/main/scala/org/apache/spark/sql/SnappyContext.scala snappy-core/src/main/scala/org/apache/spark/sql/columnar/CacheBatchHolder.scala snappy-core/src/main/scala/org/apache/spark/sql/columnar/ExternalStoreUtils.scala snappy-core/src/main/scala/org/apache/spark/sql/columnar/JDBCAppendableRelation.scala snappy-core/src/main/scala/org/apache/spark/sql/row/JDBCMutableRelation.scala snappy-core/src/main/scala/org/apache/spark/sql/snappyParsers.scala snappy-core/src/main/scala/org/apache/spark/sql/store/ExternalStore.scala snappy-core/src/main/scala/org/apache/spark/sql/store/JDBCSourceAsStore.scala snappy-core/src/test/scala/io/snappydata/SnappyFunSuite.scala snappy-core/src/test/scala/io/snappydata/core/LocalTestData.scala snappy-spark snappy-tools/src/main/scala/org/apache/spark/sql/columntable/ColumnFormatRelation.scala snappy-tools/src/main/scala/org/apache/spark/sql/store/StoreInitRDD.scala snappy-tools/src/main/scala/org/apache/spark/sql/store/StoreUtils.scala snappy-tools/src/main/scala/org/apache/spark/sql/store/impl/JDBCSourceAsColumnarStore.scala snappy-tools/src/test/scala/org/apache/spark/sql/store/ColumnTableTest.scala

…-commons into tpch

…-commons into wip/tests/tpch Conflicts: snappy-core/src/test/scala/io/snappydata/SnappyFunSuite.scala

…ninng needs to be worked to make it enabled again.

Conflicts: snappy-spark

streaming_perf

ymahajan and others added 30 commits October 26, 2015 17:53

Initial implementation of streaming apis and cqs

ac0852f

Adding StreamingSnappyContext, SchemaDStream, window clause in CQs Adding DDLs to create stream tables for sockets, file and kafka sources

Fixed 'table not found' issue in query on stream table created throug…

8481ed4

…h DDL

Removing unused tests

6be6248

Added Twitter Kafka Streaming quickstart

c117cad

Moved scalatests to some other package

68e7d40

Initial implementation of streaming apis and cqs

6acec56

Adding StreamingSnappyContext, SchemaDStream, window clause in CQs Adding DDLs to create stream tables for sockets, file and kafka sources

Fixed 'table not found' issue in query on stream table created throug…

3070d92

…h DDL

Removing unused tests

df16e3b

Added Twitter Kafka Streaming quickstart

612e4a9

Moved scalatests to some other package

a79ea73

Changes for 1.5.1 merge

9193f0e

Merge remote-tracking branch 'origin/master' into wip/streaming

eedb037

Support for window query plans for tables created through SQL

e077bf6

Cleaned up kafka configuration APIs

Fixes and updates for snappy-spark/snappy/tests/tpch branch updated w…

0c0f2dc

…ith latest upstream master

Use same hivecatalog for both stream & external tables

8ab10f2

Throw exception when truncate/drop stream tables

Sampled table from different types of stream srcs

00788d9

TPCH Benchmarking Test changes

5a3826f

code-refactoring and incorporating review comments

342555f

Support for SchemaDStream from DStream of Product (e.g. case classes)…

31485ed

… and unit tests Enhanced Twitter quickstart to use twitter4j.StatusJSONImpl class Added a JSON parser to parse the tweet json Cleand up code and added docs to key classes

Merge branch 'master' into wip/streaming

377f440

Conflicts: snappy-core/src/main/scala/org/apache/spark/sql/SnappyContext.scala

Added preferred location to NarrowPartitionsRDD

ab77eb8

Fixed an issue in storing stream to GemXD external table

b197343

Joining stream to static tables Added sql method to StreamingSnappyContext to route SQL queries

Merge remote-tracking branch 'origin/master' into wip/tests/tpch

37d0d58

Conflicts: snappy-core/src/main/scala/org/apache/spark/sql/sources/errorEstimates.scala

increase gradle JVM heapsize and other parameters else build gets stu…

718dff1

…ck for long

corrected short names of built in data sources

684a371

Further tests added

bd7f863

Store optimization is now default. Column store integration can only be done after Surnajan's checkin

Added explain,columns,printSchema API to SchemaDStream

8e1d8b4

Merge remote-tracking branch 'origin/master' into SNAP-105

fe5572e

Conflicts: snappy-core/src/main/scala/org/apache/spark/sql/SnappyContext.scala

hbhanawat reviewed Dec 8, 2015
View reviewed changes

Review comments

140b422

rmishra and others added 3 commits December 10, 2015 13:52

1) Added a new StreamingSnappyContext stop method for cleanup

ac56b3f

2) Modified Stream related tests to honour the cleanup

Fixed issues to local bucket insertion in ColumnTableDunit

e42f855

Fixed samll issue of case in ExternalShellDunit

Review comments + refactoring

baef473

DirectKafka is a seperate DDL/Relation/Source now. Cleaned up tests combined all streaming tests in StreamingSuite

hbhanawat closed this Dec 10, 2015

hbhanawat reopened this Dec 10, 2015

ymahajan and others added 12 commits December 10, 2015 21:10

Removing couple of files which should not have been checked in

723138f

SparkConf optionally can be changed within Suites.

9e07090

+ adding provision to pass a closure for SparkConf additions

+ fixing compile error

15722c5

+ Idea15 and scalastyle related changes

277bd33

+ StreamToRow now returns Seq[InternalRow] + minor refactoring

Merge branch 'wip/tests/tpch' of https://github.com/SnappyData/snappy…

f73eaec

…-commons into wip/tests/tpch

remove println

7fbcc77

+ Fixing errors that is only caught by cmd line

6edce9f

Merge branch 'wip/tests/tpch' of https://github.com/SnappyData/snappy…

f0b1758

…-commons into tpch

Merge branch 'wip/tests/tpch' of https://github.com/SnappyData/snappy…

001abe8

…-commons into wip/tests/tpch Conflicts: snappy-core/src/test/scala/io/snappydata/SnappyFunSuite.scala

Disabled join optimisation for master merge. The new column table sca…

8edfc1b

…ninng needs to be worked to make it enabled again.

Linking snappy-spark

8a4f591

Conflicts: snappy-spark

rishitesh added a commit that referenced this pull request Dec 11, 2015

Merge pull request #52 from SnappyData/wip/tests/tpch

f175da2

streaming_perf

rishitesh merged commit f175da2 into master Dec 11, 2015

sumwale deleted the wip/tests/tpch branch January 19, 2016 19:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streaming_perf #52

streaming_perf #52

ymahajan commented Nov 30, 2015

hbhanawat Dec 8, 2015

hbhanawat commented Dec 8, 2015

ymahajan commented Dec 8, 2015

hbhanawat commented Dec 9, 2015

ymahajan commented Dec 10, 2015

ymahajan commented Dec 10, 2015

ymahajan commented Dec 10, 2015

hbhanawat commented Dec 10, 2015

hbhanawat commented Dec 10, 2015

		@@ -65,6 +65,7 @@ object StreamingInputWithLoadData extends Serializable {
		val ingestionStream = stream.window(Seconds(5), Seconds(5))


		import org.apache.spark.sql.streaming.snappy._

streaming_perf #52

streaming_perf #52

Conversation

ymahajan commented Nov 30, 2015

hbhanawat Dec 8, 2015

Choose a reason for hiding this comment

hbhanawat commented Dec 8, 2015

ymahajan commented Dec 8, 2015

hbhanawat commented Dec 9, 2015

ymahajan commented Dec 10, 2015

ymahajan commented Dec 10, 2015

ymahajan commented Dec 10, 2015

hbhanawat commented Dec 10, 2015

hbhanawat commented Dec 10, 2015