Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: streamsx.cassandra #99

Closed
ecurtin opened this issue Oct 6, 2016 · 9 comments
Closed

Proposal: streamsx.cassandra #99

ecurtin opened this issue Oct 6, 2016 · 9 comments

Comments

@ecurtin
Copy link

ecurtin commented Oct 6, 2016

Proposal: streamsx.cassandra

streamsx.cassandra is a toolkit in active development (and production!) at The Weather Company.
It consists of an operator that writes Streams tuples to Cassandra.

The operator is a very thin Java facade for a Scala implementation. It's built using SBT.

Basic Capabilities

The operator is configured by specifying connection information in a ZooKeeper node.
Additionally, it provides mechanisms, also configurable in ZK, for writing values as NULLS.

Nearly all SPL types are supported, including sets, lists, and maps.

Null Value Mechanism

For a real-life example, say we have a tuple representing a report from a weather station:

stream<rstring stationID, uint64 timestamp, uint32 tempF, uint32 windspeedMPH, ......>

Old-school meteorological convention specifies that invalid observations are reported as -9999. When these observations are written in Cassandra, however, we don't want to keep them as -9999, we want to take advantage of Cassandra's ability to write nulls.
So if I specify in the JSON blob that I store in ZooKeeper:

{
  "tempF" : -9999
}

Any tuples that pass into the operator with the tempF value of -9999 will be written with a tempF value of NULL in Cassandra.

Licensing

The source is licensed under Apache V2.

Currently Supported Versions

  • Streams versions 4.0, 4.1
  • Cassandra versions 2.0, 2.1 (aka, Cassandra versions that use CQL 3.1)

Sample SPL Code

namespace com.weather.test;

composite CassandraTest {

    graph

        stream<rstring greeting, uint64 count, list<int32> testList, set<int32> testSet, map<int32, boolean> testMap, int32 nInt> Greeting = Beacon() {
            param
                iterations: 1000000u; //generate 1000000 tuples
                period : 0.5; //generate a tuple every 0.5 seconds
            output
                Greeting:
                    greeting =  "Hello Streams!",
                    count = IterationCount() + 1ul,
                    testList = [1,2,3],
                    testSet = {4, 5, 6},
                    testMap = {7: true, 8 : false, 9: true},
                    nInt = -2147483647;
        }

        () as CoolStuff = com.weather.streamsx.cassandra::CassandraSink(Greeting) {
            param
                connectionConfigZNode: "/cassandra_config";
                nullMapZnode: "/null_values";
        }
}

The znodes specify the connection to a dev Cassandra cluster and specifies that the null value for "nint" is -2147483647.

And here's a sample of the output, which I am pulling using CQL using a call that you should never ever use on a real table :)

cqlsh:testkeyspace> select * from testtable;

 count | greeting       | nint | testlist  | testmap                      | testset
-------+----------------+------+-----------+------------------------------+-----------
    19 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
     2 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    24 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
     3 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    35 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    30 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    16 | Hello Streams! | null | [1, 2, 3] | {7: True, 8: False, 9: True} | {4, 5, 6}
    ... etc etc

Future Work

  • Support for Streams 4.2 and Cassandra 3.x
  • Consistent Region support
  • Cassandra read operator
@mikespicer
Copy link
Collaborator

+1
FYI, for the configuration you currently store in Zookeeper we added secure application config in V4.2 that could be used for configuration. See details here http://www.ibm.com/support/knowledgecenter/SSCRJU_4.2.0/com.ibm.streams.dev.doc/doc/creating-secure-app-configs-dev.html

@petenicholls
Copy link
Member

+1

@chanskw
Copy link
Collaborator

chanskw commented Oct 6, 2016

+1! very nice proposal btw!

@leongor
Copy link

leongor commented Oct 7, 2016

+1

2016-10-07 0:04 GMT+03:00 Samantha Chan notifications@github.com:

+1! very nice proposal btw!


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#99 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvlA2CSLe0VJaqiVc0FQUgWkWdSW0tJks5qxWJLgaJpZM4KQRUL
.

Best regards,
Leonid Gorelik.

@petenicholls
Copy link
Member

Repository creation underway.

@ecurtin
Copy link
Author

ecurtin commented Oct 7, 2016

Sweet deal, I see the repo that got created!
Two questions

  1. Am I good to go ahead and start pushing code?
  2. I've been doing releases for internal teams at TWC, so I'm currently on version 1.3.0. Can I keep that versioning here or do I need to start over? Either is fine, just don't want to step on toes :)

@chanskw
Copy link
Collaborator

chanskw commented Oct 11, 2016

@ecurtin before you can push code, can you please sign this document:
https://github.com/IBMStreams/administration/blob/master/IBMStreams-cla-individual.pdf

Yes, I think you can keep the current version number.

@chanskw
Copy link
Collaborator

chanskw commented Oct 11, 2016

Repository created and set up. @ecurtin and @cin are initial committers.

@chanskw chanskw closed this as completed Oct 11, 2016
@ecurtin
Copy link
Author

ecurtin commented Oct 12, 2016

Woo!! Thanks all!! Here it is! https://github.com/IBMStreams/streamsx.cassandra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants