From e3dd18bcba2c796c75b839574e1d3eb9ccb8abf1 Mon Sep 17 00:00:00 2001 From: DuyHai DOAN Date: Mon, 2 Nov 2015 12:57:29 +0100 Subject: [PATCH 1/8] [ZEPPELIN-382] Add Documentation for Cassandra interpreter in the doc pages --- docs/docs/index.md | 54 ++ docs/docs/interpreter/cassandra.md | 807 +++++++++++++++++++++++++++++ 2 files changed, 861 insertions(+) create mode 100644 docs/docs/index.md create mode 100644 docs/docs/interpreter/cassandra.md diff --git a/docs/docs/index.md b/docs/docs/index.md new file mode 100644 index 00000000000..2c14026fedf --- /dev/null +++ b/docs/docs/index.md @@ -0,0 +1,54 @@ +--- +layout: page +title: "Docs" +description: "" +group: nav-right +--- +{% include JB/setup %} + +### Install + +* [Install](./install/install.html) +* [YARN Install](./install/yarn_install.html) + +### Tutorial + +* [Tutorial](./tutorial/tutorial.html) + +### Interpreter + +**[Interpreters in zeppelin](manual/interpreters.html)** + +* [cassandra](./interpreter/cassandra.html) +* [flink](./interpreter/flink.html) +* [geode](./interpreter/geode.html) +* [hive](../docs/pleasecontribute.html) +* [ignite](../docs/pleasecontribute.html) +* [lens](./interpreter/lens.html) +* [md](../docs/pleasecontribute.html) +* [postgresql, hawq](./interpreter/postgresql.html) +* [sh](../docs/pleasecontribute.html) +* [spark](./interpreter/spark.html) +* [tajo](../docs/pleasecontribute.html) + +### Display System + +* [text](./displaysystem/display.html) +* [html](./displaysystem/display.html#html) +* [table](./displaysystem/table.html) +* [angular](./displaysystem/angular.html) (Beta) + +### Manual + +* [Dynamic Form](./manual/dynamicform.html) +* [Notebook as Homepage](./manual/notebookashomepage.html) + +### REST API + * [Interpreter API](./rest-api/rest-interpreter.html) + * [Notebook API](./rest-api/rest-notebook.html) + +### Development + +* [Writing Zeppelin Interpreter](./development/writingzeppelininterpreter.html) +* [How to contribute (code)](./development/howtocontribute.html) +* [How to contribute (website)](./development/howtocontributewebsite.html) diff --git a/docs/docs/interpreter/cassandra.md b/docs/docs/interpreter/cassandra.md new file mode 100644 index 00000000000..b53295c1468 --- /dev/null +++ b/docs/docs/interpreter/cassandra.md @@ -0,0 +1,807 @@ +--- +layout: page +title: "Cassandra Interpreter" +description: "Cassandra Interpreter" +group: manual +--- +{% include JB/setup %} + +
+## 1. Cassandra CQL Interpreter for Apache Zeppelin + +
+ + + + + + + + + + + +
NameClassDescription
%cassandraCassandraInterpreterProvides interpreter for Apache Cassandra CQL query language
+ +
+ +## 2. Enabling Cassandra Interpreter + + In a notebook, to enable the **Cassandra** interpreter, click on the **Gear** icon and select **Cassandra** + +
+ ![Interpreter Binding](/assets/themes/zeppelin/img/docs-img/cassandra-InterpreterBinding.png) + + ![Interpreter Selection](/assets/themes/zeppelin/img/docs-img/cassandra-InterpreterSelection.png) +
+ +
+ +## 3. Using the Cassandra Interpreter + + In a paragraph, use **_%cassandra_** to select the **Cassandra** interpreter and then input all commands. + + To access the interactive help, type **HELP;** + +
+ ![Interactive Help](/assets/themes/zeppelin/img/docs-img/cassandra-InteractiveHelp.png) +
+ +
+ +## 4. Interpreter Commands + + The **Cassandra** interpreter accepts the following commands + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Command TypeCommand NameDescription
Help commandHELPDisplay the interactive help menu
Schema commandsDESCRIBE KEYSPACE, DESCRIBE CLUSTER, DESCRIBE TABLES ...Custom commands to describe the Cassandra schema
Option commands@consistency, @retryPolicy, @fetchSize ...Inject runtime options to all statements in the paragraph
Prepared statement commands@prepare, @bind, @remove_preparedLet you register a prepared command and re-use it later by injecting bound values
Native CQL statementsAll CQL-compatible statements (SELECT, INSERT, CREATE ...)All CQL statements are executed directly against the Cassandra server
+
+ +
+## 5. CQL statements + +This interpreter is compatible with any CQL statement supported by Cassandra. Ex: + +```sql + + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + SELECT * FROM users WHERE login='jdoe'; +``` + +Each statement should be separated by a semi-colon ( **;** ) except the special commands below: + +1. @prepare +2. @bind +3. @remove_prepare +4. @consistency +5. @serialConsistency +6. @timestamp +7. @retryPolicy +8. @fetchSize + +Multi-line statements as well as multiple statements on the same line are also supported as long as they are +separated by a semi-colon. Ex: + +```sql + + USE spark_demo; + + SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM countries LIMIT 1; + + SELECT * + FROM artists + WHERE login='jlennon'; +``` + +Batch statements are supported and can span multiple lines, as well as DDL(CREATE/ALTER/DROP) statements: + +```sql + + BEGIN BATCH + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC'); + APPLY BATCH; + + CREATE TABLE IF NOT EXISTS test( + key int PRIMARY KEY, + value text + ); +``` + +CQL statements are case-insensitive (except for column names and values). +This means that the following statements are equivalent and valid: + +```sql + + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + Insert into users(login,name) vAlues('hsue','Helen SUE'); +``` + +The complete list of all CQL statements and versions can be found below: +
+ + + + + + + + + + + + + + + + + +
Cassandra VersionDocumentation Link
2.2 + + http://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html + +
2.1 & 2.0 + + http://docs.datastax.com/en/cql/3.1/cql/cql_intro_c.html + +
1.2 + + http://docs.datastax.com/en/cql/3.0/cql/aboutCQL.html + +
+
+ +
+ +## 6. Comments in statements + +It is possible to add comments between statements. Single line comments start with the hash sign (#). Multi-line comments are enclosed between /** and **/. Ex: + +```sql + + #First comment + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + + /** + Multi line + comments + **/ + Insert into users(login,name) vAlues('hsue','Helen SUE'); +``` + +
+ +## 7. Syntax Validation + +The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors. +All CQL-related syntax validation is delegated directly to **Cassandra** + +Most of the time, syntax errors are due to **missing semi-colons** between statements or **typo errors**. + +
+ +## 8. Schema commands + +To make schema discovery easier and more interactive, the following commands are supported: +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CommandDescription
DESCRIBE CLUSTER;Show the current cluster name and its partitioner
DESCRIBE KEYSPACES;List all existing keyspaces in the cluster and their configuration (replication factor, durable write ...)
DESCRIBE TABLES;List all existing keyspaces in the cluster and for each, all the tables name
DESCRIBE TYPES;List all existing user defined types in the current (logged) keyspace
DESCRIBE FUNCTIONS <keyspace_name>;List all existing user defined functions in the given keyspace
DESCRIBE AGGREGATES <keyspace_name>;List all existing user defined aggregates in the given keyspace
DESCRIBE KEYSPACE <keyspace_name>;Describe the given keyspace configuration and all its table details (name, columns, ...)
DESCRIBE TABLE (<keyspace_name>).<table_name>; + Describe the given table. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. + If no table is found, an error message is raised +
DESCRIBE TYPE (<keyspace_name>).<type_name>; + Describe the given type(UDT). If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. + If no type is found, an error message is raised +
DESCRIBE FUNCTION (<keyspace_name>).<function_name>;Describe the given user defined function. The keyspace is optional
DESCRIBE AGGREGATE (<keyspace_name>).<aggregate_name>;Describe the given user defined aggregate. The keyspace is optional
+
+ +The schema objects (cluster, keyspace, table, type, function and aggregate) are displayed in a tabular format. +There is a drop-down menu on the top left corner to expand objects details. On the top right menu is shown the Icon legend. + +
+
+ ![Describe Schema](/assets/themes/zeppelin/img/docs-img/cassandra-DescribeSchema.png) +
+ +
+ +## 9. Runtime Parameters + +Sometimes you want to be able to pass runtime query parameters to your statements. +Those parameters are not part of the CQL specs and are specific to the interpreter. +Below is the list of all parameters: + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterSyntaxDescription
Consistency Level@consistency=valueApply the given consistency level to all queries in the paragraph
Serial Consistency Level@serialConsistency=valueApply the given serial consistency level to all queries in the paragraph
Timestamp@timestamp=long value + Apply the given timestamp to all queries in the paragraph. + Please note that timestamp value passed directly in CQL statement will override this value +
Retry Policy@retryPolicy=valueApply the given retry policy to all queries in the paragraph
Fetch Size@fetchSize=integer valueApply the given fetch size to all queries in the paragraph
+
+ + Some parameters only accept restricted values: + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterPossible Values
Consistency LevelALL, ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM
Serial Consistency LevelSERIAL, LOCAL_SERIAL
TimestampAny long value
Retry PolicyDEFAULT, DOWNGRADING_CONSISTENCY, FALLTHROUGH, LOGGING_DEFAULT, LOGGING_DOWNGRADING, LOGGING_FALLTHROUGH
Fetch SizeAny integer value
+
+ +>Please note that you should **not** add semi-colon ( **;** ) at the end of each parameter statement + +Some examples: + +```sql + + CREATE TABLE IF NOT EXISTS spark_demo.ts( + key int PRIMARY KEY, + value text + ); + TRUNCATE spark_demo.ts; + + # Timestamp in the past + @timestamp=10 + + # Force timestamp directly in the first insert + INSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100; + + # Select some data to make the clock turn + SELECT * FROM spark_demo.albums LIMIT 100; + + # Now insert using the timestamp parameter set at the beginning(10) + INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert'); + + # Check for the result. You should see 'first insert' + SELECT value FROM spark_demo.ts WHERE key=1; +``` + +Some remarks about query parameters: + +> 1. **many** query parameters can be set in the same paragraph +> 2. if the **same** query parameter is set many time with different values, the interpreter only take into account the first value +> 3. each query parameter applies to **all CQL statements** in the same paragraph, unless you override the option using plain CQL text (like forcing timestamp with the USING clause) +> 4. the order of each query parameter with regard to CQL statement does not matter + +
+ +## 10. Support for Prepared Statements + +For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values. +This interpreter provides 3 commands to handle prepared and bound statements: + +1. **@prepare** +2. **@bind** +3. **@remove_prepared** + +Example: + +``` + + @prepare[statement_name]=... + + @bind[statement_name]=’text’, 1223, ’2015-07-30 12:00:01’, null, true, [‘list_item1’, ’list_item2’] + + @bind[statement_name_with_no_bound_value] + + @remove_prepare[statement_name] +``` + +
+#### a. @prepare +
+You can use the syntax _"@prepare[statement_name]=SELECT ..."_ to create a prepared statement. +The _statement_name_ is **mandatory** because the interpreter prepares the given statement with the Java driver and +saves the generated prepared statement in an **internal hash map**, using the provided _statement_name_ as search key. + +> Please note that this internal prepared statement map is shared with **all notebooks** and **all paragraphs** because +there is only one instance of the interpreter for Cassandra + +> If the interpreter encounters **many** @prepare for the **same _statement_name_ (key)**, only the **first** statement will be taken into account. + +Example: + +``` + + @prepare[select]=SELECT * FROM spark_demo.albums LIMIT ? + + @prepare[select]=SELECT * FROM spark_demo.artists LIMIT ? +``` + +For the above example, the prepared statement is _SELECT * FROM spark_demo.albums LIMIT ?_. +_SELECT * FROM spark_demo.artists LIMIT ?_ is ignored because an entry already exists in the prepared statements map with the key select. + +In the context of **Zeppelin**, a notebook can be scheduled to be executed at regular interval, +thus it is necessary to **avoid re-preparing many time the same statement (considered an anti-pattern)**. +
+
+#### b. @bind +
+Once the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it: + +``` + @bind[select_first]=10 +``` + +Bound values are not mandatory for the **@bind** statement. However if you provide bound values, they need to comply to some syntax: + +* String values should be enclosed between simple quotes ( ‘ ) +* Date values should be enclosed between simple quotes ( ‘ ) and respect the formats: + 1. yyyy-MM-dd HH:MM:ss + 2. yyyy-MM-dd HH:MM:ss.SSS +* **null** is parsed as-is +* **boolean** (true|false) are parsed as-is +* collection values must follow the **[standard CQL syntax]**: + * list: [‘list_item1’, ’list_item2’, ...] + * set: {‘set_item1’, ‘set_item2’, …} + * map: {‘key1’: ‘val1’, ‘key2’: ‘val2’, …} +* **tuple** values should be enclosed between parenthesis (see **[Tuple CQL syntax]**): (‘text’, 123, true) +* **udt** values should be enclosed between brackets (see **[UDT CQL syntax]**): {stree_name: ‘Beverly Hills’, number: 104, zip_code: 90020, state: ‘California’, …} + +> It is possible to use the @bind statement inside a batch: +> +> ```sql +> +> BEGIN BATCH +> @bind[insert_user]='jdoe','John DOE' +> UPDATE users SET age = 27 WHERE login='hsue'; +> APPLY BATCH; +> ``` + +
+#### c. @remove_prepare +
+To avoid for a prepared statement to stay forever in the prepared statement map, you can use the +**@remove_prepare[statement_name]** syntax to remove it. +Removing a non-existing prepared statement yields no error. + +
+ +## 11. Using Dynamic Forms + +Instead of hard-coding your CQL queries, it is possible to use the mustache syntax ( **\{\{ \}\}** ) to inject simple value or multiple choices forms. + +The syntax for simple parameter is: **\{\{input_Label=default value\}\}**. The default value is mandatory because the first time the paragraph is executed, +we launch the CQL query before rendering the form so at least one value should be provided. + +The syntax for multiple choices parameter is: **\{\{input_Label=value1 | value2 | … | valueN \}\}**. By default the first choice is used for CQL query +the first time the paragraph is executed. + +Example: + +{% raw %} + #Secondary index on performer style + SELECT name, country, performer + FROM spark_demo.performers + WHERE name='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}' + AND styles CONTAINS '{{style=Rock}}'; +{% endraw %} + + +In the above example, the first CQL query will be executed for _performer='Sheryl Crow' AND style='Rock'_. +For subsequent queries, you can change the value directly using the form. + +> Please note that we enclosed the **\{\{ \}\}** block between simple quotes ( **'** ) because Cassandra expects a String here. +> We could have also use the **\{\{style='Rock'\}\}** syntax but this time, the value displayed on the form is **_'Rock'_** and not **_Rock_**. + +It is also possible to use dynamic forms for **prepared statements**: + +{% raw %} + + @bind[select]=='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}', '{{style=Rock}}' + +{% endraw %} + +
+ +## 12. Execution parallelism and shared states + +It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. +_Asynchronous execution_ is only possible when it is possible to return a `Future` value in the `InterpreterResult`. +It may be an interesting proposal for the **Zeppelin** project. + +Another caveat is that the same `com.datastax.driver.core.Session` object is used for **all** notebooks and paragraphs. +Consequently, if you use the **USE _keyspace name_;** statement to log into a keyspace, it will change the keyspace for +**all current users** of the **Cassandra** interpreter because we only create 1 `com.datastax.driver.core.Session` object +per instance of **Cassandra** interpreter. + +The same remark does apply to the **prepared statement hash map**, it is shared by **all users** using the same instance of **Cassandra** interpreter. + +Until **Zeppelin** offers a real multi-users separation, there is a work-around to segregate user environment and states: +_create different **Cassandra** interpreter instances_ + +For this, first go to the **Interpreter** menu and click on the **Create** button +
+
+
+ ![Create Interpreter](/assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInstance.png) +
+ +In the interpreter creation form, put **cass-instance2** as **Name** and select the **cassandra** +in the interpreter drop-down list +
+
+
+ ![Interpreter Name](/assets/themes/zeppelin/img/docs-img/cassandra-InterpreterName.png) +
+ + Click on **Save** to create the new interpreter instance. Now you should be able to see it in the interpreter list. + +
+
+
+ ![Interpreter In List](/assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInList.png) +
+ +Go back to your notebook and click on the **Gear** icon to configure interpreter bindings. +You should be able to see and select the **cass-instance2** interpreter instance in the available +interpreter list instead of the standard **cassandra** instance. + +
+
+
+ ![Interpreter Instance Selection](/assets/themes/zeppelin/img/docs-img/cassandra-InterpreterInstanceSelection.png) +
+ +
+ +## 13. Interpreter Configuration + +To configure the **Cassandra** interpreter, go to the **Interpreter** menu and scroll down to change the parameters. +The **Cassandra** interpreter is using the official **[Cassandra Java Driver]** and most of the parameters are used +to configure the Java driver + +Below are the configuration parameters and their default value. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Property NameDescriptionDefault Value
cassandra.clusterName of the Cassandra cluster to connect toTest Cluster
cassandra.compression.protocolOn wire compression. Possible values are: NONE, SNAPPY, LZ4NONE
cassandra.credentials.usernameIf security is enable, provide the loginnone
cassandra.credentials.passwordIf security is enable, provide the passwordnone
cassandra.hosts + Comma separated Cassandra hosts (DNS name or IP address). +
+ Ex: '192.168.0.12,node2,node3' +
localhost
cassandra.interpreter.parallelismNumber of concurrent paragraphs(queries block) that can be executed10
cassandra.keyspace + Default keyspace to connect to. + + It is strongly recommended to let the default value + and prefix the table name with the actual keyspace + in all of your queries + + system
cassandra.load.balancing.policy + Load balancing policy. Default = new TokenAwarePolicy(new DCAwareRoundRobinPolicy()) + To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. + At runtime the interpreter will instantiate the policy using + Class.forName(FQCN) + DEFAULT
cassandra.max.schema.agreement.wait.secondCassandra max schema agreement wait in second10
cassandra.pooling.core.connection.per.host.localProtocol V2 and below default = 2. Protocol V3 and above default = 12
cassandra.pooling.core.connection.per.host.remoteProtocol V2 and below default = 1. Protocol V3 and above default = 11
cassandra.pooling.heartbeat.interval.secondsCassandra pool heartbeat interval in secs30
cassandra.pooling.idle.timeout.secondsCassandra idle time out in seconds120
cassandra.pooling.max.connection.per.host.localProtocol V2 and below default = 8. Protocol V3 and above default = 18
cassandra.pooling.max.connection.per.host.remoteProtocol V2 and below default = 2. Protocol V3 and above default = 12
cassandra.pooling.max.request.per.connection.localProtocol V2 and below default = 128. Protocol V3 and above default = 1024128
cassandra.pooling.max.request.per.connection.remoteProtocol V2 and below default = 128. Protocol V3 and above default = 256128
cassandra.pooling.new.connection.threshold.localProtocol V2 and below default = 100. Protocol V3 and above default = 800100
cassandra.pooling.new.connection.threshold.remoteProtocol V2 and below default = 100. Protocol V3 and above default = 200100
cassandra.pooling.pool.timeout.millisecsCassandra pool time out in millisecs5000
cassandra.protocol.versionCassandra binary protocol version3
cassandra.query.default.consistency + Cassandra query default consistency level +
+ Available values: ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL +
ONE
cassandra.query.default.fetchSizeCassandra query default fetch size5000
cassandra.query.default.serial.consistency + Cassandra query default serial consistency level +
+ Available values: SERIAL, LOCAL_SERIAL +
SERIAL
cassandra.reconnection.policy + Cassandra Reconnection Policy. + Default = new ExponentialReconnectionPolicy(1000, 10 * 60 * 1000) + To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. + At runtime the interpreter will instantiate the policy using + Class.forName(FQCN) + DEFAULT
cassandra.retry.policy + Cassandra Retry Policy. + Default = DefaultRetryPolicy.INSTANCE + To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. + At runtime the interpreter will instantiate the policy using + Class.forName(FQCN) + DEFAULT
cassandra.socket.connection.timeout.millisecsCassandra socket default connection timeout in millisecs500
cassandra.socket.read.timeout.millisecsCassandra socket read timeout in millisecs12000
cassandra.socket.tcp.no_delayCassandra socket TCP no delaytrue
cassandra.speculative.execution.policy + Cassandra Speculative Execution Policy. + Default = NoSpeculativeExecutionPolicy.INSTANCE + To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. + At runtime the interpreter will instantiate the policy using + Class.forName(FQCN) + DEFAULT
+ +
+ +## 14. Bugs & Contacts + + If you encounter a bug for this interpreter, please create a **[JIRA]** ticket and ping me on Twitter + at **[@doanduyhai]** + + +[Cassandra Java Driver]: https://github.com/datastax/java-driver +[standard CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html +[Tuple CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tupleType.html +[UDT CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_using/cqlUseUDT.html +[JIRA]: https://issues.apache.org/jira/browse/ZEPPELIN-382?jql=project%20%3D%20ZEPPELIN +[@doanduyhai]: https://twitter.com/doanduyhai From 01716e1592ceafad4d3637e5e763334016137a69 Mon Sep 17 00:00:00 2001 From: DuyHai DOAN Date: Wed, 6 Jan 2016 10:30:35 +0100 Subject: [PATCH 2/8] Cassandra Interpreter V2 doc --- docs/docs/index.md | 54 -- docs/docs/interpreter/cassandra.md | 807 -------------------- docs/interpreter/cassandra.md | 1095 +++++++++++++++------------- 3 files changed, 602 insertions(+), 1354 deletions(-) delete mode 100644 docs/docs/index.md delete mode 100644 docs/docs/interpreter/cassandra.md diff --git a/docs/docs/index.md b/docs/docs/index.md deleted file mode 100644 index 2c14026fedf..00000000000 --- a/docs/docs/index.md +++ /dev/null @@ -1,54 +0,0 @@ ---- -layout: page -title: "Docs" -description: "" -group: nav-right ---- -{% include JB/setup %} - -### Install - -* [Install](./install/install.html) -* [YARN Install](./install/yarn_install.html) - -### Tutorial - -* [Tutorial](./tutorial/tutorial.html) - -### Interpreter - -**[Interpreters in zeppelin](manual/interpreters.html)** - -* [cassandra](./interpreter/cassandra.html) -* [flink](./interpreter/flink.html) -* [geode](./interpreter/geode.html) -* [hive](../docs/pleasecontribute.html) -* [ignite](../docs/pleasecontribute.html) -* [lens](./interpreter/lens.html) -* [md](../docs/pleasecontribute.html) -* [postgresql, hawq](./interpreter/postgresql.html) -* [sh](../docs/pleasecontribute.html) -* [spark](./interpreter/spark.html) -* [tajo](../docs/pleasecontribute.html) - -### Display System - -* [text](./displaysystem/display.html) -* [html](./displaysystem/display.html#html) -* [table](./displaysystem/table.html) -* [angular](./displaysystem/angular.html) (Beta) - -### Manual - -* [Dynamic Form](./manual/dynamicform.html) -* [Notebook as Homepage](./manual/notebookashomepage.html) - -### REST API - * [Interpreter API](./rest-api/rest-interpreter.html) - * [Notebook API](./rest-api/rest-notebook.html) - -### Development - -* [Writing Zeppelin Interpreter](./development/writingzeppelininterpreter.html) -* [How to contribute (code)](./development/howtocontribute.html) -* [How to contribute (website)](./development/howtocontributewebsite.html) diff --git a/docs/docs/interpreter/cassandra.md b/docs/docs/interpreter/cassandra.md deleted file mode 100644 index b53295c1468..00000000000 --- a/docs/docs/interpreter/cassandra.md +++ /dev/null @@ -1,807 +0,0 @@ ---- -layout: page -title: "Cassandra Interpreter" -description: "Cassandra Interpreter" -group: manual ---- -{% include JB/setup %} - -
-## 1. Cassandra CQL Interpreter for Apache Zeppelin - -
- - - - - - - - - - - -
NameClassDescription
%cassandraCassandraInterpreterProvides interpreter for Apache Cassandra CQL query language
- -
- -## 2. Enabling Cassandra Interpreter - - In a notebook, to enable the **Cassandra** interpreter, click on the **Gear** icon and select **Cassandra** - -
- ![Interpreter Binding](/assets/themes/zeppelin/img/docs-img/cassandra-InterpreterBinding.png) - - ![Interpreter Selection](/assets/themes/zeppelin/img/docs-img/cassandra-InterpreterSelection.png) -
- -
- -## 3. Using the Cassandra Interpreter - - In a paragraph, use **_%cassandra_** to select the **Cassandra** interpreter and then input all commands. - - To access the interactive help, type **HELP;** - -
- ![Interactive Help](/assets/themes/zeppelin/img/docs-img/cassandra-InteractiveHelp.png) -
- -
- -## 4. Interpreter Commands - - The **Cassandra** interpreter accepts the following commands - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Command TypeCommand NameDescription
Help commandHELPDisplay the interactive help menu
Schema commandsDESCRIBE KEYSPACE, DESCRIBE CLUSTER, DESCRIBE TABLES ...Custom commands to describe the Cassandra schema
Option commands@consistency, @retryPolicy, @fetchSize ...Inject runtime options to all statements in the paragraph
Prepared statement commands@prepare, @bind, @remove_preparedLet you register a prepared command and re-use it later by injecting bound values
Native CQL statementsAll CQL-compatible statements (SELECT, INSERT, CREATE ...)All CQL statements are executed directly against the Cassandra server
-
- -
-## 5. CQL statements - -This interpreter is compatible with any CQL statement supported by Cassandra. Ex: - -```sql - - INSERT INTO users(login,name) VALUES('jdoe','John DOE'); - SELECT * FROM users WHERE login='jdoe'; -``` - -Each statement should be separated by a semi-colon ( **;** ) except the special commands below: - -1. @prepare -2. @bind -3. @remove_prepare -4. @consistency -5. @serialConsistency -6. @timestamp -7. @retryPolicy -8. @fetchSize - -Multi-line statements as well as multiple statements on the same line are also supported as long as they are -separated by a semi-colon. Ex: - -```sql - - USE spark_demo; - - SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM countries LIMIT 1; - - SELECT * - FROM artists - WHERE login='jlennon'; -``` - -Batch statements are supported and can span multiple lines, as well as DDL(CREATE/ALTER/DROP) statements: - -```sql - - BEGIN BATCH - INSERT INTO users(login,name) VALUES('jdoe','John DOE'); - INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC'); - APPLY BATCH; - - CREATE TABLE IF NOT EXISTS test( - key int PRIMARY KEY, - value text - ); -``` - -CQL statements are case-insensitive (except for column names and values). -This means that the following statements are equivalent and valid: - -```sql - - INSERT INTO users(login,name) VALUES('jdoe','John DOE'); - Insert into users(login,name) vAlues('hsue','Helen SUE'); -``` - -The complete list of all CQL statements and versions can be found below: -
- - - - - - - - - - - - - - - - - -
Cassandra VersionDocumentation Link
2.2 - - http://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html - -
2.1 & 2.0 - - http://docs.datastax.com/en/cql/3.1/cql/cql_intro_c.html - -
1.2 - - http://docs.datastax.com/en/cql/3.0/cql/aboutCQL.html - -
-
- -
- -## 6. Comments in statements - -It is possible to add comments between statements. Single line comments start with the hash sign (#). Multi-line comments are enclosed between /** and **/. Ex: - -```sql - - #First comment - INSERT INTO users(login,name) VALUES('jdoe','John DOE'); - - /** - Multi line - comments - **/ - Insert into users(login,name) vAlues('hsue','Helen SUE'); -``` - -
- -## 7. Syntax Validation - -The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors. -All CQL-related syntax validation is delegated directly to **Cassandra** - -Most of the time, syntax errors are due to **missing semi-colons** between statements or **typo errors**. - -
- -## 8. Schema commands - -To make schema discovery easier and more interactive, the following commands are supported: -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CommandDescription
DESCRIBE CLUSTER;Show the current cluster name and its partitioner
DESCRIBE KEYSPACES;List all existing keyspaces in the cluster and their configuration (replication factor, durable write ...)
DESCRIBE TABLES;List all existing keyspaces in the cluster and for each, all the tables name
DESCRIBE TYPES;List all existing user defined types in the current (logged) keyspace
DESCRIBE FUNCTIONS <keyspace_name>;List all existing user defined functions in the given keyspace
DESCRIBE AGGREGATES <keyspace_name>;List all existing user defined aggregates in the given keyspace
DESCRIBE KEYSPACE <keyspace_name>;Describe the given keyspace configuration and all its table details (name, columns, ...)
DESCRIBE TABLE (<keyspace_name>).<table_name>; - Describe the given table. If the keyspace is not provided, the current logged in keyspace is used. - If there is no logged in keyspace, the default system keyspace is used. - If no table is found, an error message is raised -
DESCRIBE TYPE (<keyspace_name>).<type_name>; - Describe the given type(UDT). If the keyspace is not provided, the current logged in keyspace is used. - If there is no logged in keyspace, the default system keyspace is used. - If no type is found, an error message is raised -
DESCRIBE FUNCTION (<keyspace_name>).<function_name>;Describe the given user defined function. The keyspace is optional
DESCRIBE AGGREGATE (<keyspace_name>).<aggregate_name>;Describe the given user defined aggregate. The keyspace is optional
-
- -The schema objects (cluster, keyspace, table, type, function and aggregate) are displayed in a tabular format. -There is a drop-down menu on the top left corner to expand objects details. On the top right menu is shown the Icon legend. - -
-
- ![Describe Schema](/assets/themes/zeppelin/img/docs-img/cassandra-DescribeSchema.png) -
- -
- -## 9. Runtime Parameters - -Sometimes you want to be able to pass runtime query parameters to your statements. -Those parameters are not part of the CQL specs and are specific to the interpreter. -Below is the list of all parameters: - -
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ParameterSyntaxDescription
Consistency Level@consistency=valueApply the given consistency level to all queries in the paragraph
Serial Consistency Level@serialConsistency=valueApply the given serial consistency level to all queries in the paragraph
Timestamp@timestamp=long value - Apply the given timestamp to all queries in the paragraph. - Please note that timestamp value passed directly in CQL statement will override this value -
Retry Policy@retryPolicy=valueApply the given retry policy to all queries in the paragraph
Fetch Size@fetchSize=integer valueApply the given fetch size to all queries in the paragraph
-
- - Some parameters only accept restricted values: - -
-
- - - - - - - - - - - - - - - - - - - - - - - - - -
ParameterPossible Values
Consistency LevelALL, ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM
Serial Consistency LevelSERIAL, LOCAL_SERIAL
TimestampAny long value
Retry PolicyDEFAULT, DOWNGRADING_CONSISTENCY, FALLTHROUGH, LOGGING_DEFAULT, LOGGING_DOWNGRADING, LOGGING_FALLTHROUGH
Fetch SizeAny integer value
-
- ->Please note that you should **not** add semi-colon ( **;** ) at the end of each parameter statement - -Some examples: - -```sql - - CREATE TABLE IF NOT EXISTS spark_demo.ts( - key int PRIMARY KEY, - value text - ); - TRUNCATE spark_demo.ts; - - # Timestamp in the past - @timestamp=10 - - # Force timestamp directly in the first insert - INSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100; - - # Select some data to make the clock turn - SELECT * FROM spark_demo.albums LIMIT 100; - - # Now insert using the timestamp parameter set at the beginning(10) - INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert'); - - # Check for the result. You should see 'first insert' - SELECT value FROM spark_demo.ts WHERE key=1; -``` - -Some remarks about query parameters: - -> 1. **many** query parameters can be set in the same paragraph -> 2. if the **same** query parameter is set many time with different values, the interpreter only take into account the first value -> 3. each query parameter applies to **all CQL statements** in the same paragraph, unless you override the option using plain CQL text (like forcing timestamp with the USING clause) -> 4. the order of each query parameter with regard to CQL statement does not matter - -
- -## 10. Support for Prepared Statements - -For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values. -This interpreter provides 3 commands to handle prepared and bound statements: - -1. **@prepare** -2. **@bind** -3. **@remove_prepared** - -Example: - -``` - - @prepare[statement_name]=... - - @bind[statement_name]=’text’, 1223, ’2015-07-30 12:00:01’, null, true, [‘list_item1’, ’list_item2’] - - @bind[statement_name_with_no_bound_value] - - @remove_prepare[statement_name] -``` - -
-#### a. @prepare -
-You can use the syntax _"@prepare[statement_name]=SELECT ..."_ to create a prepared statement. -The _statement_name_ is **mandatory** because the interpreter prepares the given statement with the Java driver and -saves the generated prepared statement in an **internal hash map**, using the provided _statement_name_ as search key. - -> Please note that this internal prepared statement map is shared with **all notebooks** and **all paragraphs** because -there is only one instance of the interpreter for Cassandra - -> If the interpreter encounters **many** @prepare for the **same _statement_name_ (key)**, only the **first** statement will be taken into account. - -Example: - -``` - - @prepare[select]=SELECT * FROM spark_demo.albums LIMIT ? - - @prepare[select]=SELECT * FROM spark_demo.artists LIMIT ? -``` - -For the above example, the prepared statement is _SELECT * FROM spark_demo.albums LIMIT ?_. -_SELECT * FROM spark_demo.artists LIMIT ?_ is ignored because an entry already exists in the prepared statements map with the key select. - -In the context of **Zeppelin**, a notebook can be scheduled to be executed at regular interval, -thus it is necessary to **avoid re-preparing many time the same statement (considered an anti-pattern)**. -
-
-#### b. @bind -
-Once the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it: - -``` - @bind[select_first]=10 -``` - -Bound values are not mandatory for the **@bind** statement. However if you provide bound values, they need to comply to some syntax: - -* String values should be enclosed between simple quotes ( ‘ ) -* Date values should be enclosed between simple quotes ( ‘ ) and respect the formats: - 1. yyyy-MM-dd HH:MM:ss - 2. yyyy-MM-dd HH:MM:ss.SSS -* **null** is parsed as-is -* **boolean** (true|false) are parsed as-is -* collection values must follow the **[standard CQL syntax]**: - * list: [‘list_item1’, ’list_item2’, ...] - * set: {‘set_item1’, ‘set_item2’, …} - * map: {‘key1’: ‘val1’, ‘key2’: ‘val2’, …} -* **tuple** values should be enclosed between parenthesis (see **[Tuple CQL syntax]**): (‘text’, 123, true) -* **udt** values should be enclosed between brackets (see **[UDT CQL syntax]**): {stree_name: ‘Beverly Hills’, number: 104, zip_code: 90020, state: ‘California’, …} - -> It is possible to use the @bind statement inside a batch: -> -> ```sql -> -> BEGIN BATCH -> @bind[insert_user]='jdoe','John DOE' -> UPDATE users SET age = 27 WHERE login='hsue'; -> APPLY BATCH; -> ``` - -
-#### c. @remove_prepare -
-To avoid for a prepared statement to stay forever in the prepared statement map, you can use the -**@remove_prepare[statement_name]** syntax to remove it. -Removing a non-existing prepared statement yields no error. - -
- -## 11. Using Dynamic Forms - -Instead of hard-coding your CQL queries, it is possible to use the mustache syntax ( **\{\{ \}\}** ) to inject simple value or multiple choices forms. - -The syntax for simple parameter is: **\{\{input_Label=default value\}\}**. The default value is mandatory because the first time the paragraph is executed, -we launch the CQL query before rendering the form so at least one value should be provided. - -The syntax for multiple choices parameter is: **\{\{input_Label=value1 | value2 | … | valueN \}\}**. By default the first choice is used for CQL query -the first time the paragraph is executed. - -Example: - -{% raw %} - #Secondary index on performer style - SELECT name, country, performer - FROM spark_demo.performers - WHERE name='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}' - AND styles CONTAINS '{{style=Rock}}'; -{% endraw %} - - -In the above example, the first CQL query will be executed for _performer='Sheryl Crow' AND style='Rock'_. -For subsequent queries, you can change the value directly using the form. - -> Please note that we enclosed the **\{\{ \}\}** block between simple quotes ( **'** ) because Cassandra expects a String here. -> We could have also use the **\{\{style='Rock'\}\}** syntax but this time, the value displayed on the form is **_'Rock'_** and not **_Rock_**. - -It is also possible to use dynamic forms for **prepared statements**: - -{% raw %} - - @bind[select]=='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}', '{{style=Rock}}' - -{% endraw %} - -
- -## 12. Execution parallelism and shared states - -It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. -_Asynchronous execution_ is only possible when it is possible to return a `Future` value in the `InterpreterResult`. -It may be an interesting proposal for the **Zeppelin** project. - -Another caveat is that the same `com.datastax.driver.core.Session` object is used for **all** notebooks and paragraphs. -Consequently, if you use the **USE _keyspace name_;** statement to log into a keyspace, it will change the keyspace for -**all current users** of the **Cassandra** interpreter because we only create 1 `com.datastax.driver.core.Session` object -per instance of **Cassandra** interpreter. - -The same remark does apply to the **prepared statement hash map**, it is shared by **all users** using the same instance of **Cassandra** interpreter. - -Until **Zeppelin** offers a real multi-users separation, there is a work-around to segregate user environment and states: -_create different **Cassandra** interpreter instances_ - -For this, first go to the **Interpreter** menu and click on the **Create** button -
-
-
- ![Create Interpreter](/assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInstance.png) -
- -In the interpreter creation form, put **cass-instance2** as **Name** and select the **cassandra** -in the interpreter drop-down list -
-
-
- ![Interpreter Name](/assets/themes/zeppelin/img/docs-img/cassandra-InterpreterName.png) -
- - Click on **Save** to create the new interpreter instance. Now you should be able to see it in the interpreter list. - -
-
-
- ![Interpreter In List](/assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInList.png) -
- -Go back to your notebook and click on the **Gear** icon to configure interpreter bindings. -You should be able to see and select the **cass-instance2** interpreter instance in the available -interpreter list instead of the standard **cassandra** instance. - -
-
-
- ![Interpreter Instance Selection](/assets/themes/zeppelin/img/docs-img/cassandra-InterpreterInstanceSelection.png) -
- -
- -## 13. Interpreter Configuration - -To configure the **Cassandra** interpreter, go to the **Interpreter** menu and scroll down to change the parameters. -The **Cassandra** interpreter is using the official **[Cassandra Java Driver]** and most of the parameters are used -to configure the Java driver - -Below are the configuration parameters and their default value. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Property NameDescriptionDefault Value
cassandra.clusterName of the Cassandra cluster to connect toTest Cluster
cassandra.compression.protocolOn wire compression. Possible values are: NONE, SNAPPY, LZ4NONE
cassandra.credentials.usernameIf security is enable, provide the loginnone
cassandra.credentials.passwordIf security is enable, provide the passwordnone
cassandra.hosts - Comma separated Cassandra hosts (DNS name or IP address). -
- Ex: '192.168.0.12,node2,node3' -
localhost
cassandra.interpreter.parallelismNumber of concurrent paragraphs(queries block) that can be executed10
cassandra.keyspace - Default keyspace to connect to. - - It is strongly recommended to let the default value - and prefix the table name with the actual keyspace - in all of your queries - - system
cassandra.load.balancing.policy - Load balancing policy. Default = new TokenAwarePolicy(new DCAwareRoundRobinPolicy()) - To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using - Class.forName(FQCN) - DEFAULT
cassandra.max.schema.agreement.wait.secondCassandra max schema agreement wait in second10
cassandra.pooling.core.connection.per.host.localProtocol V2 and below default = 2. Protocol V3 and above default = 12
cassandra.pooling.core.connection.per.host.remoteProtocol V2 and below default = 1. Protocol V3 and above default = 11
cassandra.pooling.heartbeat.interval.secondsCassandra pool heartbeat interval in secs30
cassandra.pooling.idle.timeout.secondsCassandra idle time out in seconds120
cassandra.pooling.max.connection.per.host.localProtocol V2 and below default = 8. Protocol V3 and above default = 18
cassandra.pooling.max.connection.per.host.remoteProtocol V2 and below default = 2. Protocol V3 and above default = 12
cassandra.pooling.max.request.per.connection.localProtocol V2 and below default = 128. Protocol V3 and above default = 1024128
cassandra.pooling.max.request.per.connection.remoteProtocol V2 and below default = 128. Protocol V3 and above default = 256128
cassandra.pooling.new.connection.threshold.localProtocol V2 and below default = 100. Protocol V3 and above default = 800100
cassandra.pooling.new.connection.threshold.remoteProtocol V2 and below default = 100. Protocol V3 and above default = 200100
cassandra.pooling.pool.timeout.millisecsCassandra pool time out in millisecs5000
cassandra.protocol.versionCassandra binary protocol version3
cassandra.query.default.consistency - Cassandra query default consistency level -
- Available values: ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL -
ONE
cassandra.query.default.fetchSizeCassandra query default fetch size5000
cassandra.query.default.serial.consistency - Cassandra query default serial consistency level -
- Available values: SERIAL, LOCAL_SERIAL -
SERIAL
cassandra.reconnection.policy - Cassandra Reconnection Policy. - Default = new ExponentialReconnectionPolicy(1000, 10 * 60 * 1000) - To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using - Class.forName(FQCN) - DEFAULT
cassandra.retry.policy - Cassandra Retry Policy. - Default = DefaultRetryPolicy.INSTANCE - To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using - Class.forName(FQCN) - DEFAULT
cassandra.socket.connection.timeout.millisecsCassandra socket default connection timeout in millisecs500
cassandra.socket.read.timeout.millisecsCassandra socket read timeout in millisecs12000
cassandra.socket.tcp.no_delayCassandra socket TCP no delaytrue
cassandra.speculative.execution.policy - Cassandra Speculative Execution Policy. - Default = NoSpeculativeExecutionPolicy.INSTANCE - To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using - Class.forName(FQCN) - DEFAULT
- -
- -## 14. Bugs & Contacts - - If you encounter a bug for this interpreter, please create a **[JIRA]** ticket and ping me on Twitter - at **[@doanduyhai]** - - -[Cassandra Java Driver]: https://github.com/datastax/java-driver -[standard CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html -[Tuple CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tupleType.html -[UDT CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_using/cqlUseUDT.html -[JIRA]: https://issues.apache.org/jira/browse/ZEPPELIN-382?jql=project%20%3D%20ZEPPELIN -[@doanduyhai]: https://twitter.com/doanduyhai diff --git a/docs/interpreter/cassandra.md b/docs/interpreter/cassandra.md index 3cec02d18e3..861bffcc765 100644 --- a/docs/interpreter/cassandra.md +++ b/docs/interpreter/cassandra.md @@ -6,7 +6,10 @@ group: manual --- {% include JB/setup %} -## Cassandra CQL Interpreter for Apache Zeppelin +
+## 1. Cassandra CQL Interpreter for Apache Zeppelin + +
@@ -20,26 +23,35 @@ group: manual
Name
-## Enabling Cassandra Interpreter -In a notebook, to enable the **Cassandra** interpreter, click on the **Gear** icon and select **Cassandra**. +
+ +## 2. Enabling Cassandra Interpreter + + In a notebook, to enable the **Cassandra** interpreter, click on the **Gear** icon and select **Cassandra** -
- -
- -
+
+ ![Interpreter Binding](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterBinding.png) + + ![Interpreter Selection](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterSelection.png) +
-## Using the Cassandra Interpreter -In a paragraph, use **_%cassandra_** to select the **Cassandra** interpreter and then input all commands. +
-To access the interactive help, type `HELP;` +## 3. Using the Cassandra Interpreter + + In a paragraph, use **_%cassandra_** to select the **Cassandra** interpreter and then input all commands. -
- -
+ To access the interactive help, type **HELP;** + +
+ ![Interactive Help](../assets/themes/zeppelin/img/docs-img/cassandra-InteractiveHelp.png) +
+ +
+ +## 4. Interpreter Commands -## Interpreter Commands -The **Cassandra** interpreter accepts the following commands. + The **Cassandra** interpreter accepts the following commands
@@ -73,17 +85,19 @@ The **Cassandra** interpreter accepts the following commands. -
All CQL-compatible statements (SELECT, INSERT, CREATE ...) All CQL statements are executed directly against the Cassandra server
+
-## CQL statements +
+## 5. CQL statements + This interpreter is compatible with any CQL statement supported by Cassandra. Ex: ```sql -INSERT INTO users(login,name) VALUES('jdoe','John DOE'); -SELECT * FROM users WHERE login='jdoe'; -``` + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + SELECT * FROM users WHERE login='jdoe'; +``` Each statement should be separated by a semi-colon ( **;** ) except the special commands below: @@ -96,276 +110,320 @@ Each statement should be separated by a semi-colon ( **;** ) except the special 7. @retryPolicy 8. @fetchSize -Multi-line statements as well as multiple statements on the same line are also supported as long as they are separated by a semi-colon. Ex: +Multi-line statements as well as multiple statements on the same line are also supported as long as they are +separated by a semi-colon. Ex: ```sql -USE spark_demo; + USE spark_demo; -SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM countries LIMIT 1; + SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM countries LIMIT 1; -SELECT * -FROM artists -WHERE login='jlennon'; + SELECT * + FROM artists + WHERE login='jlennon'; ``` Batch statements are supported and can span multiple lines, as well as DDL(CREATE/ALTER/DROP) statements: ```sql -BEGIN BATCH - INSERT INTO users(login,name) VALUES('jdoe','John DOE'); - INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC'); -APPLY BATCH; + BEGIN BATCH + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC'); + APPLY BATCH; -CREATE TABLE IF NOT EXISTS test( - key int PRIMARY KEY, - value text -); + CREATE TABLE IF NOT EXISTS test( + key int PRIMARY KEY, + value text + ); ``` -CQL statements are case-insensitive (except for column names and values). +CQL statements are case-insensitive (except for column names and values). This means that the following statements are equivalent and valid: ```sql -INSERT INTO users(login,name) VALUES('jdoe','John DOE'); -Insert into users(login,name) vAlues('hsue','Helen SUE'); + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + Insert into users(login,name) vAlues('hsue','Helen SUE'); ``` The complete list of all CQL statements and versions can be found below: +
+ + + + + + + + + + + + + + + + + +
Cassandra VersionDocumentation Link
2.2 + + http://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html + +
2.1 & 2.0 + + http://docs.datastax.com/en/cql/3.1/cql/cql_intro_c.html + +
1.2 + + http://docs.datastax.com/en/cql/3.0/cql/aboutCQL.html + +
+
- - - - - - - - - - - - - - - - - -
Cassandra VersionDocumentation Link
2.2 - - http://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html - -
2.1 & 2.0 - - http://docs.datastax.com/en/cql/3.1/cql/cql_intro_c.html - -
1.2 - - http://docs.datastax.com/en/cql/3.0/cql/aboutCQL.html - -
+
+ +## 6. Comments in statements -## Comments in statements -It is possible to add comments between statements. Single line comments start with the hash sign (#). Multi-line comments are enclosed between /** and **/. Ex: +It is possible to add comments between statements. Single line comments start with the **hash sign** (#) or **double slashes** (//). Multi-line comments are enclosed between /** and **/. Ex: ```sql -#First comment -INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + #Single line comment style 1 + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); -/** - Multi line - comments - **/ -Insert into users(login,name) vAlues('hsue','Helen SUE'); + //Single line comment style 2 + + /** + Multi line + comments + **/ + Insert into users(login,name) vAlues('hsue','Helen SUE'); ``` -## Syntax Validation -The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors. -All CQL-related syntax validation is delegated directly to **Cassandra**. +
-Most of the time, syntax errors are due to **missing semi-colons** between statements or **typo errors**. +## 7. Syntax Validation -## Schema commands -To make schema discovery easier and more interactive, the following commands are supported: +The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors. +All CQL-related syntax validation is delegated directly to **Cassandra** - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CommandDescription
DESCRIBE CLUSTER;Show the current cluster name and its partitioner
DESCRIBE KEYSPACES;List all existing keyspaces in the cluster and their configuration (replication factor, durable write ...)
DESCRIBE TABLES;List all existing keyspaces in the cluster and for each, all the tables name
DESCRIBE TYPES;List all existing user defined types in the current (logged) keyspace
DESCRIBE FUNCTIONS <keyspace_name>;List all existing user defined functions in the given keyspace
DESCRIBE AGGREGATES <keyspace_name>;List all existing user defined aggregates in the given keyspace
DESCRIBE KEYSPACE <keyspace_name>;Describe the given keyspace configuration and all its table details (name, columns, ...)
DESCRIBE TABLE (<keyspace_name>).<table_name>; - Describe the given table. If the keyspace is not provided, the current logged in keyspace is used. - If there is no logged in keyspace, the default system keyspace is used. - If no table is found, an error message is raised. -
DESCRIBE TYPE (<keyspace_name>).<type_name>; - Describe the given type(UDT). If the keyspace is not provided, the current logged in keyspace is used. - If there is no logged in keyspace, the default system keyspace is used. - If no type is found, an error message is raised. -
DESCRIBE FUNCTION (<keyspace_name>).<function_name>;Describe the given user defined function. The keyspace is optional.
DESCRIBE AGGREGATE (<keyspace_name>).<aggregate_name>;Describe the given user defined aggregate. The keyspace is optional.
+Most of the time, syntax errors are due to **missing semi-colons** between statements or **typo errors**. + +
+ +## 8. Schema commands -The schema objects (cluster, keyspace, table, type, function and aggregate) are displayed in a tabular format. +To make schema discovery easier and more interactive, the following commands are supported: +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CommandDescription
DESCRIBE CLUSTER;Show the current cluster name and its partitioner
DESCRIBE KEYSPACES;List all existing keyspaces in the cluster and their configuration (replication factor, durable write ...)
DESCRIBE TABLES;List all existing keyspaces in the cluster and for each, all the tables name
DESCRIBE TYPES;List all existing keyspaces in the cluster and for each, all the user-defined types name
DESCRIBE FUNCTIONS;List all existing keyspaces in the cluster and for each, all the functions name
DESCRIBE AGGREGATES;List all existing keyspaces in the cluster and for each, all the aggregates name
DESCRIBE MATERIALIZED VIEWS;List all existing keyspaces in the cluster and for each, all the materialized views name
DESCRIBE KEYSPACE <keyspace_name>;Describe the given keyspace configuration and all its table details (name, columns, ...)
DESCRIBE TABLE (<keyspace_name>).<table_name>; + Describe the given table. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. + If no table is found, an error message is raised +
DESCRIBE TYPE (<keyspace_name>).<type_name>; + Describe the given type(UDT). If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. + If no type is found, an error message is raised +
DESCRIBE FUNCTION (<keyspace_name>).<function_name>;Describe the given function. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. + If no function is found, an error message is raised +
DESCRIBE AGGREGATE (<keyspace_name>).<aggregate_name>;Describe the given aggregate. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. + If no aggregate is found, an error message is raised +
DESCRIBE MATERIALIZED VIEW (<keyspace_name>).<view_name>;Describe the given view. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. + If no view is found, an error message is raised +
+
+ +The schema objects (cluster, keyspace, table, type, function and aggregate) are displayed in a tabular format. There is a drop-down menu on the top left corner to expand objects details. On the top right menu is shown the Icon legend. +
![Describe Schema](../assets/themes/zeppelin/img/docs-img/cassandra-DescribeSchema.png)
-## Runtime Parameters +
-Sometimes you want to be able to pass runtime query parameters to your statements. -Those parameters are not part of the CQL specs and are specific to the interpreter. -Below is the list of all parameters: +## 9. Runtime Parameters - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
ParameterSyntaxDescription
Consistency Level@consistency=valueApply the given consistency level to all queries in the paragraph.
Serial Consistency Level@serialConsistency=valueApply the given serial consistency level to all queries in the paragraph.
Timestamp@timestamp=long value - Apply the given timestamp to all queries in the paragraph. - Please note that timestamp value passed directly in CQL statement will override this value. -
Retry Policy@retryPolicy=valueApply the given retry policy to all queries in the paragraph.
Fetch Size@fetchSize=integer valueApply the given fetch size to all queries in the paragraph.
+Sometimes you want to be able to pass runtime query parameters to your statements. +Those parameters are not part of the CQL specs and are specific to the interpreter. +Below is the list of all parameters: -Some parameters only accept restricted values: +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterSyntaxDescription
Consistency Level@consistency=valueApply the given consistency level to all queries in the paragraph
Serial Consistency Level@serialConsistency=valueApply the given serial consistency level to all queries in the paragraph
Timestamp@timestamp=long value + Apply the given timestamp to all queries in the paragraph. + Please note that timestamp value passed directly in CQL statement will override this value +
Retry Policy@retryPolicy=valueApply the given retry policy to all queries in the paragraph
Fetch Size@fetchSize=integer valueApply the given fetch size to all queries in the paragraph
+
- - - - - - - - - - - - - - - - - - - - - - - - - -
ParameterPossible Values
Consistency LevelALL, ANY, ONE, TWO, THREE, QUORUM, LOCAL\_ONE, LOCAL\_QUORUM, EACH\_QUORUM
Serial Consistency LevelSERIAL, LOCAL\_SERIAL
TimestampAny long value
Retry PolicyDEFAULT, DOWNGRADING\_CONSISTENCY, FALLTHROUGH, LOGGING\_DEFAULT, LOGGING\_DOWNGRADING, LOGGING\_FALLTHROUGH
Fetch SizeAny integer value
+ Some parameters only accept restricted values: + +
+
+ + + + + + + + + + + + + + + + + + + + + + + + + +
ParameterPossible Values
Consistency LevelALL, ANY, ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM
Serial Consistency LevelSERIAL, LOCAL_SERIAL
TimestampAny long value
Retry PolicyDEFAULT, DOWNGRADING_CONSISTENCY, FALLTHROUGH, LOGGING_DEFAULT, LOGGING_DOWNGRADING, LOGGING_FALLTHROUGH
Fetch SizeAny integer value
+
->Please note that you should **not** add semi-colon ( **;** ) at the end of each parameter statement. +>Please note that you should **not** add semi-colon ( **;** ) at the end of each parameter statement Some examples: ```sql -CREATE TABLE IF NOT EXISTS spark_demo.ts( - key int PRIMARY KEY, - value text -); -TRUNCATE spark_demo.ts; -# Timestamp in the past -@timestamp=10 + CREATE TABLE IF NOT EXISTS spark_demo.ts( + key int PRIMARY KEY, + value text + ); + TRUNCATE spark_demo.ts; -# Force timestamp directly in the first insert -INSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100; + # Timestamp in the past + @timestamp=10 -# Select some data to make the clock turn -SELECT * FROM spark_demo.albums LIMIT 100; + # Force timestamp directly in the first insert + INSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100; -# Now insert using the timestamp parameter set at the beginning(10) -INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert'); + # Select some data to make the clock turn + SELECT * FROM spark_demo.albums LIMIT 100; -# Check for the result. You should see 'first insert' -SELECT value FROM spark_demo.ts WHERE key=1; -``` + # Now insert using the timestamp parameter set at the beginning(10) + INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert'); + # Check for the result. You should see 'first insert' + SELECT value FROM spark_demo.ts WHERE key=1; +``` + Some remarks about query parameters: + +> 1. **many** query parameters can be set in the same paragraph +> 2. if the **same** query parameter is set many time with different values, the interpreter only take into account the first value +> 3. each query parameter applies to **all CQL statements** in the same paragraph, unless you override the option using plain CQL text (like forcing timestamp with the USING clause) +> 4. the order of each query parameter with regard to CQL statement does not matter + +
-> 1. **Many** query parameters can be set in the same paragraph. -> 2. If the **same** query parameter is set many time with different values, the interpreter only take into account the first value. -> 3. Each query parameter applies to **all CQL statements** in the same paragraph, unless you override the option using plain CQL text. ( Like forcing timestamp with the USING clause ) -> 4. The order of each query parameter with regard to CQL statement does not matter. +## 10. Support for Prepared Statements -## Support for Prepared Statements -For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values. +For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values. This interpreter provides 3 commands to handle prepared and bound statements: 1. **@prepare** @@ -375,77 +433,95 @@ This interpreter provides 3 commands to handle prepared and bound statements: Example: ``` -@prepare[statement_name]=... - -@bind[statement_name]=’text’, 1223, ’2015-07-30 12:00:01’, null, true, [‘list_item1’, ’list_item2’] -@bind[statement_name_with_no_bound_value] + @prepare[statement_name]=... -@remove_prepare[statement_name] -``` - -#### @prepare -You can use the syntax `@prepare[statement_name]=SELECT ...` to create a prepared statement. -The `statement_name` is **mandatory** because the interpreter prepares the given statement with the Java driver and saves the generated prepared statement in an **internal hash map**, using the provided `statement_name` as search key. + @bind[statement_name]=’text’, 1223, ’2015-07-30 12:00:01’, null, true, [‘list_item1’, ’list_item2’] -> Please note that this internal prepared statement map is shared with **all notebooks** and **all paragraphs** because there is only one instance of the interpreter for Cassandra. + @bind[statement_name_with_no_bound_value] -> If the interpreter encounters **many** @prepare for the **same statement_name (key)**, only the **first** statement will be taken into account. + @remove_prepare[statement_name] +``` +
+#### a. @prepare +
+You can use the syntax _"@prepare[statement_name]=SELECT ..."_ to create a prepared statement. +The _statement_name_ is **mandatory** because the interpreter prepares the given statement with the Java driver and +saves the generated prepared statement in an **internal hash map**, using the provided _statement_name_ as search key. + +> Please note that this internal prepared statement map is shared with **all notebooks** and **all paragraphs** because +there is only one instance of the interpreter for Cassandra + +> If the interpreter encounters **many** @prepare for the **same _statement_name_ (key)**, only the **first** statement will be taken into account. + Example: ``` -@prepare[select]=SELECT * FROM spark_demo.albums LIMIT ? -@prepare[select]=SELECT * FROM spark_demo.artists LIMIT ? -``` + @prepare[select]=SELECT * FROM spark_demo.albums LIMIT ? -For the above example, the prepared statement is `SELECT * FROM spark_demo.albums LIMIT ?`. -`SELECT * FROM spark_demo.artists LIMIT ?` is ignored because an entry already exists in the prepared statements map with the key select. + @prepare[select]=SELECT * FROM spark_demo.artists LIMIT ? +``` -In the context of **Zeppelin**, a notebook can be scheduled to be executed at regular interval, thus it is necessary to **avoid re-preparing many time the same statement (considered an anti-pattern)**. +For the above example, the prepared statement is _SELECT * FROM spark_demo.albums LIMIT ?_. +_SELECT * FROM spark_demo.artists LIMIT ?_ is ignored because an entry already exists in the prepared statements map with the key select. -#### @bind -Once the statement is prepared ( possibly in a separated notebook/paragraph ). You can bind values to it: +In the context of **Zeppelin**, a notebook can be scheduled to be executed at regular interval, +thus it is necessary to **avoid re-preparing many time the same statement (considered an anti-pattern)**. +
+
+#### b. @bind +
+Once the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it: ``` -@bind[select_first]=10 -``` + @bind[select_first]=10 +``` -Bound values are not mandatory for the `@bind` statement. However if you provide bound values, they need to comply to some syntax: +Bound values are not mandatory for the **@bind** statement. However if you provide bound values, they need to comply to some syntax: * String values should be enclosed between simple quotes ( ‘ ) * Date values should be enclosed between simple quotes ( ‘ ) and respect the formats: 1. yyyy-MM-dd HH:MM:ss 2. yyyy-MM-dd HH:MM:ss.SSS -* **null** is parsed as-is. -* **boolean** (true|false) is parsed as-is. +* **null** is parsed as-is +* **boolean** (true|false) are parsed as-is * collection values must follow the **[standard CQL syntax]**: * list: [‘list_item1’, ’list_item2’, ...] * set: {‘set_item1’, ‘set_item2’, …} * map: {‘key1’: ‘val1’, ‘key2’: ‘val2’, …} -* **tuple** values should be enclosed between parenthesis ( see **[Tuple CQL syntax]** ): (‘text’, 123, true) -* **udt** values should be enclosed between brackets ( see **[UDT CQL syntax]** ): {stree_name: ‘Beverly Hills’, number: 104, zip_code: 90020, state: ‘California’, …} +* **tuple** values should be enclosed between parenthesis (see **[Tuple CQL syntax]**): (‘text’, 123, true) +* **udt** values should be enclosed between brackets (see **[UDT CQL syntax]**): {stree_name: ‘Beverly Hills’, number: 104, zip_code: 90020, state: ‘California’, …} > It is possible to use the @bind statement inside a batch: > -> ```sql -> BEGIN BATCH -> @bind[insert_user]='jdoe','John DOE' -> UPDATE users SET age = 27 WHERE login='hsue'; -> APPLY BATCH; +> ```sql +> +> BEGIN BATCH +> @bind[insert_user]='jdoe','John DOE' +> UPDATE users SET age = 27 WHERE login='hsue'; +> APPLY BATCH; > ``` -#### @remove_prepare -To avoid for a prepared statement to stay forever in the prepared statement map, you can use the `@remove_prepare[statement_name]` syntax to remove it. +
+#### c. @remove_prepare +
+To avoid for a prepared statement to stay forever in the prepared statement map, you can use the +**@remove_prepare[statement_name]** syntax to remove it. Removing a non-existing prepared statement yields no error. -## Using Dynamic Forms -Instead of hard-coding your CQL queries, it is possible to use the mustache syntax ( **\{\{ \}\}** ) to inject simple value or multiple choices forms. +
+ +## 11. Using Dynamic Forms -The syntax for simple parameter is: **\{\{input_Label=default value\}\}**. The default value is mandatory because the first time the paragraph is executed, we launch the CQL query before rendering the form so at least one value should be provided. +Instead of hard-coding your CQL queries, it is possible to use the mustache syntax ( **\{\{ \}\}** ) to inject simple value or multiple choices forms. -The syntax for multiple choices parameter is: **\{\{input_Label=value1 | value2 | … | valueN \}\}**. By default the first choice is used for CQL query the first time the paragraph is executed. +The syntax for simple parameter is: **\{\{input_Label=default value\}\}**. The default value is mandatory because the first time the paragraph is executed, +we launch the CQL query before rendering the form so at least one value should be provided. + +The syntax for multiple choices parameter is: **\{\{input_Label=value1 | value2 | … | valueN \}\}**. By default the first choice is used for CQL query +the first time the paragraph is executed. Example: @@ -456,12 +532,13 @@ Example: WHERE name='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}' AND styles CONTAINS '{{style=Rock}}'; {% endraw %} + -In the above example, the first CQL query will be executed for _performer='Sheryl Crow' AND style='Rock'_. -For subsequent queries, you can change the value directly using the form. +In the above example, the first CQL query will be executed for _performer='Sheryl Crow' AND style='Rock'_. +For subsequent queries, you can change the value directly using the form. -> Please note that we enclosed the **\{\{ \}\}** block between simple quotes ( **'** ) because Cassandra expects a String here. -> We could have also use the **\{\{style='Rock'\}\}** syntax but this time, the value displayed on the form is **_'Rock'_** and not **_Rock_**. +> Please note that we enclosed the **\{\{ \}\}** block between simple quotes ( **'** ) because Cassandra expects a String here. +> We could have also use the **\{\{style='Rock'\}\}** syntax but this time, the value displayed on the form is **_'Rock'_** and not **_Rock_**. It is also possible to use dynamic forms for **prepared statements**: @@ -471,253 +548,285 @@ It is also possible to use dynamic forms for **prepared statements**: {% endraw %} -## Execution parallelism and shared states -It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. -_Asynchronous execution_ is only possible when it is possible to return a `Future` value in the `InterpreterResult`. +
+ +## 12. Shared states + +It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. +_Asynchronous execution_ is only possible when it is possible to return a `Future` value in the `InterpreterResult`. It may be an interesting proposal for the **Zeppelin** project. Another caveat is that the same `com.datastax.driver.core.Session` object is used for **all** notebooks and paragraphs. -Consequently, if you use the **USE _keyspace name_;** statement to log into a keyspace, it will change the keyspace for **all current users** of the **Cassandra** interpreter because we only create 1 `com.datastax.driver.core.Session` object per instance of **Cassandra** interpreter. +Consequently, if you use the **USE _keyspace name_;** statement to log into a keyspace, it will change the keyspace for +**all current users** of the **Cassandra** interpreter because we only create 1 `com.datastax.driver.core.Session` object +per instance of **Cassandra** interpreter. The same remark does apply to the **prepared statement hash map**, it is shared by **all users** using the same instance of **Cassandra** interpreter. -Until **Zeppelin** offers a real multi-users separation, there is a work-around to segregate user environment and states: -create different **Cassandra** interpreter instances. -For this, first go to the **Interpreter** menu and click on the **Create** button. +Until **Zeppelin** offers a real multi-users separation, there is a work-around to segregate user environment and states: +_create different **Cassandra** interpreter instances_ +For this, first go to the **Interpreter** menu and click on the **Create** button +
+
![Create Interpreter](../assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInstance.png)
-In the interpreter creation form, put **cass-instance2** as **Name** and select the **cassandra** in the interpreter drop-down list - +In the interpreter creation form, put **cass-instance2** as **Name** and select the **cassandra** +in the interpreter drop-down list +
+
![Interpreter Name](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterName.png) -
- -Click on **Save** to create the new interpreter instance. Now you should be able to see it in the interpreter list. + + Click on **Save** to create the new interpreter instance. Now you should be able to see it in the interpreter list. + +
+
![Interpreter In List](../assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInList.png) -
+ Go back to your notebook and click on the **Gear** icon to configure interpreter bindings. -You should be able to see and select the **cass-instance2** interpreter instance in the available interpreter list instead of the standard **cassandra** instance. +You should be able to see and select the **cass-instance2** interpreter instance in the available +interpreter list instead of the standard **cassandra** instance. +
+
![Interpreter Instance Selection](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterInstanceSelection.png)
-## Interpreter Configuration +
+ +## 13. Interpreter Configuration + To configure the **Cassandra** interpreter, go to the **Interpreter** menu and scroll down to change the parameters. -The **Cassandra** interpreter is using the official **[Cassandra Java Driver]** and most of the parameters are used to configure the Java driver +The **Cassandra** interpreter is using the official **[Cassandra Java Driver]** and most of the parameters are used +to configure the Java driver Below are the configuration parameters and their default value. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Property NameDescriptionDefault Value
cassandra.clusterName of the Cassandra cluster to connect toTest Cluster
cassandra.compression.protocolOn wire compression. Possible values are: NONE, SNAPPY, LZ4NONE
cassandra.credentials.usernameIf security is enable, provide the loginnone
cassandra.credentials.passwordIf security is enable, provide the passwordnone
cassandra.hosts - Comma separated Cassandra hosts (DNS name or IP address). -
- Ex: '192.168.0.12,node2,node3' -
localhost
cassandra.interpreter.parallelismNumber of concurrent paragraphs(queries block) that can be executed10
cassandra.keyspace - Default keyspace to connect to. - - It is strongly recommended to let the default value - and prefix the table name with the actual keyspace - in all of your queries. - - system
cassandra.load.balancing.policy - Load balancing policy. Default = new TokenAwarePolicy(new DCAwareRoundRobinPolicy()) - To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using - Class.forName(FQCN). - DEFAULT
cassandra.max.schema.agreement.wait.secondCassandra max schema agreement wait in second10
cassandra.pooling.core.connection.per.host.localProtocol V2 and below default = 2. Protocol V3 and above default = 12
cassandra.pooling.core.connection.per.host.remoteProtocol V2 and below default = 1. Protocol V3 and above default = 11
cassandra.pooling.heartbeat.interval.secondsCassandra pool heartbeat interval in secs30
cassandra.pooling.idle.timeout.secondsCassandra idle time out in seconds120
cassandra.pooling.max.connection.per.host.localProtocol V2 and below default = 8. Protocol V3 and above default = 18
cassandra.pooling.max.connection.per.host.remoteProtocol V2 and below default = 2. Protocol V3 and above default = 12
cassandra.pooling.max.request.per.connection.localProtocol V2 and below default = 128. Protocol V3 and above default = 1024128
cassandra.pooling.max.request.per.connection.remoteProtocol V2 and below default = 128. Protocol V3 and above default = 256128
cassandra.pooling.new.connection.threshold.localProtocol V2 and below default = 100. Protocol V3 and above default = 800100
cassandra.pooling.new.connection.threshold.remoteProtocol V2 and below default = 100. Protocol V3 and above default = 200100
cassandra.pooling.pool.timeout.millisecsCassandra pool time out in millisecs5000
cassandra.protocol.versionCassandra binary protocol version3
cassandra.query.default.consistency - Cassandra query default consistency level -
- Available values: ONE, TWO, THREE, QUORUM, LOCAL\_ONE, LOCAL\_QUORUM, EACH\_QUORUM, ALL -
ONE
cassandra.query.default.fetchSizeCassandra query default fetch size5000
cassandra.query.default.serial.consistency - Cassandra query default serial consistency level -
- Available values: SERIAL, LOCAL_SERIAL -
SERIAL
cassandra.reconnection.policy - Cassandra Reconnection Policy. - Default = new ExponentialReconnectionPolicy(1000, 10 * 60 * 1000) - To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using - Class.forName(FQCN). - DEFAULT
cassandra.retry.policy - Cassandra Retry Policy. - Default = DefaultRetryPolicy.INSTANCE - To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using - Class.forName(FQCN). - DEFAULT
cassandra.socket.connection.timeout.millisecsCassandra socket default connection timeout in millisecs500
cassandra.socket.read.timeout.millisecsCassandra socket read timeout in millisecs12000
cassandra.socket.tcp.no_delayCassandra socket TCP no delaytrue
cassandra.speculative.execution.policy - Cassandra Speculative Execution Policy. - Default = NoSpeculativeExecutionPolicy.INSTANCE - To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using - Class.forName(FQCN). - DEFAULT
-## Bugs & Contacts + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Property NameDescriptionDefault Value
cassandra.clusterName of the Cassandra cluster to connect toTest Cluster
cassandra.compression.protocolOn wire compression. Possible values are: NONE, SNAPPY, LZ4NONE
cassandra.credentials.usernameIf security is enable, provide the loginnone
cassandra.credentials.passwordIf security is enable, provide the passwordnone
cassandra.hosts + Comma separated Cassandra hosts (DNS name or IP address). +
+ Ex: '192.168.0.12,node2,node3' +
localhost
cassandra.interpreter.parallelismNumber of concurrent paragraphs(queries block) that can be executed10
cassandra.keyspace + Default keyspace to connect to. + + It is strongly recommended to let the default value + and prefix the table name with the actual keyspace + in all of your queries + + system
cassandra.load.balancing.policy + Load balancing policy. Default = new TokenAwarePolicy(new DCAwareRoundRobinPolicy()) + To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. + At runtime the interpreter will instantiate the policy using + Class.forName(FQCN) + DEFAULT
cassandra.max.schema.agreement.wait.secondCassandra max schema agreement wait in second10
cassandra.pooling.core.connection.per.host.localProtocol V2 and below default = 2. Protocol V3 and above default = 12
cassandra.pooling.core.connection.per.host.remoteProtocol V2 and below default = 1. Protocol V3 and above default = 11
cassandra.pooling.heartbeat.interval.secondsCassandra pool heartbeat interval in secs30
cassandra.pooling.idle.timeout.secondsCassandra idle time out in seconds120
cassandra.pooling.max.connection.per.host.localProtocol V2 and below default = 8. Protocol V3 and above default = 18
cassandra.pooling.max.connection.per.host.remoteProtocol V2 and below default = 2. Protocol V3 and above default = 12
cassandra.pooling.max.request.per.connection.localProtocol V2 and below default = 128. Protocol V3 and above default = 1024128
cassandra.pooling.max.request.per.connection.remoteProtocol V2 and below default = 128. Protocol V3 and above default = 256128
cassandra.pooling.new.connection.threshold.localProtocol V2 and below default = 100. Protocol V3 and above default = 800100
cassandra.pooling.new.connection.threshold.remoteProtocol V2 and below default = 100. Protocol V3 and above default = 200100
cassandra.pooling.pool.timeout.millisecsCassandra pool time out in millisecs5000
cassandra.protocol.versionCassandra binary protocol version3
cassandra.query.default.consistency + Cassandra query default consistency level +
+ Available values: ONE, TWO, THREE, QUORUM, LOCAL_ONE, LOCAL_QUORUM, EACH_QUORUM, ALL +
ONE
cassandra.query.default.fetchSizeCassandra query default fetch size5000
cassandra.query.default.serial.consistency + Cassandra query default serial consistency level +
+ Available values: SERIAL, LOCAL_SERIAL +
SERIAL
cassandra.reconnection.policy + Cassandra Reconnection Policy. + Default = new ExponentialReconnectionPolicy(1000, 10 * 60 * 1000) + To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. + At runtime the interpreter will instantiate the policy using + Class.forName(FQCN) + DEFAULT
cassandra.retry.policy + Cassandra Retry Policy. + Default = DefaultRetryPolicy.INSTANCE + To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. + At runtime the interpreter will instantiate the policy using + Class.forName(FQCN) + DEFAULT
cassandra.socket.connection.timeout.millisecsCassandra socket default connection timeout in millisecs500
cassandra.socket.read.timeout.millisecsCassandra socket read timeout in millisecs12000
cassandra.socket.tcp.no_delayCassandra socket TCP no delaytrue
cassandra.speculative.execution.policy + Cassandra Speculative Execution Policy. + Default = NoSpeculativeExecutionPolicy.INSTANCE + To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. + At runtime the interpreter will instantiate the policy using + Class.forName(FQCN) + DEFAULT
+ +
+ +## 14. Change Log + +**2.0** : +* Update help menu and add changelog +* Add Support for **User Defined Functions**, **User Defined Aggregates** and **Materialized Views** +* Upgrade Java driver version to **3.0.0-rc1** + +**1.0** : +* Initial version + +## 15. Bugs & Contacts If you encounter a bug for this interpreter, please create a **[JIRA]** ticket and ping me on Twitter - at **[@doanduyhai]**. + at **[@doanduyhai]** + [Cassandra Java Driver]: https://github.com/datastax/java-driver [standard CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html From 88811eea57c7dad55531d166b5a5eae8ef6be9dc Mon Sep 17 00:00:00 2001 From: DuyHai DOAN Date: Wed, 27 Jan 2016 14:33:39 +0100 Subject: [PATCH 3/8] Add Zeppelin version along-side with interpreter version --- docs/interpreter/cassandra.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/interpreter/cassandra.md b/docs/interpreter/cassandra.md index 861bffcc765..7a9c97a0a03 100644 --- a/docs/interpreter/cassandra.md +++ b/docs/interpreter/cassandra.md @@ -814,12 +814,12 @@ Below are the configuration parameters and their default value. ## 14. Change Log -**2.0** : +**2.0** _(Zeppelin 0.5.7-incubating)_ : * Update help menu and add changelog * Add Support for **User Defined Functions**, **User Defined Aggregates** and **Materialized Views** * Upgrade Java driver version to **3.0.0-rc1** -**1.0** : +**1.0** _(Zeppelin 0.5.5-incubating)_ : * Initial version ## 15. Bugs & Contacts From c05d489a788588184d3efd78532799c446696885 Mon Sep 17 00:00:00 2001 From: DuyHai DOAN Date: Wed, 27 Jan 2016 23:44:49 +0100 Subject: [PATCH 4/8] Revert commit of doc cleaning PR #648 --- docs/interpreter/cassandra.md | 222 ++++++++++++++-------------------- 1 file changed, 89 insertions(+), 133 deletions(-) diff --git a/docs/interpreter/cassandra.md b/docs/interpreter/cassandra.md index 7a9c97a0a03..cd64e9dd185 100644 --- a/docs/interpreter/cassandra.md +++ b/docs/interpreter/cassandra.md @@ -6,10 +6,8 @@ group: manual --- {% include JB/setup %} -
-## 1. Cassandra CQL Interpreter for Apache Zeppelin +## Cassandra CQL Interpreter for Apache Zeppelin -
@@ -23,35 +21,29 @@ group: manual
Name
-
- -## 2. Enabling Cassandra Interpreter - - In a notebook, to enable the **Cassandra** interpreter, click on the **Gear** icon and select **Cassandra** +## Enabling Cassandra Interpreter + +In a notebook, to enable the **Cassandra** interpreter, click on the **Gear** icon and select **Cassandra**
![Interpreter Binding](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterBinding.png) ![Interpreter Selection](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterSelection.png)
- -
-## 3. Using the Cassandra Interpreter - - In a paragraph, use **_%cassandra_** to select the **Cassandra** interpreter and then input all commands. +## Using the Cassandra Interpreter - To access the interactive help, type **HELP;** +In a paragraph, use **_%cassandra_** to select the **Cassandra** interpreter and then input all commands. + +To access the interactive help, type **HELP;**
- ![Interactive Help](../assets/themes/zeppelin/img/docs-img/cassandra-InteractiveHelp.png) + ![Interactive Help](../assets/themes/zeppelin/img/docs-img/cassandra-InteractiveHelp.png)
-
- -## 4. Interpreter Commands - - The **Cassandra** interpreter accepts the following commands +## Interpreter Commands + +The **Cassandra** interpreter accepts the following commands
@@ -88,15 +80,14 @@ group: manual
-
-## 5. CQL statements - +## CQL statements + This interpreter is compatible with any CQL statement supported by Cassandra. Ex: ```sql - INSERT INTO users(login,name) VALUES('jdoe','John DOE'); - SELECT * FROM users WHERE login='jdoe'; +INSERT INTO users(login,name) VALUES('jdoe','John DOE'); +SELECT * FROM users WHERE login='jdoe'; ``` Each statement should be separated by a semi-colon ( **;** ) except the special commands below: @@ -110,42 +101,40 @@ Each statement should be separated by a semi-colon ( **;** ) except the special 7. @retryPolicy 8. @fetchSize -Multi-line statements as well as multiple statements on the same line are also supported as long as they are -separated by a semi-colon. Ex: +Multi-line statements as well as multiple statements on the same line are also supported as long as they are separated by a semi-colon. Ex: ```sql - USE spark_demo; +USE spark_demo; - SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM countries LIMIT 1; +SELECT * FROM albums_by_country LIMIT 1; SELECT * FROM countries LIMIT 1; - SELECT * - FROM artists - WHERE login='jlennon'; +SELECT * +FROM artists +WHERE login='jlennon'; ``` Batch statements are supported and can span multiple lines, as well as DDL(CREATE/ALTER/DROP) statements: ```sql - BEGIN BATCH - INSERT INTO users(login,name) VALUES('jdoe','John DOE'); - INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC'); - APPLY BATCH; +BEGIN BATCH + INSERT INTO users(login,name) VALUES('jdoe','John DOE'); + INSERT INTO users_preferences(login,account_type) VALUES('jdoe','BASIC'); +APPLY BATCH; - CREATE TABLE IF NOT EXISTS test( - key int PRIMARY KEY, - value text - ); +CREATE TABLE IF NOT EXISTS test( + key int PRIMARY KEY, + value text +); ``` -CQL statements are case-insensitive (except for column names and values). -This means that the following statements are equivalent and valid: +CQL statements are case-insensitive (except for column names and values). This means that the following statements are equivalent and valid: ```sql - INSERT INTO users(login,name) VALUES('jdoe','John DOE'); - Insert into users(login,name) vAlues('hsue','Helen SUE'); +INSERT INTO users(login,name) VALUES('jdoe','John DOE'); +Insert into users(login,name) vAlues('hsue','Helen SUE'); ``` The complete list of all CQL statements and versions can be found below: @@ -185,40 +174,36 @@ The complete list of all CQL statements and versions can be found below: -
- ## 6. Comments in statements It is possible to add comments between statements. Single line comments start with the **hash sign** (#) or **double slashes** (//). Multi-line comments are enclosed between /** and **/. Ex: ```sql - #Single line comment style 1 - INSERT INTO users(login,name) VALUES('jdoe','John DOE'); +#Single line comment style 1 +INSERT INTO users(login,name) VALUES('jdoe','John DOE'); - //Single line comment style 2 - - /** - Multi line - comments - **/ - Insert into users(login,name) vAlues('hsue','Helen SUE'); -``` +//Single line comment style 2 -
+/** + Multi line + comments + **/ +Insert into users(login,name) vAlues('hsue','Helen SUE'); +``` -## 7. Syntax Validation +## Syntax Validation The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors. + All CQL-related syntax validation is delegated directly to **Cassandra** Most of the time, syntax errors are due to **missing semi-colons** between statements or **typo errors**. - -
-## 8. Schema commands +## Schema commands To make schema discovery easier and more interactive, the following commands are supported: +
@@ -300,20 +285,18 @@ To make schema discovery easier and more interactive, the following commands are The schema objects (cluster, keyspace, table, type, function and aggregate) are displayed in a tabular format. There is a drop-down menu on the top left corner to expand objects details. On the top right menu is shown the Icon legend. -
![Describe Schema](../assets/themes/zeppelin/img/docs-img/cassandra-DescribeSchema.png)
-
- -## 9. Runtime Parameters +## Runtime Parameters Sometimes you want to be able to pass runtime query parameters to your statements. + Those parameters are not part of the CQL specs and are specific to the interpreter. + Below is the list of all parameters: -
@@ -352,9 +335,8 @@ Below is the list of all parameters:
- Some parameters only accept restricted values: +Some parameters only accept restricted values: -
@@ -390,26 +372,26 @@ Some examples: ```sql - CREATE TABLE IF NOT EXISTS spark_demo.ts( - key int PRIMARY KEY, - value text - ); - TRUNCATE spark_demo.ts; +CREATE TABLE IF NOT EXISTS spark_demo.ts( + key int PRIMARY KEY, + value text +); +TRUNCATE spark_demo.ts; - # Timestamp in the past - @timestamp=10 +# Timestamp in the past +@timestamp=10 - # Force timestamp directly in the first insert - INSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100; +# Force timestamp directly in the first insert +INSERT INTO spark_demo.ts(key,value) VALUES(1,'first insert') USING TIMESTAMP 100; - # Select some data to make the clock turn - SELECT * FROM spark_demo.albums LIMIT 100; +# Select some data to make the clock turn +SELECT * FROM spark_demo.albums LIMIT 100; - # Now insert using the timestamp parameter set at the beginning(10) - INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert'); +# Now insert using the timestamp parameter set at the beginning(10) +INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert'); - # Check for the result. You should see 'first insert' - SELECT value FROM spark_demo.ts WHERE key=1; +# Check for the result. You should see 'first insert' +SELECT value FROM spark_demo.ts WHERE key=1; ``` Some remarks about query parameters: @@ -419,11 +401,10 @@ Some remarks about query parameters: > 3. each query parameter applies to **all CQL statements** in the same paragraph, unless you override the option using plain CQL text (like forcing timestamp with the USING clause) > 4. the order of each query parameter with regard to CQL statement does not matter -
- -## 10. Support for Prepared Statements +## Support for Prepared Statements For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values. + This interpreter provides 3 commands to handle prepared and bound statements: 1. **@prepare** @@ -433,19 +414,17 @@ This interpreter provides 3 commands to handle prepared and bound statements: Example: ``` +@prepare[statement_name]=... - @prepare[statement_name]=... +@bind[statement_name]=’text’, 1223, ’2015-07-30 12:00:01’, null, true, [‘list_item1’, ’list_item2’] - @bind[statement_name]=’text’, 1223, ’2015-07-30 12:00:01’, null, true, [‘list_item1’, ’list_item2’] +@bind[statement_name_with_no_bound_value] - @bind[statement_name_with_no_bound_value] - - @remove_prepare[statement_name] +@remove_prepare[statement_name] ``` -
-#### a. @prepare -
+#### @prepare + You can use the syntax _"@prepare[statement_name]=SELECT ..."_ to create a prepared statement. The _statement_name_ is **mandatory** because the interpreter prepares the given statement with the Java driver and saves the generated prepared statement in an **internal hash map**, using the provided _statement_name_ as search key. @@ -458,25 +437,22 @@ there is only one instance of the interpreter for Cassandra Example: ``` +@prepare[select]=SELECT * FROM spark_demo.albums LIMIT ? - @prepare[select]=SELECT * FROM spark_demo.albums LIMIT ? - - @prepare[select]=SELECT * FROM spark_demo.artists LIMIT ? +@prepare[select]=SELECT * FROM spark_demo.artists LIMIT ? ``` -For the above example, the prepared statement is _SELECT * FROM spark_demo.albums LIMIT ?_. +For the above example, the prepared statement is _SELECT * FROM spark_demo.albums LIMIT ?_. _SELECT * FROM spark_demo.artists LIMIT ?_ is ignored because an entry already exists in the prepared statements map with the key select. In the context of **Zeppelin**, a notebook can be scheduled to be executed at regular interval, thus it is necessary to **avoid re-preparing many time the same statement (considered an anti-pattern)**. -
-
-#### b. @bind -
+ +#### @bind Once the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it: ``` - @bind[select_first]=10 +@bind[select_first]=10 ``` Bound values are not mandatory for the **@bind** statement. However if you provide bound values, they need to comply to some syntax: @@ -497,23 +473,19 @@ Bound values are not mandatory for the **@bind** statement. However if you provi > It is possible to use the @bind statement inside a batch: > > ```sql -> -> BEGIN BATCH -> @bind[insert_user]='jdoe','John DOE' -> UPDATE users SET age = 27 WHERE login='hsue'; -> APPLY BATCH; +>BEGIN BATCH +> @bind[insert_user]='jdoe','John DOE' +> UPDATE users SET age = 27 WHERE login='hsue'; +>APPLY BATCH; > ``` -
-#### c. @remove_prepare -
+#### @remove_prepare + To avoid for a prepared statement to stay forever in the prepared statement map, you can use the **@remove_prepare[statement_name]** syntax to remove it. Removing a non-existing prepared statement yields no error. -
- -## 11. Using Dynamic Forms +## Using Dynamic Forms Instead of hard-coding your CQL queries, it is possible to use the mustache syntax ( **\{\{ \}\}** ) to inject simple value or multiple choices forms. @@ -548,9 +520,7 @@ It is also possible to use dynamic forms for **prepared statements**: {% endraw %} -
- -## 12. Shared states +## Shared states It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. _Asynchronous execution_ is only possible when it is possible to return a `Future` value in the `InterpreterResult`. @@ -567,24 +537,17 @@ Until **Zeppelin** offers a real multi-users separation, there is a work-around _create different **Cassandra** interpreter instances_ For this, first go to the **Interpreter** menu and click on the **Create** button -
-
![Create Interpreter](../assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInstance.png)
In the interpreter creation form, put **cass-instance2** as **Name** and select the **cassandra** in the interpreter drop-down list -
-
![Interpreter Name](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterName.png)
Click on **Save** to create the new interpreter instance. Now you should be able to see it in the interpreter list. - -
-
![Interpreter In List](../assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInList.png)
@@ -593,15 +556,11 @@ Go back to your notebook and click on the **Gear** icon to configure interpreter You should be able to see and select the **cass-instance2** interpreter instance in the available interpreter list instead of the standard **cassandra** instance. -
-
![Interpreter Instance Selection](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterInstanceSelection.png)
-
- -## 13. Interpreter Configuration +## Interpreter Configuration To configure the **Cassandra** interpreter, go to the **Interpreter** menu and scroll down to change the parameters. The **Cassandra** interpreter is using the official **[Cassandra Java Driver]** and most of the parameters are used @@ -609,7 +568,6 @@ to configure the Java driver Below are the configuration parameters and their default value. -
@@ -810,9 +768,7 @@ Below are the configuration parameters and their default value.
Property Name
-
- -## 14. Change Log +## Change Log **2.0** _(Zeppelin 0.5.7-incubating)_ : * Update help menu and add changelog From d3f78714d7c93395819b18e8b27880800e8a08a9 Mon Sep 17 00:00:00 2001 From: DuyHai DOAN Date: Wed, 27 Jan 2016 23:45:13 +0100 Subject: [PATCH 5/8] Use ZEPPELIN_VERSION variable instead of hard-coding --- docs/interpreter/cassandra.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/interpreter/cassandra.md b/docs/interpreter/cassandra.md index cd64e9dd185..f15ad9f675b 100644 --- a/docs/interpreter/cassandra.md +++ b/docs/interpreter/cassandra.md @@ -770,7 +770,7 @@ Below are the configuration parameters and their default value. ## Change Log -**2.0** _(Zeppelin 0.5.7-incubating)_ : +**2.0** _(Zeppelin {{ ZEPPELIN_VERSION }})_ : * Update help menu and add changelog * Add Support for **User Defined Functions**, **User Defined Aggregates** and **Materialized Views** * Upgrade Java driver version to **3.0.0-rc1** @@ -783,7 +783,6 @@ Below are the configuration parameters and their default value. If you encounter a bug for this interpreter, please create a **[JIRA]** ticket and ping me on Twitter at **[@doanduyhai]** - [Cassandra Java Driver]: https://github.com/datastax/java-driver [standard CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html [Tuple CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tupleType.html From f052bd8624d00109e75a9d351f027509fbda90be Mon Sep 17 00:00:00 2001 From: DuyHai DOAN Date: Thu, 4 Feb 2016 16:32:41 +0100 Subject: [PATCH 6/8] Fixes reference to ZEPPELIN_VERSION in markdown --- docs/interpreter/cassandra.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/interpreter/cassandra.md b/docs/interpreter/cassandra.md index f15ad9f675b..4d13ef282d2 100644 --- a/docs/interpreter/cassandra.md +++ b/docs/interpreter/cassandra.md @@ -770,7 +770,7 @@ Below are the configuration parameters and their default value. ## Change Log -**2.0** _(Zeppelin {{ ZEPPELIN_VERSION }})_ : +**2.0** _(Zeppelin {{ site.ZEPPELIN_VERSION }})_ : * Update help menu and add changelog * Add Support for **User Defined Functions**, **User Defined Aggregates** and **Materialized Views** * Upgrade Java driver version to **3.0.0-rc1** From 80fcea43e3121915c3885347bdd37b65d9a5655f Mon Sep 17 00:00:00 2001 From: DuyHai DOAN Date: Fri, 5 Feb 2016 01:02:44 +0100 Subject: [PATCH 7/8] Add ZEPPELIN_VERSION in _config.yml --- docs/_config.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/_config.yml b/docs/_config.yml index 8a875510a27..14aca6eed9d 100644 --- a/docs/_config.yml +++ b/docs/_config.yml @@ -21,6 +21,8 @@ author : twitter : ASF feedburner : feedname +ZEPPELIN_VERSION : 0.6.0-incubating-SNAPSHOT + # The production_url is only used when full-domain names are needed # such as sitemap.txt # Most places will/should use BASE_PATH to make the urls From b1e70cbea35c3550d1a28af18a34f1f94ea0968d Mon Sep 17 00:00:00 2001 From: DuyHai DOAN Date: Fri, 5 Feb 2016 12:05:47 +0100 Subject: [PATCH 8/8] Remove un-necessary whitespaces --- docs/interpreter/cassandra.md | 222 +++++++++++++++++----------------- 1 file changed, 112 insertions(+), 110 deletions(-) diff --git a/docs/interpreter/cassandra.md b/docs/interpreter/cassandra.md index 4d13ef282d2..7a0837bf1ce 100644 --- a/docs/interpreter/cassandra.md +++ b/docs/interpreter/cassandra.md @@ -22,29 +22,29 @@ group: manual ## Enabling Cassandra Interpreter - + In a notebook, to enable the **Cassandra** interpreter, click on the **Gear** icon and select **Cassandra** - +
![Interpreter Binding](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterBinding.png) ![Interpreter Selection](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterSelection.png)
- + ## Using the Cassandra Interpreter - + In a paragraph, use **_%cassandra_** to select the **Cassandra** interpreter and then input all commands. - + To access the interactive help, type **HELP;** - +
![Interactive Help](../assets/themes/zeppelin/img/docs-img/cassandra-InteractiveHelp.png)
## Interpreter Commands - + The **Cassandra** interpreter accepts the following commands - +
@@ -77,18 +77,18 @@ The **Cassandra** interpreter accepts the following commands -
All CQL-compatible statements (SELECT, INSERT, CREATE ...) All CQL statements are executed directly against the Cassandra server
+
## CQL statements -This interpreter is compatible with any CQL statement supported by Cassandra. Ex: +This interpreter is compatible with any CQL statement supported by Cassandra. Ex: ```sql INSERT INTO users(login,name) VALUES('jdoe','John DOE'); SELECT * FROM users WHERE login='jdoe'; -``` +``` Each statement should be separated by a semi-colon ( **;** ) except the special commands below: @@ -100,8 +100,8 @@ Each statement should be separated by a semi-colon ( **;** ) except the special 6. @timestamp 7. @retryPolicy 8. @fetchSize - -Multi-line statements as well as multiple statements on the same line are also supported as long as they are separated by a semi-colon. Ex: + +Multi-line statements as well as multiple statements on the same line are also supported as long as they are separated by a semi-colon. Ex: ```sql @@ -114,7 +114,7 @@ FROM artists WHERE login='jlennon'; ``` -Batch statements are supported and can span multiple lines, as well as DDL(CREATE/ALTER/DROP) statements: +Batch statements are supported and can span multiple lines, as well as DDL(CREATE/ALTER/DROP) statements: ```sql @@ -129,7 +129,7 @@ CREATE TABLE IF NOT EXISTS test( ); ``` -CQL statements are case-insensitive (except for column names and values). This means that the following statements are equivalent and valid: +CQL statements are case-insensitive (except for column names and values). This means that the following statements are equivalent and valid: ```sql @@ -138,7 +138,8 @@ Insert into users(login,name) vAlues('hsue','Helen SUE'); ``` The complete list of all CQL statements and versions can be found below: -
+ +
@@ -147,36 +148,36 @@ The complete list of all CQL statements and versions can be found below: - + - + - +
Cassandra Version
2.2 - http://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html
2.1 & 2.0 - http://docs.datastax.com/en/cql/3.1/cql/cql_intro_c.html
1.2 - http://docs.datastax.com/en/cql/3.0/cql/aboutCQL.html
-## 6. Comments in statements +## Comments in statements -It is possible to add comments between statements. Single line comments start with the **hash sign** (#) or **double slashes** (//). Multi-line comments are enclosed between /** and **/. Ex: +It is possible to add comments between statements. Single line comments start with the **hash sign** (#) or **double slashes** (//). Multi-line comments are enclosed between /** and **/. Ex: ```sql @@ -194,17 +195,17 @@ Insert into users(login,name) vAlues('hsue','Helen SUE'); ## Syntax Validation -The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors. +The interpreters is shipped with a built-in syntax validator. This validator only checks for basic syntax errors. -All CQL-related syntax validation is delegated directly to **Cassandra** +All CQL-related syntax validation is delegated directly to **Cassandra** Most of the time, syntax errors are due to **missing semi-colons** between statements or **typo errors**. - + ## Schema commands To make schema discovery easier and more interactive, the following commands are supported: -
+
@@ -213,23 +214,23 @@ To make schema discovery easier and more interactive, the following commands are - + - + - + - + - + @@ -237,52 +238,52 @@ To make schema discovery easier and more interactive, the following commands are - + - + - + - + - - + - - + - - +
Command
DESCRIBE CLUSTER; Show the current cluster name and its partitioner
DESCRIBE KEYSPACES; List all existing keyspaces in the cluster and their configuration (replication factor, durable write ...)
DESCRIBE TABLES; List all existing keyspaces in the cluster and for each, all the tables name
DESCRIBE TYPES; List all existing keyspaces in the cluster and for each, all the user-defined types name
DESCRIBE FUNCTIONS; List all existing keyspaces in the cluster and for each, all the functions name
DESCRIBE AGGREGATES; List all existing keyspaces in the cluster and for each, all the aggregates name
DESCRIBE MATERIALIZED VIEWS; List all existing keyspaces in the cluster and for each, all the materialized views name
DESCRIBE KEYSPACE <keyspace_name>; Describe the given keyspace configuration and all its table details (name, columns, ...)
DESCRIBE TABLE (<keyspace_name>).<table_name>; - Describe the given table. If the keyspace is not provided, the current logged in keyspace is used. - If there is no logged in keyspace, the default system keyspace is used. + Describe the given table. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. If no table is found, an error message is raised
DESCRIBE TYPE (<keyspace_name>).<type_name>; - Describe the given type(UDT). If the keyspace is not provided, the current logged in keyspace is used. - If there is no logged in keyspace, the default system keyspace is used. + Describe the given type(UDT). If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. If no type is found, an error message is raised
DESCRIBE FUNCTION (<keyspace_name>).<function_name>;Describe the given function. If the keyspace is not provided, the current logged in keyspace is used. - If there is no logged in keyspace, the default system keyspace is used. + Describe the given function. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. If no function is found, an error message is raised
DESCRIBE AGGREGATE (<keyspace_name>).<aggregate_name>;Describe the given aggregate. If the keyspace is not provided, the current logged in keyspace is used. - If there is no logged in keyspace, the default system keyspace is used. + Describe the given aggregate. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. If no aggregate is found, an error message is raised
DESCRIBE MATERIALIZED VIEW (<keyspace_name>).<view_name>;Describe the given view. If the keyspace is not provided, the current logged in keyspace is used. - If there is no logged in keyspace, the default system keyspace is used. + Describe the given view. If the keyspace is not provided, the current logged in keyspace is used. + If there is no logged in keyspace, the default system keyspace is used. If no view is found, an error message is raised
-
- -The schema objects (cluster, keyspace, table, type, function and aggregate) are displayed in a tabular format. +
+ +The schema objects (cluster, keyspace, table, type, function and aggregate) are displayed in a tabular format. There is a drop-down menu on the top left corner to expand objects details. On the top right menu is shown the Icon legend.
@@ -291,13 +292,13 @@ There is a drop-down menu on the top left corner to expand objects details. On t ## Runtime Parameters -Sometimes you want to be able to pass runtime query parameters to your statements. +Sometimes you want to be able to pass runtime query parameters to your statements. -Those parameters are not part of the CQL specs and are specific to the interpreter. +Those parameters are not part of the CQL specs and are specific to the interpreter. -Below is the list of all parameters: +Below is the list of all parameters: -
+
@@ -335,9 +336,9 @@ Below is the list of all parameters:
Parameter
-Some parameters only accept restricted values: +Some parameters only accept restricted values: -
+
@@ -364,11 +365,11 @@ Some parameters only accept restricted values:
ParameterAny integer value
-
+
>Please note that you should **not** add semi-colon ( **;** ) at the end of each parameter statement -Some examples: +Some examples: ```sql @@ -393,9 +394,9 @@ INSERT INTO spark_demo.ts(key,value) VALUES(1,'second insert'); # Check for the result. You should see 'first insert' SELECT value FROM spark_demo.ts WHERE key=1; ``` - + Some remarks about query parameters: - + > 1. **many** query parameters can be set in the same paragraph > 2. if the **same** query parameter is set many time with different values, the interpreter only take into account the first value > 3. each query parameter applies to **all CQL statements** in the same paragraph, unless you override the option using plain CQL text (like forcing timestamp with the USING clause) @@ -403,15 +404,15 @@ Some remarks about query parameters: ## Support for Prepared Statements -For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values. +For performance reason, it is better to prepare statements before-hand and reuse them later by providing bound values. -This interpreter provides 3 commands to handle prepared and bound statements: +This interpreter provides 3 commands to handle prepared and bound statements: 1. **@prepare** 2. **@bind** 3. **@remove_prepared** -Example: +Example: ``` @prepare[statement_name]=... @@ -425,35 +426,35 @@ Example: #### @prepare -You can use the syntax _"@prepare[statement_name]=SELECT ..."_ to create a prepared statement. -The _statement_name_ is **mandatory** because the interpreter prepares the given statement with the Java driver and +You can use the syntax _"@prepare[statement_name]=SELECT ..."_ to create a prepared statement. +The _statement_name_ is **mandatory** because the interpreter prepares the given statement with the Java driver and saves the generated prepared statement in an **internal hash map**, using the provided _statement_name_ as search key. - -> Please note that this internal prepared statement map is shared with **all notebooks** and **all paragraphs** because + +> Please note that this internal prepared statement map is shared with **all notebooks** and **all paragraphs** because there is only one instance of the interpreter for Cassandra - + > If the interpreter encounters **many** @prepare for the **same _statement_name_ (key)**, only the **first** statement will be taken into account. - -Example: + +Example: ``` @prepare[select]=SELECT * FROM spark_demo.albums LIMIT ? @prepare[select]=SELECT * FROM spark_demo.artists LIMIT ? -``` +``` For the above example, the prepared statement is _SELECT * FROM spark_demo.albums LIMIT ?_. -_SELECT * FROM spark_demo.artists LIMIT ?_ is ignored because an entry already exists in the prepared statements map with the key select. +_SELECT * FROM spark_demo.artists LIMIT ?_ is ignored because an entry already exists in the prepared statements map with the key select. -In the context of **Zeppelin**, a notebook can be scheduled to be executed at regular interval, +In the context of **Zeppelin**, a notebook can be scheduled to be executed at regular interval, thus it is necessary to **avoid re-preparing many time the same statement (considered an anti-pattern)**. #### @bind -Once the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it: +Once the statement is prepared (possibly in a separated notebook/paragraph). You can bind values to it: ``` @bind[select_first]=10 -``` +``` Bound values are not mandatory for the **@bind** statement. However if you provide bound values, they need to comply to some syntax: @@ -471,7 +472,7 @@ Bound values are not mandatory for the **@bind** statement. However if you provi * **udt** values should be enclosed between brackets (see **[UDT CQL syntax]**): {stree_name: ‘Beverly Hills’, number: 104, zip_code: 90020, state: ‘California’, …} > It is possible to use the @bind statement inside a batch: -> +> > ```sql >BEGIN BATCH > @bind[insert_user]='jdoe','John DOE' @@ -481,21 +482,21 @@ Bound values are not mandatory for the **@bind** statement. However if you provi #### @remove_prepare -To avoid for a prepared statement to stay forever in the prepared statement map, you can use the -**@remove_prepare[statement_name]** syntax to remove it. +To avoid for a prepared statement to stay forever in the prepared statement map, you can use the +**@remove_prepare[statement_name]** syntax to remove it. Removing a non-existing prepared statement yields no error. ## Using Dynamic Forms -Instead of hard-coding your CQL queries, it is possible to use the mustache syntax ( **\{\{ \}\}** ) to inject simple value or multiple choices forms. +Instead of hard-coding your CQL queries, it is possible to use the mustache syntax ( **\{\{ \}\}** ) to inject simple value or multiple choices forms. -The syntax for simple parameter is: **\{\{input_Label=default value\}\}**. The default value is mandatory because the first time the paragraph is executed, -we launch the CQL query before rendering the form so at least one value should be provided. +The syntax for simple parameter is: **\{\{input_Label=default value\}\}**. The default value is mandatory because the first time the paragraph is executed, +we launch the CQL query before rendering the form so at least one value should be provided. -The syntax for multiple choices parameter is: **\{\{input_Label=value1 | value2 | … | valueN \}\}**. By default the first choice is used for CQL query -the first time the paragraph is executed. +The syntax for multiple choices parameter is: **\{\{input_Label=value1 | value2 | … | valueN \}\}**. By default the first choice is used for CQL query +the first time the paragraph is executed. -Example: +Example: {% raw %} #Secondary index on performer style @@ -504,26 +505,26 @@ Example: WHERE name='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}' AND styles CONTAINS '{{style=Rock}}'; {% endraw %} - -In the above example, the first CQL query will be executed for _performer='Sheryl Crow' AND style='Rock'_. -For subsequent queries, you can change the value directly using the form. -> Please note that we enclosed the **\{\{ \}\}** block between simple quotes ( **'** ) because Cassandra expects a String here. -> We could have also use the **\{\{style='Rock'\}\}** syntax but this time, the value displayed on the form is **_'Rock'_** and not **_Rock_**. +In the above example, the first CQL query will be executed for _performer='Sheryl Crow' AND style='Rock'_. +For subsequent queries, you can change the value directly using the form. + +> Please note that we enclosed the **\{\{ \}\}** block between simple quotes ( **'** ) because Cassandra expects a String here. +> We could have also use the **\{\{style='Rock'\}\}** syntax but this time, the value displayed on the form is **_'Rock'_** and not **_Rock_**. -It is also possible to use dynamic forms for **prepared statements**: +It is also possible to use dynamic forms for **prepared statements**: {% raw %} @bind[select]=='{{performer=Sheryl Crow|Doof|Fanfarlo|Los Paranoia}}', '{{style=Rock}}' - + {% endraw %} ## Shared states -It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. -_Asynchronous execution_ is only possible when it is possible to return a `Future` value in the `InterpreterResult`. +It is possible to execute many paragraphs in parallel. However, at the back-end side, we’re still using synchronous queries. +_Asynchronous execution_ is only possible when it is possible to return a `Future` value in the `InterpreterResult`. It may be an interesting proposal for the **Zeppelin** project. Another caveat is that the same `com.datastax.driver.core.Session` object is used for **all** notebooks and paragraphs. @@ -533,24 +534,24 @@ per instance of **Cassandra** interpreter. The same remark does apply to the **prepared statement hash map**, it is shared by **all users** using the same instance of **Cassandra** interpreter. -Until **Zeppelin** offers a real multi-users separation, there is a work-around to segregate user environment and states: +Until **Zeppelin** offers a real multi-users separation, there is a work-around to segregate user environment and states: _create different **Cassandra** interpreter instances_ For this, first go to the **Interpreter** menu and click on the **Create** button
![Create Interpreter](../assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInstance.png)
- -In the interpreter creation form, put **cass-instance2** as **Name** and select the **cassandra** -in the interpreter drop-down list + +In the interpreter creation form, put **cass-instance2** as **Name** and select the **cassandra** +in the interpreter drop-down list
![Interpreter Name](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterName.png) -
+
Click on **Save** to create the new interpreter instance. Now you should be able to see it in the interpreter list.
![Interpreter In List](../assets/themes/zeppelin/img/docs-img/cassandra-NewInterpreterInList.png) -
+
Go back to your notebook and click on the **Gear** icon to configure interpreter bindings. You should be able to see and select the **cass-instance2** interpreter instance in the available @@ -558,7 +559,7 @@ interpreter list instead of the standard **cassandra** instance.
![Interpreter Instance Selection](../assets/themes/zeppelin/img/docs-img/cassandra-InterpreterInstanceSelection.png) -
+
## Interpreter Configuration @@ -625,7 +626,7 @@ Below are the configuration parameters and their default value. Load balancing policy. Default = new TokenAwarePolicy(new DCAwareRoundRobinPolicy()) To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using + At runtime the interpreter will instantiate the policy using Class.forName(FQCN) DEFAULT @@ -724,7 +725,7 @@ Below are the configuration parameters and their default value. Cassandra Reconnection Policy. Default = new ExponentialReconnectionPolicy(1000, 10 * 60 * 1000) To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using + At runtime the interpreter will instantiate the policy using Class.forName(FQCN) DEFAULT @@ -735,7 +736,7 @@ Below are the configuration parameters and their default value. Cassandra Retry Policy. Default = DefaultRetryPolicy.INSTANCE To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using + At runtime the interpreter will instantiate the policy using Class.forName(FQCN) DEFAULT @@ -761,7 +762,7 @@ Below are the configuration parameters and their default value. Cassandra Speculative Execution Policy. Default = NoSpeculativeExecutionPolicy.INSTANCE To Specify your own policy, provide the fully qualify class name (FQCN) of your policy. - At runtime the interpreter will instantiate the policy using + At runtime the interpreter will instantiate the policy using Class.forName(FQCN) DEFAULT @@ -770,19 +771,20 @@ Below are the configuration parameters and their default value. ## Change Log -**2.0** _(Zeppelin {{ site.ZEPPELIN_VERSION }})_ : +**2.0** _(Zeppelin {{ site.ZEPPELIN_VERSION }})_ : * Update help menu and add changelog * Add Support for **User Defined Functions**, **User Defined Aggregates** and **Materialized Views** * Upgrade Java driver version to **3.0.0-rc1** -**1.0** _(Zeppelin 0.5.5-incubating)_ : +**1.0** _(Zeppelin 0.5.5-incubating)_ : * Initial version -## 15. Bugs & Contacts +## Bugs & Contacts If you encounter a bug for this interpreter, please create a **[JIRA]** ticket and ping me on Twitter at **[@doanduyhai]** + [Cassandra Java Driver]: https://github.com/datastax/java-driver [standard CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_collections_c.html [Tuple CQL syntax]: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/tupleType.html