Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDAP-4957 et al: Fix Third-Party Plugin Documentation and JDBC examples #5589

Merged
merged 6 commits into from Apr 23, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 15 additions & 9 deletions cdap-docs/cdap-apps/build.sh
Expand Up @@ -41,16 +41,18 @@ function get_hydrator_version() {

function download_md_doc_file() {
# Downloads a Markdown docs file to a directory
#
# https://raw.githubusercontent.com/caskdata/hydrator-plugins/develop/cassandra-plugins/docs/Cassandra-batchsink.md
# goes to
# hydrator-plugins/batchsinks/cassandra.md
# 1:Includes dir 2:GitHub Hydrator source dir 3:Hydrator dir 4:Type 5:Target filename 6:Source Markdown filename
# download_md_doc_file base_target base_source source_dir source_file_name
# download_md_doc_file $base_target $hydrator_source cassandra-plugins Cassandra-batchsink.md
#
# download_md_doc_file base_target base_source source_dir source_file_name append_file (optional)
# download_md_doc_file $base_target $hydrator_source cassandra-plugins Cassandra-batchsink.md append.txt
local base_target="${1}"
local base_source="${2}"
local source_dir="${3}"
local source_file_name="${4}" # JavaScript-transform.md
local append_file="${5}"
local source_name="${source_file_name%-*}"

local type=$(echo "${source_file_name#*-}" | tr [:upper:] [:lower:]) # batchsink
Expand All @@ -76,21 +78,25 @@ function download_md_doc_file() {
if curl --output /dev/null --silent --head --fail "${source_url}"; then
echo "Downloading ${source_file_name} from ${source_dir} to ${type_plural}/${target_file_name}"
curl --silent ${source_url} --output ${target}
echo "${RETURN_STRING}" >> ${target}
echo "${VERSION_STRING}${HYDRATOR_VERSION}" >> ${target}
# FIXME if file does not begin with a "#" character, append "# title\n" to start
local first=$(head -1 ${target})
if [ "x${first:0:2}" != "x# " ]; then
local m="Markdown file missing initial title: ${source_file_name}: ${source_name} ${type_capital}"
echo_red_bold "${m}"
set_message "${m}"
echo -e "# ${source_name} ${type_capital}\n$(cat ${target})" > ${target}
fi
fi
if [[ "x${append_file}" != "x" ]]; then
echo " Appending ${append_file} to ${target_file_name}"
cat ${base_target}/${append_file} >> ${target}
fi
echo "${RETURN_STRING}" >> ${target}
echo "${VERSION_STRING}${HYDRATOR_VERSION}" >> ${target}
else
local m="URL does not exist: ${source_url}"
echo_red_bold "${m}"
set_message "${m}"
fi
fi
}

function extract_table() {
Expand Down Expand Up @@ -168,8 +174,8 @@ function download_includes() {
download_md_doc_file $base_target $hydrator_source core-plugins Twitter-realtimesource.md
download_md_doc_file $base_target $hydrator_source core-plugins Validator-transform.md

download_md_doc_file $base_target $hydrator_source database-plugins Database-batchsink.md
download_md_doc_file $base_target $hydrator_source database-plugins Database-batchsource.md
download_md_doc_file $base_target $hydrator_source database-plugins Database-batchsink.md database-batchsink-append.txt
download_md_doc_file $base_target $hydrator_source database-plugins Database-batchsource.md database-batchsource-append.txt

download_md_doc_file $base_target $hydrator_source elasticsearch-plugins Elasticsearch-batchsink.md
download_md_doc_file $base_target $hydrator_source elasticsearch-plugins Elasticsearch-batchsource.md
Expand Down
@@ -0,0 +1,5 @@

Using Third-Party JARs
----------------------
For information on how to use the JDBC jar to talk to the database sink, see
[Using Third-Party JARs](../third-party.html).
@@ -0,0 +1,5 @@

Using Third-Party JARs
----------------------
For information on how to use the JDBC jar to talk to the database source, see
[Using Third-Party JARs](../third-party.html).
172 changes: 97 additions & 75 deletions cdap-docs/developers-manual/source/building-blocks/plugins.rst
Expand Up @@ -121,11 +121,42 @@ Sometimes there is a need to use classes in a third-party JAR as plugins. For ex
a JDBC driver as a plugin. In these situations, you have no control over the code, which means you cannot
annotate the relevant class with the ``@Plugin`` annotation. If this is the case, you can explicitly specify
the plugins when deploying the artifact. For example, if you are using the RESTful API, you set the
``Artifact-Plugins`` and ``Artifact-Version`` headers when deploying the artifact::
``Artifact-Plugins``, ``Artifact-Version``, and ``Artifact-Extends`` headers when deploying the artifact:

.. tabbed-parsed-literal::

$ curl -w"\n" -X POST "localhost:10000/v3/namespaces/default/artifacts/mysql-connector-java" \
-H "Artifact-Plugins: [ { 'name': 'mysql', 'type': 'jdbc', 'className': 'com.mysql.jdbc.Driver' } ]" \
-H "Artifact-Version: 5.1.35" \
-H "Artifact-Extends: system:cdap-etl-batch[|version|, |version|]/system:cdap-etl-realtime[|version|, |version|]" \
--data-binary @mysql-connector-java-5.1.35.jar

Or, using the CDAP CLI:

.. tabbed-parsed-literal::
:tabs: "CDAP CLI"

|cdap >| load artifact /path/to/mysql-connector-java-5.1.35.jar config-file /path/to/config.json


where ``config.json`` contains:

.. highlight:: xml

.. container:: highlight

.. parsed-literal::
{
"parents": [ "system:cdap-etl-batch\[|version|,\ |version|]", "system:cdap-etl-realtime[|version|,\ |version|]" ],
"plugins": [
{
"name": "mysql",
"type": "jdbc",
"className": "com.mysql.jdbc.Driver"
}
]
}

$ curl -w'\n' localhost:10000/v3/namespaces/default/artifacts/mysql-connector-java \
-H 'Artifact-Plugins: [ { "name": "mysql", "type": "jdbc", "className": "com.mysql.jdbc.Driver" } ]' \
-H 'Artifact-Version: 5.1.35' --data-binary @mysql-connector-java-5.1.35.jar

.. _plugins-deployment:

Expand Down Expand Up @@ -265,21 +296,18 @@ When using the CLI, a configuration file exactly like the one described in the

For example, to deploy ``custom-transforms-1.0.0.jar`` using the RESTful API:

.. highlight:: console

.. container:: highlight
.. tabbed-parsed-literal::

.. parsed-literal::
|$| curl -w'\\n' localhost:10000/v3/namespaces/default/artifacts/custom-transforms \\
-H 'Artifact-Extends: system:cdap-etl-batch[|version|, |version|]/system:cdap-etl-realtime[|version|, |version|]' \\
--data-binary @/path/to/custom-transforms-1.0.0.jar
$ curl -w"\n" -X POST "localhost:10000/v3/namespaces/default/artifacts/custom-transforms" \
-H "Artifact-Extends: system:cdap-etl-batch[|version|, |version|]/system:cdap-etl-realtime[|version|, |version|]" \
--data-binary @/path/to/custom-transforms-1.0.0.jar

Using the CLI:

.. container:: highlight

.. parsed-literal::
|$| cdap-cli.sh load artifact /path/to/custom-transforms-1.0.0.jar config-file /path/to/config.json
.. tabbed-parsed-literal::
:tabs: "CDAP CLI"

|cdap >| load artifact /path/to/custom-transforms-1.0.0.jar config-file /path/to/config.json

where ``config.json`` contains:

Expand All @@ -305,22 +333,20 @@ first be deleted.
Using the RESTful API (note that if the artifact version is not in the JAR manifest file,
it needs to be set explicitly, as the JAR contents are uploaded without the filename):

.. highlight:: console

.. container:: highlight
.. tabbed-parsed-literal::

.. parsed-literal::
|$| curl -w'\\n' localhost:10000/v3/namespaces/default/artifacts/mysql-connector-java \\
-H 'Artifact-Extends: system:cdap-etl-batch[|version|,\ |version|]/system:cdap-etl-realtime[|version|,\ |version|]' \\
-H 'Artifact-Plugins: [ { "name": "mysql", "type": "jdbc", "className": "com.mysql.jdbc.Driver" } ]' \\
-H 'Artifact-Version: 5.1.35' --data-binary @/path/to/mysql-connector-java-5.1.35.jar
$ curl -w"\n" -X POST "localhost:10000/v3/namespaces/default/artifacts/mysql-connector-java" \
-H "Artifact-Plugins: [ { 'name': 'mysql', 'type': 'jdbc', 'className': 'com.mysql.jdbc.Driver' } ]" \
-H "Artifact-Version: 5.1.35" \
-H "Artifact-Extends: system:cdap-etl-batch[|version|, |version|]/system:cdap-etl-realtime[|version|, |version|]" \
--data-binary @mysql-connector-java-5.1.35.jar

Using the CLI (note that the artifact version, if not explicitly set, is derived from the JAR filename):

.. container:: highlight

.. parsed-literal::
|$| cdap-cli.sh load artifact /path/to/mysql-connector-java-5.1.35.jar config-file /path/to/config.json
.. tabbed-parsed-literal::
:tabs: "CDAP CLI"

|cdap >| load artifact /path/to/mysql-connector-java-5.1.35.jar config-file /path/to/config.json

where ``config.json`` contains:

Expand Down Expand Up @@ -348,12 +374,9 @@ You can verify that a plugin artifact was added successfully by using the
:ref:`RESTful Artifact API <http-restful-api-artifact-detail>` to retrieve artifact details.
For example, to retrieve detail about our ``custom-transforms`` artifact:

.. highlight:: console
.. tabbed-parsed-literal::

.. container:: highlight

.. parsed-literal::
|$| curl -w'\\n' localhost:10000/v3/namespaces/default/artifacts/custom-transforms/versions/1.0.0?scope=[system | user]
$ curl -w"\n" -X POST "localhost:10000/v3/namespaces/default/artifacts/custom-transforms/versions/1.0.0?scope=[system | user]

If you deployed the ``custom-transforms`` artifact as a system artifact, the scope is ``system``.
If you deployed the ``custom-transforms`` artifact as a user artifact, the scope is ``user``.
Expand All @@ -363,10 +386,9 @@ You can verify that the plugins in your newly-added artifact are available to it
specific type. For example, to check if ``cdap-etl-batch`` can access the plugins in the
``custom-transforms`` artifact:

.. container:: highlight
.. tabbed-parsed-literal::

.. parsed-literal::
|$| curl -w'\\n' localhost:10000/v3/namespaces/default/artifacts/cdap-etl-batch/versions/|version|/extensions/transform?scope=system
$ curl -w"\n" -X POST "localhost:10000/v3/namespaces/default/artifacts/cdap-etl-batch/versions/|version|/extensions/transform?scope=system"

You can then check the list returned to see if your transforms are in the list. Note that
the scope here refers to the scope of the parent artifact. In this example it is the ``system``
Expand Down Expand Up @@ -437,40 +459,40 @@ in those files into words, and then counts how many times each word appears. The

.. highlight:: console

We package our code into a JAR file named ``wordcount-1.0.0.jar`` and add it to CDAP::
We package our code into a JAR file named ``wordcount-1.0.0.jar`` and add it to CDAP:

.. tabbed-parsed-literal::

$ curl -w'\n' localhost:10000/v3/namespaces/default/artifacts/wordcount --data-binary @wordcount-1.0.0.jar
$ curl -w"\n" -X POST "localhost:10000/v3/namespaces/default/artifacts/wordcount" --data-binary @wordcount-1.0.0.jar

We then create an application from that artifact::
We then create an application from that artifact:

$ curl -w'\n' -X PUT localhost:10000/v3/namespaces/default/apps/basicwordcount -H 'Content-Type: application/json' \
-d '{
"artifact": { "name": "wordcount", "version": "1.0.0", "scope": "user" }
}'
.. tabbed-parsed-literal::

$ curl -w"\n" -X PUT "localhost:10000/v3/namespaces/default/apps/basicwordcount" -H "Content-Type: application/json" \
-d "{ 'artifact': { 'name': 'wordcount', 'version': '1.0.0', 'scope': 'user' } }"

This program runs just fine. It counts all words in the input. However, what if we want to count phrases
instead of words? Or what if we want to filter out common words such as 'the' and 'a'? We would not want
instead of words? Or what if we want to filter out common words such as ``'the'`` and ``'a'``? We would not want
to copy and paste our application class and then make just small tweaks.

.. rubric:: A Configurable Application

Instead, we would like to be able to create applications that
are configured to tokenize the line in different ways. That is, if we want an application that filters
stopwords, we want to be able to create it through a configuration::
stopwords, we want to be able to create it through a configuration:

$ curl -w'\n' -X PUT localhost:10000/v3/namespaces/default/apps/stopwordcount -H 'Content-Type: application/json' \
-d '{
"artifact": { "name": "wordcount", "version": "1.0.0", "scope": "user" },
"config": { "tokenizer": "stopword" }
}'
.. tabbed-parsed-literal::

Similarly, we want to be able to create an application that counts phrases through a configuration::
$ curl -w"\n" -X PUT "localhost:10000/v3/namespaces/default/apps/stopwordcount" -H "Content-Type: application/json" \
-d "{ 'artifact': { 'name': 'wordcount', 'version': '1.0.0', 'scope': 'user' }, 'config': { 'tokenizer': 'stopword' } }"

Similarly, we want to be able to create an application that counts phrases through a configuration:

$ curl -w'\n' -X PUT localhost:10000/v3/namespaces/default/apps/phrasecount -H 'Content-Type: application/json' \
-d '{
"artifact": { "name": "wordcount", "version": "1.0.0", "scope": "user" },
"config": { "tokenizer": "phrase" }
}'
.. tabbed-parsed-literal::

$ curl -w"\n" -X PUT "localhost:10000/v3/namespaces/default/apps/phrasecount" -H "Content-Type: application/json" \
-d "{ 'artifact': { 'name': 'wordcount', 'version': '1.0.0', 'scope': 'user' }, 'config': { 'tokenizer': 'phrase' } }"

.. highlight:: java

Expand Down Expand Up @@ -575,9 +597,9 @@ package in our pom.xml:

We then package the code in a new version of the artifact ``wordcount-1.1.0.jar`` and deploy it:

.. code-block:: console
.. tabbed-parsed-literal::

$ curl -w'\n' localhost:10000/v3/namespaces/default/artifacts/wordcount --data-binary @wordcount-1.1.0.jar
$ curl -w"\n" -X POST "localhost:10000/v3/namespaces/default/artifacts/wordcount" --data-binary @wordcount-1.1.0.jar

.. rubric:: Implementing Tokenizer Plugins

Expand Down Expand Up @@ -665,19 +687,20 @@ we need to expose the ``com.example.tokenizer`` package in our pom.xml:
.. highlight:: console

When deploying this artifact, we tell CDAP that the artifact extends the ``wordcount`` artifact, versions
``1.1.0`` inclusive to ``2.0.0`` exclusive::
``1.1.0`` inclusive to ``2.0.0`` exclusive:

.. tabbed-parsed-literal::

$ curl -w'\n' localhost:10000/v3/namespaces/default/artifacts/tokenizers --data-binary @tokenizers-1.0.0.jar \
-H 'Artifact-Extends:wordcount[1.1.0,2.0.0)'
$ curl -w"\n" "localhost:10000/v3/namespaces/default/artifacts/tokenizers" --data-binary @tokenizers-1.0.0.jar \
-H "Artifact-Extends:wordcount[1.1.0,2.0.0)"

This will make the plugins available to those versions of the ``wordcount`` artifact. We can now create
applications that use the tokenizer we want::
applications that use the tokenizer we want:

.. tabbed-parsed-literal::

$ curl -w'\n' -X PUT localhost:10000/v3/namespaces/default/apps/phrasecount -H 'Content-Type: application/json' \
-d '{
"artifact": { "name": "wordcount", "version": "1.1.0", "scope": "user" },
"config": { "tokenizer": "phrase" }
}'
$ curl -w"\n" -X PUT localhost:10000/v3/namespaces/default/apps/phrasecount -H "Content-Type: application/json" \
-d "{ 'artifact': { 'name': 'wordcount', 'version': '1.1.0', 'scope': 'user' }, 'config': { 'tokenizer': 'phrase' } }"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't swap the quotes, this is no longer valid json.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have had to do this in the examples (swap quotes) otherwise they don't work under Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See http://docs.cask.co/cdap/3.4.0-SNAPSHOT/en/examples-manual/examples/log-analysis.html#running-the-example under "Querying the Results":

cdap-cli.bat call service LogAnalysis.HitCounterService POST "hitcount" body "{'url':'/index.html'}""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means all our Windows examples will need to change, as they require the double-quotes on the outside, to

cdap-cli.bat call service LogAnalysis.HitCounterService POST "hitcount" body "{\"url\":\"/index.html\"}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o i guess it is fine then. Don't know why the linter I tried choked on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, when I went and looked up the spec, it does say double-quotes, so I think you are correct there. See http://www.json.org


.. rubric:: Adding a Plugin Configuration to the Application

Expand Down Expand Up @@ -734,14 +757,13 @@ property must be given when registering the plugin::

.. highlight:: console

Now we can create an application that uses a comma instead of a space to split text::
Now we can create an application that uses a comma instead of a space to split text (re-formatted for display):

$ curl -w'\n' -X PUT localhost:10000/v3/namespaces/default/apps/wordcount2 -H 'Content-Type: application/json' \
-d '{
"artifact": { "name": "wordcount", "version": "1.2.0", "scope": "user" },
"config": {
"tokenizer": "default",
"tokenizerProperties": { "delimiter": "," }
}
}'
.. tabbed-parsed-literal::

$ curl -w"\n" -X PUT "localhost:10000/v3/namespaces/default/apps/wordcount2" -H "Content-Type: application/json" \
-d "{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't change quotes, this is no longer valid json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about other similar changes in this commit

'artifact': { 'name': 'wordcount', 'version': '1.2.0', 'scope': 'user' },
'config': { 'tokenizer': 'default', 'tokenizerProperties': { 'delimiter': ',' }
}
}"