Skip to content

dturanski/scdf-app-tool

 
 

Repository files navigation

Spring Cloud Data Flow App Tool

What is this?

Use this tool to import selected pre-built stream and task app jars to a local Spring boot scdf-app-repo project tree which you build and deploy to provide an http app repository on the local network that Data Flow can use to register and deploy stream and task apps without requiring an access to an external Maven repository.

Why would you use this?

The pre-built jars are updated in conjunction with Spring Cloud Data Flow releases and published to the Spring Maven Repository. While the Spring Cloud deployers used by Data Flow load jars directly from this repository by default, Data Flow users frequently must deploy these apps to an environment that is behind a firewall, either requiring a Maven proxy or possibly without public internet access at all. This tool makes it easy to build and install a resource on the local network which allows the Data Flow deployer to pull the binaries and runtime metadata via http.

Getting Started

The scdf-app-tool is a simple Java program built with Spring Shell 2.0.x. Since it is a Java application, it can run on Unix, Mac, and Windows. Download the binary archive (zip, tar.gz, tar .bz2) from the latest release or build from source and run

$ ./scdf-app-tool

or on Windows

$ .\scdf-app-tool.bat

This will run a REPL Shell to perform various commands.

$./scdf-app-tool
binder-undefined:>
Note
The prompt indicates that no binder (rabbit, kafka) has been set for stream apps. The stream apps are packaged as separate binaries for each supported messaging transport in accordance with Spring Cloud Stream architecture. To import stream apps, it is convenient to specify the binder up front rather than as an option for each import command. While it is possible to mix and match binders, typically, users chose one for all stream pipelines.

In this intial state, only commands related to stream apps are disabled. Type help to see what you can do.

binder-undefined:>help
AVAILABLE COMMANDS

Binder
        binder: Set the Binder to use for stream apps.

Built-In Commands
        clear: Clear the shell screen.
        exit, quit: Exit the shell.
        help: Display help about available commands.
        history: Display or save the history of previously run commands
        script: Read and execute commands from a file.
        stacktrace: Display the full stacktrace of the last error.

Download
      * get stream-apps: Download stream app jars from maven.
        get task-apps: Download task app jars from maven or given URL.
      * list stream-apps: List stream apps available for download.
        list task-apps: List task apps available for download.

Repo
        repo add: Add to local repository from a URL.
        repo clean: Clean local repository.
        repo list, repo ls: List local repository.
        repo rm: Remove entries from local repository.

Commands marked with (*) are currently unavailable.
Type `help <command>` to learn more.

Commands marked with (*) are currently unavailable.
Type `help <command>` to learn more.

Once you set the binder (kafka or rabbit), the prompt will change to the selected binder and the stream commands, marked with * above will be available.

binder-undefined:>binder rabbit
rabbit:>
Tip
binder is alqo Spring property that can be set as a command line argument, e.g., $./scdf-app-tool --binder=rabbit

Note that kafka is an alias for kafka-10, so you can do this:

rabbit:>binder kafka
kafka-10:>

Besides, the binder command, there are commands related to downloading binaries from Maven and commands related to the Local repository. The Download commands allow you to select from available binaries using simple pattern matching. There are name and type options for the get stream-apps command, each of which can include wildcards. If you type list stream-apps you get a list of what is available for download. Below is the head of the list.

kafka-10:>list stream-apps
processor:aggregator
processor:bridge
processor:filter
processor:groovy-filter
processor:groovy-transform
processor:header-enricher
processor:httpclient
processor:pmml
processor:python-http
processor:python-jython
processor:scriptable-transform
processor:splitter
processor:tasklaunchrequest-transform
processor:tcp-client
processor:tensorflow
processor:transform
processor:twitter-sentiment
sink:aggregate-counter
sink:cassandra
sink:counter
sink:field-value-counter
sink:file
sink:ftp
sink:gemfire
sink:gpfdist
sink:hdfs
sink:hdfs-dataset
sink:jdbc
sink:log
sink:mongodb
sink:mqtt
...

The available apps are configured in a properties file, in this case config/kafka-10-stream-apps.properties, this command displays the property keys (excluding the associated metadata entries) in the form <type>:<name>.

You can filter by name and/or type. For instance:

kafka-10:>get stream-apps --name jdbc
downloading https://repo.spring.io/release/org/springframework/cloud/stream/app/jdbc-sink-kafka-10/1.3.1.RELEASE/jdbc-sink-kafka-10-1.3.1.RELEASE.jar...
downloading https://repo.spring.io/release/org/springframework/cloud/stream/app/jdbc-sink-kafka-10/1.3.1.RELEASE/jdbc-sink-kafka-10-1.3.1.RELEASE-metadata.jar...
downloading https://repo.spring.io/release/org/springframework/cloud/stream/app/jdbc-source-kafka-10/1.3.1.RELEASE/jdbc-source-kafka-10-1.3.1.RELEASE.jar...
downloading https://repo.spring.io/release/org/springframework/cloud/stream/app/jdbc-source-kafka-10/1.3.1.RELEASE/jdbc-source-kafka-10-1.3.1.RELEASE-metadata.jar...
Note
You may also edit or comment (#) the contents of these files and bulk import everything via get stream-apps with no parameters.

Now check the current state of the local repo:

kafka-10:>repo list
sink.jdbc=jdbc-sink-kafka-10-1.3.1.RELEASE.jar
sink.jdbc.metadata=jdbc-sink-kafka-10-1.3.1.RELEASE-metadata.jar
source.jdbc=jdbc-source-kafka-10-1.3.1.RELEASE.jar
source.jdbc.metadata=jdbc-source-kafka-10-1.3.1.RELEASE-metadata.jar

As we can see we have downloaded the jdbc source and jdbc sink apps and metadata which is really useful for configuring these apps in Data Flow. But maybe we made a mistake because we really just wanted the sink.

kafka-10:>repo rm  --name jdbc --type sink
rm jdbc-sink-kafka-10-1.3.1.RELEASE-metadata.jar
rm jdbc-sink-kafka-10-1.3.1.RELEASE.jar
kafka-10:>repo list
source.jdbc=jdbc-source-kafka-10-1.3.1.RELEASE.jar
source.jdbc.metadata=jdbc-source-kafka-10-1.3.1.RELEASE-metadata.jar
kafka-10:>

So we can continue this way until we have everything we need for the time being. There are similar commands for tasks:

kafka-10:>list task-apps
composed-task-runner
jdbchdfs-local
spark-client
spark-cluster
spark-yarn
timestamp
timestamp-batch
kafka-10:>

Building and running the app repo

Once the local repo contains everything we will need, we can quit or exit the shell. The repo is conveniently located under config/scdf-app-repo which is a Spring boot project for serving the internal app repositor. To build scdf-app-repo, you need to have JDK 8+ installed. To build and run it locally:

$cd config/scdf-app-repo
$./mvnw package
$java -jar target/scdf-app-repo-0.0.1-SNAPSHOT.jar

While it is running, you can test it by retrieving a download jar from another terminal window. For example:

$wget http://localhost:8080/jdbc-source-kafka-10-1.3.1.RELEASE.jar

You would use this URL to register this app in a local Spring Cloud Data Flow server.

You can list the contents of the repo at http://localhost:8080/repo:

$curl http://localhost:8080/repo

And bulk import the apps in Spring Cloud Data Flow using the URI http://localhost:8080/import

You can also deploy scdf-app-repo to Pivotal Cloud Foundry, typically the same space in which your Data Flow server is running.

Note
You will need to rebuild and deploy in order to change the repo contents.

Custom apps

Use repo add to add a locally built jar to the repo from a file URL (or any URL). For example:

binder-undefined:>repo add --name my-app --type processor --url file:///Users/me/workspace/my-app/target/my-app-0.0
.1-SNAPSHOT.jar

binder-undefined:>repo add --type source --name foo --url https://github.com/user/repo/blob/master/binaries/foo-v1.1
.jar?raw=true
Note
The repo add command will by default look for an associated metadata jar using the normal naming convention (e,g, file:///Users/me/workspace/my-app/target/my-app-0.0.1-SNAPSHOT-metadata.jar). You may also use the --metadata option to provide the location of the metadata resource).

Alternately, You may edit the appropriate properties files in the config dir to add entries for custom built stream and task apps that exist in a different Maven repository, an external site, e.g., github, or the local file system. The entry should use an http(s) or file URL for the jar. e.g.,

sink.my-app=file://Users/me/workspace/my-app/target/my-app-1.0.0.jar
source.foo=https://github.com/user/repo/blob/master/binaries/foo-v1.1.jar?raw=true
Note
Directly copying jars to the target directory is strongly discouraged since the tool maintains metadata which will cause some Repo and Data Flow functions to fail.

Building SCDF app tool from source:

$./mvnw clean package

This will create the jar file along with binary distributions of the app as zip, tar.gz, and tar.bz2 in the dist directory. You can run the app as any Spring Boot application,

 $java -jar target/scdf-app-tool-0.0.1-SNAPSHOT.jar

or

$./scdf-app-tool

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 97.0%
  • HTML 2.1%
  • Other 0.9%