Skip to content
hildrum edited this page Oct 23, 2015 · 1 revision

What is the Bluemix toolkit?

The com.ibm.streamsx.hdfs.bluemix toolkit contains operators for interacting with WebHdfs on Bluemix. Access to BigInsights on Bluemix is via the Knox gateway, which requires a username and password. While WebHdfs is supported in the com.ibm.streams.hdfs::HDFS2* operators, those operators unfortunately do not support access via the Knox gateway.

As a temporary measure, we are providing bluemix ready-operators in a separate toolkit.

These operators are composite operators (written in SPL) using the streamsx.inet toolkit http functions to make the necessary calls to read and write files.

What operators are included

  • WebHdfsRead -- read a single file from Hdfs
  • WebHdfsReadFiles -- read files where the name of the file to read comes from the input stream.
  • WebHdfsWrite -- write a file to Hdfs
  • WebHdfsDirectoryScan -- periodically scan a directory and output the names of new or modified files

Design notes

  • WebHdfsDirectoryScan gets a JSON string via httpGet call. Rather than using an JSON parser, it uses findFirst and findFirstOf to extract the necessary information from the string. This works in this simple case, and it's more efficient than processing via the JSONToTuple operator, and prevents this toolkit from having a dependence on the JSON toolkit.