Skip to content
This repository has been archived by the owner on Dec 13, 2023. It is now read-only.

Feeding Java Regex into a task's input #145

Closed
blueelephants opened this issue Mar 31, 2017 · 1 comment
Closed

Feeding Java Regex into a task's input #145

blueelephants opened this issue Mar 31, 2017 · 1 comment

Comments

@blueelephants
Copy link

blueelephants commented Mar 31, 2017

Hello,

I have one problem where I have a worker, which I feed some Java regex patterns via a task's input.

My workflow definition is this:

...
{
	"name": "test-service",
	"taskReferenceName": "regex-copy-test-03",
	"inputParameters": {
		"inputHeader": {
			"operation": "/api/v1/regexCopy"
		},
		"inputContent": {
			"srcHdfsNameNodes": "${workflow.input.hdfsNamenodes}",
			"srcHdfsUsername": "${workflow.input.hdfsUsername}",
			"srcHdfsBaseDirectory": "${workflow.input.testRootHdfsPath}/regex-copy/regex-copy-test-03/src",
			"srcHdfsPathRegexFilter": "(?<relativeFilepath>.*$)",
			"recursive": true,
			"dstHdfsPathRegexRename": "${workflow.input.testRootHdfsPath}/regex-copy/regex-copy-test-03/dst/${relativeFilepath}"
		}
	},
	"type": "SIMPLE"
},
...

As you can see, I specified some workflow-inputs:

  • srcHdfsNameNodes
  • srcHdfsUsername
  • srcHdfsBaseDirectory
  • dstHdfsPathRegexRename

And also some Java Regex inputs:

  • srcHdfsPathRegexFilter
  • dstHdfsPathRegexRename

Some background: my business logic is to match/rewrite some Java Regex Strings (making use of Regex groupings and so on)

Pattern p = Pattern.compile(srcHdfsPathRegexFilter);
Matcher matcher = p.matcher(input);

if (matcher.find()) {

	String rewrite = matcher.replaceAll(dstHdfsPathRegexRename);

	return rewrite;
}

I start my workflow with the following input JSON

{
	"hdfsNamenodes": "hdfs://myHost:8020",
	"hdfsUsername": "myUser",
	"testRootHdfsPath": "/tmp/service-tests/resources"
}

Upon startup, Conductor's "workflow input variable substitution" generates the following "running workflow" for me:

{
   "inputHeader": {
      "operation": "/api/v1/regexCopy"
   },
   "inputContent": {
      "srcHdfsNameNodes": "hdfs://myHost:8020",
      "srcHdfsUsername": "myUser",
      "srcHdfsBaseDirectory": "/tmp/service-tests/resources/regex-copy/regex-copy-test-03/src",
      "srcHdfsPathRegexFilter": "(?<relativeFilepath>.*$)",
      "recursive": true,
      "dstHdfsPathRegexRename": "/tmp/service-tests/resources/regex-copy/regex-copy-test-03/dst/"
   }
}

As we can see in inputContent "dstHdfsPathRegexRename", my regex pattern for "${relativeFilepath}" was removed by Conductor.
It seems to be, that Conductor treated it as "normal variable" and as this variable is not part of the workflow input it gets removed.
But that's the problem, as I need this regex pattern to be fed into the worker, so Conductor should not remove those.

Is there any way to workaround this?

@blueelephants
Copy link
Author

I found a way to workaround the issue:

Instead of re-using the specified group name within the regex (here "relativeFilepath") , I can use $1, $2, $3 to access the regex groups:

"srcHdfsPathRegexFilter": "(?<relativeFilepath>.*$)",
"dstHdfsPathRegexRename": "${workflow.input.testRootHdfsPath}/regex-copy/regex-copy-test-03/dst/$1"

So this works, but if there is any other workaround possible to still use the specified regex group name, that would be great.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant