Skip to content
jwestberg edited this page Apr 25, 2013 · 2 revisions

One thing that often comes up when creating new stages in Hydra is the need to debug them. Most things can be examined simply by running your stage locally and calling the process method by hand, but some times one might need to look at some real life data in the pipeline. Here below, methods for both these alternatives is described.

Running your stage against a test document

Q: I need to verify my stage does the right thing with a specific document!

Ok, so let's use the example of the RemoveFieldsStage, which has the following signature:

@Stage(description = "Removes fields specified by regular expressions.")
public class RemoveFieldsStage extends AbstractProcessStage {
	@Parameter(name = "removeFields", description = "List of regular expressions defining what fields to remove from the document")
	private List<String> removeFields;

	// ... //
}

Clearly, we will want to set that removeFields member to do what we expect it to do in the pipeline. This is done through the setParameters()-method. So, a unit test for this stage for a particular document might look like this:

@Test
public void testRemoveFieldsStage() throws Exception {
	RemoveFieldsStage stage = new RemoveFieldsStage();
	
	/* Create the parameter map for this instance */
	HashMap<String, Object> parameters = new HashMap<String, Object>();
	parameters.put("removeFields", Arrays.asList(new String[] {"batman"}));
	
	/* Set the parameters. Populates the removeFields member of stage with [ "batman" ] */
	stage.setParameters(parameters);
	
	/* Create a test document */
	LocalDocument doc = new LocalDocument();
	doc.putContentField("ironman", "tony stark");
	doc.putContentField("batman", "bruce wayne");
	
	/* Run the stage on that document */
	stage.process(doc);
	
	/* Verify behavior */
	Assert.assertFalse(doc.hasContentField("batman"));
	Assert.assertTrue(doc.hasContentField("ironman"));
	Assert.assertEquals(1, doc.getContentFields().size());
}

Q: Ok, that worked, but it fails when I run it in Hydra!

Well, then perhaps we should test it against the production configuration of your stage instead? If you have an instance of hydra core running somewhere, you can easily grab the live configuration and inject it into your stage:

	/* Set the parameters from what is configured for stage 'removeBatman' in Hydra */
	String hostName = "localhost";
	int port = 12001;
	String stageName = "removeBatman";
	Map<String, Object> properties = new RemotePipeline(hostName, port, stageName).getProperties();
	stage.setParameters(properties);

Adjust stagename, hostname, and port according to your needs. If you are really running on localhost with the default port, simply using new RemotePipeline(stagename).getProperties() would work too.

This would, of course, allow you to inspect what you're actually getting back from core in terms of the properties object, as well as using inspection during debugging to look at what the fields in the stage are actually set to.

Running your stage against live data

Q: I need to see how my stage behaves in the wild, what do I do?

So, you want to see how your stage behaves in the wild. Here, you have two options, either launch your stage locally against your pipeline, or attach a remote debugger.

Run the stage locally

You can start your stage locally using the same method that Hydra Core does in your IDE.

In Eclipse

  1. Open up Run Configurations and create a new Java Application run configuration.
  2. Make sure the "Project" field points to the project that contains your stage.
  3. Set the "Main Class" field to com.findwise.hydra.stage.GroupStarter which is inherited from hydra-api.
  4. Go to the "Arguments" tab and enter ${string_prompt} in the "Program arguments" text box.
  5. Hit run! You will be presented with a text box (the string prompt you asked for above)
  6. Enter stageGroup localhost 12001 into the text box that pops up. Replace stageGroup with whatever the name is of the stage group you wish to debug. localhost and 12001 of course point out the location of the Hydra Core you are connecting to, and should also be changed accordingly.

If all goes well, you will now have spun up the stages found in the stage group you specified, and you'll have them running inside your IDE. From here, you can just debug your code normally, as the stages will start to receive documents from the core it is connected to.

NOTE however that you will now have two instances of the same group running. One is managed by the Hydra core you are connecting to, and the other is run inside your IDE. In order to turn off the one managed by Hydra and only keep the local one, set the stage group to the "DEBUG" state in the pipeline configuration.

Attach a remote debugger

You will need to provide som JVM options when the stage is launched. This can be done via the jvm_parameters stage property, see this wiki page for more information on that particular parameter.

Set the jvm_options parameter to -Xdebug -Xrunjdwp:transport=dt_socket,address=3333,server=y,suspend=y in the stage group configuration. Hydra Core will then restart the stage group, and the JVM it spins up will listen to remote debug connections on port 3333.

With that done, you can now attach a remote debugger from your IDE.