Skip to content
This repository has been archived by the owner on Jan 7, 2020. It is now read-only.

Distributed Resources and Dependency Injection

Patrick Jaromin edited this page Apr 13, 2015 · 9 revisions

Mara greatly simplifies the distribution of resources, hiding the complexities of the distributed cache behind the annotations: @Distribute and @Resource. This pair of annotations is used to distribute values from the Tool class to mapreduce components. Both static and dynamically-built objects and values may be passed into your component classes.

@Distribute

Driver or context properties you wish to make available in your Hadoop framework components may be annotated with the FIELD or METHOD level @Distribute annotation. If you annotate a method, the method will be invoked once during job initialization and its output distributed through the framework. Presently the annotation works with primitives, Strings, org.apache.hadoop.fs.Path, java.io.File, or any class that implements java.io.Serializable. Future enhancements will likely include support for serialization via Kryo.

@Resource

To access the object in your Hadoop component – currently mappers and reducers are supported – annotate the class with either the Spring @Service stereotype annotation or the appropriate framework-specific @MapperService, @CombinerService, or @ReducerService annotations. Then annotate the member you wish to inject with the @Resource annotation. Basic type conversion will work as expected – for example Strings from the context will be converted to primitive values if they may be parsed into the target type. Legal casts will function as expected. Use the optional ‘name’ parameter if you wish the distributed name to differ from the target resource bean property.

Using @Distribute and @Resource

In your driver:

// Simple FIELD-level annotation of a property
@Distribute
private String myProperty

// Can apply at FIELD or METHOD level
@Distribute
public Set<String> getBlacklist() throws IOException {
	Set<String> blacklist = null;
	if (StringUtils.isNotBlank(context.blacklist)) {
		blacklist = new HashSet<String>();
		…<populate set>…
	}
	return blacklist;
}

// May also be applied to context-level elements
private static class MyContext extends DriverContextBase {
   @Distribute
   @Option(required=true, argName="string-value", description="A distributed cli option")
   private String myCliOption;
}

In your mapreduce component:

@Resource(name="blacklist")
private Set<String> blacklistedWords;

Unit Testing with Resource Distribution

Mara supports these annotations within unit tests as well - something difficult to replicate when using the distributed cache mechanism. See Unit Testing with Dependency Injection for details and examples.