StructureOfDiamond

Terminology

searchlet: A collection of executable code and parameters to be executed on one or more Diamond servers. A searchlet processes each of a series of objects and makes pass/drop decisions on each of them.
search: One execution run of a searchlet.
object: A unit of data to be searched, such as a single image or text file. Each object to be examined is processed by one or more filters. The result of running a filter on an object is a score, a floating-point value which is thresholded against a minimum and maximum specified in the searchlet to determine whether the object should be passed or dropped. Filters are free to consider internal structure within an object (such as individual pixels within an image), but must make a pass or drop decision on the object as a whole.
filter: A single program designed to be included in a searchlet. Responsible for a single task, such as face detection or texture recognition. A filter is started when the search starts and killed when the search completes. Filters accept zero or more string arguments and exactly one blob (binary) argument. Filters can have dependencies on other filters; for example, a face detection filter can depend on another filter that decodes a JPEG image to an RGB pixel array. The results of running a filter against an object are cached by OpenDiamond to improve performance on subsequent searches.
attribute: A named binary value associated with an object during searchlet execution. As an object is retrieved and filters are executed against it, attributes are associated with the object which can store result values, image thumbnail data, and so on. Attributes can be used to pass intermediate results between cooperating filters. All attribute names, as well as attribute values requested by the client, are reported to the client when an object is passed. However, attributes marked omitted are never reported to the client. Attribute naming conventions are documented here.
search predicate: The building block from which the user constructs a searchlet. Whereas the filter is the basic unit of searchlet execution, the predicate is the basic unit of searchlet definition. A predicate defines a set of configuration options that can be adjusted by the user, such as the minimum number of faces that must be found by a face detection filter or a string to find in a text file. It also specifies one or more filters that must execute when the predicate is included in a searchlet, and defines how to use the option values to set the filters' argument lists, dependencies, and minimum/maximum scores. Predicates are encapsulated in bundle files with extension .pred.
codec: A special type of search predicate which preprocesses each object. A searchlet typically includes a single codec, which typically does not have any options or drop any objects. The codec's responsibility is to perform basic transcoding tasks such as JPEG decompression and thumbnail generation, and to store the results in attributes for use by other filters. Codecs are encapsulated in bundle files with extension .codec.
example: Some predicates, such as texture or color detectors, work by finding images similar to one or more example images. These examples are chosen by the user and passed to the underlying filter via its blob argument.
scope: The set of objects to be examined during a particular search, typically computed via user interaction with a scope server. The scope is encoded in a scope cookie which is downloaded to the Diamond client, and thence uploaded to one or more Diamond servers.
reexecution: A search normally executes against every object in the scope. Reexecution, in contrast, occurs when a Diamond client requests that a searchlet be executed against one particular object, typically to obtain additional information about it.

Scopeserver

The scopeserver provides a mechanism for the user to select a set of objects to search. This is done via a normal website, over http or https. The selection of objects can be done in arbitrary ways, including:

selection from preset lists of objects
SQL query
Mirage query
live search

Once the user somehow specifies objects to search, a "scope cookie" is generated by the scopeserver and downloaded via the browser to the user's computer. A custom mime type handler tells the browser to run a simple script to take the cookie and deposit it into a place where the client library will look for it (currently $HOME/.diamond/NEWSCOPE). The cookie contains a secret and a list of servers for OpenDiamond to contact and send the secret to.

diamondd

diamondd is the name of the OpenDiamond server program. It executes searchlet code against Diamond objects on behalf of the user.

diamondd listens on a single port for 2 incoming connections. It pairs these 2 connections by way of a random nonce. Once a pair of connections is created, diamondd forks and begins speaking the OpenDiamond protocol, which is a custom XDR-based RPC protocol. One connection is the control connection. The other is the blast connection.

The client uses the control connection to configure and start the search, obtain search statistics, execute a searchlet on specific objects, and read and update session variables for anomaly detection. All responses come across the control connection and have no effect on the blast connection, with one exception. The start RPC will start the flow of objects on the blast connection. Note that there is no stop RPC. Clients are expected to disconnect from the server (HTTP-style) to signal their intent to stop receiving objects.

The blast connection uses a similar RPC protocol, but it only has one procedure: get_object. The client makes a get_object call on the blast connection when an object is requested. This is a blocking call that will wait until an object is ready. The get_object calls are typically pipelined in order to avoid round trip latency between object requests. In the current client library, 10 calls are pipelined up at a time.

Once diamondd receives a scope cookie during the search setup process, it will decode the cookie and get the URL for the object list. Once it connects to the URL (typically a dataretriever, see below), a list of URLs will stream back. These represent individual objects that are within the scope of the search. diamondd will then proceed to fetch each raw object and process them through the filters in the searchlet.

Objects are processed by the searchlet via a set of filters. A Diamond object is a key-value dictionary of attributes. Passed results are streamed back. All names in the dictionary are sent, but the only values sent are the "push attributes" specified during search setup, as well as an object ID. This allows use of the reexecute call to retrieve arbitrary attribute values for a specific object.

Filters

Diamond filters are executable programs, and can be compiled x86 code or scripts. Filters communicate with diamondd over stdio using a simple key-value protocol that allows for the efficient exchange of blobs of binary data. Upon invocation, a filter waits for:

the protocol version (currently 1)
the filter's given name
the argument list to the filter
the blob (binary argument to the filter)

If all goes well, the filter says init-success. Then the filter starts up and starts issuing commands to diamondd. These commands are:

get-attribute: get the value of the named attribute
set-attribute: set the value of the named attribute
omit-attribute: "omit" (do not send over the network) the name or value of the named attribute
get-session-variables: get the current values for the given session variables
update-session-variables: atomically add the given values to the named session variables
log: log a message
stdout: relay text sent from the filter's stdout
result: set the current objects result and signal the end of processing of this object

Note that the result command is special: it is like a return statement in C and signifies the end of working with a particular object. diamondd will record the result and move on to the next object.

Diamond includes a C library, libdiamondfilter, that can handle the filter protocol on behalf of filters written in C or C++. The libdiamondfilter API is defined in <opendiamond/lib_filter.h>. There is also a Python opendiamond.filter package that provides similar services to filters written in Python.

Note that with the filter protocol, it is easy to run Diamond filters without using diamondd at all. This is useful for trying out filters inside of other existing systems, or replacing diamondd with something very different.

dataretriever

The dataretriever is a simple HTTP server that emits an object list and objects in the format that diamondd expects. In a simple Diamond setup, the scopeserver would be configured to produce a URL that points to a dataretriever local to the system that diamondd is running on. The dataretriever will read objects locally and feed them to diamondd.

cookiecutter

The cookiecutter is a utility program that can be used to generate a scope cookie for diamondd. Given certain parameters, it will generate a cookie that contains a list of servers and a URL to fetch the list of objects to search.

Client library (Java)

One client library currently exists, and it is written in Java. It fully implements the client side of all the RPC protocols used by diamondd. The library carefully avoids mutable shared state and allows for multiple searches simultaneously. It is heavily threaded, leveraging java.util.concurrent to provide a fairly simple design that also allows for good performance.

Applications

On top of the client library, we have several applications:

HyperFind
StrangeFind
MassFind
FatFind
PathFind

All of these are written in Java.

JSON Blaster

The JSON Blaster provides a web service allowing applications to perform Diamond searches using only HTTP, JSON, and SockJS. This allows Diamond web applications to be written in any language. More information about the JSON Blaster is available here.

Search predicates

In FatFind, MassFind, and StrangeFind, the process of selecting and configuring filters for a searchlet is hard-coded. HyperFind and PathFind, however, construct searchlets from one or more predicates configured by the user. Predicates are loaded from predicate bundle files stored in a particular directory. A predicate bundle is a specially-formatted Zip file containing an XML manifest and optionally some filter code or blob argument data. The manifest describes configuration options to be shown within the application, and how to use those options to configure one or more filters to be included in the searchlet. Documentation on the structure of the manifest file can be found here. The diamond-bundle-predicate tool can be used to create a predicate bundle from a manifest and any supporting data files.

ImageJ and MATLAB predicates

ImageJFind and MATLABFind provide filters allowing specially-constructed ImageJ or MATLAB macros to be included in a searchlet. They also provide diamond-bundle-imagej and diamond-bundle-matlab command-line tools which can be used to create a predicate bundle from a macro.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly