Skip to content

Crawler API

Amit Jain edited this page Apr 3, 2017 · 4 revisions

The raster type crawler is responsible for crawling a data store and return crawled data source items. A crawler function is to filter items that satisfy a criteria. For example, a simple file crawler can filter items based on the extension of the files in a folder.

A Raster Type Definition can, optionally, provide a custom crawler implementation that can filter items using its own rules. If there is no crawler class defined, ArcGIS identifies and uses a default builtin crawler, based on the input data store.

It is recommended that you define a crawler class if your input data is laid out in a specific folder structure or if you wish to perform grouping on URIs (for example Multispectral and Panchromatic URIs of same group are combined to give Pan-sharpened item). A custom crawler implementation can also provide support for other types of input formats (example, a CSV file that contains paths to the items).

The name of the crawler class must be the same as what is defined in the dictionary returned by the Raster Type Definition.

The crawler class must contain the following methods:

__init__(self, **kwargs)

This method is used to initialize the crawler class. kwargs is a dictionary that contains the following items:

  • paths (tuple) - All the paths that require crawling; this can be any string that represents a resource (examples are folders, filepaths, URLs etc).
  • recurse (bool) - Flag indicating whether the crawler should recurse into folders while crawling inorder to search for valid items.
  • filter (string) - The filter to be used to search for datasets. This is a regular expression that can be used to filter file/folder names.
  • rasterTypeInfo (dictionary) - It contains basic raster type specific information like raster type name.


This method returns a dictionary which represents the next crawled itemURI. The dictionary contains the following:

  • path (string) - The resource that is crawled. In the case of file based datasets, this will be the path to the file that was crawled.
  • tag (string) - The tag identifies the type of the template that should be applied on the item that is built by the Builder::build function.
  • displayName (string) - The name of the file that is displayed in various user interface controls.
  • groupName (string) - The name of the collection group to which the item belongs.
  • productName (string) - The product type of the item.
  • uriProperties (dictionary) - This dictionary can contain any additional properties associated with the item URI. The values contained in this dictionary can either be numeric or string (only basic data types are supported).

When there are no more items to crawl in the data store, this method must return None to mark end of crawler iteration.