Skip to content

Command line interface

Marten edited this page Dec 11, 2020 · 3 revisions

Harvester Command Line Interface

Command Line Interface application is a Java application designed to run harvesting tasks outside harvester web application. It might be usefull for any use case when the harvesting job shall be integrated with the operating system, for example to schedule job through Unix cron tab mechanism.

General syntax:

java -jar harvest.jar [options] [-file <file>] | [-task <task definition>]

where:

  • options are:
   -h,--help: to display help  for the harvest command   
   -v,--version: to display harvester version number  
   -V,--verbose: to receive more detailed information during harvesting  
  • commands are:
   -f,--file <arg>: to perform harvesting by the task defined in a file, you can either use a file exported from the harvester application or create the file manually.  
   -t,--task <task definition>:  to perform harvesting by the task defined at the command line, this will be involve entering all parameters of the task.   

Task definition syntax:

The task definition is a JSON syntax like the following:

{
	"source" : {
		"type" : <Type of the adaptor: WAF | CSW | UNC | GPTSRC | AGS | AGP-IN | CKAN>,
		"label" : <Short name of the source>,
		"properties" : {
			<properties custom for each adaptor type>
		}
	},
	"destinations" : [{
			"action" : {
				"type" : <Type of the adaptor : FOLDER | GPT | AGP-OUT>,
				"label" : <Short name of the destination>,
				"properties" : {
					<properties custom for each adaptor type>
				}
			}
		}
	]
	"incremental" : <Perform incremental harvesting if possible: true|false; default is false>,
	"ignoreRobots" : <Ignore directived from robots.txt if present: true|false; default is false>
}

Following is a task definition example:

{
	"source" : {
		"type" : "CSW",
		"label" : "GEOSUR",
		"properties" : {
			"csw-host-url" : "http://www.geosur.info/geoportal/csw",
			"csw-profile-id" : "urn:ogc:CSW:2.0.2:HTTP:OGCCORE:ESRI:GPT"
		}
	},
	"destinations" : [{
			"action" : {
				"type" : "GPT",
				"label" : "GPT",
				"properties" : {
					"gpt-host-url" : "http://localhost:8080/geoportal",
					"cred-username" : "user name",
					"cred-password" : "user password",
					"gpt-cleanup" : "true"
				}
			}
		}
	],
	"incremental" : false,
	"ignoreReobots" : true
}

Note:

  • With the -t option, user should supply command line with the task definition rather than path to that task definition file, so:
java –jar harvest.jar –t “<task definition>”
  • Please note that there is a double quote with the actual task definition in it, there will be a difference between Linux/Unix and Windows on how the command will look like, let’s say for simplicity, the task looks like this (JSON)
{“source”: {}, “destinations”: []}
On Linux (Unix) platform the content of the task can be provided in a single quote, the command will be look like:
java –jar harvest.jar –t ‘{“source”: {}, “destinations”: []}’
On Windows platform, the double quotes will have to be escaped, the command will be look like:
java –jar harvest.jar –t “{““source””: {}, ““destinations””: []}”

Type of the adaptors and their parameters:

Input adaptors:

WAF - web accessible folder,  
	"waf-host-url" - harvesting root URL  
	"waf-pattern" - pattern to filter harvested files (GLOB pattern syntax); default: **.xml  
	"cred-username" - user name (optional)  
	"cred-password" - user password (optional)  
CSW - Catalog Service for the Web  
	"csw-host-url" - CSW service URL  
	"csw-profile-id" - CSW profile id  
	"cred-username" - user name (optional)  
	"cred-password" - user password (optional)  
UNC - Uniform Naming Convention folder  
	"unc-root-folder" - root folder (must be accessible by the harvester)  
	"unc-pattern" - pattern to filter harvested files (GLOB pattern syntax); default: **.xml  
GPTSRC - Geoportal Server 2.0  
	"gpt-host-url" - Geoportal Server URL  
	"gpt-index" - name of the index to harvest (optional)  
	"cred-username" - user name  
	"cred-password" - user password  
AGS - ArcGIS server services  
	"ags-host-url" - ArcGIS Server host URL (before rest/services)  
	"ags-enable-layers" - true to harvest layers from map service (default: false)  
AGP-IN - ArcGIS Portal (or ArcGIS Online)  
	"agp-host-url" - ArcGIS Portal host URL (before sharing)  
	"agp-folder-id" -  folder id or folder name (optional)  
	"cred-username" - user name  
	"cred-password" - user password  
CKAN - CKAN  
	"ckan-host-url" - CKAN host URL  
	"ckan-apikey" - CKAN API key (optional)  

Output adaptors:

FOLDER - local folder  
	"folder-root-folder" - root folder  
	"folder-cleanup" - cleanup data  
GPT - Geoportal server 2.0  
	"gpt-host-url" - Geoportal Server URL  
	"gpt-index" - name of the index to harvest (optional)  
	"gpt-force-add" - force adding records instead checking if they exist 
	"gpt-cleanup" - cleanup data  
	"cred-username" - user name  
	"cred-password" - user password  
AGP-OUT - ArcGIS Portal (or ArcGIS Online)  
	"agp-host-url" - ArcGIS portal host URL (before sharing)  
	"agp-folder-id" -  folder id or folder name (optional)  
	"agp-folder-cleanup" - cleanup data  
	"cred-username" - user name  
	"cred-password" - user password