Details of Lucene Indexing in the Geoportal
Pages 202
- Home
- 2009 Esri Federal UC
- 2009 Esri International UC
- 2010 Esri Federal UC
- 2010 Esri International UC
- 2011 Esri Federal UC
- 2012 Esri Federal UC
- 2013 Esri Federal GIS Conference
- 2013 Esri International User Conference
- 2015 SDI Special Interest Group
- Add a Custom Profile
- Add an OpenSearch endpoint for Federated Search
- Add Another Tab to the Geoportal Interface
- Add Custom Link to a Search Result
- Add Custom Search Criteria
- Add the Geoportal Search to a List of Search Providers
- Add v1.1.1 FGDC editor to a previous Geoportal release
- AGP TO AGP Harvesting with the Geoportal
- AGS TO AGP Harvesting with the Geoportal
- All gpt.xml file settings
- An Introduction to vi
- Apache Tomcat geoportal logging
- Being a Good Robot
- Best Practice for Edits to JSP files
- Biological or Remote Sensing FGDC xsds
- Browse Tree
- Cart Processor
- Catalog Service
- Clear the Tomcat Work Folder
- Collections
- Common problems and solutions
- Communities and live examples
- Components
- Configure a Directory Server for the Geoportal
- Configure geoportal User and Schema in the PostgreSQL Database
- Configure Previewable Filetypes
- Configure Searching of YouTube
- Configure the gpt.xml File
- Configure Widgets
- Connecting to a User Directory
- Create a user account
- Create Relationships between Resources
- Customizations
- Customize DCAT output
- Customize Metadata Validation
- Database problems
- Database Tables
- DataDownload Tab
- Deploy and Configure the Geoportal Web Application in Tomcat
- Deploy and Configure the Servlet Web Application
- Deploy the Geoportal Web Application
- Details of Lucene Indexing in the Geoportal
- Development topics
- Discovering Resources
- Eclipse Project from Compiled WAR
- Eclipse Project from Source Code
- Enable Search Using an Ontology Service
- Error Messages in the Geoportal Web Application
- Esri Geoportal Server LiveDVD
- Extending the Web Harvester
- Federated Search in Portal for ArcGIS
- Feedback
- FGDC Biological Profile and Remote Sensing Extension
- FGDC Service Checker Integration
- Geoportal Clients for ArcGIS
- Geoportal CSW Clients
- Geoportal Facets using Apache Solr
- Geoportal genie
- Geoportal Project from Compiled WAR
- Geoportal Publish Client
- Geoportal Server 1.2.5 What's New
- Geoportal Server 1.2.6 What's New
- Geoportal Server 1.2.7 What's New
- Geoportal server as a broker
- Geoportal Server Downloads
- Geoportal Server v 1.0 What's New
- Geoportal Server v 1.1 What's New
- Geoportal Server v 1.1.1 What's New
- Geoportal Server v 1.2 What's New
- Geoportal Server v 1.2.2 What's New
- Geoportal Server v 1.2.4 What's New
- Geoportal SPARQL Sample
- Geoportal User Interface Components
- Geoportal Web Application File Organization
- Geoportal XML Editor
- Get Assistance with an Implementation
- GXE Concepts
- GXE Crash Course
- GXE Structure
- GXE Workflow
- High Availability and Large Number of Records
- How to Browse for Resources
- How to Create and Manage My Profile
- How to find all documents of a particular metadata standard
- How to Leave a Resource Review
- How to Login and Manage my Password
- How to Manage and Edit Resources
- How to Publish Resources
- How to Restrict Access to Resources
- How to Search for Resources
- How to Search with an Ontology Service
- How to Set Up an Esri Geoportal Server on Linux
- How to Use Search Page Results
- How to Use the Data Download Feature
- How to View Resource Relationships
- IDE Topics
- Identity Components LDAP and Single Sign On
- Index All Metadata Content
- Indexing and Searching the Time Period of the Content
- Install Apache Tomcat 6
- Install Desktop Tools
- Install Esri Geoportal Server
- Install PostgreSQL 9.1.2
- Install the JDBC .jar Files
- Installation
- Installation Version 1.0
- Installation Version 1.1
- Installation Version 1.2
- Installation Version 1.2.2
- Installation Version 1.2.4
- Installation Version 1.2.5
- Installation Version 1.2.6
- Installation Version 1.2.7
- Installation Version 1.2.8
- Integrate with a Content Management System
- Integrate with the con terra Security Solution
- Localization
- Log In to the Geoportal
- Logging
- Look and Feel of the User Interface
- Main Page
- Map LDAP Attributes on the Registration Page
- Map Viewer
- Online form editing for all publication methods
- Open source acknowledgements
- Oracle WebLogic geoportal logging
- Orientation to the Create Metadata Page
- Perform Preinstallation Computer Setup
- Portal for ArcGIS Integration
- Post Deployment Actions
- Preinstallation
- Preinstallation 0.9
- Preinstallation 1.0 and 1.1.x
- Preinstallation 1.2
- Preinstallation 1.2.2
- Preinstallation 1.2.4
- Preinstallation 1.2.5
- Preinstallation 1.2.6
- Preinstallation 1.2.7
- Preinstallation 1.2.8
- Preview Function
- Publication Components
- Ratings and Comments for Search Results
- Register ArcGIS for Server with the Geoportal
- Release notes
- REST API Syntax
- Sample FGDC metadata.xml
- Scheduled tasks
- Search Components
- Search Map
- Search Widget for Flex
- Search Widget for HTML
- Search Widget for Silverlight
- Security Concepts
- Set Up Systemwide Environment Variables
- Set up the Geoportal Database
- Share Link
- Single Sign On
- Smoketest the Geoportal
- Standards Support
- Supported CSW Profiles for Synchronization
- Theme Library
- Troubleshooting
- Troubleshooting Tips
- Two geoportals on the same server
- Upgrade 1.x to 1.2 database
- Upgrading file system approach
- Upgrading Read This Overview
- Upgrading SVN approach
- Url filter customization
- Use an XSLT to Render the Details Page
- Use Ant to build Geoportal
- User Functions and Roles
- User Management Interface
- Using a geoportal
- Using Lucene Search Text Queries
- Version 0.9
- Version 1.0
- Version 1.1
- Version 1.1.1
- Version 1.2
- Version 1.2.2
- Version 1.2.4
- Version 1.2.5
- Version 1.2.6
- Version 1.2.7
- Version 1.2.8
- What is a geoportal and the geoportal server
- What is the esri geoportal server
- What's New
- wiki template
- WMC Client
- Show 187 more pages…
Clone this wiki locally
The Geoportal Server uses Lucene to index metadata for search. How metadata is indexed is important because it determines what search results are returned when a user submits search criteria to the geoportal. When publishing a metadata document, certain content from the document will be submitted for indexing. When a user conducts a search, it is the index not the geoportal database that is searched. Indexed information can is assigned a particular meaning. 'Meaning', refers to a concept you would like to specifically search or query. This 'meaning' determines how Lucene will index the content and how it may be used in searching.
Before a 'meaning' value can be used, it has to be defined in a file called property-meanings.xml, located in \\geoportal\WEB-INF\classes\gpt\metadata. The geoportal references property-meanings.xml to index the metadata value for search and retrieval.
Each geoportal metadata profile definition.xml file can specify the set of properties that will be indexed. These properties are usually captured in that profile's indexables.xml file. The indexables.xml makes a connection between an element's XML xpath and its associated meaning in the proptery-meanings.xml file. This in turn defines how that element will be indexed and searched.
Note: The geoportal can be customized so that it automatically indexes all metadata content, regardless of which parameter it is associated with in the metadata. To enable this customization, see Index All Metadata Content.
Determine if a metadata element is already indexed by default
To check if a specific XML element is already indexed, identify the definition.xml file for the profile that references the metadata element. For example if we want to investigate if the Lineage element from the INSPIRE (Data) profile, we start by opening the inspire-iso-19115-definition.xml file. Here, we will need to identify which indexables.xml file is referenced by this profile. To find the indexables.xml file, look in the <indexables fileName=""></indexables> attribute in definition.xml. In our example, this points to the apiso-indexables.xml file from the \\geoportal\WEB-INF\classes\gpt\metadata\iso folder. Once you have identified which indexables.xml file is referenced, open that indexables.xml.
Now, search in the indexables.xml file for the xpath of the metadata element in which you are interested. If the xpath is not referenced, then it is not indexed. Alternatively, you may see that it is present in the file and therefore indexed, but may want to change how it is indexed.
In our Lineage example, the xpath is /gmd:MD_Metadata/gmd:dataQualityInfo/gmd:DQ_DataQuality/gmd:lineage/gmd:LI_Lineage/gmd:statement/gco:CharacterString. We do find this xpath in the apiso-indexables.xml file, and see that it is indexed by the property meaning name apiso:Lineage. When we look up the apiso name="apiso:Lineage" in property-meaningx.xml, we see that the queriable for this is the text apiso.Lineage. So we could type apiso.Lineage:searchTerm in the Search field on the geoportal search page to search the Lineage elements for searchTerm.
If a metadata element is not already indexed, add it to the indexables.xml file
If the xpath to the metadata element is not provided in indexables.xml, you can add its xpath to one of the property meanings listed in that file. After adding the xpath to the property meaning that matches your metadata element's meaning, save indexables.xml and restart the geoportal web application. You will need to re-approve the resources through the geoportal Administration interface for them to be reindexed with your new property meaning.
Note: It is possible to implement conditional indexing as well. For example, if you wanted to index a URL only if contained a certain phrase, you could leverage a [contains] component in the xpath. In the snippet below, we are indexing the resource.url only if it contains the word "thredds" in it. URL's that do not contain the word "thredds" would not be indexed:
<property meaning="resource.url"
xpath="/metadata/distinfo/stdorder/digform/digtopt/onlinopt/computer/networka/networkr[contains(.,'thredds')]"/>
Instructions for #How to define a new property meaning are provided later in this topic, but first read the section on the property-meanings.xml file below.
The property-meanings.xml file
Before adding new meanings, check the property-meanings.xml file to see if an existing meaning will suit your need. Some of the meanings already defined in the file are listed in the table below, along with any functionality the geoportal code associates with that meaning. Additional meanings defined for ISO-based standards are also found in property-meanings.xml, but are not listed in the table. By using existing meanings, the effort to upgrade to future versions of the Geoportal Server is minimized. The existing meanings should satisfy most of the search needs.
| property-meaning name | description | geoportal function |
| uuid | geoportal's primary key for identifying the document. | Typically you will see this value in URLs. For example: http://host:port/geoportal/rest/document?id=[uuid] |
| fileIdentifier | Represents an identifier from within the metadata document. Not all metadata standards support an internal identifier metadata element. If present, it is recommended that it be globally unique. | Used by the geoportal to avoid duplication of resources and as an alternative identifier for most of the REST-based functions. For example: http://host:port/geoportal/rest/document?id=[fileIdentifier] |
| sys.siteuuid | Internally used by the geoportal, associated with documents that are harvested from remote catalogs. Is the identifier of the remote catalog, and is available . Do not alter this. | Available for query. |
| dateModified | geoportal's modification datestamp associated with the last occurance that the resource's XML was updated. | Used in the Additional Options dialog on the geoportal Search page, and for sorting by date. |
| geometry | Represents the bounding envelope associated with the resource. | Used for spatial queries. |
| keywords | Keywords associated with the resource. | Available for query. |
| body | Non-specific query; a catch-all for indexing and searching text in a metadata document. | If you want to index a certain element, but do not plan to query for that specific element, index it as body. |
| anytext | Anytext is not actually indexed. It represents a collection of properties that will be searched when the queriable anytext is specified. | General searches that are not directed to a specific property are anytext queries. |
| title | Title of the resource. | Used when the resource's title is displayed, for example in the list of search results on the Search page. |
| title.org | Captures the original title as provided from a resource's GetCapabilities response. | Enables geoportal to search both a user-given title for a registered resource, and its original title as per the GetCapabilities response. |
| abstract | Abstract associated in the resource. | Maps to the information displayed as text below the title or a record in the list of search results. |
| contentType | Esri concept for catagorizing resources. | Used for generating the icon for the resource listed in Search page results, and also as a filter on the Additional Options dialog. |
| dataTheme | ISO Topic Catagory code associated with the resource. ISO has defined the Topic Category codelist in the 19115 standard. | Maps to the ISO Categories in the Additional Options dialog. |
| resource.url | Primary endpoint for accessing the resource through the internet. | Used for generation of links in search results. For example, it is the URL accessed when the Preview or Open link is clicked. It is also sometimes used to determine the Esri contentType for the resource. |
| thumbnail.url | URL to the thumbnail image for the resource. | Used for generation of the thumbnail image next to the resource in the list of search results. |
| website.url | URL to a website associated with the resource. | Used for generation of a website link for the resource in the list of search results. |
Each property-meaning in property-meanings.xml has attributes. These attributes for property-meanings are described below.
| Attribute Name | Description |
| name | Unique name for the meaning in this file, and should match the meaning="" attribute in the definition.xml file. The name designated becomes a Lucene field that can be used for advanced searches, as per Lucene documentation. For example, designating a name of title and then typing title:water on your geoportal search page will only return items with water in the index Lucene has associated with the property-meaning title. |
| meaningType | Used to flag metadata elements that are tied to functionality within the geoportal. It is good practice to avoid altering the meaningType of a property-meaning. |
| valueType | Data type of the property value, e.g. Double, Geometry, Long, String, or Timestamp. |
| comparisonType |
Indicates how Lucene will index the property values. There are three options defined in the property-meaning.xml file:
|
Some property-meanings have one or two additional sub-elements, <dc></dc> and <consider></consider>.
- The <dc></dc> element facilitates the connection of property-meanings to Dublin Core concepts. This is essential to supporting the CS-W OGCORE profile, defining what is queriable and returnable through CS-W. Within the <dc></dc> element, there are is an attribute for name and for aliases. The name attribute defines the name of the Dublin Core element. The aliases attribute defines alternate words that will be recognized when supplied as a CS-W property name.
- The <consider></consider> element is used only for the anytext property. It defines other property-meanings that should be included when a search target is anytext. For example, the property-meaning for anytext is shown below. Because anytext has four other property-meanings listed in its <consider></consider> element, a search for anytext results in the title, abstract, keywords, and body properties being searched.
<property-meaning name="anytext"
meaningType="anytext"
valueType="String"
comparisonType="terms"
allowLeadingWildcard="true">
<consider>title,abstract,keywords,body,contentType,dataTheme</consider>
<dc name="AnyText" aliases="csw:AnyText,any,csw:Any"/>
</property-meaning>
How to define a new property meaning
If you have created a custom metadata profile, or added new elements to an existing geoportal metadata profile, and none of the existing property meanings in property-meanings.xml suit your needs for indexing a specific element, then you may need to define a new property meaning. Follow instructions below.
- Choose an existing property meaning from property-meanings.xml that is conceptually similar to the new one you'd like to create. Make a copy of the existing property meaning, and use it as a template to add a new property meaning per the specifications discussed above.
- Add a reference to your property meaning to indexables.xml for the profile for your metadata. Make sure that the xpath for the property meaning in indexables.xml correctly references the xpath for the element in the metadata.
- Save the files, and restart the geoportal web application. You will need to re-approve the resources through the geoportal Administration interface for them to be reindexed with your new property meaning.
Configure the property meaning to be returned in the CSW response
If you would like your new property meaning information to be returned when folks query your geoportal through CSW, then there are some additional steps. You will add the property meanings you want to query into the brief, summary, and/or full CSW response. You will also map the property to an appropriate http://dublincore.org/documents/dcmi-terms/ Dublin core element (dct:references in this example) of your choosing so it can be returned. Follow the steps below:
- In property-meanings.xml, find the brief, summary, and full section.
- Add the property meaning that you wanted returned in the CSW rseponse to the list(s) of meanings in the meaning-names tags for the brief, summary, and/or full section (depending on if you want to include them in the brief, summary, or full response). An example for a property meaning called new.property is shown below for the summary section; note that it doesn't really matter which of the meaning-names tags you put the new property in.
<summary>
<dc>
<meaning-names>fileIdentifier,uuid,title</meaning-names>
<meaning-names>contentType,dataTheme</meaning-names>
<meaning-names>dateModified,abstract</meaning-names>
<meaning-names>resource.url,website.url,thumbnail.url,xml.url</meaning-names>
<meaning-names>geometry,date,relation,new.property</meaning-names>
</dc>
</summary>
- Now, go to the place in the property-meanings.xml file where the property meaning of interest is defined, e.g., like here:
<property-meaning name="new.property" valueType="String" comparisonType="value">
</property-meaning>
- Map the new property to a Dublin Core element by adding a dc element that is appropriate to your property meaning. Note you'll need to update the scheme attribute as well - e.g., scheme="urn:x-esri:specification:new.property". Because the CSW GetRecords response typically provides Dublin Core elements by default, this maps your property meaning into an acceptable Dublin Core element. The dct:references can be used, or another of your choosing - the element you choose will be holding your property's information in the response. See the example below:
<property-meaning name="new.property" valueType="String" comparisonType="value">
<dc name="dct:references" scheme="urn:x-esri:specification:new.property"/>
</property-meaning>
- Save property-meanings.xml file and restart your geoportal web application. Now when you post a CSW query, you should be able to see the property information in the response in the section corresponding to that dc element you chose.
Tips and Tools
- A useful tool for investigating your lucene index is http://code.google.com/p/luke/, and the version that will work with Geoportal 1.2 is http://code.google.com/p/luke/downloads/detail?name=lukeall-3.1.0.jar. Just double-click the jar after downloading, and the interface appears. You may have to stop your geoportal web application to open your index else, it complains that the file is locked and doesn't allow you to force unlock.