Geoportal Facets using Apache Solr
Pages 202
- Home
- 2009 Esri Federal UC
- 2009 Esri International UC
- 2010 Esri Federal UC
- 2010 Esri International UC
- 2011 Esri Federal UC
- 2012 Esri Federal UC
- 2013 Esri Federal GIS Conference
- 2013 Esri International User Conference
- 2015 SDI Special Interest Group
- Add a Custom Profile
- Add an OpenSearch endpoint for Federated Search
- Add Another Tab to the Geoportal Interface
- Add Custom Link to a Search Result
- Add Custom Search Criteria
- Add the Geoportal Search to a List of Search Providers
- Add v1.1.1 FGDC editor to a previous Geoportal release
- AGP TO AGP Harvesting with the Geoportal
- AGS TO AGP Harvesting with the Geoportal
- All gpt.xml file settings
- An Introduction to vi
- Apache Tomcat geoportal logging
- Being a Good Robot
- Best Practice for Edits to JSP files
- Biological or Remote Sensing FGDC xsds
- Browse Tree
- Cart Processor
- Catalog Service
- Clear the Tomcat Work Folder
- Collections
- Common problems and solutions
- Communities and live examples
- Components
- Configure a Directory Server for the Geoportal
- Configure geoportal User and Schema in the PostgreSQL Database
- Configure Previewable Filetypes
- Configure Searching of YouTube
- Configure the gpt.xml File
- Configure Widgets
- Connecting to a User Directory
- Create a user account
- Create Relationships between Resources
- Customizations
- Customize DCAT output
- Customize Metadata Validation
- Database problems
- Database Tables
- DataDownload Tab
- Deploy and Configure the Geoportal Web Application in Tomcat
- Deploy and Configure the Servlet Web Application
- Deploy the Geoportal Web Application
- Details of Lucene Indexing in the Geoportal
- Development topics
- Discovering Resources
- Eclipse Project from Compiled WAR
- Eclipse Project from Source Code
- Enable Search Using an Ontology Service
- Error Messages in the Geoportal Web Application
- Esri Geoportal Server LiveDVD
- Extending the Web Harvester
- Federated Search in Portal for ArcGIS
- Feedback
- FGDC Biological Profile and Remote Sensing Extension
- FGDC Service Checker Integration
- Geoportal Clients for ArcGIS
- Geoportal CSW Clients
- Geoportal Facets using Apache Solr
- Geoportal genie
- Geoportal Project from Compiled WAR
- Geoportal Publish Client
- Geoportal Server 1.2.5 What's New
- Geoportal Server 1.2.6 What's New
- Geoportal Server 1.2.7 What's New
- Geoportal server as a broker
- Geoportal Server Downloads
- Geoportal Server v 1.0 What's New
- Geoportal Server v 1.1 What's New
- Geoportal Server v 1.1.1 What's New
- Geoportal Server v 1.2 What's New
- Geoportal Server v 1.2.2 What's New
- Geoportal Server v 1.2.4 What's New
- Geoportal SPARQL Sample
- Geoportal User Interface Components
- Geoportal Web Application File Organization
- Geoportal XML Editor
- Get Assistance with an Implementation
- GXE Concepts
- GXE Crash Course
- GXE Structure
- GXE Workflow
- High Availability and Large Number of Records
- How to Browse for Resources
- How to Create and Manage My Profile
- How to find all documents of a particular metadata standard
- How to Leave a Resource Review
- How to Login and Manage my Password
- How to Manage and Edit Resources
- How to Publish Resources
- How to Restrict Access to Resources
- How to Search for Resources
- How to Search with an Ontology Service
- How to Set Up an Esri Geoportal Server on Linux
- How to Use Search Page Results
- How to Use the Data Download Feature
- How to View Resource Relationships
- IDE Topics
- Identity Components LDAP and Single Sign On
- Index All Metadata Content
- Indexing and Searching the Time Period of the Content
- Install Apache Tomcat 6
- Install Desktop Tools
- Install Esri Geoportal Server
- Install PostgreSQL 9.1.2
- Install the JDBC .jar Files
- Installation
- Installation Version 1.0
- Installation Version 1.1
- Installation Version 1.2
- Installation Version 1.2.2
- Installation Version 1.2.4
- Installation Version 1.2.5
- Installation Version 1.2.6
- Installation Version 1.2.7
- Installation Version 1.2.8
- Integrate with a Content Management System
- Integrate with the con terra Security Solution
- Localization
- Log In to the Geoportal
- Logging
- Look and Feel of the User Interface
- Main Page
- Map LDAP Attributes on the Registration Page
- Map Viewer
- Online form editing for all publication methods
- Open source acknowledgements
- Oracle WebLogic geoportal logging
- Orientation to the Create Metadata Page
- Perform Preinstallation Computer Setup
- Portal for ArcGIS Integration
- Post Deployment Actions
- Preinstallation
- Preinstallation 0.9
- Preinstallation 1.0 and 1.1.x
- Preinstallation 1.2
- Preinstallation 1.2.2
- Preinstallation 1.2.4
- Preinstallation 1.2.5
- Preinstallation 1.2.6
- Preinstallation 1.2.7
- Preinstallation 1.2.8
- Preview Function
- Publication Components
- Ratings and Comments for Search Results
- Register ArcGIS for Server with the Geoportal
- Release notes
- REST API Syntax
- Sample FGDC metadata.xml
- Scheduled tasks
- Search Components
- Search Map
- Search Widget for Flex
- Search Widget for HTML
- Search Widget for Silverlight
- Security Concepts
- Set Up Systemwide Environment Variables
- Set up the Geoportal Database
- Share Link
- Single Sign On
- Smoketest the Geoportal
- Standards Support
- Supported CSW Profiles for Synchronization
- Theme Library
- Troubleshooting
- Troubleshooting Tips
- Two geoportals on the same server
- Upgrade 1.x to 1.2 database
- Upgrading file system approach
- Upgrading Read This Overview
- Upgrading SVN approach
- Url filter customization
- Use an XSLT to Render the Details Page
- Use Ant to build Geoportal
- User Functions and Roles
- User Management Interface
- Using a geoportal
- Using Lucene Search Text Queries
- Version 0.9
- Version 1.0
- Version 1.1
- Version 1.1.1
- Version 1.2
- Version 1.2.2
- Version 1.2.4
- Version 1.2.5
- Version 1.2.6
- Version 1.2.7
- Version 1.2.8
- What is a geoportal and the geoportal server
- What is the esri geoportal server
- What's New
- wiki template
- WMC Client
- Show 187 more pages…
Clone this wiki locally
The Geoportal Facets Customization (GFC) is a prototype component that can be deployed standalone or in conjunction with the Esri Geoportal Server. It is downloadable from this page: https://github.com/Esri/geoportal-server/wiki/Geoportal-Server-Downloads. This topic describes the GFC as used with the Esri Geoportal Server, and provides an overview of the GFC, how to setup the GFC, how to use the sample user interface, and how to further configure it.
The GFC consists of a backend component and a user interface component, both of which leverage an Apache solr index (solr versions 4.1.0 and 4.6.1 have been tested and are supported). The backend component parses XML into the solr index according to customizable rules, and also writes documents from a Geoportal Server database to the solr index. The user interface interacts with the solr indices through a faceting and filtering display. Architecture for the GFC is shown below. You can see that solr is deployed separately from the Esri Geoportal Server, and maintains a separate lucene index. Records can be pushed into this index from the Esri Geoportal Server or from other systems. The Custom solr UI exposes the solr index; this UI can be placed into the Esri Geoportal Server page structure, or can be maintained outside of the Esri Geoportal Server.
How to setup the GFC
To setup the GFC, you’ll first set up solr. Then you will deploy a custom web application that will connect your geoportal to the solr instance. Then you will configure the web application and the user interface for viewing the solr index.
Step One: Verify Apache solr instance
To setup the GFC, you must first have an Apache solr instance deployed. This can – but does not have to – be deployed in the same web server instance as your geoportal. Download and install Apache solr (http://lucene.apache.org/solr).
In this example, the provided Jetty servlet container is used, following the solr tutorial examples. It is possible to run solr in another servlet container – e.g., Tomcat, GlassFish, etc. – but specific java properties in your solr instance will need to be defined. See Apache solr documentation for configuring solr in other servlet containers.
Step Two: Decide where you are going to put your geoportal solr documents
You can customize this deployment to put the solr documents in the default collection, or a new collection. Each collection is an independent Lucene index; collections are also referred to as 'cores' in the Solr administrative interface. In this example, we put them in the default collection , called ‘collection1’,
Step Three: modify the solr schema.xml file
A couple small modifications to the solr schema.xml in your collection location are necessary to support the geoportal integration. Open the schema.xml file from the collection location, for example: \\solr-4.6.1\example\solr\collection1\conf\schema.xml
The changes are in 2 sections:
- Find the section where the fields are defined. Just before the closing fields/ tag, enter the following:
<!-- CUSTOMIZATION -->
<dynamicField name="*_geo" type="geo" indexed="true" stored="true" multiValued="true"/>
<dynamicField name="*_xml" type="string" indexed="false" stored="true" multiValued="false"/>
<dynamicField name="*_json" type="string" indexed="false" stored="true" multiValued="false"/>
<dynamicField name="*_clob" type="string" indexed="false" stored="true" multiValued="false"/>
<dynamicField name="*_blob" type="binary" indexed="false" stored="true" multiValued="false"/>
- Now, find this section:
<fieldType name="location_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" geo="true" distErrPct="0.025" maxDistErr="0.000009" units="degrees"/>
- Just after that element, paste the following:
<!-- CUSTOMIZATION -->
<!-- we had trouble with the default distErrPct="0.025", changed to distErrPct="0.25", may no longer be necesary -->
<fieldType name="geo" class="solr.SpatialRecursivePrefixTreeFieldType"
geo="true" distErrPct="0.25" maxDistErr="0.000009" units="degrees" />
- Save the schema.xml file, and then restart the solr service
Step Four: Deploy and configure the GcService.war file
Deploy the included GcService.war file to Tomcat or GlassFish. The GcService.war file is available for download at https://github.com/Esri/geoportal-server/wiki/Geoportal-Server-Downloads. You can deploy it into the same Tomcat or GlassFish as your geoportal web application.
After the GcService web application is deployed, open the //[GcService]/WEB-INF/classes/gc-config/gptdb2solr.xml file. You will now configure this file to auto-copy your Geoportal documents to solr from your geoportal database. This copy process runs just after making the change and restarting Tomcat, and then will run again once every 24 hours.
Update the following section in gptdb2solr.xml as described in the table below - note, this is information about your geoportal database, so use the geoportal's database JDBC configuration (e.g, Tomcat\conf\Catalina\localhost\geoportal.xml file) for values as described :
<property name="gptdb2solr.instance.id" value="gptinstance.mygeoportal">
<property name="gpt.instance.url" value="http://mygeoportal:8080/geoportal"/>
<property name="gpt.include.collections" value="false"/>
<property name="solr.collection.url" value="http://test.esri.com:8983/solr/collection1/"/>
<property name="sql.database">
<property name="driver" value="jdbcDriverClassnameHere"/>
<property name="url" value="jdbcDatabaseUrlHere"/>
<property name="username" value="username"/>
<property name="password" value="password"/>
</property>
</property>
| property name | description | example |
|---|---|---|
| gptdb2solr.instance.id | A unique identifier for this solr collection instance. If you have multiple instances defined in this file, this value should always be unique. | gptinstance.geoportal.1 |
| gpt.instance.url | The URL to the Esri Geoportal Server instance from which you want to push records into solr. | http://mygeoportal:8080/geoportal |
| gpt.include.collections | If you've enabled the Collections feature on your geoportal and want to have a facet for those collections, set this to "true"; the default is "false". | false |
| solr.collection.url | The URL to your solr collection. Note, when you look at your solr collection through a browser, this URL may have a ‘#’ in the URL; leave this out in this configuration. Also, the final backslash is important to include. | http://test.esri.com:8983/solr/collection1/ |
| driver | The driver you enter here is dependent on the driver used to connect your geoportal to its database. You can copy this setting from the \\Tomcat\conf\Catalina\localhost\geoportal.xml jdbc connection file. | Oracle: oracle.jdbc.driver.OracleDriver SQL Server: com.microsoft.sqlserver.jdbc.SQLServerDriver PostgreSQL: org.postgresql.Driver MySQL: driverClassName="com.mysql.jdbc.Driver" |
| url | This is the jdbc connection URL, and should also be the same as the jdbc connection information from your Geoportal instance to its database, see the note in the sql.database/driver section above. | Oracle: jdbc:oracle:thin:@serverName:1521:oracleSID SQL Server: jdbc:sqlserver://serverName:1433;databaseNa me=geoportal PostgreSQL: jdbc:postgresql://serverName:5432/postgres MySQL: jdbc:mysql://serverName:3306/geoportal |
| username | The database user configured to access the Esri Geoportal Server database. Also configured in \\Tomcat\conf\Catalina\localhost\geoportal.xml. | a username |
| password | The password of the database user configured to access the Esri Geoportal Server database. Also configured in \\Tomcat\conf\Catalina\localhost\geoportal.xml. | user's password |
After configuring this section, save the file. Do not restart Tomcat yet, there are two more changes, Step 5 & 6.
Step Five: Update the GcService web.xml
The \\GcService\WEB-INF\web.xml file should be updated to point to your solr location. Open the \\GcService\WEB-INF\web.xml file, and find the param-name element with the value of targetUri. In the param-value element below it, enter the URL to your solr instance, e.g., http://test.esri.com:8983/solr/collection1. Then save the file. Do not restart Tomcat yet, there is one more change, Step 6.
Step Six: Update the HTML sample page
Open the //[GcService]/g.html file, and find this section:
data-dojo-id="solrClient" data-dojo-type="g/solr/base/SolrClient"
data-dojo-props='
collectionUrl:"http://test.esri.com:8983/solr/collection1",
filters:[{viewable:true, removable:false, urlParameterName:"fq",
urlParameterValue:"id.table_s:table.docindex"}]'
Update the collectionUrl to match the URL to your solr collection. Then save the g.html file.
Step Seven: Start the GcService web application
After configuring the g.html file, restart Tomcat or whatever servlet application software is hosting your GcService web application.
Note: The time it takes for your solr index to be updated with the geoportal records will depend on the number of records you have in your geoportal database. When you launch the Geoportal Facets Customization user interface, it may be slow at first because the user interface is showing an index that is growing by the minute initially. You can view and interact with the index by launching the html page, e.g., http://yourserver:8080/GcService/g.html
How to Use the Sample User Interface
The sample user interface provides a view into the solr index. When you first deploy the sample user interface, you will see something like the following (note, you will have to click on the Map filter to show the map):
The categories on the left correspond to facets and filters. See the section below on Changing the GFC Configurations for how these can be customized. If you click on a category within a facet, that facet is applied to the list of results on the right, and the counts in the categories are updated.
The list on the right shows indexed records that meet the criteria defined by the facets and filters. There is a section at the very top called Filters. This lists the current facets that are applied to the list of results. To remove a facet from affecting the list, click the red X next to it in the Filters list. Each result has three links associated with it - xml, solrxml, and solrjson. When each link is clicked, the following occurs:
- xml: the originating metadata xml will be accessed from its host Geoportal Server instance.
- solrxml: the view of the record from the solr index is shown. Here, you can see the name of the solr indices and what value is indexed for each of them from the originating metadata.
- solrjson: a JSON view of the records information as indexed by solr.
The difference between facets and filters is that a facet provides count information about the documents – how many documents have that value for a field when zero or more facets are applied whereas filters do not provide such information but filter out results that do not match the filter query. In this default interface all the categories are facets except the Map and Temporal Extent, which are filters. To apply the Map filter, check the box next to the map. The map can be zoomed in or out using the + or – buttons, and recentered by dragging a spot on the map. The Temporal Extent is defined by entering in a start and/or end date in a YYYY-MM-DD format, or picking a recent date from the calendar that appears if you click the dropdown arrow on the right of the date field.
Developer Tools for the Facets
Each facet has a tool icon to the right of it (this example is the Organizations facet):
If you click this icon, a popup like the following displays:
This popup is a developer tool, included in the release for this version. It can be used to manipulate how the facets display information, as described below. When you click the OK button, your new settings are applied to the facet in the user interface.
| property | description |
|---|---|
| Field | Corresponds to the solr index that you want to associate with this facet |
| Alias Key | Used for faceting on the same field multiple times with different exclusions |
| Minimum Count | Minimum number of records that should be available for this facet to appear in the list; e.g., setting to ‘10’ means there must be at least 10 records for this facet to appear |
| Limit (rows) | How many categories should be shown in the list (e.g., ‘10’ = ten categories can be shown) |
| Value Prefix | If you want to constrain the values by including a case sensitive prefix, designate the prefix here. For example, you could limit the organizations in this list to those that begin with ‘US’ |
| Show Missing | Yes = show the facet category even when no records meet the other facet criteria; No = only show facet categories that have results associated with them |
| Sort By | Count = the facet categories with the highest number of results will be at the top of the list, and ordered by descending number of results; Index = the facet categories will be ordered alphabetically |
For more about how the facet options work, see the solr documentation available at http://wiki.apache.org/solr/#Search_and_Indexing and http://wiki.apache.org/solr/SimpleFacetParameters
Changing the GFC Configurations (optional)
Change how terms are indexed
This initial version of the GFC supports the following metadata schemas for parsing into the solr index: ISO 19115, ISO 19115-2, FGDC CSDGM, and Dublin Core. Which xpaths are associated with which index is defined in the requisite metadata schema’s xslt:
- ISO 19115 and ISO 19115-2: \\GcService\WEB-INF\classes\gc-config\xmltypes\iso19115-base-toSolr.xslt
- FGDC CSDGM: \\GcService\WEB-INF\classes\gc-config\xmltypes\fgdc-base-toSolr.xslt
- Dublin Core: \\GcService\WEB-INF\classes\gc-config\xmltypes\dc-base-toSolr.xslt
Show additional indices on the Facet user interface
These instructions are for adding a simple facet – e.g., one that facets on text. Configuring a facet on numbers, dates, or mixed content is not yet documented.
To show additional indices on the facet user interface page, you will add content to the \\GcService\g.html page. Open the g.html file and find the section below:
<div data-dojo-type="dijit/layout/ContentPane" data-dojo-props="splitter:true, region:'leading'" class="gPanel gLeadingContentPane">
Within this div are other divs that define the facets. To add a new facet, choose a section after one of the closing /div tags. Add the following:
<div data-dojo-type="g/solr/SimpleFacetPane" data-dojo-props="facet:{field:’placeholder.s', mincount:1, limit:10}"> </div>
Now, update the placeholder.s with the name of the solr index upon which you want to facet. Then save the g.html file.
Note: you can also keep all the strings for your g.html user interface decoupled by adding the facet name and text you'd like to display to the \\GcService\g\solr\nls\resources.js file. Add the mapping in the "Fields" section towards the end of the resources.js file. For example:
Fields: {
"title": "Title",
"keywords": "Keywords (terms)",
"keywords_ss": "Keywords (values)",
"contact.organizations_ss": "Organizations (contact)",
"contact.people_ss": "People (contact)",
"placeholder.s": "Placeholder facet"
}