High Availability and Large Number of Records
Pages 202
- Home
- 2009 Esri Federal UC
- 2009 Esri International UC
- 2010 Esri Federal UC
- 2010 Esri International UC
- 2011 Esri Federal UC
- 2012 Esri Federal UC
- 2013 Esri Federal GIS Conference
- 2013 Esri International User Conference
- 2015 SDI Special Interest Group
- Add a Custom Profile
- Add an OpenSearch endpoint for Federated Search
- Add Another Tab to the Geoportal Interface
- Add Custom Link to a Search Result
- Add Custom Search Criteria
- Add the Geoportal Search to a List of Search Providers
- Add v1.1.1 FGDC editor to a previous Geoportal release
- AGP TO AGP Harvesting with the Geoportal
- AGS TO AGP Harvesting with the Geoportal
- All gpt.xml file settings
- An Introduction to vi
- Apache Tomcat geoportal logging
- Being a Good Robot
- Best Practice for Edits to JSP files
- Biological or Remote Sensing FGDC xsds
- Browse Tree
- Cart Processor
- Catalog Service
- Clear the Tomcat Work Folder
- Collections
- Common problems and solutions
- Communities and live examples
- Components
- Configure a Directory Server for the Geoportal
- Configure geoportal User and Schema in the PostgreSQL Database
- Configure Previewable Filetypes
- Configure Searching of YouTube
- Configure the gpt.xml File
- Configure Widgets
- Connecting to a User Directory
- Create a user account
- Create Relationships between Resources
- Customizations
- Customize DCAT output
- Customize Metadata Validation
- Database problems
- Database Tables
- DataDownload Tab
- Deploy and Configure the Geoportal Web Application in Tomcat
- Deploy and Configure the Servlet Web Application
- Deploy the Geoportal Web Application
- Details of Lucene Indexing in the Geoportal
- Development topics
- Discovering Resources
- Eclipse Project from Compiled WAR
- Eclipse Project from Source Code
- Enable Search Using an Ontology Service
- Error Messages in the Geoportal Web Application
- Esri Geoportal Server LiveDVD
- Extending the Web Harvester
- Federated Search in Portal for ArcGIS
- Feedback
- FGDC Biological Profile and Remote Sensing Extension
- FGDC Service Checker Integration
- Geoportal Clients for ArcGIS
- Geoportal CSW Clients
- Geoportal Facets using Apache Solr
- Geoportal genie
- Geoportal Project from Compiled WAR
- Geoportal Publish Client
- Geoportal Server 1.2.5 What's New
- Geoportal Server 1.2.6 What's New
- Geoportal Server 1.2.7 What's New
- Geoportal server as a broker
- Geoportal Server Downloads
- Geoportal Server v 1.0 What's New
- Geoportal Server v 1.1 What's New
- Geoportal Server v 1.1.1 What's New
- Geoportal Server v 1.2 What's New
- Geoportal Server v 1.2.2 What's New
- Geoportal Server v 1.2.4 What's New
- Geoportal SPARQL Sample
- Geoportal User Interface Components
- Geoportal Web Application File Organization
- Geoportal XML Editor
- Get Assistance with an Implementation
- GXE Concepts
- GXE Crash Course
- GXE Structure
- GXE Workflow
- High Availability and Large Number of Records
- How to Browse for Resources
- How to Create and Manage My Profile
- How to find all documents of a particular metadata standard
- How to Leave a Resource Review
- How to Login and Manage my Password
- How to Manage and Edit Resources
- How to Publish Resources
- How to Restrict Access to Resources
- How to Search for Resources
- How to Search with an Ontology Service
- How to Set Up an Esri Geoportal Server on Linux
- How to Use Search Page Results
- How to Use the Data Download Feature
- How to View Resource Relationships
- IDE Topics
- Identity Components LDAP and Single Sign On
- Index All Metadata Content
- Indexing and Searching the Time Period of the Content
- Install Apache Tomcat 6
- Install Desktop Tools
- Install Esri Geoportal Server
- Install PostgreSQL 9.1.2
- Install the JDBC .jar Files
- Installation
- Installation Version 1.0
- Installation Version 1.1
- Installation Version 1.2
- Installation Version 1.2.2
- Installation Version 1.2.4
- Installation Version 1.2.5
- Installation Version 1.2.6
- Installation Version 1.2.7
- Installation Version 1.2.8
- Integrate with a Content Management System
- Integrate with the con terra Security Solution
- Localization
- Log In to the Geoportal
- Logging
- Look and Feel of the User Interface
- Main Page
- Map LDAP Attributes on the Registration Page
- Map Viewer
- Online form editing for all publication methods
- Open source acknowledgements
- Oracle WebLogic geoportal logging
- Orientation to the Create Metadata Page
- Perform Preinstallation Computer Setup
- Portal for ArcGIS Integration
- Post Deployment Actions
- Preinstallation
- Preinstallation 0.9
- Preinstallation 1.0 and 1.1.x
- Preinstallation 1.2
- Preinstallation 1.2.2
- Preinstallation 1.2.4
- Preinstallation 1.2.5
- Preinstallation 1.2.6
- Preinstallation 1.2.7
- Preinstallation 1.2.8
- Preview Function
- Publication Components
- Ratings and Comments for Search Results
- Register ArcGIS for Server with the Geoportal
- Release notes
- REST API Syntax
- Sample FGDC metadata.xml
- Scheduled tasks
- Search Components
- Search Map
- Search Widget for Flex
- Search Widget for HTML
- Search Widget for Silverlight
- Security Concepts
- Set Up Systemwide Environment Variables
- Set up the Geoportal Database
- Share Link
- Single Sign On
- Smoketest the Geoportal
- Standards Support
- Supported CSW Profiles for Synchronization
- Theme Library
- Troubleshooting
- Troubleshooting Tips
- Two geoportals on the same server
- Upgrade 1.x to 1.2 database
- Upgrading file system approach
- Upgrading Read This Overview
- Upgrading SVN approach
- Url filter customization
- Use an XSLT to Render the Details Page
- Use Ant to build Geoportal
- User Functions and Roles
- User Management Interface
- Using a geoportal
- Using Lucene Search Text Queries
- Version 0.9
- Version 1.0
- Version 1.1
- Version 1.1.1
- Version 1.2
- Version 1.2.2
- Version 1.2.4
- Version 1.2.5
- Version 1.2.6
- Version 1.2.7
- Version 1.2.8
- What is a geoportal and the geoportal server
- What is the esri geoportal server
- What's New
- wiki template
- WMC Client
- Show 187 more pages…
Clone this wiki locally
'High Availability' and 'Large' are arbitrary terms, but in this case we refer to geoportals that must meet a requirement for system failover and/or contain 500,000+ records. If your organization plans to implement such a geoportal, then there are some things you can do to improve the performance and success of your implementation. This topic discusses architectural considerations and settings in the gpt.xml file to accommodate high availability and larger geoportals.
User store and database
The Geoportal architecture typically includes a server hosting a user store (LDAP), a server hosting the geoportal database, and a server hosting the geoportal web application. For a failover environment, you should follow the guidance of your LDAP and RDBMS software for backing up both the user store and the database. This topic does not include steps for configuring this backup.
Architecture Overview
The diagram below provides a visual overview of one way to set up the geoportal environment for high availability. It is recommended that you deploy the geoportal web application in two server instances and use a load balancer to direct web traffic to the endpoint. Each instance should have its own lucene index - they should not share an index, but each be configured in their respective gpt.xml files to point to their own index. This means that its possible for search to retrieve slightly different results for newly published documents, as the indices will typically sync with the database at night.
If you have scheduled harvesting of several repositories - a likely scenario for large geoportal deployments - then you should reduce the workload of the server hosting the geoportal by separating out the harvesting functionality. Here you would deploy an additional geoportal instance to be behind the scenes and used solely for harvesting. You will configure the gpt.xml file for the two user-facing geoportals and the geoportal harvester in a specific way, discussed in the gpt.xml configuration section below.
Note: In the diagram, the Database1 and LDAP1 servers are being backed up on the Database2 and LDAP2 servers. That is part of best practice for maintaining user stores and databases, but is not addressed in this topic.
gpt.xml configuration
The gpt.xml configuration for the two geoportal servers and the geoportal harvester instance is shown below. Here we show only the Web Harvester parameters section; the rest of the gpt.xml configuration will be determined by your organization's preferences.
Geoportal1 and Geoportal2
In the example below, you will update and add the following parameters to the gpt.xml file for the two user-facing geoportals. Remember to replace the url_to_the_harvester_machine with the url to your geoportal instance that you want to dedicate solely to harvesting.
<parameter key="webharvester.updateindex" value="false"/>
<thread class="com.esri.gpt.catalog.context.CatalogSynchronizer" at="01:00"/>
<thread class="com.esri.gpt.control.webharvest.engine.ScheduledPause" at="00:45">
<parameter key="remoteIndexingUrls"
value="url_to_the_harvester_machine"/>
<parameter key="connectionTimeout" value="30[MINUTE]"/>
<parameter key="responseTimeout" value="30[MINUTE]"/>
<parameter key="initialSleepTime" value="30[MINUTE]"/>
<parameter key="consecutiveSleepTime" value="15[MINUTE]"/>
</thread>
Geoportal Harvester
On the harvesting geoportal, the configuration would be the same except that for the parameter remoteIndexingUrls, the value should be set to 'self', as shown in the example below:
<parameter key="webharvester.updateindex" value="false"/>
<thread class="com.esri.gpt.catalog.context.CatalogSynchronizer" at="01:00"/>
<thread class="com.esri.gpt.control.webharvest.engine.ScheduledPause" at="00:45">
<parameter key="remoteIndexingUrls"
value="self"/>
<parameter key="connectionTimeout" value="30[MINUTE]"/>
<parameter key="responseTimeout" value="30[MINUTE]"/>
<parameter key="initialSleepTime" value="30[MINUTE]"/>
<parameter key="consecutiveSleepTime" value="15[MINUTE]"/>
</thread>
Spatial Ranking
Another setting in the gpt.xml file that can be adjusted is the one responsible for spatial ranking. Spatial ranking is an automatic attempt to rank records in the geoportal catalog by their spatial relevance. When there are many records, this ranking becomes resource-intensive. To change the maximum number of records your geoportal should support spatial ranking for, update the spatialRelevance.ranking.maxDoc parameter in the gpt.xml file. You may want to set this number lower e.g., 100,000 so the ranking does not happen for your large catalog.
Back to Customizations