Skip to content

Aleph2: configuration reference

Alex edited this page Mar 31, 2016 · 7 revisions
  • Global

    • globals.local_root_dir will usually be /opt/aleph2-home - all the configuration files of the Aleph2 system are assumed to be below this directory
    • globals.local_cached_jar_dir is a location where shared library bundles (eg JARs) are cached locally (default /opt/aleph2-home/cached-jars/)
    • globals.distributed_root_dir will usually be '/apps/aleph2' - all of the distributed/shares configuration and data files on the distributed filesystem in the cluster (typically HDFS) are assumed to be below this directory
    • globals.local_yarn_config_dir is the directory where the system assumes that all Hadoop/YARN-related configuration files ("*-site.xml") are to be found. Defaults to /opt/aleph2-home/yarn-config/.
  • Core Distributed Services

    • service.CoreDistributedServices.interface - should always be com.ikanow.aleph2.data_model.interfaces.shared_services.ICoreDistributedServices
    • service.CoreDistributedServices.service - for production, should be com.ikanow.aleph2.management_db.services.CoreDistributedServices
      • For system testing, can use com.ikanow.aleph2.management_db.services.MockCoreDistributedServices
    • CoreDistributedServices.zookeeper_connection - defaults to "localhost:2181"
    • CoreDistributedServices.zookeeper_connection - for Kafka defaults to "localhost:6667"
    • CoreDistributedServices.application_name - "DataImportManager", "DataAnalyticsManager", "AccessManager" etc (or null for transient nodes such as external harvest processes)
    • CoreDistributedServices.application_port.<application_name> - for the applications defined using the above parameter, the server port number to sit on (eg 2252 for "DataImportManager")
  • MongoDB Management DB

    • service.ManagementDbService.interface - should always be com.ikanow.aleph2.data_model.interfaces.shared_services.IManagementDbService
    • service.ManagementDbService.service - should always be com.ikanow.aleph2.management_db.mongodb.services.MongoDbMangementDbService
      • For system testing can use com.ikanow.aleph2.management_db.mongodb.services.MockMongoDbMangementDbService
    • MongoDbManagementDbService.mongodb_connection - the connection string for the database, eg "localhost:27017"
    • MongoDbManagementDbService.v1_enabled - if true, then runs the V1/V2 synchronization service
  • Elasticsearch Search Index service

    • service.SearchIndexService.interface - should always be com.ikanow.aleph2.data_model.interfaces.data_services.ISearchIndexService

    • service.SearchIndexService.service - should always be com.ikanow.aleph2.search_service.elasticsearch.services.ElasticsearchIndexService

    • ElasticsearchCrudService.elasticsearch_connection- the connection string for the index, eg localhost:9300

    • ElasticsearchCrudService.cluster_name- (optional) the cluster name, if not present then connects to whatever cluster is running at the location pointed to by elasticsearch_connection

    • ElasticsearchIndexService.search_technology_override.* - options for setting the default search index settings, see link

    • ElasticsearchIndexService.columnar_technology_override.* - options for setting the default columnar settings, see link

    • ElasticsearchIndexService.temporal_technology_override.* - options for setting the default temporal settings, see link

  • Core Management DB

    • service.CoreManagementDbService.interface - should always be com.ikanow.aleph2.data_model.interfaces.shared_services.IManagementDbService
    • service.CoreManagementDbService.service - should always be com.ikanow.aleph2.management_db.services.CoreManagementDbService
  • Security Service

    • service.SecurityService.interface- should always be com.ikanow.aleph2.data_model.interfaces.shared_services.ISecurityService
    • service.SecurityService.service - the technology service that provides user authentication and authorization, options: com.ikanow.aleph2.security.service.IkanowV1SecurityService
  • Document Service

    • service.DocumentService.interface - should always be com.ikanow.aleph2.data_model.interfaces.data_services.IDocumentService
    • service.DocumentService.service - the technology service that provides document-oriented storage, options: com.ikanow.aleph2.search_service.elasticsearch.services.ElasticsearchIndexService
    • An additional "v1" document service that provides read only access to documents in v1 format stored in MongoDB can be provided with the following 2 lines:
      • service.V1DocumentService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IDocumentService
      • service.V1DocumentService.service=com.ikanow.aleph2.v1.document_db.services.V1DocumentDbService
  • Enrichment services:

    • service.BatchEnrichmentService.interface - should always be com.ikanow.aleph2.data_model.interfaces.data_analytics.IAnalyticsTechnologyService
    • service.BatchEnrichmentService.service - The technology service that provides batch enrichment to harvesters, can be: com.ikanow.aleph2.analytics.hadoop.services.HadoopTechnologyService (can use analytic_technology_name_or_id:BatchEnrichmentService in analytic jobs)
    • service.StreamingEnrichmentService.interface - should always be com.ikanow.aleph2.data_model.interfaces.data_analytics.IAnalyticsTechnologyService
    • service.StreamingEnrichmentService.service - The technology service that provides batch enrichment to harvesters, can be: "com.ikanow.aleph2.analytics.storm.services.StormAnalyticTechnologyService" (can use analytic_technology_name_or_id:StreamingEnrichmentService in analytic jobs)
  • MongoDB CRUD service

    • MockMongoDbCrudServiceFactory.one_per_thread: defaults to false: get a separate mock MongoDB instance per thread - useful for unit testing where each test might run on a different thread. To use within a multi-threaded test, set to false instead.
  • Data Import Manager

    • DataImportManager.harvest_enabled - whether this data import manager supports harvest orchestration (default true)
    • DataImportManager.analytics_enabled - whether this data import manager supports analytics orchestration (default false)
    • DataImportManager.governance_enabled - whether this data import manager supports data governance, eg deletion based on age (default true)
  • Logging Service

    • service.LoggingService.interface - should always be com.ikanow.aleph2.data_model.interfaces.shared_services.ILoggingService
    • service.LoggingService.service - can be one of com.ikanow.aleph2.logging.service.LoggingService (standard - writes to a "logging bucket"), com.ikanow.aleph2.logging.service.Log4jLoggingService (Writes to log4j), or com.ikanow.aleph2.logging.service.NoLoggingService (doesn't write any logging out at all)
    • LoggingService.default_time_field - field name to output logging message timestamps as (defaults to 'date')
    • LoggingService.default_user_log_level - log4j Level to default user messages as https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html (defaults to 'OFF')
    • LoggingService.default_system_log_level - log4j Level to default system messages as https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html (defaults to 'OFF')
    • LoggingService.system_mirror_to_log4j_level - log4j Level to output system messages to log4j, set to 'OFF' to not output any messages to log4j (this allows for external logging rather than messages going into ES) (defaults to 'OFF')
Clone this wiki locally