Skip to content

Senzing/elasticsearch

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 

Senzing with ElasticSearch

Overview

This code project demonstrates how the G2 engine may be used with an ElasticSearch indexing engine. ElasticSearch provides enhanced searching capabilities on entity data.

The G2 data repository contains data records and observations about known entities. It determines which records match/merge to become single resolved entities. These resolved entities can be indexed through the ElasticSearch engine, to provide more searchable data entities.

ElasticSearch stores its indexed entity data in a separate data repository than the G2 engine does. Thus, ElasticSearch and G2 must both be managed in order to keep them in sync.

Preamble

At Senzing, we strive to create GitHub documentation in a "don't make me think" style. For the most part, instructions are copy and paste. Whenever thinking is needed, it's marked with a "thinking" icon πŸ€”. Whenever customization is needed, it's marked with a "pencil" icon ✏️. If the instructions are not clear, please let us know by opening a new Documentation issue describing where we can improve. Now on with the show...

Legend

  1. πŸ€” - A "thinker" icon means that a little extra thinking may be required. Perhaps there are some choices to be made. Perhaps it's an optional step.
  2. ✏️ - A "pencil" icon means that the instructions may need modification before performing.
  3. ⚠️ - A "warning" icon means that something tricky is happening, so pay attention.

Expectations

  • Space: This repository and demonstration require X GB free disk space.
  • Time: Budget 30 minutes to get the demonstration up-and-running, depending on CPU and network speeds.
  • Background knowledge: This repository assumes a working knowledge of:

Prerequisites

  1. maven
  2. java
  3. git.

Demonstration

Load Data

  • πŸ€” Data needs to be preemptively loaded into a Senzing project to post to elasticsearch, if you don't have any data to load, or don't know how, visit our quickstart. Whether using an existing Senzing installation or a new installation from the quickstart, the following instructions will be referring to this installation.

Startup elasticsearch

  • Start an instance of elasticsearch and your favorite elastic search UI, kibana is recommended and will be assumed for the remainder of this demonstration. For guidance on how to get an instance of ES and kibana running vist our doc on How to Bring Up an ELK Stack.

Build project

  1. ✏️ Set local environment variables. These variables may be modified, but do not need to be modified. The variables are used throughout the installation procedure.

    export GIT_ACCOUNT=senzing
    export GIT_REPOSITORY=elasticsearch
    export GIT_ACCOUNT_DIR=~/${GIT_ACCOUNT}.git
    export GIT_REPOSITORY_DIR="${GIT_ACCOUNT_DIR}/${GIT_REPOSITORY}"
  2. Clone the repository

    cd ${GIT_ACCOUNT_DIR}
    git clone https://github.com/Senzing/elasticsearch.git
    cd ${GIT_REPOSITORY_DIR}
  3. πŸ€” Make sure the SENZING_ENGINE_CONFIGURATION_JSON environment variable is set to the Senzing installation that the data was loaded into earlier

  4. πŸ€” Set elasticsearch local environment variables. The hostname and port must point towards the exposed port that the elasticsearch instance has. The index name can be anything; conforming to elasticsearch's index syntax.

    export ELASTIC_HOSTNAME=localhost
    export ELASTIC_PORT=9200
    export ELASTIC_INDEX_NAME=g2index
  5. Build the interface for ElasticSearch.

    cd ${GIT_REPOSITORY_DIR}/elasticsearch
    
    mvn \
      -Dmaven.repo.local=${GIT_REPOSITORY_DIR}/elasticsearch/maven_resources \
      install
  6. ✏️ Copy the library into a working directory

    sudo mkdir /opt/senzing/g2/elasticsearch
    cd /opt/senzing/g2/elasticsearch
    
    sudo cp \
      ${GIT_REPOSITORY_DIR}/elasticsearch/target/g2elasticsearch-1.0.0-SNAPSHOT.jar \
      /opt/senzing/g2/elasticsearch/g2elasticsearch.jar
  7. πŸ€” Make sure to set the LD_LIBRARY_PATH variable in the same console window that will be running the indexer. If a Senzing project like the one setup in the quickstart the setupenv can be used similarly to the quickstart to achieve this. Example:

    export LD_LIBRARY_PATH=/opt/senzing/g2/lib/
  8. πŸ€” Navigate to the dir that the library was stored in and run the indexer.

    cd /opt/senzing/g2/elasticsearch
    java -jar g2elasticsearch.jar

Search data

  1. Open up kibana in a web browser, default: localhost:5601

  2. Navigate to the discover tab

image
  1. Create Index.

    • If all was done correctly, a new screen with a button to "Create data view" should appear.
    • Click this and in the index pattern box type the name of the index that was created, this was the ELASTIC_INDEX_NAME variable set early, and should also appear on the right side of the popup.
    • The Name field can be set but is not required.
  2. Press "Save data view to Kibana" at the bottom of the screen, now can view the created index and do searches. If fuzzy searches are needed click on "Saved Query" and switch the language to lucene. Here you can view the lucene syntax and how to do fuzzy searches