OpenG2P Search Service

The “OpenG2P” is a set of digital building blocks opensourced to support large scale cash transfer programs digitize key cogs in their delivery chain: 1) beneficiary targeting and enrollment, 2) beneficiary list management, 3) payment digitization, and 4) recourse.

This project provides an extensible entity resolution framework for finding/matching persons usually lacking unique identities, helping programs main the uniqueness of and deduplicate their beneficiary lists. Can also be used to enable linkages against external databases.

Can be used as a standalone component and also integrates with the OpenG2P CRM

Background

Government-to-persons programs must be confident of the uniqueness of beneficiaries they serve and avoid double-dipping. A social protection transfer program, as an example, should not be paying the same individual multiple times per period. Yet, this is a significant challenge in countries where universal unique ID coverage, e.g., national ID, is low. The OpenG2P Search Service helps alleviate this and improve confidence in these disbursement lists by using combinations of beneficiary's attributes (e.g., name, address, dob) to find high probabilities of duplicates. The strategy employed is called entity resolution and also accounts for typos and representation nuisances in these attributes, e.g., names, addresses, etc

By default, it leverages elasticsearch and zentity for entity resolution but provides an easily extensible framework for adopters to add more methods, e.g. facial recognition.

Getting Started

WARNING:
Do not use it as your data store! Do not expose to the internet or untrusted networks in production!

Using Docker Compose

You can get started by using the docker-compose which starts both the server and its dependencies.

docker-compose up -d

Manually

You will need to have elasticsearch 7.6.1 up and running with the following plugins installed:

zentity 1.6.0
analysis-phonetic
analysis-icu

elasticsearch-plugin install https://zentity.io/releases/zentity-1.6.0-elasticsearch-7.6.1.zip
elasticsearch-plugin install analysis-phonetic
elasticsearch-plugin install analysis-icu

Set searchservice.elastic.endpoint to your elasticsearch endpoint e.g. http://localhost:9200

./gradlew bootRun

NOTE:
Will not start if it cannot connect to elasticsearch!

User Guide

Provides a Rest API for persons/beneficiaries data to be indexed and queried. A typical use case will

index all enrolled beneficiaries into the search service
query the search service to assert that beneficiary not already enrolled before proceed
use for existing deduplicating enrollments

API

Indexing Beneficiary

Adding a beneficiary to the search service, e.g., beneficiary add to your program

POST /index
{
    "city": "Freetown",
    "email": "saltonmassally@gmail.com",
    "first_name": "Salton",
    "id": "{id}",
    "last_name": "Massally",
    "phone": "07722015",
    "state": "Freetown",
    "street": "5 Foday Drive",
    "street2": "Hill Station"
}

id must be unique, ideally beneficiary ID in your database. This is what we will be returned if that beneficiary matches a query.

Searching for Beneficiary

Query for beneficiaries very likely referring to the same person as the query data

POST /index/search
{
  "attributes": {
    "first_name": "Salton",
    "last_name": "Massally",
    "phone": ["202-555-1234", "317-555-1234"],
    "email": "saltonmassally@gmail.com"
  }
}

RESPONSE: HTTP 200 

[
    {
        "beneficiary": "{id}",
        "reasons": [
            "email ->  Input: saltonmassally@gmail.com | Match: saltonmassally@gmail.com | Type: match_fuzzy ",
            "first_name.metaphone ->  Input: Salton | Match: Salton | Type: match ",
            "first_name.nysiis ->  Input: Salton | Match: Salton | Type: match ",
            "first_name.soundex ->  Input: Salton | Match: Salton | Type: match ",
            "first_name ->  Input: Salton | Match: Salton | Type: match_fuzzy ",
            "last_name.metaphone ->  Input: Massally | Match: Massally | Type: match ",
            "last_name.nysiis ->  Input: Massally | Match: Massally | Type: match ",
            "last_name.soundex ->  Input: Massally | Match: Massally | Type: match ",
            "last_name ->  Input: Massally | Match: Massally | Type: match_fuzzy "
        ]
    }
]

Returns a list of map of beneficiaries with the id being that used to index the record and a list of reasons why the record matched the query. Querying with as much data as possible increases the likelihood of finding duplicates if any exists.

De-indexing Beneficiary

Removing a beneficiary from the search service, e.g., beneficiary removed from your program

DELETE /index/{id}

Where id is what was used to index beneficiary

Allowed fields

Table shows a list of beneficiary attributes allowed for indexing and querying:

Attribute	Type	Note
id	String	Required; beneficiary's unique ID in your database
identity	String	Identity records in the form type-number. e.g. passport-1234567
first_name	String
middle_name	String
last_name	String
phone	String	Please remove the country code e.g. 0778763839 and not +232778763839
email	String
street	String
street2	String
city	String	City or town depending on context
state	String	State or District depending on context
postal_code	String
dob	String	please use form 1990/07/23
bank	String	Name of bank or any organization that payment is sent to
bank_account	String	Bank account number
emergency_contact_name	String	Name of person listed as emergency contact
emergency_contact_phone	String	Phone of person listed as emergency contact; without country code

You do not need to provide all this data when indexing or querying for records; however the more attributes you can supply, the better the precision of your result; i.e. try to supply all these fields for both index and query operations!

Kibana UI

The docker-compose ships with Kibana, providing the power to visualize your beneficiary data in custom ways and run queries against the elasticsearch backend.

Navigate to http://<you-ip-address>:5601

We appreciate the contributions of these visualizations.

How it works

The default implementation works via the process of entity resolution. It compares attributes of the query provided against its database of indexed beneficiaries to find a match very likely referencing the same person.

Below is the set of matching rules employed:

identity (ID Type Number)
first_name, last_name, phone
first_name, phone
first_name, last_name, phone
last_name, phone
first_name, last_name, email
first_name, phone
last_name, phone
first_name, last_name, middle_name, dob
first_name, last_name, street, street2, city, state
first_name, last_name, street, city, state
first_name, last_name, street, state
first_name, last_name, middle_name, street2, state
first_name, last_name, street, postal_code
first_name, last_name, dob, city, state
first_name, last_name, emergency_contact_phone
first_name, last_name, middle_name, emergency_contact_name
first_name, last_name, bank_account_bank, bank_account_number
first_name, bank_account_bank, bank_account_number
last_name, bank_account_bank, bank_account_number

NOTE:
To compensate for typos and near infinite representation of the same information, we employ a selection of fuzzy matching and phonetics algorithms when indexing and querying

Development

Dockerizing

Creating a docker image

./gradlew clean
./gradlew build
docker build --build-arg JAR_FILE=build/libs/*.jar -t openg2p/searchservice .

Adding A New Search Backend

Adopters can add and replace existing backends. Consider the example of adding a facial recognition backend that, added to the existing elasticsearch implementation, will compare the facial portrait of a person pending enrollment against beneficiaries already enrolled in a program. Assuming beneficiaries' photos are stored in an attribute called photo, implementation will lock like:

Provide a new implementation of org.openg2p.searchservice.services.backends.Backend and annotate with springs' @Service annotation
Add photo to searchservice.allowed_query_attributes while redeclaring the defaults in org.openg2p.searchservice.config.Configurations.allowedQueryAttributes

Your newly implemented backend will be passed data to index when the index API is called; a good practice is to check that the photo attribute exists before running your logic. That same method is called for both create and update document operations.

To remove an existing backend, simply remove springs @Service annotation for that backend's implementation.

Roadmap

Automated Testing
Queue deduplication tasks

Reference Documentation

For further reference, please consider the following sections:

Additional Links

These additional references should also help you:

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
doc/images		doc/images
gradle/wrapper		gradle/wrapper
scripts		scripts
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
build.gradle.kts		build.gradle.kts
docker-compose.yml		docker-compose.yml
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenG2P Search Service

Background

Getting Started

Using Docker Compose

Manually

User Guide

API

Indexing Beneficiary

Searching for Beneficiary

De-indexing Beneficiary

Allowed fields

Kibana UI

How it works

Development

Dockerizing

Adding A New Search Backend

Roadmap

Reference Documentation

Additional Links

About

Releases

Packages

Contributors 2

Languages

OpenG2P/openg2p-deduplicationservice

Folders and files

Latest commit

History

Repository files navigation

OpenG2P Search Service

Background

Getting Started

Using Docker Compose

Manually

User Guide

API

Indexing Beneficiary

Searching for Beneficiary

De-indexing Beneficiary

Allowed fields

Kibana UI

How it works

Development

Dockerizing

Adding A New Search Backend

Roadmap

Reference Documentation

Additional Links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages