# G2Engine

In [None]:
import os
import sys
import json

## System path

Update system path.

In [None]:
sys.path.append('/opt/senzing/g2/python')

# G2Engine
The G2Engine API...

In [None]:
from G2Engine import G2Engine

## Initialize variables

Create variables used for G2Engine.

In [None]:
module_name = 'pyG2EngineForAddRecord'
senzing_directory = os.environ.get("SENZING_DIR", "/opt/senzing")
senzing_python_directory = "{0}/g2/python".format(senzing_directory)
g2module_ini_pathname = "{0}/G2Module.ini".format(senzing_python_directory)
verbose_logging = True
from G2Config import G2Config
from G2ConfigMgr import G2ConfigMgr

## Initialization

To start using Senzing G2Engine, create and initialize an instance.
This should be done once per process.
The `init()` method accepts the following parameters:

- **module_name:** A short name given to this instance of the G2 engine (i.e. your G2Module object)
- **g2module_ini_pathname:** A fully qualified path to the G2 engine INI file (often /opt/senzing/g2/python/G2Module.ini)
- **verbose_logging:** A boolean which enables diagnostic logging - this will print a massive amount of information to stdout (default = False)
- **config_id:** (optional) The identifier value for the engine configuration can be returned here.

Calling this function will return "0" upon success.

In [None]:
iniParams = "{\"PIPELINE\": {\"SUPPORTPATH\": \"/opt/senzing/g2/data\"},\"SQL\": {\"CONNECTION\": \"sqlite3://na:na@/opt/senzing/g2/sqldb/G2C.db\",\"RESOURCEPATH\": \"/opt/senzing/g2/python/g2config.json\"}}"
g2ConfigMgr=G2ConfigMgr()
g2ConfigMgr.initV2(module_name, iniParams, verbose_logging)
g2config=G2Config()
config_bytearray=bytearray("", 'utf-8')
g2config.initV2(module_name, iniParams, verbose_logging)
config=g2config.create()
g2config.save(config, config_bytearray)
configJsonToUse = config_bytearray.decode()
config_comment = "Configuration added from G2SetupConfig."
new_config_id = bytearray()
return_code = 0
return_code = g2ConfigMgr.addConfig(configJsonToUse, config_comment, new_config_id)
g2ConfigMgr.setDefaultConfigID(new_config_id)
g2_engine = G2Engine()
g2_engine.initV2(module_name, iniParams, verbose_logging)

## Prime Engine

The `primeEngine()` method may optionally be called to pre-initialize some of the heavier weight internal resources of the G2 engine.

In [None]:
response = g2_engine.primeEngine()

In [None]:
configID = bytearray("", 'utf-8')
ret = g2_engine.getActiveConfigID(configID)

print("Active configID: " + str(configID.decode()))

## addRecord()

Once the Senzing engine is initialized, use addRecord() to load a record into the Senzing repository -- addRecord() can be called as many times as desired and from multiple threads at the same time. The addRecord() function returns "0" upon success, and accepts four parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id:** The record ID, used to identify distinct records
- **data_string:** A JSON document with the attribute data for the record
- **load_id:** The observation load ID for the record; value can be null and will default to data_source


In [None]:
datasource_code = "TEST"
record_id = "1"
load_id = None
data = {
	"NAMES": [{
		"NAME_TYPE": "PRIMARY",
		"NAME_LAST": "Smith",
		"NAME_FIRST": "John",
		"NAME_MIDDLE": "M"
	}],
	"PASSPORT_NUMBER": "PP11111",
	"PASSPORT_COUNTRY": "US",
	"DRIVERS_LICENSE_NUMBER": "DL11111",
	"SSN_NUMBER": "111-11-1111"
}
data_string = json.dumps(data)

result = g2_engine.addRecord(datasource_code, record_id, data_string, load_id)
print(result)

responseBuffer = bytearray("", 'utf-8')
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
ret = g2_engine.addRecordWithInfo(datasource_code, record_id, data_string, responseBuffer, load_id, flags)
print("Modified Entities: "+str(responseBuffer.decode()))

# Retrieve a Record
Use getRecordV2() to retrieve a single record from the data repository; the record is assigned in JSON form to a user-designated buffer, and the function itself returns "0" upon success. Once the Senzing engine is initialized, getRecordV2() can be called as many times as desired and from multiple threads at the same time. The getRecordV2() function accepts the following parameters as input:

- **datasource_code:** The name of the data source the record is associated with. This value is configurable to the system
- **record_id:** The record ID, used to identify the record for retrieval
- **flags:** Control flags for specifying what data about the record to retrieve
- **response_string:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize(C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
response_string = bytearray("",'utf-8')
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
result = g2_engine.getRecordV2(datasource_code, record_id, flags, response_string)

response_dictionary = json.loads(response_string)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

The function `getRecordV2()` is an improved version of `getRecord()` that also allows you to use control flags. The `getRecord()` function has been deprecated.

## Entity Search
##### By Record

Entity searching is a key component for interactive use of Entity Resolution intelligence. The core Senzing engine provides real-time search capabilities that are easily accessed via the Senzing API. Senzing offers methods for entity searching, all of which can be called as many times as desired and from multiple threads at the same time (and all of which return "0" upon success) .

Use `getEntityByRecordIDV2()` to retrieve entity data based on the ID of a resolved identity. This function accepts the following parameters as input:

- **record_id:** The numeric ID of a resolved entity
- **flags:** Control flags for specifying what data about the entity to retrieve
- **response_string:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
response_string = bytearray()
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
result = g2_engine.getEntityByRecordIDV2(datasource_code, record_id, flags, response_string)

response_dictionary = json.loads(response_string)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

# Entity Search
##### By Entity

Entity searching is a key component for interactive use of Entity Resolution intelligence. The core Senzing engine provides real-time search capabilities that are easily accessed via the Senzing API. Senzing offers methods for entity searching, all of which can be called as many times as desired and from multiple threads at the same time (and all of which return "0" upon success) .

Use `getEntityByEntityIDV2()` to retrieve entity data based on the ID of a resolved identity. This function accepts the following parameters as input:

- **entity_id:** The numeric ID of a resolved entity
- **flags:** Control flags for specifying what data about the entity to retrieve
- **response_string:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
#Because Entity Ids can change, this assumes you've ran getEntityByRecordID() and so you can pull the Entity Id from its results
entity_id = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
response_string = bytearray()
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
result = g2_engine.getEntityByEntityIDV2(entity_id, flags, response_string)

response_dictionary = json.loads(response_string)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

The functions `getEntityByEntityIDV2()` and `getEntityByRecordIDV2()` are improved versions of `getEntityByEntityID()` and `getEntityByRecordID()` that also allow you to use control flags. The `getEntityByEntityID()` and `getEntityByRecordID()` functions have been deprecated.

## Search By Attributes

Entity searching is a key component for interactive use of Entity Resolution intelligence. The core Senzing engine provides real-time search capabilities that are easily accessed via the Senzing API. Senzing offers a method for entity searching by attributes, which can be called as many times as desired and from multiple threads at the same time (and all of which return "0" upon success) .

Use `searchByAttributes()` to retrieve entity data based on a user-specified set of entity attributes. This function accepts the following parameters as input:

- **data_string:** A JSON document with the attribute data to search for
- **response:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
response_string = bytearray()
result = g2_engine.searchByAttributes(data_string, response_string)

response_dictionary = json.loads(response_string)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## Search By Attributes V2

This function is similar but preferable to the searchByAttributes() function. This function has improved functionality and a better standardized output structure.

Use `searchByAttributesV2()` to retrieve entity data based on a user-specified set of entity attributes. This function accepts the following parameters as input:

- **data_string:** A JSON document with the attribute data to search for
- **flags:** Operational flags
- **response:** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes
- **resizeFunc (C only):** A function pointer that can be used to resize the memory buffer specified in the response argument. This function will be called to allocate more memory if the response buffer is not large enough. This argument may be NULL. If so, the function will return an error if the result is larger than the buffer

In [None]:
dataSourceCode = 'TEST'
recordID = 'entity_record_id'
response = bytearray()
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
ret = g2_engine.searchByAttributesV2(data_string,flags,response)

response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

# Replace the record
Use the `replaceRecord()` function to update or replace a record in the data repository (if record doesn't exist, a new record is added to the data repository. Like the above functions, `replaceRecord()` returns "0" upon success, and it can be called as many times as desired and from multiple threads at the same time. The `replaceRecord()` function accepts four parameters as input:

- **dataSourceCode:** The name of the data source the record is associated with. This value is configurable to the system
- **recordID:** The record ID, used to identify distinct records
- **jsonData:** A JSON document with the attribute data for the record
- **loadID:** The observation load ID for the record; value can be null and will default to dataSourceCode

In [None]:
datasource_code = "TEST"
record_id = "1"
load_id = None
data = {
	"NAMES": [{
		"NAME_TYPE": "PRIMARY",
		"NAME_LAST": "Miller",
		"NAME_FIRST": "John",
		"NAME_MIDDLE": "M"
	}],
	"PASSPORT_NUMBER": "PP11111",
	"PASSPORT_COUNTRY": "US",
	"DRIVERS_LICENSE_NUMBER": "DL11111",
	"SSN_NUMBER": "111-11-1111"
}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord(datasource_code,record_id,data_string,load_id)
print(result)

## Export JSON Entity Report

There are three steps to exporting resolved entity data from the G2Engine object in JSON format. First, use the `exportJSONEntityReport()` method to generate a long integer, referred to here as an 'exportHandle'. The `exportJSONEntityReport()` method accepts one parameter as input:

- **flags**: An integer specifying which entity details should be included in the export. See the "Entity Export Flags" section for further details.

Second, use the fetchNext() method to read the exportHandle and export a row of JSON output containing the entity data for a single entity. Note that successive calls of fetchNext() will export successive rows of entity data. The fetchNext() method accepts the following parameters as input:

- **exportHandle:** A long integer from which resolved entity data may be read and exported
- **response:** A memory buffer for returning the response document; if an error occurred, an error response is stored here.

In [None]:
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
exportHandle = g2_engine.exportJSONEntityReport(flags)
while True:
  response_string = bytearray([])
  g2_engine.fetchNext(exportHandle,response_string)
  if not response_string:
    break
  response_dictionary = json.loads(response_string)
  response = json.dumps(response_dictionary, sort_keys=True, indent=4)
  print("Result: {0}\n{1}".format(result, response))

## Export CSV Entity Report

There are three steps to exporting resolved entity data from the G2Engine object in CSV format. First, use the `exportCSVEntityReportV2()` method to generate a long integer, referred to here as an 'exportHandle'.

The `exportCSVEntityReportV2()` method accepts these parameter as input:

- **csvColumnList:** A comma-separated list of column names for the CSV export. (These are listed a little further down.)
- **flags:** An integer specifying which entity details should be included in the export. See the "Entity Export Flags" section for further details.

Second, use the `fetchNext()` method to read the exportHandle and export a row of CSV output containing the entity data for a single entity. Note that the first call of `fetchNext()` will yield a header row, and that successive calls of `fetchNext()` will export successive rows of entity data. The `fetchNext()` method accepts the following parameters as input:

- **exportHandle:** A long integer from which resolved entity data may be read and exported
- **response (C only):** A memory buffer for returning the response document; if an error occurred, an error response is stored here
- **bufSize (C only):** The max number of bytes that can be stored in response. The response buffer MUST be able to hold at least this many bytes

In [None]:
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
exportHandle = g2_engine.exportCSVEntityReport(flags)
entity_id = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]
while True:
  response_string = bytearray([])
  g2_engine.fetchNext(exportHandle,response_string)
  if not response_string:
    break
  print("Result: {0}\n{1}".format(result, response))

## Finding Paths
The `FindPathByEntityID()` and `FindPathByRecordID()` functions can be used to find single relationship paths between two entities. Paths are found using known relationships with other entities.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen.

These functions have the following parameters:

- **entityID1:** The entity ID for the starting entity of the search path
- **entityID2:** The entity ID for the ending entity of the search path
- **dataSourceCode1:** The data source for the starting entity of the search path
- **recordID1:** The record ID for the starting entity of the search path
- **dataSourceCode2:** The data source for the ending entity of the search path
- **recordID2:** The record ID for the ending entity of the search path
- **maxDegree:** The number of relationship degrees to search

First you will need to create some records so that you have some that you can compare. Can you see what is the same between this record and the previous one?

In [None]:
data = {"NAMES": [{"NAME_TYPE": "PRIMARY","NAME_LAST": "Miller","NAME_FIRST": "Max", "NAME_MIDDLE": "W"}],"SSN_NUMBER": "111-11-1111"}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST","2",data_string,None)
print(result)
data = {"NAMES": [{"NAME_TYPE": "PRIMARY","NAME_LAST": "Miller","NAME_FIRST": "Mildred"}],"SSN_NUMBER": "111-11-1111" }
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST","3",data_string,None)
print(result)

response_string=bytearray()
result = g2_engine.getEntityByRecordID("TEST", "2", response_string)
response_dictionary = json.loads(response_string)
entityID1 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

response_string=bytearray()
result = g2_engine.getEntityByRecordID("TEST", "3", response_string)
response_dictionary = json.loads(response_string)
entityID2 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

## `FindPathByEntityID()`

In [None]:
  #define search variables

maxDegree = 3

    #find the path by entity ID
response = bytearray([])
g2_engine.findPathByEntityID(entityID1,entityID2,maxDegree,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

# FindPathByEntityIDV2()
The function `FindPathByEntityIDV2()` is an improved version of `FindPathByEntityID()` that also allow you to use control flags.

In [None]:
  #define search variables

maxDegree = 3

    #find the path by entity ID
response = bytearray([])
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
g2_engine.findPathByEntityIDV2(entityID1,entityID2,maxDegree,flags,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## `FindPathByRecordID()`

In [None]:
    #define search variables
dataSourceCode1 = "TEST"
recordID1 = "2"
dataSourceCode2 = "TEST"
recordID2 = "3"
maxDegree = 3

    #find the path by record ID
response=bytearray([])
g2_engine.findPathByRecordID(dataSourceCode1,recordID1,dataSourceCode2,recordID2,maxDegree,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

# FindPathByRecordIDV2()
The function `FindPathByRecordIDV2()` is an improved version of `FindPathByRecordID()` that also allow you to use control flags.

In [None]:
    #define search variables
dataSourceCode1 = "TEST"
recordID1 = "2"
dataSourceCode2 = "TEST"
recordID2 = "3"
maxDegree = 3

    #find the path by record ID
response=bytearray([])
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
g2_engine.findPathByRecordIDV2(dataSourceCode1,recordID1,dataSourceCode2,recordID2,maxDegree,flags,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## Finding Paths with Exclusions
The `FindPathExcludingByEntityID()` and `FindPathExcludingByRecordID()` functions can be used to find single relationship paths between two entities. Paths are found using known relationships with other entities. In addition, it will find paths that exclude certain entities from being on the path.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen. Additionally, entities to be excluded can also be specified by either Entity ID or by Record ID.

When excluding entities, the user may choose to either (a) strictly exclude the entities, or (b) prefer to exclude the entities, but still include them if no other path is found. By default, entities will be strictly excluded. A "preferred exclude" may be done by specifying the G2_FIND_PATH_PREFER_EXCLUDE control flag.

These functions have the following parameters:

- **entityID1:** The entity ID for the starting entity of the search path
- **entityID2:** The entity ID for the ending entity of the search path
- **dataSourceCode1:** The data source for the starting entity of the search path
- **recordID1:** The record ID for the starting entity of the search path
- **dataSourceCode2:** The data source for the ending entity of the search path
- **recordID2:** The record ID for the ending entity of the search path
- **maxDegree:** The number of relationship degrees to search
- **excludedEntities:** Entities that should be avoided on the path (JSON document)
- **flags:** Operational flags

## `FindPathExcludingByEntityID()`

In [None]:
 #define search variables

maxDegree = 4
excludedEntities = {"ENTITIES":[{"ENTITY_ID":entity_id}]}
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
excluded_string = json.dumps(excludedEntities)
    #find the path by entity ID
response=bytearray([])
g2_engine.findPathExcludingByEntityID(entityID1,entityID2,maxDegree,excluded_string,flags,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## `FindPathExcludingByRecordID()`

In [None]:
    #define search variables
dataSourceCode1 = "TEST"
recordID1 = "2"
dataSourceCode2 = "TEST"
recordID2 = "3"
excludedRecords = "{\"RECORDS\":[{\"RECORD_ID\":\"1\",\"DATA_SOURCE\":\"TEST\"}]}"

    #find the path by record ID
response=bytearray([])
g2_engine.findPathExcludingByRecordID(dataSourceCode1,recordID1,dataSourceCode2,
                                      recordID2,maxDegree,excludedRecords,flags,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## Finding Paths with Required Sources
The `FindPathIncludingSourceByEntityID()` and `FindPathIncludingSourceByRecordID()` functions can be used to find single relationship paths between two entities. In addition, one of the enties along the path must include a specified data source.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen. The required data source or sources are specified by a json document list.

Specific entities may also be excluded, using the same methodology as the `FindPathExcludingByEntityID()` and `FindPathExcludingByRecordID()` functions use.

These functions have the following parameters:

- **entityID1:** The entity ID for the starting entity of the search path
- **entityID2:** The entity ID for the ending entity of the search path
- **dataSourceCode1:** The data source for the starting entity of the search path
- **recordID1:** The record ID for the starting entity of the search path
- **dataSourceCode2:** The data source for the ending entity of the search path
- **recordID2:** The record ID for the ending entity of the search path
- **maxDegree:** The number of relationship degrees to search
- **excludedEntities:** Entities that should be avoided on the path (JSON document)
- **requiredDsrcs:** Entities that should be avoided on the path (JSON document)
- **flags:** Operational flags

## `FindPathIncludingSourceByEntityID()`

In [None]:
#define search variables
maxDegree = 4
excludedEntities = {"ENTITIES":[{"ENTITY_ID":entity_id}]}
requiredDsrcs = "{\"DATA_SOURCES\":[\"TEST\"]}"
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
excluded_string = json.dumps(excludedEntities)

    #find the path by entity ID
response=bytearray([])
g2_engine.findPathIncludingSourceByEntityID(entityID1,entityID2,maxDegree,
                                                        excluded_string,requiredDsrcs,flags,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## `FindPathIncludingSourceByRecordID()`

In [None]:
    #define search variables
dataSourceCode1 = "TEST"
recordID1 = "2"
dataSourceCode2 = "TEST"
recordID2 = "3"
excludedRecords = "{\"RECORDS\":[{\"RECORD_ID\":\"1\",\"DATA_SOURCE\":\"TEST\"}]}"

    #find the path by record ID
response=bytearray([])
g2_engine.findPathIncludingSourceByRecordID(dataSourceCode1,recordID1,dataSourceCode2,recordID2,
                                            maxDegree,excludedRecords,requiredDsrcs,flags,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

# Finding Networks

The `FindNetworkByEntityID()` and `FindNetworkByRecordID()` functions can be used to find all entities surrounding a requested set of entities. This includes the requested entities, paths between them, and relations to other nearby entities.

Entities can be searched for by either Entity ID or by Record ID, depending on which function is chosen.

These functions have the following parameters:

- **entityList:** A list of entities, specified by Entity ID (JSON document)
- **recordList:** A list of entities, specified by Record ID (JSON document)
- **maxDegree:** The maximum number of degrees in paths between search entities
- **buildOutDegree:** The number of degrees of relationships to show around each search entity
- **maxEntities:** The maximum number of entities to return in the discovered network
They also have various arguments used to return response documents

The functions return a JSON document that identifies the path between the each set of search entities (if the path exists), and the information on the entities in question (search entities, path entities, and build-out entities.

In [None]:
 #define search variables
entityList = {"ENTITIES":[{"ENTITY_ID":entity_id},{"ENTITY_ID":entityID1},{"ENTITY_ID":entityID2}]}
entity_string = json.dumps(entityList)
maxDegree = 2
buildOutDegree = 1
maxEntities = 12
response = bytearray()

    #find the network by entity ID
g2_engine.findNetworkByEntityID(entity_string,maxDegree,buildOutDegree,maxEntities,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print(response)

# findNetworkByRecordID()

In [None]:
#define search variables
recordList = "{\"RECORDS\":[{\"RECORD_ID\":\"1\",\"DATA_SOURCE\":\"TEST\"},{\"RECORD_ID\":\"2\",\"DATA_SOURCE\":\"TEST\"},{\"RECORD_ID\":\"3\",\"DATA_SOURCE\":\"TEST\"}]}"

    #find the network by record ID
response=bytearray()
g2_engine.findNetworkByRecordID(recordList,maxDegree,buildOutDegree,maxEntities,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print(response)

# findNetworkByEntityIDV2()
The function `FindNetworkByEntityIDV2()` is an improved version of `FindNetworkByEntityID()` that also allow you to use control flags.

In [None]:
 #define search variables
entityList = {"ENTITIES":[{"ENTITY_ID":entity_id},{"ENTITY_ID":entityID1},{"ENTITY_ID":entityID2}]}
entity_string = json.dumps(entityList)
maxDegree = 2
buildOutDegree = 1
maxEntities = 12
response = bytearray()
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS

    #find the network by entity ID
g2_engine.findNetworkByEntityIDV2(entity_string,maxDegree,buildOutDegree,maxEntities,flags,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print(response)

# findNetworkByRecordIDV2()
The function `FindNetworkByRecordIDV2()` is an improved version of `FindNetworkByRecordID()` that also allow you to use control flags.

In [None]:
#define search variables
recordList = "{\"RECORDS\":[{\"RECORD_ID\":\"1\",\"DATA_SOURCE\":\"TEST\"},{\"RECORD_ID\":\"2\",\"DATA_SOURCE\":\"TEST\"},{\"RECORD_ID\":\"3\",\"DATA_SOURCE\":\"TEST\"}]}"

    #find the network by record ID
response=bytearray()
g2_engine.findNetworkByRecordID(recordList,maxDegree,buildOutDegree,maxEntities,response);

    #print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print(response)

# Why records belong to an entity

The `WhyEntityByEntityID()` and `WhyEntityByRecordID()` functions can be used to determine why records belong to their resolved entities. These functions will compare the record data within an entity against the rest of the entity data, and show why they are connected. This is calculated based on the features that record data represents.

Records can be chosen by either Record ID or by Entity ID, depending on which function is chosen. If a single record ID is used, then comparison results for that single record will be generated, as part of its entity. If an Entity ID is used, then comparison results will be generated for every record within that entity.

These functions have the following parameters:

- **entityID:** The entity ID for the entity to be analyzed
- **dataSourceCode:** The data source for the record to be analyzed
- **recordID:** The record ID for the record to be analyzed
- **flags:** Control flags for outputting entities
They also have various arguments used to return response documents.

The functions return a JSON document that gives the results of the record analysis. The document contains a section called "WHY_RESULTS", which shows how specific records relate to the rest of the entity. It has a "WHY_KEY", which is similar to a match key, in defining the relevant connected data. It shows candidate keys for features that initially cause the records to be analyzed for a relationship, plus a series of feature scores that show how similar the feature data was.

The response document also contains a separate ENTITIES section, with the full information about the resolved entity. (Note: When working with this entity data, Senzing recommends using the flags G2_ENTITY_SHOW_FEATURES_EXPRESSED and G2_ENTITY_SHOW_FEATURES_STATS. This will provide detailed feature data that is not included by default, but is useful for understanding the WHY_RESULTS data.)

The functions `WhyEntityByEntityIDV2()` and `WhyEntityByRecordV2()` are enhanced versions of `WhyEntityByEntityID()` and `WhyEntityByRecordID()` that also allow you to use control flags. The `WhyEntityByEntityID()` and `WhyEntityByRecordID()` functions work in the same way, but use the default flag value G2_WHY_ENTITY_DEFAULT_FLAGS.

In [None]:
#define input variables
dataSourceCode = "TEST"
recordID = "1"
response=bytearray()
#find the why-information
ret = g2_engine.whyEntityByRecordID(dataSourceCode,recordID,response);

#print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print(response)

entityID = entity_id
response=bytearray()
#find the why-information
ret = g2_engine.whyEntityByEntityID(entityID, response);

#print the results
response_dictionary = json.loads(response)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print(response)

## Redo Processing
Redo records are automatically created by Senzing when certain conditions occur where it believes more processing may be needed.  Some examples:
* A value becomes generic and previous decisions may need to be revisited
* Clean up after some record deletes
* Detected related entities were being changed at the same time
* A table inconsistency exists, potentially after a non-graceful shutdown
First we will need to have a total of 6 data sources so let's add 4 more

In [None]:
data = {"NAMES": [{"NAME_TYPE": "PRIMARY","NAME_LAST": "Owens","NAME_FIRST": "Lily"}],"SSN_NUMBER": "111-11-1111"}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST","4",data_string,None)
print(result)
data = {"NAMES": [{"NAME_TYPE": "PRIMARY","NAME_LAST": "Bauler","NAME_FIRST": "August", "NAME_MIDDLE": "E"}],"SSN_NUMBER": "111-11-1111"}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST","5",data_string,None)
print(result)
data = {"NAMES": [{"NAME_TYPE": "PRIMARY","NAME_LAST": "Barcy","NAME_FIRST": "Brian", "NAME_MIDDLE": "H"}],"SSN_NUMBER": "111-11-1111"}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST","6",data_string,None)
print(result)
data = {"NAMES": [{"NAME_TYPE": "PRIMARY","NAME_LAST": "Miller","NAME_FIRST": "Jack", "NAME_MIDDLE": "H"}],"SSN_NUMBER": "111-11-1111"}
data_string = json.dumps(data)
ret = g2_engine.replaceRecord("TEST","7",data_string,None)
print(result)

response_string=bytearray()
result = g2_engine.getEntityByRecordID("TEST", "4", response_string)
response_dictionary = json.loads(response_string)
entityID4 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

response_string=bytearray()
result = g2_engine.getEntityByRecordID("TEST", "5", response_string)
response_dictionary = json.loads(response_string)
entityID5 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

response_string=bytearray()
result = g2_engine.getEntityByRecordID("TEST", "6", response_string)
response_dictionary = json.loads(response_string)
entityID6 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

response_string=bytearray()
result = g2_engine.getEntityByRecordID("TEST", "7", response_string)
response_dictionary = json.loads(response_string)
entityID7 = response_dictionary["RESOLVED_ENTITY"]["ENTITY_ID"]

## Counting the number of redos
This returns the number of redos within the processed records that are awaiting processing.

In [None]:
response = g2_engine.countRedoRecords()
print(response)

## Geting a redo record
Gets a redo record so that it can be processed

In [None]:
response_string=bytearray()
response = g2_engine.getRedoRecord(response_string)
print(response)
if (response == 0 and response_string):
  g2_engine.process(response_string.decode())

## Processing redo records
This processes the next redo record and returns it (If `processRedoRecord()` "response" returns 0 and "response_string" is blank then there are no more redo records to process and if you do `count.RedoRecords()` again it will return 0)
Has potential to create more redo records in certian situations

In [None]:
response_string=bytearray()
response = g2_engine.processRedoRecord(response_string)
print(response)
print(response_string.decode())

## Deleting Records
use `deleteRecord()` to remove a record from the data repository (returns "0" upon success) ; `deleteRecord()` can be called as many times as desired and from multiple threads at the same time. The `deleteRecord()` function accepts three parameters as input:

- **dataSourceCode:** The name of the data source the record is associated with. This value is configurable to the system
- **recordID:** The record ID, used to identify distinct records
- **loadID:** The observation load ID for the record; value can be null and will default to dataSourceCode

In [None]:
datasource_code = 'TEST'
record_ID = '1'
load_ID = None
ret = g2_engine.deleteRecord(datasource_code, record_ID, load_ID)
print(str(ret))

## deleteRecordWithInfo()
`deleteRecordWithInfo()` behaves the same as `deleteRecord()` but also returns a json document containing the IDs of the affected entities. It accepts the following parameters:

In [None]:
datasource_code = 'TEST'
record_ID = '2'
load_ID = None

responseBuffer = bytearray("", 'utf-8')
ret2 = g2_engine.deleteRecordWithInfo(datasource_code, record_id, responseBuffer, load_id, flags)
print("Modified Entities: "+str(responseBuffer.decode()))

Attempt to get the record again. It should error and give an output similar to "Unknown record".

In [None]:
response_string = bytearray("",'utf-8')
flags = G2Engine.G2_EXPORT_DEFAULT_FLAGS
result = g2_engine.getRecordV2(datasource_code, record_id, flags, response_string)

response_dictionary = json.loads(response_string)
response = json.dumps(response_dictionary, sort_keys=True, indent=4)
print("Result: {0}\n{1}".format(result, response))

## Purge Repository
To purge the G2 repository, use the aptly named `purgeRepository()` method. This will remove every record in your current repository.

In [None]:
g2_engine.purgeRepository()