If you are beginning your journey with Senzing, please start with Senzing Quick Start guides.
You are in the Senzing Garage where projects are "tinkered" on. Although this GitHub repository may help you understand an approach to using Senzing, it's not considered to be "production ready" and is not considered to be part of the Senzing product. Heck, it may not even be appropriate for your application of Senzing!
The G2Explorer.py utility is an interactive command line utility that allows a user to:
- search - Find an entity by name, date of birth, address, etc.
- get - Display an entity's records in either summary or detail format.
- compare – Compare two or more entities side by side.
- why – Display the internal engine keys and stats about an entity to help determine why it did or did not match another.
- how – Display how all the records in a single entity came together, showing why at each step
- export – Export the json records that make up an entity for debugging or reingesting after changes are made.
It can also be used to view the reports generated by the following companion projects:
Usage:
python3 G2Explorer.py --help
usage: G2Explorer.py [-h] [-c INI_FILE_NAME] [-s SNAPSHOT_FILE_NAME]
[-a AUDIT_FILE_NAME] [-D DEBUG_OUTPUT]
optional arguments:
-h, --help show this help message and exit
-c INI_FILE_NAME, --config_file_name INI_FILE_NAME
name of the g2.ini file, defaults to
/etc/opt/senzing/G2Module.ini
-s SNAPSHOT_FILE_NAME, --snapshot_json_file SNAPSHOT_FILE_NAME
the name of a json statistics file computed by
G2Snapshot.py
-a AUDIT_FILE_NAME, --audit_json_file AUDIT_FILE_NAME
the name of a json statistics file computed by
G2Audit.py
-D DEBUG_OUTPUT, --debug_output DEBUG_OUTPUT
print raw api json to screen or <filename.txt>
- Python 3.6 or higher
- Senzing API version 3.0 or higher
- python pretty table module (pip3 install ptable)
-
Place the following file in a directory of your choice:
-
Set PYTHONPATH environment variable to python directory where you installed Senzing.
- Example: export PYTHONPATH=/opt/senzing/g2/python
-
The senzing environment must be set to your project by sourceing the setupEnv script created for it.
Its a good idea to place these settings in your .bashrc file to make sure the enviroment is always setup and ready to go. These will already be set if you are using a Senzing docker image such as the sshd or console.
python3 G2Explorer.py
Optional parameters:
- The -c config_file parameter is only required if your project's G2Module.ini file can't be found in the usual location.
- The -s snapshot_file_name parameter is for convenience if you just took a snapshot and want to immediately view it.
- The -a audit_file_name parameter is for convenience if you just ran an audit and want to immediately view it.
- The -d debug_output parameter turns on debug to either screen or to a file name. This shows you what the API is returning
as you search and get records.
-- use -d SCREEN to print the debug output to the screen -- or -d /path/to/filename.txt to print the debug output to whatever file you choose.
Further documentation for how to use the G2Explorer and its companion projects is located at:
Here are some basic instructions to get you started ...
python3 G2Explorer.py
____| __ \ \
__| | | _ \ Senzing G2
| | | ___ \ Exploratory Data Analysis
_____| ____/ _/ _\
Type help or ? to list commands.
(g2)
Next type "help" to see the available commands ...
(g2) help
Documented commands (type help <topic>):
========================================
auditSummary dataSourceSummary get quickLook search why
compare entitySizeBreakdown help score setColorScheme
crossSourceSummary export load scroll versions
Type "help" on a specific command to find out how to use it ...
(poc) help search
Searches for entities by their attributes.
Syntax:
search Joe Smith (without a json structure performs a search on name alone)
search {"name_full": "Joe Smith"}
search {"name_org": "ABC Company"}
search {"name_last": "Smith", "name_first": "Joe", "date_of_birth": "1992-12-10"}
search {"name_org": "ABC Company", "addr_full": "111 First St, Anytown, USA 11111"}
Notes:
Searching by name alone may not locate a specific entity.
Try adding a date of birth, address, or phone number if not found by name alone.
The next thing to do is to start exploring your data. Try the following:
adhoc flow:
- search joe smith (where "joe smith" is the name of an entity you want to lookup)
- get 123 (where 123 is one of the entity_ids returned by the search)
- why 123 (if entity 123 consists of multiple records and you want to know why they resolved)
- compare 123,145 (where 123 and 145 are two entity_ids you want to compare)
- why 123,145 (where 123 and 145 are two entity_ids you want to see why they did not resolve)
Notes:
- Type "help why" to understand what the colors and symbols mean.
- When the output of a command is too large to fit on the screen, it is displayed with the linux command less and all of its scrolling functionality is enabled. Make sure to type q to quit when done with the oputput and ready to try another command.
browsing statistics and examples:
- load /project/snapshots/snapshot1.json (where snapshot1.json is a file created by G2Snapshot.py)
- dataSourceSummary (when a list of the data sources and their statistics are desired)
- dataSourceSummary customer duplicates (to view a list of duplicate records in the customer data source)
- Select (N)ext or (P)revious to move through the examples.
- Select (W)hy on any example you want more explanation for.
- Select (Q)uit to get out of the report.
Thats it, happy exploring!