This proposal is to modify the Senzing Software Development Kit (SDK) for Python to become more in-line with industry standards.
The current Senzing SDK for Python is shipped in the RPM/DEB file as a
directory (g2/sdk/python
) and is intermingled with client programs in
another directory (g2/python
).
To accommodate this setup, a user has to modify PYTHONPATH
to locate these files.
This approach has shortcomings:
- Isn't conducive to the industry standard method of installing Python libraries,
pip
. - User modification of
PYTHONPATH
isn't the usual method of adding Python libraries. - Cannot be independenly installed; requires yum/apt installer.
- Does not work well with Python's virtual environments.
__pycache__
is created dynamically which may mutate containers.- Not conducive to toolchains (Integrated Development Environments, Docker build, )
The proposed approach:
- Creates industry-standard Python packaging.
- Hosts packages in industry standard location, Python Package Index.
- Updates Senzing Python programs (i.e.
g2/python/*
) to use packages correctly.
Note: This will not preclude the use of the directories shipped with the SenzingAPI RPM/DEB packages. However, the directory structure of the Senzing SDK for Python will need to be modified to the industry-standard packaging format.
Because this is a "breaking change", the recommendation is to introduce the functionality into Senzing API 3.0.0.
- Installation
- Modification to client code
- Modification to Senzing SDK for Python
- Modification to RPM/DEB directory structure
Installation is done with pip
.
-
Using the
pip
test server. Example:python3 -m pip install \ --index-url https://test.pypi.org/simple/ \ --no-deps \ senzing
-
Before. Example:
import G2Exception from G2Config import G2Config from G2ConfigMgr import G2ConfigMgr from G2Diagnostic import G2Diagnostic from G2Engine import G2Diagnostic from G2Product import G2Product
-
After. Example:
from senzing import G2Config, G2ConfigMgr, G2Diagnostic, G2Engine, G2Exception, G2Hasher, G2IniParams, G2Product
-
After alternative. Example:
from senzing import *
-
Before. Example:
an_object = G2Product()
-
After. Example:
an_object = G2Product.G2Product()
This alternative from
/import
syntax allows object creation to remain unchanged.
-
Use
from
containing path. Example:from senzing.G2Exception import G2Exception from senzing.G2Config import G2Config from senzing.G2ConfigMgr import G2ConfigMgr from senzing.G2Diagnostic import G2Diagnostic from senzing.G2Engine import G2Engine from senzing.G2Product import G2Product
-
Then object creation can remain:
an_object = G2Product()
-
Before. Example:
from G2Exception import TranslateG2ModuleException, G2ModuleNotInitialized, G2ModuleGenericException
-
After. Example:
from .G2Exception import TranslateG2ModuleException, G2ModuleNotInitialized, G2ModuleGenericException
-
Notice the preceeding dot in
.G2Exception
. It means "look for G2Exception in the same directory as the calling python module". It is a "relative path" indicator.
-
g2/python
before. Example:$ tree . ├── CompressedFile.py ├── demo │ ├── sample │ │ ├── project.csv │ │ ├── project.json │ │ ├── sample_company.csv │ │ ├── sample_company.json │ │ ├── sample_person.csv │ │ └── sample_person.json │ └── truth │ ├── project.csv │ ├── project.json │ ├── truthset-person-v1-set1-data.csv │ ├── truthset-person-v1-set1-key.csv │ ├── truthset-person-v1-set1.sh │ ├── truthset-person-v1-set2-data.csv │ ├── truthset-person-v1-set2-key.csv │ └── truthset-person-v1-set2.sh ├── DumpStack.py ├── G2Audit.py ├── G2Command.py ├── G2ConfigMgr.py ├── G2Config.py ├── G2ConfigTables.py ├── G2ConfigTool.py ├── G2ConfigTool.readme ├── G2CreateProject.py ├── G2Database.py ├── G2Diagnostic.py ├── G2Engine.py ├── G2Exception.py ├── G2Explorer.py ├── G2Export.py ├── G2Hasher.py ├── G2Health.py ├── G2IniParams.py ├── G2Loader.py ├── G2Paths.py ├── G2Product.py ├── G2Project.py ├── g2purge.umf ├── G2S3.py ├── G2SetupConfig.py ├── G2Snapshot.py ├── G2UpdateProject.py └── governor_postgres_xid.py
-
g2/python
after. Example:$ tree . ├── CompressedFile.py ├── demo │ ├── sample │ │ ├── project.csv │ │ ├── project.json │ │ ├── sample_company.csv │ │ ├── sample_company.json │ │ ├── sample_person.csv │ │ └── sample_person.json │ └── truth │ ├── project.csv │ ├── project.json │ ├── truthset-person-v1-set1-data.csv │ ├── truthset-person-v1-set1-key.csv │ ├── truthset-person-v1-set1.sh │ ├── truthset-person-v1-set2-data.csv │ ├── truthset-person-v1-set2-key.csv │ └── truthset-person-v1-set2.sh ├── DumpStack.py ├── G2Audit.py ├── G2Command.py ├── G2ConfigTables.py ├── G2ConfigTool.py ├── G2ConfigTool.readme ├── G2CreateProject.py ├── G2Database.py ├── G2Explorer.py ├── G2Export.py ├── G2Health.py ├── G2Loader.py ├── G2Paths.py ├── G2Project.py ├── g2purge.umf ├── G2S3.py ├── G2SetupConfig.py ├── G2Snapshot.py ├── G2UpdateProject.py ├── governor_postgres_xid.py ├── senzing │ ├── G2ConfigMgr.py │ ├── G2Config.py │ ├── G2Diagnostic.py │ ├── G2Engine.py │ ├── G2Exception.py │ ├── G2Hasher.py │ ├── G2IniParams.py │ ├── G2Product.py │ └── __init__.py └── senzing_governor.py
-
In terms of code it would look like this. Move Senzing SDK for Python modules to a
senzing
subdirectory. Example:export SENZING_PYTHON_DIR=~/my-senzing/g2/python mkdir ${SENZING_PYTHON_DIR}/senzing mv ${SENZING_PYTHON_DIR}/G2Config.py ${SENZING_PYTHON_DIR}/senzing/ mv ${SENZING_PYTHON_DIR}/G2ConfigMgr.py ${SENZING_PYTHON_DIR}/senzing/ mv ${SENZING_PYTHON_DIR}/G2Diagnostic.py ${SENZING_PYTHON_DIR}/senzing/ mv ${SENZING_PYTHON_DIR}/G2Engine.py ${SENZING_PYTHON_DIR}/senzing/ mv ${SENZING_PYTHON_DIR}/G2Exception.py ${SENZING_PYTHON_DIR}/senzing/ mv ${SENZING_PYTHON_DIR}/G2Hasher.py ${SENZING_PYTHON_DIR}/senzing/ mv ${SENZING_PYTHON_DIR}/G2IniParams.py ${SENZING_PYTHON_DIR}/senzing/ mv ${SENZING_PYTHON_DIR}/G2Product.py ${SENZING_PYTHON_DIR}/senzing/
-
Then add
__init__.py
. Example:cat <<EOT > ${SENZING_PYTHON_DIR}/senzing/__init__.py __all__ = ["G2Config", "G2ConfigMgr", "G2Diagnostic", "G2Engine", "G2Exception", "G2Hasher", "G2IniParams", "G2Product"] EOT
- init-container
- stream-loader