analytics_toolkit is a small utility package for:
- AB-test related helpers
- SQL I/O and table-loading helpers for Trino, Greenplum, and ClickHouse
- date helpers for common period calculations
- Excel helpers for writing pivoted tables from long-format data
From PyPI:
pip install analytics-toolkitFrom GitHub:
pip install git+https://github.com/Karapsin/analytics_toolkit.gitPyPI publishing uses GitHub Actions trusted publishing. Before each release,
run the full local tox matrix from AGENTS.md, update the changelog, and verify
the package build:
python -m pip install --upgrade build twine
python -m build
python -m twine check dist/*Run the publish workflow manually to publish the current version to TestPyPI.
After the TestPyPI install check passes, create and publish a GitHub release
tagged as v<pyproject.toml version>; the release event publishes to PyPI.
The PyPI trusted publisher uses repository Karapsin/analytics_toolkit,
workflow .github/workflows/publish.yml, and environment pypi. The TestPyPI
trusted publisher uses the same repository and workflow with environment
testpypi.
from analytics_toolkit.ab_utils import compute_test_metrics
from analytics_toolkit import sql
from analytics_toolkit.dates.dates import first_day
from analytics_toolkit.excel import break_table, pivot_and_break_tableSupported SQL imports are from analytics_toolkit import sql or
import analytics_toolkit.sql as sql. Deep imports under
analytics_toolkit.sql.* are internal implementation details and may change;
call SQL helpers through the sql facade, for example sql.create_sql_table(...)
or sql.transfer(...). Do not restore removed root implementation paths.
SQL connection settings are read from .connections. The package searches
from the current working directory upward through parent directories. Each key is
the public connection alias used by analytics_toolkit.sql; each value must
include type as one of gp, trino, or ch.
{
"gp": {
"type": "gp",
"host": "gp.example",
"port": 5432,
"user": "user",
"password": "password",
"database": "db"
},
"gp_sandbox": {
"type": "gp",
"host": "gp-sandbox.example",
"user": "user",
"password": "password",
"database": "sandbox"
}
}Legacy variables such as GP_HOST, TRINO_HOST, CH_HOST, SQL_CONNECTIONS,
and TRINO_INSERT_CHUNK_SIZE are not read. Move connection settings into
.connections; Trino insert chunk sizing is the Trino connection field
insert_chunk_size.
If a Trino connection sets use_keychain_certs=true, the generated CA bundle is
written to:
<connections_file_directory>/certs/trino-<connection-key>-keychain-ca.pem
You can override the state/output directory with MAGNIT_UTILS_HOME.
analytics_toolkit/ab_utils: AB-test metric comparison helpers, includingcompute_test_metricsanalytics_toolkit/dates: date and period helpersanalytics_toolkit/excel: Excel formatting helpersanalytics_toolkit/sql: SQL execution, loading, and transfer helpers