A procurement classification framework for Australian government contract disclosure data.
This package turns rows from a contract-disclosure register — Queensland's standard schema today, more sources to follow — into a stable, machine-comparable set of classification dimensions. Pure-Python, zero runtime dependencies, deterministic outputs.
The intended audience is anyone who wants analytical leverage over public procurement disclosures: government policy researchers, academic institutions, civil-society organisations, journalists, and platform builders. Released open source so the analytical layer over the $127.5B Queensland infrastructure pipeline (and other Australian government spend) does not need to be rebuilt from scratch by every team that needs it.
pip install demiton-classifierPython 3.10 or newer.
Classify a single contract:
from demiton_classifier import Contract, classify
contract = Contract(
value_aud=1_588_168.0,
title="Strategic Rail Advisory Services",
supplier_name="Lunarr Advisory Pty Ltd",
supplier_postcode="3206",
procurement_method="Limited",
number_of_offers_sought=1,
has_variation=True,
)
result = classify(contract)
print(result.value_tier.value) # "1m_to_10m"
print(result.supplier_locality.value) # "interstate_au"
print(result.procurement_competition.value) # "limited_sole_source"
print(result.engagement_category.value) # "engineering_consulting"Parse a Queensland disclosure CSV and run concentration analysis:
from demiton_classifier import (
parse_qld_disclosure_csv,
classify_batch,
analyze_supplier_concentration,
)
contracts = parse_qld_disclosure_csv("crr_disclosure_april_2025.csv")
print(f"Loaded {len(contracts)} contracts")
for contract, classification in classify_batch(contracts):
print(contract.supplier_name, "→", classification.engagement_category.value)
concentration = analyze_supplier_concentration(contracts)
print(f"Top-5 supplier concentration: {concentration.top_5_concentration_pct}%")
print(f"Suppliers with ABN: {concentration.suppliers_with_abn}/{concentration.distinct_suppliers}")The classifier produces seven deterministic dimensions for every contract. Every dimension has an explicit UNKNOWN (or equivalent) variant for cases where the input is missing or unparseable.
| Dimension | Variants | Inputs |
|---|---|---|
value_tier |
under_10k, 10k_to_100k, 100k_to_1m, 1m_to_10m, 10m_to_100m, over_100m |
value_aud |
procurement_competition |
open_competitive, selective, limited_sole_source, unknown |
procurement_method, number_of_offers_sought |
supplier_locality |
local_qld, interstate_au, international, unknown |
supplier_country, supplier_state, supplier_postcode |
engagement_category |
legal, engineering_consulting, financial_advisory, assurance, it_software, construction, facilities, project_management, security, other |
title, description, contract_category_group, supplier_name |
panel_or_direct |
panel, direct, unknown |
parent_contract_id, supplier_name |
has_variation |
True, False, None |
has_variation |
confidentiality_used |
True, False, None |
confidentiality_provision_used |
Each Classification also carries a locality_evidence enum (which signal resolved the locality dimension) and a category_match_score (0.0-1.0) so downstream consumers can filter low-confidence assignments.
- No external API calls. The classifier is pure Python; classification is deterministic given the inputs. Resolving an ABN to entity details or checking Indigenous-business-register membership is an enrichment step performed before classification, by the caller, against whatever source they choose.
- No automatic data fetching. Use the parser to load a file you already have, or feed
parse_qld_disclosure_rowsfrom your own data source (a CKAN client, an XLSX reader, a SQL query, etc.). - No commercial pitch baked into outputs. Dimensions are descriptive, not prescriptive.
- Queensland Government — standard contract disclosure schema as published on data.qld.gov.au by the Cross River Rail Delivery Authority, Queensland Treasury, Economic Development Queensland, and other state agencies. The bundled test fixture is derived from the publicly-available Cross River Rail Delivery Authority Contract Disclosure Report (CC-BY-4.0).
The classifier is deliberately conservative about which dimensions ship in v0.1. The following are tracked for future releases:
- AusTender / Commonwealth procurement parser. The OCDS schema differs from the QLD shape; needs its own field mapping.
- NSW eTendering parser.
- ABR-resolved entity classification — once consumers have ABN-to-entity-type data, expand the classification with
entity_kind(sole trader, proprietary company, public company, government entity, trust) andgst_registereddimensions. - Indigenous-owned classification — surface Indigenous-business-register membership as a dimension. Requires a registry source the classifier can be pointed at; Supply Nation's Indigenous Business Direct and Queensland's Black Business Finder are both candidates.
- Foreign-owned / parent-ownership classification — needs ASIC ledger or equivalent ownership-chain source.
- Repeat-supplier / variation-history dimensions — requires multi-snapshot inputs; the
analyze_supplier_concentrationanalyzer is the v0.1 starting point.
Issues and pull requests welcome at github.com/demitonapp/classifier.
This project follows SemVer. Until 1.0.0, dimension enum values are the public contract — adding a variant is a minor bump, renaming or removing one is a major bump.
Apache License 2.0 — see LICENSE.
The bundled test fixture in tests/fixtures/crr_disclosure_sample.csv is derived from the public Cross River Rail Delivery Authority Contract Disclosure Report, licensed CC-BY-4.0 by the State of Queensland.