Skip to content

demitonapp/classifier

Repository files navigation

demiton-classifier

A procurement classification framework for Australian government contract disclosure data.

This package turns rows from a contract-disclosure register — Queensland's standard schema today, more sources to follow — into a stable, machine-comparable set of classification dimensions. Pure-Python, zero runtime dependencies, deterministic outputs.

The intended audience is anyone who wants analytical leverage over public procurement disclosures: government policy researchers, academic institutions, civil-society organisations, journalists, and platform builders. Released open source so the analytical layer over the $127.5B Queensland infrastructure pipeline (and other Australian government spend) does not need to be rebuilt from scratch by every team that needs it.

Install

pip install demiton-classifier

Python 3.10 or newer.

Quickstart

Classify a single contract:

from demiton_classifier import Contract, classify

contract = Contract(
    value_aud=1_588_168.0,
    title="Strategic Rail Advisory Services",
    supplier_name="Lunarr Advisory Pty Ltd",
    supplier_postcode="3206",
    procurement_method="Limited",
    number_of_offers_sought=1,
    has_variation=True,
)

result = classify(contract)
print(result.value_tier.value)             # "1m_to_10m"
print(result.supplier_locality.value)      # "interstate_au"
print(result.procurement_competition.value) # "limited_sole_source"
print(result.engagement_category.value)     # "engineering_consulting"

Parse a Queensland disclosure CSV and run concentration analysis:

from demiton_classifier import (
    parse_qld_disclosure_csv,
    classify_batch,
    analyze_supplier_concentration,
)

contracts = parse_qld_disclosure_csv("crr_disclosure_april_2025.csv")
print(f"Loaded {len(contracts)} contracts")

for contract, classification in classify_batch(contracts):
    print(contract.supplier_name, "→", classification.engagement_category.value)

concentration = analyze_supplier_concentration(contracts)
print(f"Top-5 supplier concentration: {concentration.top_5_concentration_pct}%")
print(f"Suppliers with ABN: {concentration.suppliers_with_abn}/{concentration.distinct_suppliers}")

Classification dimensions (v0.1)

The classifier produces seven deterministic dimensions for every contract. Every dimension has an explicit UNKNOWN (or equivalent) variant for cases where the input is missing or unparseable.

Dimension Variants Inputs
value_tier under_10k, 10k_to_100k, 100k_to_1m, 1m_to_10m, 10m_to_100m, over_100m value_aud
procurement_competition open_competitive, selective, limited_sole_source, unknown procurement_method, number_of_offers_sought
supplier_locality local_qld, interstate_au, international, unknown supplier_country, supplier_state, supplier_postcode
engagement_category legal, engineering_consulting, financial_advisory, assurance, it_software, construction, facilities, project_management, security, other title, description, contract_category_group, supplier_name
panel_or_direct panel, direct, unknown parent_contract_id, supplier_name
has_variation True, False, None has_variation
confidentiality_used True, False, None confidentiality_provision_used

Each Classification also carries a locality_evidence enum (which signal resolved the locality dimension) and a category_match_score (0.0-1.0) so downstream consumers can filter low-confidence assignments.

What the package will not do

  • No external API calls. The classifier is pure Python; classification is deterministic given the inputs. Resolving an ABN to entity details or checking Indigenous-business-register membership is an enrichment step performed before classification, by the caller, against whatever source they choose.
  • No automatic data fetching. Use the parser to load a file you already have, or feed parse_qld_disclosure_rows from your own data source (a CKAN client, an XLSX reader, a SQL query, etc.).
  • No commercial pitch baked into outputs. Dimensions are descriptive, not prescriptive.

Data sources tested against

Roadmap

The classifier is deliberately conservative about which dimensions ship in v0.1. The following are tracked for future releases:

  • AusTender / Commonwealth procurement parser. The OCDS schema differs from the QLD shape; needs its own field mapping.
  • NSW eTendering parser.
  • ABR-resolved entity classification — once consumers have ABN-to-entity-type data, expand the classification with entity_kind (sole trader, proprietary company, public company, government entity, trust) and gst_registered dimensions.
  • Indigenous-owned classification — surface Indigenous-business-register membership as a dimension. Requires a registry source the classifier can be pointed at; Supply Nation's Indigenous Business Direct and Queensland's Black Business Finder are both candidates.
  • Foreign-owned / parent-ownership classification — needs ASIC ledger or equivalent ownership-chain source.
  • Repeat-supplier / variation-history dimensions — requires multi-snapshot inputs; the analyze_supplier_concentration analyzer is the v0.1 starting point.

Issues and pull requests welcome at github.com/demitonapp/classifier.

Versioning

This project follows SemVer. Until 1.0.0, dimension enum values are the public contract — adding a variant is a minor bump, renaming or removing one is a major bump.

License

Apache License 2.0 — see LICENSE.

The bundled test fixture in tests/fixtures/crr_disclosure_sample.csv is derived from the public Cross River Rail Delivery Authority Contract Disclosure Report, licensed CC-BY-4.0 by the State of Queensland.

About

A procurement classification framework for Australian government contract disclosure data — open source under Apache 2.0.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages