Fast, zero-dependency streaming reader for AEMO NEM12 and NEM13 metering files. Implements AEMO MDFF (Meter Data File Format) v2.6.
- O(1) memory — iterate through millions of intervals.
- Pure stdlib core; pandas / PyMySQL are opt-in extras.
- ~2 M readings/sec on the columnar fast path.
- Includes an
aemo-mdff-readerCLI.
pip install aemo-mdff-reader
# optional extras
pip install aemo-mdff-reader[pandas] # to_dataframe() / parquet
pip install aemo-mdff-reader[mysql] # SQL persistencefrom aemo_mdff_reader import parse
for r in parse("metering.csv"):
print(r.nmi, r.interval_start, r.value, r.uom)Or as a flat CSV / DataFrame:
from aemo_mdff_reader import parse, write_csv, to_dataframe
write_csv(parse("metering.csv"), "out.csv") # no pandas
df = to_dataframe("metering.csv") # needs [pandas]From the command line:
aemo-mdff-reader metering.csv -o out.csv
aemo-mdff-reader metering.csv --validate # spec check
aemo-mdff-reader metering.csv --nmi NMI1234567 --start 2024-01-01 --end 2024-01-31
aemo-mdff-reader manual.csv --records accumulations # NEM13Each parsed record is a slots-based class with named attributes plus a
to_dict() for JSON / dict pipelines:
for r in parse("metering.csv"):
payload = r.to_dict() # {"nmi": "...", "value": 0.12, ...}
print(r.quality_flag, r.method_flag) # split of the QMM field, e.g. "S", "52"For aggregation, aemo_mdff_reader.aggregate provides streaming helpers:
from aemo_mdff_reader import parse
from aemo_mdff_reader.aggregate import group_by_nmi, daily_totals
for key, group in group_by_nmi(parse("metering.csv")):
# key = ChannelKey(nmi, register_id, nmi_suffix)
intervals = list(group)
for day in daily_totals(parse("metering.csv")):
# day.total, day.interval_count, day.unique_quality_flags
print(day.nmi, day.interval_date.date(), day.total, day.uom)End-to-end recipes — load + inspect, daily roll-up, filter to pandas,
spec validation — live in examples/.
| You want | Call |
|---|---|
| 300 interval readings (NEM12) | parse(src) |
| 250 accumulations (NEM13) | parse_accumulations(src) |
| Both, in file order | parse_all(src) |
| 400 quality / event flags | parse_events(src) |
| 500 / 550 B2B transactions | parse_b2b(src) |
| Just the 100 header | parse_header(src) |
| Build a pandas DataFrame | to_dataframe(src) |
| Write a flat CSV (no pandas) | write_csv(rows, out) |
| Validate against AEMO MDFF v2.6 | validate_file(src) |
| Compute / verify an NMI checksum | nmi_checksum, validate_nmi |
| Group readings by NMI / channel | aggregate.group_by_nmi(rows) |
| Roll up to daily totals | aggregate.daily_totals(rows) |
| Convert any record to a plain dict | r.to_dict() |
src can be a path, a file-like object, an iterable of CSV lines, or
an iterable of pre-split rows. The v1 NEMReader facade
(read_from_file, to_dataframe, to_csv) still works.
Each parse(...) yields an IntervalReading with nmi,
meter_serial_number, register_id, nmi_suffix, uom,
interval_length, interval_date, interval_start, interval_end,
interval_index, value, quality_method, reason_code,
reason_description, update_datetime, msats_load_datetime. See
the type stubs (from aemo_mdff_reader import IntervalReading) for the
exact signatures.
- Spec: AEMO Meter Data File Format Specification NEM12 & NEM13,
v2.6 (effective 29 September 2024). Records
100,200,250,300,400,500,550,900are all surfaced; unknown indicators are ignored. Allowed values for quality flags, transaction codes, reason codes, units of measure, and direction indicators are exposed as constants inaemo_mdff_reader.specfor callers that want stricter validation than the parser performs. - Tolerant: UTF-8 BOM is consumed silently, LF and CRLF both work,
and empty interval cells are coerced to
0.0(usequality_methodto distinguish missing from zero). Datetime fields accept the spec forms (YYYYMMDD,YYYYMMDDhhmmss) and a few common non-spec variants (YYYY-MM-DD, ISOYYYY-MM-DDTHH:MM:SS, with or without aZ/±HH:MM/±HHMMtimezone suffix — the suffix is stripped and parsed datetimes are returned naive).direction_indicatoron 250 records passes through whatever the file emits; the spec set isspec.DIRECTION_INDICATORS = {"I", "E"}butBandNappear in the wild. The parser also accepts non-spec IntervalLengths (1, 60, etc.); strict callers should compare againstspec.ALLOWED_INTERVAL_LENGTHS(={5, 15, 30}). - Migration from v1:
NEMReaderstill works. The internalaemo_mdff_reader.nemstructurepackage is gone — see the API table above.pandasis now opt-in. SeeCHANGELOG.mdfor details.
420,480 readings (4 NMIs × 365 days × 5-min, 2.8 MiB CSV), Python 3.11:
| Operation | Time |
|---|---|
for r in parse(path): ... |
0.45 s |
parse_to_columns(path) |
0.21 s |
to_dataframe(path) (pandas) |
0.76 s |
~2.7× faster than v1 end-to-end; reproduce with
python benchmarks/bench_parser.py.
The parser is built to scale to gigabyte-class NEM12 files without loading them into RAM. Measured peak memory delta on a synthetic 10.5 M-reading file (100 NMIs × 365 days × 5-min, 71 MiB CSV), Python 3.12:
| API | Memory profile | Peak Δ |
|---|---|---|
for r in parse(path): ... |
streaming | 1.3 MiB |
daily_totals(parse(path)) |
streaming | 0 MiB |
write_csv(parse(path), out) |
streaming | 0 MiB |
iter_dataframes(path, chunk_size=N) |
bounded O(N) | ~30 MiB / 100k |
iter_columns_chunks(path, chunk_size=N) |
bounded O(N) | ~10 MiB / 100k |
parse_to_columns(path) |
full materialise | ~600 MiB |
list(parse(path)) / to_dataframe(path) / NEMReader.read_from_file() |
full materialise | ~2.5 GiB |
Rule of thumb: stay on the streaming or chunked APIs for any file larger than a few hundred MiB. The chunked variants make pandas-based workflows safe on arbitrarily large inputs:
from aemo_mdff_reader import iter_dataframes
# Process a multi-GiB file 50,000 readings at a time.
for df in iter_dataframes("huge.csv", chunk_size=50_000):
daily = df.groupby(["NMI", "IntervalDate"])["Value"].sum()
daily.to_csv("out.csv", mode="a", header=False)The NEMReader facade and to_dataframe(path) materialise their
inputs by design (so len(reader) and df.iloc[...] work). Avoid
them for files that won't fit in RAM.
git clone https://github.com/Utilified/aemo-mdff-reader.git
cd aemo-mdff-reader
pip install -e .[dev]
pytestCI runs ruff, mypy --strict, the test matrix on Python 3.11 → 3.12 /
Linux / macOS / Windows, pip-audit, bandit, CodeQL, OpenSSF
Scorecard, and a wheel-install smoke test.
Releases are automated by release-please
from Conventional Commits on
main, then signed with sigstore, attested with SLSA build provenance
and a CycloneDX SBOM, and published to PyPI via Trusted Publishing.
See CONTRIBUTING.md for the contributor commit
conventions and the full release flow.
MIT — see LICENSE.