Apache-2.0 clean-room Rust/Python toolkit for inspecting Autodesk Revit files (.rvt, .rfa, .rte, .rft) without a Revit installation. Opens the OLE/CFB container, decodes Revit's truncated-gzip streams, extracts metadata and previews, parses the embedded Formats/Latest schema, and classifies all observed schema field encodings across an 11-release 2016–2026 reference corpus.
This is not yet a full Revit model reader. Schema-directed instance walking has a verified ADocument beachhead on Revit 2024–2026, 80 per-class decoders registered in elements::all_decoders() (Wall, Floor, Door, Window, Column, Beam, Stair, Railing, Rebar, Room/Area/Space, Furniture, DesignOption, Phase, Workset, and many more), and IFC4 STEP emission that produces a valid spatial tree with per-element entities, rectangular geometry, compound-layer materials, typed property sets, and door/window openings. Real-world project-file corpus validation (Q-01) is active research — the shipped 11-release test corpus is family-scale (.rfa); one .rvt project probe already surfaced and fixed a latent bounds bug in gzip_header_len. See What works today for the precise boundary.
A zero-upload, client-side browser viewer ships alongside the library, live at https://drunkonjava.github.io/rvt-rs/. Drop a .rvt / .rfa file onto the page — the WebAssembly build parses it in-tab, renders 3D via Three.js with orbit controls + element picking + scene tree, and offers one-click Export glTF / Export IFC / Export plan SVG. No upload, no account, no telemetry. CI asserts the compiled .wasm has zero fetch / XMLHttpRequest / WebSocket imports.
Rust 2024 edition (MSRV 1.85). Fourteen CLIs ship (rvt-analyze, rvt-info, rvt-schema, rvt-history, rvt-diff, rvt-corpus, rvt-dump, rvt-doc, rvt-ifc, rvt-write, rvt-gltf, rvt-sheet, rvt-elem-table, gen-fixture) plus 36 reproducible probes under examples/. Python bindings via pyo3+maturin in the rvt-py workspace member (SEC-12/13 — the core rvt crate is unconditionally #![forbid(unsafe_code)]) — pip install rvt.
| Layer | Status | Notes |
|---|---|---|
| OLE/CFB container open | ✓ | No Revit required |
| Truncated-gzip stream decode | ✓ | |
BasicFileInfo metadata |
✓ | Version, build, GUID, original path |
PartAtom XML |
✓ | Title, OmniClass code, taxonomies |
| Stream preview extraction | ✓ | Clean PNG, wrapper stripped |
Formats/Latest schema parse |
✓ | 395 classes, 13,570 fields |
| Field-type classification | ✓ | 100% over rac_basic_sample_family 11-release corpus — CI regression gate |
| Cross-release tag-drift table | ✓ | First public 122×11 dataset |
| Layer 5a ADocument walker | partial | Reliable on Revit 2024–2026; 2016–2023 entry-point detection pending |
| Stream-level modifying writer | ✓ | 13/13 streams byte-preserving; rvt-write CLI + JSON patch manifests |
| Field-level semantic writer | pending | Phase 7 |
| Layer 5b per-class decoders | ✓ | 80 decoders registered in elements::all_decoders(): Level/Category/Subcategory/Material/FillPattern/LinePattern/LineStyle/BasePoint/SurveyPoint/ProjectPosition/Grid/GridType/ReferencePlane/Wall/WallType/Floor/FloorType/Roof/RoofType/Ceiling/CeilingType/Door/Window/Column/StructuralColumn/Beam/StructuralFraming/Stair/StairType/Railing/RailingType/Room/Area/Space/Furniture/FurnitureSystem/Casework/Rebar/Foundation/Mass/GenericModel/View/Sheet/Schedule/ScheduleView/Titleblock/Viewport/Dimension/TextNote/Tag/Annotation/Revision/Phase/DesignOption/Workset/ParameterElement/SharedParameter/Symbol and every Electrical*/Mechanical*/Plumbing*/Specialty FamilyInstance subtype. Each decoder takes instance bytes → typed view. Validated on synthesized schema+bytes fixtures; real-project-file corpus validation is Q-01. |
| IFC4 STEP export — spatial tree | ✓ | IfcProject + IfcSite + IfcBuilding + IfcBuildingStorey + OmniClass classifications |
| IFC4 STEP export — elements | ✓ | IfcWall/IfcSlab/IfcRoof/IfcCovering/IfcDoor/IfcWindow/IfcColumn/IfcBeam/IfcStair/IfcRailing/IfcFurniture/IfcFooting/IfcReinforcingBar/IfcSpace/IfcBuildingElementProxy constructors + IfcLocalPlacement + IfcRelContainedInSpatialStructure. Each element is valid IFC4 and appears in BlenderBIM / IfcOpenShell spatial browsers. |
| IFC4 STEP export — geometry | ✓ (rectangular) | IfcExtrudedAreaSolid + IfcRectangleProfileDef chain wired to the element's Representation slot. Rectangular profiles only (curved doors, non-orthogonal walls still pending — IFC-17/24). |
| IFC4 STEP export — materials | ✓ | Single-material via IfcMaterial + IfcRelAssociatesMaterial; compound assemblies via IfcMaterialLayerSet + IfcMaterialLayerSetUsage (IFC-28/29). Walls / floors / roofs with layered composition emit correctly. |
| IFC4 STEP export — properties | ✓ | IfcPropertySet + IfcPropertySingleValue with typed values (IfcText, IfcInteger, IfcReal, IfcBoolean, IfcLengthMeasure, IfcPlaneAngleMeasure, IfcAreaMeasure, IfcVolumeMeasure, IfcCountMeasure, IfcTimeMeasure, IfcMassMeasure) wired via IfcRelDefinesByProperties. |
| IFC4 STEP export — openings | ✓ | IfcOpeningElement + IfcRelVoidsElement + IfcRelFillsElement — doors and windows cut actual holes in their host walls (BlenderBIM verified). |
| Geometry extraction | partial | Extrusion helpers ship for walls/slabs/roofs/ceilings/columns/beams/stairs/doors/windows (GEO-27..35, IFC-16..26). Swept / revolved / BRep variants exist (IFC-17/18/19/20) but with rvt feature-flagged rectangular fallbacks in the default emission path. |
| glTF 2.0 binary export | ✓ | model_to_glb() produces a valid .glb file that loads in Three.js's GLTFLoader (VW1-04). rvt-gltf CLI. |
| 2D plan-view SVG export | ✓ | render_plan_svg() produces per-category-coloured SVG (walls black, doors blue, columns red, …) (VW1-11). rvt-sheet CLI. |
| Browser viewer | ✓ | Live at https://drunkonjava.github.io/rvt-rs/. WebAssembly build of the core library + Three.js + Vite. Zero-upload, in-tab parse, Export glTF/IFC/SVG buttons, URL-based share via share::ViewerState. (VW1-01 through VW1-24 shipped.) |
| Fuzz-regression harness | ✓ | 9 libFuzzer targets + 38 synthetic adversarial regression cases under tests/fuzz_regressions.rs. Caught a real gzip_header_len bounds bug on 9-byte truncated headers (Q-04). |
The openBIM community — anchored by buildingSMART International and the IFC standard — has spent years working on Revit interoperability. Autodesk's own revit-ifc exporter runs inside Revit using the Revit API, so it can only emit what the API surfaces. Real-world IFC exports from Revit are described, routinely and publicly, as "very limited" (thinkmoult.com), "data loss" (Reddit r/bim), and "out of the box, just crap" (the OSArch Wiki's guide to Revit for openBIM).
The schema work here — decoding Formats/Latest and classifying 100% of field encodings across 11 Revit releases — is the dictionary a byte-level reader needs. Once per-element decoders (Phase 4 in TODO) and geometry extraction (Phase 5) land on top, the resulting IFC export can carry more than what the Revit API chooses to expose. That is the thesis. It is not yet the delivered product.
If you're building BIM/AEC tooling and want an Apache-2 Revit reader to compose into your stack, the current release covers metadata, schema introspection, 80 per-class decoders (Level, Wall, Floor, Roof, Ceiling, Door, Window, Column, Beam, Stair, Railing, Room, Furniture, Rebar, Phase, DesignOption, Workset, and every Electrical/Mechanical/Plumbing FamilyInstance subtype), and IFC4 STEP emission with a real spatial tree + per-element entities + glTF 2.0 binary + 2D plan-view SVG. See tests/fixtures/synthetic-project.ifc for a committed sample output you can open in BlenderBIM or IfcOpenShell — IfcProject, IfcSite, IfcBuilding, three IfcBuildingStoreys (Ground / Second / Roof Deck at real elevations), and ten IfcWall / IfcSlab / IfcDoor / IfcWindow / IfcStair / IfcBuildingElementProxy entities all wired to the storey via IfcRelContainedInSpatialStructure. Or just drop your file at https://drunkonjava.github.io/rvt-rs/ and click Export IFC — same code path, running in your browser.
One command produces the full forensic picture — identity, upgrade history, format anchors, schema table, Phase D link histogram, content metadata, and a disclosure scan:
cargo build --release
./target/release/rvt-analyze --redact path/to/your.rfaimport rvt
f = rvt.RevitFile("my-project.rfa")
print(f.version, f.part_atom_title) # 2024 "0610 x 0915mm"
print(f.read_adocument()["fields"][-1]) # {name: m_devBranchInfo, kind: element_id, tag: 0, id: 35}
open("out.ifc", "w").write(f.write_ifc())Install: pip install rvt — or build from source with maturin build --release --manifest-path rvt-py/Cargo.toml. Full API + Jupyter notebook walkthrough: docs/python.md and docs/rvt-python-quickstart.ipynb.
Drop a .rvt / .rfa / .rte / .rft at https://drunkonjava.github.io/rvt-rs/ — nothing leaves the tab. The viewer compiles the core library to WebAssembly (wasm-pack build --target web --features wasm), runs the parse in a dedicated worker, and renders 3D via Three.js. One-click buttons export the model as glTF 2.0 binary, IFC4 STEP, or plan-view SVG. URL state (camera pose + category filters) is shareable via the hash fragment.
Privacy posture is CI-enforced: the deploy workflow (.github/workflows/deploy-viewer.yml) runs wasm-objdump -j Import on every build and fails if the compiled .wasm imports fetch, XMLHttpRequest, or WebSocket. See docs/viewer-privacy-posture.md.
Sample output (all pre-scrubbed with --redact, committed for review):
- One-screen teaser:
docs/demo/rvt-analyze-2024-teaser.txt— the four highlight sections fit in one terminal screen (identity, format anchors, Phase D linkage, disclosure scan) - Full terminal report:
docs/demo/rvt-analyze-2024-redacted.txt— 130 lines of structured output - JSON report:
docs/demo/rvt-analyze-2024-redacted.json— machine-readable version - Tag-drift heatmap:
docs/data/tag-drift-heatmap.svg— visual proof of class-ID drift across 11 Revit releases
The --redact flag (on by default in every committed artifact) scrubs Windows usernames, Autodesk-internal paths, and project-ID folder names to <redacted> markers while preserving path shape so claims remain verifiable. Omit the flag when running privately against your own files.
Running the shipped CLIs against one 400 KB RFA fixture:
- Metadata: version, build tag, creator path, file GUID, locale (
rvt-info) - Atom XML: title, OmniClass code, taxonomies (
rvt-infoparsesPartAtom) - Preview: clean PNG thumbnail, 300-byte Revit wrapper stripped (
rvt-info --extract-preview) - Schema: 395 classes + 1,114 fields + per-field typed encoding (
rvt-schema) - History: every Revit release that ever saved this file (
rvt-history) - Bulk strings: 3,746 length-prefixed UTF-16LE records from Partitions/NN — Autodesk unit/spec/parameter-group identifiers, OmniClass + Uniformat codes, Revit category labels, localized format strings (
rvt-history --partitions)
Every class and field name that rvt-schema extracts was cross-checked against the public RevitAPI.dll NuGet package's exported C++ symbol list. All top-level tagged class names we've inspected (ADocument, DBView, HostObj, LoadBCBase, Symbol, APIAppInfo, APropertyDouble3, ElementId, and the rest) appear in that export with their decorated signatures (e.g. __cdecl NotNull<class ADocument *,void>::NotNull(class ADocument *)), confirming the on-disk schema names match the compiled symbols one-to-one.
A build-server path also appears in C++ assertion strings inside the same DLL; it is mentioned in the recon report for completeness and does not represent anything the reader extracts from .rvt / .rfa files.
Six reproducible discoveries, all documented in docs/rvt-moat-break-reconnaissance.md and reproducible from examples/:
-
The schema indexes the data. Class names do not appear as ASCII in
Global/Latest; class tags fromFormats/Latest(u16 after class name, with 0x8000 flag set) occur ~340× the uniform-random rate. The top tag,AbsCurveGStep, appears 19,415 times in 938 KB of decompressed Global/Latest. [examples/link_schema.rs] -
Tags drift across releases but are stable-sort-assigned.
ADocWarnings= 0x001b 2016→2026 because no class sorted alphabetically before it has ever been added.AbsCurveGStepshifted 0x0053 → 0x0066 across the decade as 19 new A-class entries were inserted. Full 122-class × 11-release drift table:docs/data/tag-drift-2016-2026.csv, visualised indocs/data/tag-drift-heatmap.svg. First publicly-available version of this data. [examples/tag_drift.rs] -
Revit 2021 was a major undocumented format transition. Global/Latest grew 27× (~26 KB → ~715 KB) while simultaneously the Forge Design Data Schema namespaces (
autodesk.unit.*,autodesk.spec.*) debuted in Partitions/NN. Two symptoms, one event. Any reader built for 2016-2020 silently drops 30× more data when pointed at 2021+. -
Parameter-group namespace shipped separately in Revit 2024.
autodesk.parameter.group.*identifiers appear in 2024+ only — three releases after units/specs. Dating the Forge schema rollout from on-disk bytes:examples/tag_drift.rs,src/object_graph.rs. -
A stable Revit format-identifier GUID in family files.
Global/PartitionTableis 167 bytes decompressed in.rfafamily files, and 165 of those bytes are byte-for-byte identical across every Revit release 2016-2026 (98.8% invariant). The invariant region contains a never-before-published UUIDv1:3529342d-e51e-11d4-92d8-0000863f27ad. The MAC suffix0000863f27admatches a known Autodesk-dev-workstation signature from circa 2000. Useful for family-file detection. Scope correction (2026-04-21): this invariant is a family-file anchor, not a universal Revit-file anchor. Three real.rvtproject files we probed carry three different GUIDs (6a6261fd-...on Revit 2023,552368c6-...on 2024, all-zero on 2025) in a shorter 87-bytePartitionTable. File-type sniffers using the family GUID will correctly reject non-family files but can't identify them. Seedocs/project-file-corpus-probe-2026-04-21.md. [examples/partition_full.rs] -
Tagged class record structure decoded. Every class declaration in
Formats/Latestcarries an explicit tag (u16 with 0x8000 flag), optional parent class, and declared field count, followed by N field records each with name + C++ type encoding.HostObjAttrnow resolves to{tag=107, parent=Symbol, declared_field_count=3}with all three field names (m_symbolInfo,m_renderStyleId,m_previewElemId) extracted byte-for-byte. [examples/record_framing.rs,src/formats.rs]
Three unintended disclosure patterns also surfaced in Autodesk's shipped reference content — the specific values are withheld from this README to avoid re-broadcasting them; they are documented in docs/rvt-moat-break-reconnaissance.md for security-research reproducibility:
- A customer-facing OneDrive path that leaks the directory structure of an Autodesk employee's personal sample-authoring workflow.
- A build-server path baked into C++ assertion strings inside the public
RevitAPI.dll. - A creator-name field inside the
Contentsstream that travels with every copy of the sample family, preserving the name of one of Revit's original 1997 developers.
Downstream safety: the rvt-analyze CLI ships with a --redact flag (on by default for any of the committed demo output in this repo) that rewrites creator paths, Autodesk-internal paths, and build-server paths to <redacted> markers while preserving the surrounding structure. Any tool consuming rvt-rs output and displaying it publicly should do the same.
All modules compile under both the default build and the wasm feature flag. See src/ for type docs:
| Module | What it does |
|---|---|
reader |
Open any Revit file with OpenLimits, enumerate every OLE stream, fetch raw stream bytes, bounded reads |
compression |
Truncated-gzip decode (inflate_at, inflate_at_auto, inflate_at_with_limits) + multi-chunk (inflate_all_chunks_with_limits) + truncated-gzip encoder for write-back (truncated_gzip_encode) |
basic_file_info |
Version, build tag, GUID, creator path, locale — read path + byte-back encoder (BasicFileInfo::encode) |
part_atom |
Atom XML with Autodesk partatom namespace — title, OmniClass, taxonomies — read + encode |
formats |
Parse + encode Formats/Latest with FieldType classification (100 % over the 11-release corpus) |
walker |
Schema-directed instance walker + 80-decoder dispatch + detect_adocument_start entry-point finder |
elements |
80 ElementDecoder implementations (Wall, Floor, Door, Window, Column, Beam, Stair, Railing, Rebar, Room, Furniture, …) |
geometry |
Curve / Face / Solid variants (Line, Arc, Ellipse, NURBS, Hermite, Ruled, Revolved, Extrusion, Sweep, Blend, SweptBlend, Boolean, Mesh, PointCloud) |
object_graph |
DocumentHistory, string-record extractor for Global/Latest + Partitions/NN |
class_index |
Quick class-name inventory (BTreeSet) |
corpus |
Cross-version byte-delta classifier |
elem_table |
Global/ElemTable header parser + rough record enumeration |
partitions |
Partitions/NN 44-byte header decoder + gzip-chunk splitter |
writer |
Byte-preserving round-trip copy_file + write_with_patches (atomic temp-file rename, stream-hash verification) + GUID + history preservation |
round_trip |
Per-class encoder round-trip verification (verify_instance_round_trip) |
ifc |
Full IFC4 spatial tree + elements + materials + properties + openings + extrusion geometry + glTF 2.0 binary (gltf::model_to_glb) + plan-view SVG (sheet::render_plan_svg) + viewer data model (scene_graph, camera, clipping, sheet, share, measure, annotation, pbr) |
streams |
Named constants for every invariant OLE stream in a Revit file |
redact |
Shared PII scrubbers for all CLIs (--redact flag) |
wasm |
#[cfg(feature = "wasm")] — 14 JS-callable wasm-bindgen bindings powering the browser viewer |
error |
Structured error type (Error / Result) |
Runtime capabilities:
- Open any Revit file from disk (magic
D0 CF 11 E0 A1 B1 1A E1) - Enumerate every OLE stream; find the version-specific
Partitions/NN - Decompress any stream (truncated-gzip format — standard gzip header, no trailing CRC/ISIZE)
- Parse
BasicFileInfo,PartAtom, extract preview PNG - Extract 395 class records from
Formats/Latestwith tag + parent + ancestor-tag + declared field count for every tagged class - Decode the 167-byte
Global/PartitionTablestructure including the stable Revit format-identifier GUID - Decode the 307-byte
Contentsstream including the embedded UTF-16LE metadata chunk - Produce a byte-for-byte round-trip copy of any
.rfa/.rvtfile - Run across the full 11-release corpus in < 500 ms per file (release build)
Thirteen CLIs ship in the box:
cargo build --release
# One-shot forensic analysis — all subsystems in one report
./target/release/rvt-analyze --redact my-project.rvt
./target/release/rvt-analyze --redact --json my-project.rvt > report.json
# Quick metadata + schema summary
./target/release/rvt-info --show-classes my-project.rvt
# Machine-readable (JSON)
./target/release/rvt-info -f json my-project.rvt > meta.json
# Pull the embedded thumbnail
./target/release/rvt-info --extract-preview preview.png my-project.rvt
# Compare two versions of the same file (cross-version byte diff)
./target/release/rvt-diff --decompress 2018.rfa 2024.rfa
# Dump the full class schema (395 classes, 13,570 fields)
./target/release/rvt-schema my-project.rvt
# Document upgrade history (which Revit releases have opened this file)
./target/release/rvt-history my-project.rvt
# Pull every UTF-16LE string record out of Partitions/NN
# (categories, OmniClass, Uniformat, Autodesk unit identifiers, …)
./target/release/rvt-history --partitions my-project.rvt
# Hex-dump every decompressed stream (for Phase D work)
./target/release/rvt-dump my-project.rvt
# IFC4 STEP export — spatial tree + elements + geometry + openings
./target/release/rvt-ifc my-project.rvt -o out.ifc
# glTF 2.0 binary export — loads in Three.js / Blender / any glTF viewer
./target/release/rvt-gltf my-project.rvt -o out.glb
# 2D plan-view SVG — per-category colours, ready for plot/laser-cut/printing
./target/release/rvt-sheet my-project.rvt -o out.svg
# Global/ElemTable dump — declared element-ids + record layout (family 12B / project 28B/40B)
./target/release/rvt-elem-table my-project.rvt --limit 20
# Byte-preserving write path — patch stream bytes via JSON manifest
./target/release/rvt-write my-project.rvt --patches patches.json -o patched.rvt
# Per-file doc generator (schema + sample-data render for any RVT)
./target/release/rvt-doc my-project.rvt -o doc.md
# Cross-version corpus analysis (11 releases in one pass)
./target/release/rvt-corpus /path/to/corpus-dirThirty-six reproducible probes live in examples/ — one per FACT in the recon report:
cargo build --release --examples
# --- schema ↔ data linkage (Phase D) ---
./target/release/examples/probe_link <file> # null-hypothesis: class names absent from Global/Latest
./target/release/examples/tag_bytes <file> # hex around known class names in Formats/Latest
./target/release/examples/tag_dump <file> # statistical sweep of post-name u16 patterns
./target/release/examples/link_schema <file> # tag-frequency histogram in Global/Latest (340× non-uniformity)
./target/release/examples/tag_drift <sample-dir> <out.csv> # per-class drift table 2016-2026
./target/release/examples/tag_drift_svg <in.csv> <out.svg> # render drift table as colour-coded SVG heatmap
# --- record framing (Phase 4c) ---
./target/release/examples/record_framing <file> # dump bytes at tagged-class defs + first tag occurrence
./target/release/examples/elem_table_probe <sample-dir> # Global/ElemTable structural sweep across releases
./target/release/examples/partitions_header_probe <sample-dir> # 44-byte Partitions/NN header + chunk offsets
./target/release/examples/contents_probe <file> # Contents stream decoder (creator name + build tag)
# --- stable anchors ---
./target/release/examples/partition_invariant <sample-dir> # find 165-byte invariant in Global/PartitionTable
./target/release/examples/partition_diff <sample-dir> # show the 2 varying bytes per release
./target/release/examples/partition_full <file> # full annotated hex dump + UUID decode
# --- write path (Phase 6) ---
./target/release/examples/roundtrip # copy 2024 sample, verify all 13 streams identicalEvery Revit file is a Microsoft Compound File Binary (OLE2) container with this stream layout (constant across 11 years of Revit releases):
<root>
├── BasicFileInfo UTF-16LE metadata
├── Contents custom 4-byte header + DEFLATE body
├── Formats/Latest DEFLATE — class schema inventory
├── Global/
│ ├── ContentDocuments tiny document list
│ ├── DocumentIncrementTable DEFLATE — change tracking
│ ├── ElemTable DEFLATE — element ID index
│ ├── History DEFLATE — edit history (GUIDs)
│ ├── Latest DEFLATE — current object state (17:1 ratio)
│ └── PartitionTable DEFLATE — partition metadata
├── PartAtom plain XML (Atom + Autodesk partatom namespace)
├── Partitions/NN bulk data: 5-10 concatenated DEFLATE segments
│ NN = 58, 60-69 for Revit 2016-2026
├── RevitPreview4.0 custom header + PNG thumbnail
└── TransmissionData UTF-16LE transmission metadata
All compressed streams use a "truncated gzip" format — the standard 10-byte
gzip header (magic 1F 8B 08 ...) followed by raw DEFLATE, but without
the trailing 8-byte CRC32 + ISIZE that conforming gzip writers produce.
Python's gzip.GzipFile and Rust's flate2::read::GzDecoder both refuse
these streams. The fix is to skip the 10-byte header manually and use
flate2::read::DeflateDecoder on the raw body.
| Layer | Description | Status |
|---|---|---|
| 1 · Container | OLE2 / Microsoft Compound File ([MS-CFB]) | Done |
| 2 · Compression | Truncated gzip → raw DEFLATE | Done |
| 3 · Stream framing | Per-stream custom headers, Partitions/NN chunk layout, Contents / Preview / PartitionTable wrappers |
Done — 165/167 bytes of PartitionTable invariant; 44-byte Partitions/NN header decoded; 62 19 22 05 wrapper magic confirmed on Contents + RevitPreview4.0 |
| 4a · Schema table | Class names + fields + C++ type signatures from Formats/Latest; per-class tag + parent + declared field count; cross-release tag-drift map |
Done |
| 4b · Schema→data link | Tags from Formats/Latest occur at ~340× the noise rate in Global/Latest; schema IS the live type dictionary for the object graph |
Done |
| 4c.1 · Record framing | Tagged class records in Formats/Latest parse into structured records: {tag, parent, ancestor_tag, declared_field_count}; HostObjAttr → {tag=107, parent=Symbol, ancestor_tag=0x0025 → APIVSTAMacroElem, declared_field_count=3} |
Done |
| 4c.2 · Field-body decoding | FieldType enum classifies 100% of schema fields across 8 variants (Primitive, String, Guid, ElementId, ElementIdRef, Pointer, Vector, Container). 11 discriminator bytes mapped, including generalized scalar-base Vector/Container ({kind} 0x10 ... / {kind} 0x50 ...) and the 0x0d point-type base. |
Done (100.00% on 13,570 fields across the 11-version corpus; zero Unknown) |
| 4d · ElemTable | Global/ElemTable header parser + rough record enumeration; record semantics TBD (blocked on per-element schema lookup) |
Partial |
| 5 · IFC4 export | Full spatial tree + per-element IFC entities + IfcLocalPlacement + IfcExtrudedAreaSolid + compound material layers + typed property sets + IfcOpeningElement/IfcRelVoidsElement/IfcRelFillsElement for doors and windows. Deterministic ISO-10303-21 output. IfcOpenShell + BlenderBIM verified. |
Done (rectangular profiles; swept / revolved / BRep fallbacks ship but use rectangular in the default emission path — IFC-17/24 is the remaining refinement) |
| 6 · Write path | Byte-preserving read-modify-write round-trip (13/13 streams identical); rvt-write CLI + JSON patch manifest + atomic temp-file rename + per-stream SHA verification (WRT-11..14). Stream-level patch is end-to-end; field-level semantic patching is Phase 7. |
Done (stream-level); field-level pending |
| 7 · Browser viewer | WebAssembly build of the core + Three.js + Vite + Pages deploy. Zero-upload, in-tab parse, export buttons for glTF/IFC/SVG, URL-state share. Live at https://drunkonjava.github.io/rvt-rs/. | Done (VW1-01..24) |
All 5 original P0 research questions (Q4-Q7) are resolved. Layer 4c.2 reaches 100.00% field-type classification on the 11-version reference corpus (13,570 total schema fields, zero Unknown). IFC4 emission, glTF export, 2D plan view, and the browser viewer all ship. The next frontier is real-world project-file corpus validation (Q-01) — one .rvt probe already caught a gzip_header_len bounds bug that family files never hit.
Key findings from this phase:
- Q4 The u16 "flag" word in each tagged-class preamble is a class-tag reference (ancestor / mixin / protocol). 9/9 non-zero values resolve to named classes in the same schema.
- Q5 Each field's
type_encodingis[byte category][u16 sub_type][optional body]. 9 category bytes mapped (0x01bool,0x02u16,0x04/0x05u32,0x06f32,0x07f64,0x08string,0x09GUID,0x0bu64,0x0ereference/container). - Q5.1 Coverage extended to 84% of fields.
- Q5.2 Coverage reaches 100% of fields (13,570 across 11 releases). Generalized
{scalar_base} 0x10 .../{scalar_base} 0x50 ...as vector/container modifiers; added0x0dpoint-type base; added0x08 0x60 ...alternate string encoding; addedElementIdRef { referenced_tag, sub }for references that carry a specific target-class tag; added deprecated0x03i32-alias seen only in 2016–2018. Seedocs/rvt-moat-break-reconnaissance.md§Q5.2. - Q6
Global/Latestis not an index + heap — it's a flat TLV stream. - Q6.1 Instance data is schema-directed (tag-less, protobuf-style). Decoding requires schema-first sequential walk from a known entry point.
- Q7
Partitions/NNtrailer u32 fields are not per-chunk offsets. Gzip-magic scan remains correct.
The full analysis narrative with 12 dated addenda lives in docs/rvt-moat-break-reconnaissance.md. Session-length synthesis in docs/rvt-phase4c-session-2026-04-19.md.
Integration tests run against 11 versions of Autodesk's public
rac_basic_sample_family RFA fixture (one per Revit release from 2016
through 2026). These are distributed via Git LFS in the phi-ag/rvt
repository. To pull them:
cd /path/to/rvt-recon/samples
git clone https://github.com/phi-ag/rvt.git _phiag
cd _phiag && git lfs pull
cd .. && cp _phiag/examples/Autodesk/*.rfa .The integration tests in tests/samples.rs skip any year whose RFA file
is absent, so partial corpora are okay — you'll just see
skipping 2024: sample not present messages.
- cfb crate over custom OLE parser — the
cfbcrate is mature, tested against Office documents, and handles both short and regular sectors. Faster than writing our own. - flate2 over miniz_oxide direct —
flate2wraps bothminiz_oxide(pure Rust) and libz backends. We pick the default pure-Rust build to avoid a C toolchain dependency. - quick-xml over xml-rs — ~3x faster, zero-copy friendly, and the
.from_str+ event-loop pattern is closer to what Go/Python parsers do. - encoding_rs over stdlib — Revit's UTF-16LE streams sometimes have
malformed pairs at boundaries (single-byte markers get interleaved).
encoding_rsrecovers gracefully where stdlib panics. - BTreeSet for class names — deterministic ordering in output (plus sorted JSON) matters for diffable CLI output.
cargo test --releaseExpected output (as of 2026-04-21):
test result: ok. 697 passed; 0 failed (lib unit tests)
test result: ok. 38 passed; 0 failed (fuzz-regression harness, Q-04)
test result: ok. 9 passed; 0 failed (integration tests, 11-version corpus)
test result: ok. 3 passed; 0 failed (ifc_roundtrip + ifc_synthetic_project/structural)
...
Integration tests are skipped if the sample RFAs are absent. The fuzz-regression harness (tests/fuzz_regressions.rs) runs hand-crafted adversarial inputs through each libFuzzer target's entry point — no libFuzzer runtime needed — so any future commit that regresses crash-resistance trips the gate locally.
- Code: Apache License 2.0. See
LICENSEfor the full text andNOTICEfor attribution detail. - Trademarks: "Autodesk" and "Revit" are registered trademarks of Autodesk, Inc. This project is not affiliated with, endorsed by, or sponsored by Autodesk. References to "Autodesk" and "Revit" in this project identify the file format this reader parses and are nominative fair use.
- Interoperability basis: reverse engineering for the purpose of creating an independently-developed interoperable program is recognised as lawful fair use under Sega Enterprises v. Accolade, 977 F.2d 1510 (9th Cir. 1992) and Sony Computer Entertainment v. Connectix, 203 F.3d 596 (9th Cir. 2000) in the United States, and under Article 6 of the EU Software Directive 2009/24/EC in the European Union. File formats themselves are not copyrightable subject matter (Baker v. Selden, 101 U.S. 99 (1879); Lotus Development v. Borland, 516 U.S. 233 (1996)).
- No Autodesk proprietary code is used, referenced, or
redistributed by this project. All file-format observations were
made by inspecting the bytes of publicly-shipped Autodesk sample
content and by parsing the public
RevitAPI.dllNuGet package's exported symbol list. SeeNOTICE.