RDPuzzle

Reconstructs RDP bitmap cache fragments into readable screenshots using neural edge-matching, auto-stitching, and OCR - all locally in the browser.

Loads 64x64 bitmap tiles, scores likely neighbors using HSV, Pearson correlation, and a self-trained neural network (EdgeMatch) that runs in the page, then semi-automatically stitches them into larger images. Has multi-tab workspaces, auto-stitching, low-information tile filtering, OCR, undo/redo, session save/load.

Use it directly

Demo:

RDPuzzle.Demo.mp4

Why

The problem is that current tooling for this is very manual and time consuming.

This tool is my attempt to fix that. The goal is to take an artifact that has always been powerful but impractical at scale, and make it actually usable in a real IR workflow, without having to spend half a day on creating a few snippets.

Features

Neural EdgeMatch scoring (self-trained model, runs in-browser via ONNX)
Auto-stitching with configurable thresholds
OCR
Single HTML file, nothing to install
Low-information tile filtering (ignores blank/solid tiles)
Near-duplicate tile skipping on import (99% visual similarity threshold)
Loads RDP bitmap cache fragments (BMC, BIN)
Move and swap placed tiles
Multi-tab workspaces
Undo / redo
Save and load sessions
Export reconstructed grids as images
All tiles displayed in a grid, drag them onto a reconstruction canvas
Import image tiles directly

Matching

The tool combines several signals to decide if two tiles are likely neighbors. When a cell has more than one placed neighbor, each one contributes evidence, so matches get stronger the more neighbors it has.

Combined score:

HSV score x HSV weight + Pearson score x Pearson weight + EdgeMatch x EdgeMatch weight

Weights are configurable. A low-information multiplier penalizes blank or near-solid tiles.

HSV

Bundled single metric combining two signals:

70% HSV histogram overlap across adjacent tile edges
30% Per-pixel touching-edge color continuity

Good for UI regions, photos, gradients. Weak on flat or repetitive areas.

Pearson

Compares luminance relationship between two touching edges. Useful when brightness patterns continue across boundaries. Dynamically down-weighted when either tile in a pair is low-detail.

EdgeMatch (neural)

It's a self trained CNN based on InfoNCE that turns tiles and a direction into a 256-dimensional embedding. Tiles that are likely neighbors end up with similar embeddings. The model was trained on real RDP session data with hard-negative mining.

Architecture:

Input is a 64x64 RGB tile plus a direction (left/right/top/bottom). Before the CNN sees it, the tile is rotated or flipped so the edge we care about is always on the right. That way the network only needs to learn right-edge matching, and the direction is handled separately.

The backbone is four ResNet stages with GroupNorm. Channel progression is 3 -> 32 -> 64 -> 128 -> 256. After the backbone, a learned per-direction FiLM modulation scales the features differently for each direction.

There's also a separate small conv net that processes only the right-edge strip of pixels (16px wide by default). This gives the model a dedicated pathway that focuses on the actual tile boundary.

A learned 32-dim embedding for each direction preserves context lost during canonicalization.

Everything gets concatenated (backbone features + edge features + side embedding) and fed through two hidden layers to produce a 256-dim vector, L2 normalized.

Usage: Given an anchor tile and a direction, compute its embedding, compute candidate embeddings on the opposing side, compare by cosine similarity. Higher similarity = more likely to be the actual neighbor.

EdgeMatch handles things that simple pixel comparison doesn't: photos, anti-aliased UI, text-heavy tiles, noisy regions, gradients.

Auto-Stitch

Grows reconstructions conservatively:

Start from user-placed "seed" tiles if any exist
Score all candidates for the frontier (empty cells adjacent to placed tiles)
Place the strongest match first (above threshold)
Recalculate (because new tiles can create multi-neighbor opportunities)
Repeat until no frontier match qualifies
Then start a new "island" cell from the unused tile that has the highest "detail" score
Continue until no good matches remain or island limit is reached

The island limit controls how many new islands auto-stitch can start. User-placed seeds don't count against it. For best results, try matching a few cells before auto-stitching.

Low-information tile filtering

Detects blank, solid, or near-empty tiles because solid color background or cmd tiles hold no information and don't help with adjacency. Calculated based on luminance variance, gradient energy, Sobel edge density, entropy. Below-threshold tiles are penalized and reduce their neighborship scores.

Cache order

RDP cache order is not a reliable timeline. The cache stores reusable fragments for performance. File order may hint at something but visual matching is the primary signal.

Saving and loading

Sessions save tile metadata, grid placement, tabs, settings, undo history, reconstruction state and embeddings.

OCR

Uses OCR to identify text-heavy tiles. Helpful for browser fragments, terminal windows, document text, dialogs. Assistive signal only. Two OCR options are available, tesseract running on 4 threads for performance, or paddleOCR running in WebGPU.

Running

Open in a browser. Chrome/Edge/Chromium recommended (ONNX inference, canvas, drag/drop, OCR workers).

Privacy

Runs locally. Parsing, scoring, stitching, export all client-side. Some OCR/model assets may load from external sources depending on configuration.

Limitations

Cache fragments may be incomplete
Fragments aren't chronological
Repeated UI patterns may cause false matches
Flat backgrounds are hard to stitch but should be penalized by the amount of "data" they hold
Neural similarity is a ranking signal, not proof
Auto-stitch still needs manual correction

License

RDPuzzle is available free of charge for personal, educational, academic, and non-commercial research use.

Commercial use, including use by companies, consultancies, MDR/IR providers, internal corporate security teams, or commercial forensic services, requires a separate commercial license.

For commercial licensing, contact: mrdanielbenzano@gmail.com

Author

Daniel Ben Zano

Special thanks to Tal Gaffen

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
samples		samples
.gitattributes		.gitattributes
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
EdgeMatch_Big.onnx		EdgeMatch_Big.onnx
EdgeMatch_Small.onnx		EdgeMatch_Small.onnx
LICENSE		LICENSE
RDPuzzle.html		RDPuzzle.html
README.md		README.md
SECURITY.md		SECURITY.md
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDPuzzle

Why

Features

Matching

HSV

Pearson

EdgeMatch (neural)

Auto-Stitch

Low-information tile filtering

Cache order

Saving and loading

OCR

Running

Privacy

Limitations

License

Author

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RDPuzzle

Why

Features

Matching

HSV

Pearson

EdgeMatch (neural)

Auto-Stitch

Low-information tile filtering

Cache order

Saving and loading

OCR

Running

Privacy

Limitations

License

Author

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages