Skip to content
kcgen edited this page May 29, 2023 · 3 revisions

DOS ROM format

Requirements

Completion level: 75%. Please review and refine the requirements so they can be finalized.

Preamble

The requirements describe what's needed or wanted, as opposed to how they will be provided.

We use Shall to indicate make-or-break requirements. These must be met and their implementations verified.

Shall is an awkward word, however the goal of this document is to ensure everyone's working with the same understanding, so it's important we use unambiguous wording.

If you see a shall requirement that would not deter you from using the DOS ROM Format, then please speak up so it can be considered as an optional requirement.

We use Should for requirements that aren't make-or-break. These may be implemented after the hard requirements, or in the worse case, they may not be implemented at all. If you see a should requirement that, if not implemented, will prevent you from using the DOS ROM format - then please speak up so it can be moved to a shall.

The ROM Shall be Self-Contained

Zero or more files and/or subdirectories may be packaged into the ROM and the result will be a single file, and only a single file, capable of meeting the requirements listed here.

The Content Shall be Immutable

All content in the ROM will not modifiable when mounted.

Although the scope of this ROM format is limited to the binary file and its format, DOS programs will still need the ability to update existing files as well as add or delete files from the archive.

In these cases, changes will not touch the ROM itself. Changes will need to be managed externally in a separate writable area. DOSBox already includes this using an 'overlay' mount-point, however, separate requirements should be written detailing that behavior.

Packaging a ROM Shall be Easy

Adoption of the ROM format will hinge on minimizing the extra effort it takes to create a ROM and iteratively work on a game packaged in an ROM. Therefore, it needs to be as simple as possible to both pack and unpack a ROM.

Emphasis (and effort) should go toward packing as opposed to inline-updates or unpacking, because integrators are likely to maintain a master tree from which new ROMs are periodically generated. Prior ROMs will simply be deprecated or discarded.

Content Shall be Compressed

If deemed compressible, ROM content will be compressed using best-in-class, widely used, and well supported compression codec(s).

The resulting ROM size need to be competitive against current compression formats (Circa 2020 examples include ZIP, RAR, 7zip, and tar.xz/gz) that users deem the ROM format an acceptable choice for storing DOS games.

Context: compression codecs (deflate, gzip, LZMA, ZStandard, ..) are mathematical algorithms that compress some input data into output data. Container formats, which aren't discussed in this requirement, describe how the output data is stored and organized in a file (ZIP, RAR, tar, ...).

DOS Content Shall be Organized into Drive Mount Directories

The ROM format will provide a layout to associate content with drives and drive-types. Images for floppies, CDROMs, and HDDs will follow a naming pattern.

When the ROM is opened, the content and/or images will be mounted acordingly.

For example, some games want floppies mounted in A: and B:. Some want a basic C: and their game directory under that, while some early HDD games used a floppy for save-games. Late-stage DOS games typically installed to C: and used one or more CDROM images rotated through D: or even allowed multiple CDROM drives to service the game's needs.

The design and implementation will specify the actual layout and image naming, however the follow is provided merely as an example:

A 2-disk game that uses A: and B: floppy drives:

ROM/
 |- a_floppy/image.img
 `- b_floppy/image.img

A game that uses three floppies, all rotated through A:

ROM/
 `- a_floppy/image-1.img, image-2.img, image-3.img

A HDD-installed game that uses one multimedia CDROM in D:

ROM/
 |- c_hdd/game/...
 `- d_cdrom/image.iso

A HDD-installed game that uses one multi-mode CD-CA CDROM in D:

ROM/
 |- c_hdd/jones/<files> ...
 `- d_cdrom/image.cue, image.bin

A HDD-installed game that uses 4 CDROMs rotated through D:

ROM/
 |- c_hdd/wc4/<files> ...
 `- d_cdrom/image-1.cue, image-1.bin, image-2.cue, image-2.bin, ...

A HDD-installed game that can read all it's CDROMs from multiple drives, shown with various CDROM-content options.

ROM/
 |- c_hdd/muppets/<files> ...
 |- d_cdrom/image.cue, image.bin
 |- e_cdrom/image.iso
 `- f_cdrom/<files>

Performance Shall be No Slower Than Baseline

When a ROM is in use, reading content from it will be just as fast versus if the ROM format was not used.

Note that if performance tests are conducted, fair cache states must be ensured. That is, both the ROM and external directory should be in cold-cache prior to testing.

Content Shall be Recoverable

The files, directory structure, and their date/time stamps placed into the ROM format will be restorable exactly as such, in bit-identical form.

Corruption Shall be Detectable

Data checksum record(s) will be generated at the time of ROM creation, and will allow the integrity of the content to be verified.

Integrity records should (typically) make up no more than 1% of the total ROM size.

Well maintained, open-standard, and 3rd party tested technology should be used as the basis of the integrity records. In addition, the solution should provide best-in-class false positive rates (to minimize collision-odds) and throughput (to maximize performance).

Extraction Shall be Possible Using Multiple Tools (Future-Proof)

The format will be extractable by more than one open-source, separately maintained off-the-shelf tools.

This will permit archival restoration and avoid the ROMs being "locked up" due to employing a one-off format in the event the main project becomes defunct or unmaintained.

This doesn't limit the format to the most ubiquitous formats, such as 1970s TAR or Phil Katz's 1989 ZIP format. Current, recent, and niche formats are all on the table provided they meet this requirement.

It Shall Be Packed in a Consistent Order

The packing process will follow a consistent order such that unchanged sequences of files will result in the same sequence of packed blocks but possibly at a different starting position within the updated package.

This property of having sequences of unchanged blocks will make the package suitable for block-level delta transfers, such as using the zsync protocol.

Using zsync as an example, a server(s) can host the most up-to-date ROM packages along with a corresponding .zsync file per package.

Users will be able to perform a delta-update using zsync regardless if their ROM package is one or several revisions out-of-date. In all cases, only the minimum number of compressed blocks will be transfered to get the user's package in-sync with the hosted revision.

The Format Shall be Extensible and Versioned

Over time, new or unforeseen needs might arise that necessitate changing or adding to the ROM format itself. The design shall allow this without damaging backward compatibility; ie: be extensible.

The ROM format itself will be versioned, and that version shall existing inside the generated ROMs.

The ROM will provide a way (or be capable of) being asked for its version.

All future ROM formats will provide the version content in the same place and form, such that older software implementations will be able to query newer versions of the ROM format (and not read garbage or an absent file).

The format of the version must be sortable as strings without requiring numeric inference. For example:

  • 2020.01.01
  • 2020.07.20
  • 2020.11.02

This is merely an illustrative example; the design will specify the actual format.

An Optional Primary Executable Entry Point Shall be Definable

The format shall define a default executable filename (likely a .bat file), that if it exists, will be launched automatically on startup (see requirement below).

The ROM shall provide a way (or be capable of) being asked if a default entry point exists.

An Optional Alternate Executable Entry Point Shall be Definable

The format shall define an alternate executable filename (likely an installation or configuration .bat file), that if it exists, will be launched instead of the primary executable point, when requested.

The ROM should provide a way (or be capable of) being asked if an alternate entry point exists.

It Should Accept an Integration Version and Date

The integrator should be able to apply a version (in the format described above) and current or specified date-stamp.

The purpose of this is to version the integration task, to allow the same content to to undergo additions and fixes over-time, and for those ROMs to be easily differentiated and compared (and deprecated versions easily pruned).

It Should Define an Extensible Metadata File and Format

It should be possible and easy to query/see, update, and add or remove metadata describing the content.

An extensible best-in-class human-centered format should be used. Metadata fields covering attributes of the majority of commercial and home-brew titles should be defined.

Examples of fields might include: name, alternate_name(s), publisher, developer, release_date, game_version, patched, copy_protection_type, crack_type, cracked_by, description, genre, perspective, gameplay, setting, credit(s).

This might also include system requirements such as: machine-type, cpu speed, ram, cdrom, floppy, and hdd space; and supported hardware such as video modes, sound cards for effects, sound cards for music, general midi, cdrom-audio, joystick, and mouse control.

The metadata format itself should be versioned and the specific version included when writing metadata. This is to permit changes over time, on an as-needed basis.

This metadata file should use a standardized name and path. Once defined, it should remain as-is for all future revisions of the ROM format (to ensure backward compatibility).

It Should Define a Layout for Metadata Files

It should be possible and easy to query/see, update, add, or remove file-based supporting metadata.

A directory layout and standardized filenames for metadata files should be defined. These standardized names should include the ability to include more than one or sequential records of the same type.

An example of some file-based metadata, as examples:

  • box cover
  • box side
  • box back
  • floppy
  • floppy sleive front
  • floppy sleive back
  • cdrom
  • cdrom booklet
  • cdrom case front
  • cdrom case back
  • manual
  • guide book
  • strategy book
  • hint book
  • map
  • quick ref
  • copy protection ref
  • code wheel
  • warranty
  • registration
  • notice
  • promo

It Should Allow the Addition of a Digital Signature during Creation

Background

The purpose of the signature is to provide a crypographically-secure way to positively confirm a ROM is binary-identical to the signer's, at the time of generation.

The key is here "positively confirm". It's entirely possible for someone to modify the ROM and implant their own signature, however they will never be able to impersonate the original signer because they don't have the original signer's private key. They will simply be able to put their own (different) signature on the ROM, which users can detect and discard as not from the desired release team.

Description

The creator may digitally sign their ROM using an RFC-4880 compiliant (public-private key pair) detached signature during the ROM generation process.

The resulting detached signature shall be made part of the ROM in a way that can be cleanly separated from the ROM container format during the verification process. One example of how to do this is to simply append the signature to the end of the container's format.

Existing 3rd party best-in-class tools should be used.

Effort should be made to stream-line and simplify this for the user. They shouldn't have to become public-key infrastructure experts or GPG's command-line tool wizards. Enabling signed ROM creation should be as easy as it was to enable ARJ's "envelope" feature (ie: a simple argument or setting).

To allow verification of the signature, the signer's public key should be provided inside the ROM format.

It Should Define the Name and Location for the Signer's Public Key

If the ROM creator included their signature, then their public key shall be copied into a fixed metadata location with standardized name such that it can be used during the signature authentication process.

It Should Allow the Authentication of a Digital Signature

If a ROM contains a digital signature and public key in the expected locations, the format should allow the verification of the two records.

Existing 3rd party best-in-class tools shall be leveraged, and effort should be made to stream-line and simplify this.

Design

Completion level: 10%. This is a First cut meant to get ideas and discussion going.

Besides meeting the requirements, the design needs to take into consideration the pros and cons of the project's ability to realistically achieve the desired design elements, the technical lay of the land (exising libraries), and any other real-world constraints or issues.

Container Format

Leverage SquashFS, its portable library and user-space tools to meet our needs (See compression).

This standardization will transform a general .sqfs ROM into a DOS ROM Pack. It will be extractable by current generation SquashFS tools, however its content will none-the-less be tailored specifically to the needs described here.

Behaviour

In order for the format to hold records suitable for configuring DOSBox on startup (similar to an internal conf file), the ROM will need to be opened and "mounted" in a host-like manner with the content available very early in DOSBox's startup sequence, similar to when the existing conf files are loaded.

Compression

Zstandard is the ideal compression codec that meets the requirements as:

  • It's in wide-spread use (RPM, Deb, Linux kernel, TAR/zstd, and even ZIP)
  • It's dual-licensed BSD and GPL-2
  • Its maintainers are funded by large corporations (Facebook, Google, etc)
  • Performance-wise, it reaches the current Pareto frontie with decompression rates faster than any other currently-available algorithm while providing similar or better compression ratios (often approaching LZMA and XZ).

The MAME project wants ZStandard added to CHD contributed upstream and libchdr, the library to read it, is also not opposed to adding ZStandard (https://github.com/rtissera/libchdr/issues/35).

Performance

On a 4 GHz Skylake-based processor, a single ZStandard thread is capable of decompressing at roughly 2.7 GiB/s. Assuming an SSD read-rate of 600 MiB/s and HDD read-rate of 120 MiB/s, we can expect the following performance:

  • Data that's 20% compressible will be read and decompressed in roughly the same time it takes the read the data if it were left uncompressed on the SSD and 20% faster than reading it from HDD.
  • Data that's 40% compressible will be read and decompressed roughly 37% faster than reading it uncompressed from the SSD and 61% faster than reading it uncompressed from the HDD.
  • Data that's 65% compressible will be read and decompressed in less than half the time it takes to read it uncompressed data from the SSD and in one third of the time it takes to read it uncompressed from the HDD.

If we scale these numbers down to a Raspberry Pi 3, assuming a decompression rate of 750 MiB/s and SDCard read speed of 20 MiB/s:

  • 20% compression reduces read duration by 18%.
  • 40% compression reduces read duration by 39%.
  • 65% compression reduces read duration by 64%.

To further accelerate the performance, it's suggested that threaded read-ahead and a last-recently used cache be added to libchdr. The author of libchdr is ameanable to both.

More Design elements coming ...

Implementation

The https://github.com/Exzap/ZArchive project looks like a perfect implementation. Thanks to @granminigun for discovering it!

Clone this wiki locally