MediaConch Matroska Survey

Introduction

This repository contains a research corpus used in the development of the MediaConch. These collections contain MediaArea XML documents which contain both a MediaInfo report and a MediaTrace report. Between these reports most of the structure of Matroska files is documented along with a list of significant characteristics of the file.

Whenever an XML file exceeds a size of 2 megabytes, the file is compressed using gzip compression before being added to the repository, which reduces size substantially. To read a gzip compressed file it is recommended to use gzcat, such as:

gzcat MatroskaFile.mkv_maxml.xml.gz | grep "<CompleteName>"

Because the majority of the structure of a Matroska is used within the Cluster elements, there may also appear a file with a "_nocluster" suffix. This xml report is the same as its neighboring xml but has all MediaTrace elements that document the Cluster elements removed. This allows nearly every sample to be documented by an xml file that can be under 2 MB in size.

For instance the file at (1919) Das Cabinet des Dr. Caligari.mkv_maxml.xml.gz represents a gzipped archive of a MediaArea XML containing both a MediaInfo and MediaTrace report on a Matroska file of Das Cabinet Des Dr. Caligari. The file called (1919) Das Cabinet des Dr. Caligari.mkv_maxml.xml_cluster.xml presents the same report but without the reporting on the Matroska Cluster elements within MediaTrace.

Collections

archive_org

The archive.org collection consists of Matroska files identified in the public collections of archive.org. Within this collection each subdirectory represents an Internet Archive asset identifier with each file within that being named after the source file of that asset.

Name		Name	Last commit message	Last commit date
Latest commit History 2,117 Commits
archive_org		archive_org
README.md		README.md
archive_org_matroska_files.txt		archive_org_matroska_files.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MediaConch Matroska Survey

Introduction

Collections

archive_org

About

Releases

Packages

MediaArea/MediaConch_MKVSurvey

Folders and files

Latest commit

History

Repository files navigation

MediaConch Matroska Survey

Introduction

Collections

archive_org

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages