Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Fetching contributors…

Cannot retrieve contributors at this time

380 lines (303 sloc) 13.04 kb
- Specification for Media Processing
This specification is intended to outline the media processing stages and their
outputs.
CGR: this document is mostly for my benefit so I know what I'm making.
-- Stages
The expectation is that each media processing stage will either import
external data or reprocess existing data producing output in the form of
new data or annotation on existing data (which is used by later steps).
The processing stages are as follows:
prepare -> import -> index -> pick-and-mix -> encode -> preview -> thumbnail
With the exception of the prepare/import stage, all stages should be re-executable
at and only generate new output where a change has occured in their input.
-- Directory Structure
Each item in the media catalogue is held in a separate subdirectory at the
within the catalogue directory. The subdirectory name identifies the specific
media item.
Within the file and directory holding a media item the structure is as follows:
/description.json
A description of the media item used to inform the web frontend.
This file is generated and updated by the media processing stage.
/sources.json
Metadata on the source media files.
This file is generated on the index stage and used to inform the encode
stage. If this file is updated it should preserve data in the user
editable fields (see specification).
/encoding.json
Contains parameters which inform the encode stage, such as subtitle
shifts and output compositions.
This file should be generated by the pick-and-mix stage if it
does not exist. If this file is updated it should preserve user
modifications.
/tags.txt
Holds (initial) tags on the media item, one per line.
This file is to be generated with an initial set of tags during
the import stage. It's contents are reprocessed into description.json.
This is not JSON so that it is easier for users to update.
/source/
Contains the video, audio, subtitles and images used to generate the media
item. Metadata on these files is held in sources.json.
/preview/
Contains preview videos generated by the preview stage.
The expectation is that many of these may exist to support different client devices.
Specifications of these are held in description.json.
This directory should be publically accessible.
/thumbnail/
Contains thumbnails of the video content generated by the thumbnail stage.
A list and details of thumbnails is held in description.json.
This directory should be publically accessible.
/video/
Contains the full scale video content generated by the encode stage.
/temp/
Contains temporary files used when encoding previews and videos.
-- Specification: description.json
This file provides the description of a given media item to the frontend.
The following example is given to elucidate all fields.
{
'name' : 'test item', -- identifier (as directory name)
'source' : 'path/test item', -- full path of import source
'tags' : [ 'j-pop', 'the pillows' ], -- tags from tags.txt
'previews': [ -- in video order
{
'target' : 'iphone', -- system type for playback
'url' : 'preview/0_iphone.mp4', -- url
'name' : '0_iphone.mp4', -- name of preview
'src' : '0.mp4', -- name of preview source
'width' : 480, -- in pixels
'height' : 320, -- in pixels
'length' : 30.0, -- in seconds
'size' : 105834, -- in bytes
'language' : 'eng' -- of subtitles
},
{
'target' : 'android',
'url' : 'preview/0_android.mp4',
'name' : '0_android.mp4',
'src' : '0.mp4',
'width' : 320,
'height' : 240,
'length' : 30.0,
'size' : 187502,
'language' : 'eng'
}
],
'thumbnails': [ -- in presentation order
{
'url' : 'thumbnail/0_0.jpg', -- url
'name' : '0_0.jpg', -- name of thumbnail
'src' : '0.mp4', -- name of thumbnail source
'width' : 120, -- in pixels
'height' : 80, -- in pixels
'size' : 8192, -- in bytes
'time-index': 3.7 -- in seconds (offset in video)
},
{
'url' : 'thumbnail/0_1.jpg',
'name' : '0_1.jpg',
'src' : '0.mp4',
'width' : 120,
'height' : 80,
'size' : 8192,
'time-index': 16.2
}
],
'subtitles': [
{
'url' : 'source/subtitles.ssa',
'name' : 'subtitles.ssa',
'language' : 'eng',
'lines' : [ ... ]
}
],
'videos': [
{
'url' : 'video/0.mp4', -- url
'name' : '0.mp4', -- matches encoding name
'width' : 1024, -- in pixels
'height' : 768, -- in pixels
'length' : 190.2, -- in seconds
'size' : 4292982, -- in bytes
"fps" : 23.98, -- frames per second
'language' : 'eng', -- of subtitles
'encode-hash': '03972619348', -- internal to encode stage
}
]
}
-- Specification: sources.json
This file holds metadata extracted from files in the source directory.
Data is held as a dictionary where the key is the source file name.
The following example is given to document all possible fields. Only fields
marked as "user modifiable" should be updated by the user, all other fields
are generated and maintained by the processing stages.
{
'video.avi': {
'width' : 1024, -- in pixels
'height' : 768, -- in pixels
'aspect' : '4:3', -- to be defined later
'length' : 10.5, -- in seconds
"fps" : 23.98, -- frames per second
'md5' : ..., -- of data
'mtime' : 0, -- of file (UTC Unix timestamp)
'score' : 1.0, -- user modifiable, subjective score
'language': 'und' -- user modifiable
},
'audio.mp3': {
'length' : 10.1, -- in seconds
'md5' : ..., -- of data
'mtime' : 0, -- of file (UTC Unix timestamp)
'score' : 1.0, -- user modifiable, subjective score
'language': 'jpn' -- user modifiable
},
'subtitles.ssa': {
'md5' : ..., -- of data
'mtime' : 0, -- of file (UTC Unix timestamp)
'language': 'en' -- user modifiable
},
'image.bmp': {
'width' : 1024, -- in pixels
'height' : 768 -- in pixels
'md5' : ..., -- of data
'mtime' : 0, -- of file (UTC Unix timestamp)
}
-- Specification: encoding.json
This file holds information about the composition of source files to produce
output videos. It expected that the pick-and-mix stage generate a suitable
compositions based on the available source files. The user may need to update
parameters in this file, e.g. subtitle-shift, hence any programmatic updates
to this file must take care to preserve user changes.
There are four supported types of composition:
1. video
2. video + subtitles
3. video + audio + subtitles
4. image + audio + subtitles
Where the video file is expected to contain audio, which is overriden if an
external audio file is supplied. Subtitles within a video file are never
used.
--- Example 1: video
{
'0.mp4': {
'video' : 'video.avi', -- source file
'language' : 'eng' -- of output file
}
}
--- Example 2: video + subtitles
{
'0.mp4': {
'video' : 'video.avi', -- source file
'subtitles' : 'subtitles.ssa', -- source file
'subtitle-shift': 0.0, -- in seconds relative to video
'language' : 'eng' -- of output file
}
}
--- Example 3: video + audio + subtitles
{
'0.mp4': {
'video' : 'video.avi', -- source file
'audio' : 'audio.mp3', -- source file
'audio-shift' : 0.0, -- in seconds relative to video
'subtitles' : 'subtitles.ssa', -- source file
'subtitle-shift': 0.0, -- in seconds relative to video
'language' : 'eng' -- of output file
}
}
--- Example 4: image + audio + subtitles
{
'0.mp4': {
'image' : 'image.bmp', -- source file
'audio' : 'audio.mp3', -- source file
'subtitles' : 'subtitles.ssa', -- source file
'subtitle-shift': 0.0, -- in seconds relative to video
'language' : 'eng' -- of output file
}
}
-- Stage: import
Take data from existing system (i.e. the old system) and place it in catalogue layout.
For each media item:
1. locate all associated files
2. create directory in catalogue (modifying item name if required)
3. create directory structure
4. copy the files to the /sources/ directory
5. (if required) extract subtitle files from video files
6. populate /tags.txt using regular expressions on source path and metadata
In step 6 knowledge of the source system is expected, for example: files in the
"J-Pop" directory should be given the tag "j-pop".
-- Stage: index
Extract metadata from the media sources.
For each media item:
1. (if it exists) load the sources.json
2. scan the /sources/ directory
3. for each file not in sources.json, or with an mtime inconsistent with
that stored in sources.json:
3.1. extract metadata like size and length
3.2. update metadata in sources.json
(making sure not to overwrite user modifiable fields)
4. update description.json with details of subtitle files
For the purposes of metadata extraction, if the language of a source is not
identifiable it should be recorded as 'und'. This allows on the fly
replacement of 'und' based on simple heuristics at runtime.
The md5 checksum maybe used to avoid processor intensive metadata extraction
if a file's contents have not changed.
-- Stage: pick-and-mix
For each media item where encoding.json does not exist create suitable
compositions based on the sources described in sources.json.
With preference to sources of higher quality, a composition should be created
for every matching audio and subtitle language, such that it is one of:
1. video + audio + subtitle
2. video + subtitle
3. image + audio + subtitle
4. video
In descending order of preference.
The first composition should be with english subtitles (if available).
-- Stage: encode
Process full media files for playback in the queue player.
For each media item:
1. load sources.json
2. load encoding.json
3. calculate the parameter hash of each composition (see below)
4. load description.json
5. for each composition where the parameter hash is not consistent with
description.json 'encode-hash',
or if description.json is absent,
or if the composition file does not exist:
5.1. encode the composition to the associated output file
5.2. update description.json with details of the new output
6. remove non-existent compositions from description.json
-- Stage: preview
Generate preview files for viewing on mobile clients.
For each media item:
1. load description.json
2. for each video, for each target mtime of the source video is after the
mtime of the preview,
or there is no preview in description.json,
or there is no file for the preview:
2.1. encode the preview for the target
2.2. update description.json
3. remove any previews and associated files where they do not map to a
composition in description.json
File names for previews should be composed: <video index> . '_' . <target name>
e.g. '0_iphone.mp4'.
--- Target specifications
The following encoding formats and specifications should be used for targets:
* iphone: h.264, baseline, generaton 3.0
* ipad:
* android:
TODO: finish these
-- Stage: thumbnail
Generate thumbnails for each video.
For each media item:
1. load description.json
2. for each thumbnail that exists, if the mtime of the source is after the
mtime of the thumbnail:
2.1. remove the thumbnail
3. for each video:
3.1. generate time indexes of thumbnails (see note below)
3.2. for each of these indexes:
3.2.1. make sure thumbnail entry exists for time index
3.2.2. update descriptions.json
3.3. if thumbnail files are missing for any thumbnail entries:
3.3.1. generate thumbnail
Thumbnails should be taken from 20%, 40%, 60% and 80% through the files time
length. In future work these time indexes should be adjusted based on the
visual stability and interest of frames around those time indexes.
Thumbnails have a width of 240 pixels and height dependent on aspect ratio of
source.
Jump to Line
Something went wrong with that request. Please try again.