file-structure.txt

File Structure
==============

These are considerations concerning the future design of the file structure
layout.

Goals
-----

Balance between:
- Human readable
- Machine readable

- Well reasoned

Open issues
-----------

- Should recording/user directory names contain more than the uuid?
    - Just the UUIDs are not very human readable.
    - We could add metadata information in the directory, like so:
        - Speaker: andy-<uuid>
        - Recording: andy,peter_the-story-of-the-mountain-<uuid>
    -> We can still extract the uuid data, but it also "speaks" to a user.
        - Problem with the approach: If metadata is changed, the directory name would change.

- Should the mapping of original and respoken segments be in samples, or time (as with the transcriptions)?
    - Time: Resistant to resampling of recordings.
    - Samples: Easier to handle on the machine side (no recalculations necessary).

- Should commentaries also be subordinate to "original" recordings?
    - In the example below, should <uuid2> be in a subdirectory /commentaries under <uuid1>.
    -> Only actual original recordings would be top level in the /recordings directory.
    -> Deeper structure, everything is subordinate to the "original" recording. 

Overview & Example
------------------

Note that <uuidN> stands for a generated UUID.
uuid1 and uuid1 are the same while uuid1 and uuid2 are different uuids.

/aikuma
    
    # Contains original recordings.
    #
    /recordings
        
        # All data of the recording uuid1 (and commentaries/transcriptions) are in this directory.
        #
        /<uuid1>
        
            # The audio data in WAV format.
            #
            data.wav
            
            # Metadata of this recording:
            #   { "uuid":"<uuid2>", "people":["<uuid3>","<uuid4>"], "languages":["usa","gsw"],
            #   "description":"...", "location":"12.3456,-98.7654", "timestamp":"2007-04-05T14:30Z" }
            #
            metadata.json
            
            # Human readable version of metadata.json.
            #
            # Only ever gets written, never read by a machine.
            #
            # Example:
            #   uuid: <uuid2>
            #   people: <uuid3>
            #   languages: usa, gsw
            #   description: Blah.
            #   location: 12.3456,-98.7654
            #   timestamp: 2007-04-05T14:30Z
            #
            # Note: We could use YAML here http://en.wikipedia.org/wiki/YAML.
            #
            metadata.txt
            
            # Directory for audio commentaries (respeakings/translations).
            #
            /commentaries
            
                # All data of commentary uuid2.
                #
                /<uuid2>
                    
                    # The audio data in WAV format.
                    #
                    data.wav
                    
                    # Metadata of this commentary:
                    # 
                    #
                    metadata.json
            
                    # The mapping CSV, mapping original segment <start,length> in
                    # seconds to respoken/translated segment <start,length> in seconds.
                    #
                    # Example:
                    #   0.0000000,1.1234567,0.0000000,2.1234567
                    #   1.0123456,3.1234567,2.1234567,4.1234567
                    #
                    mapping.csv
                    
                    # Transcriptions of this commentary.
                    #
                    /transcriptions
                        
                        # All data of transcription uuid6 are in this directory.
                        #
                        /<uuid6>
                            
                            #
                            #
                            metadata.json
                            
                            # Transcription data for the commentary uuid2.
                            #
                            # Example:
                            #   0.0000000,4.5678900,"Four score and..."
                            #   4.5678900,2.1234567,"hello, world!"
                            #   6.6913467,1.2345678,"Ding!"
                            #
                            transcript.csv
                    
                    # Images of commentary uuid2.
                    #
                    /images
                        /<uuid7>
                            metadata.json
                            image.jpg
                        /<uuid8>
                            metadata.json
                            image.jpg
            
            /transcriptions
                /<uuid4>
                    # Metadata of this transcription:
                    #
                    # It is a simple transcription if the language of the recording
                    # is the same as the language of the transcription.
                    metadata.json
                    
                    # Transcription data for the recording uuid1.
                    #
                    # 0.0000000,4.5678900,"Four score and..."
                    # 4.5678900,2.1234567,"hello, world!"
                    # 6.6913467,1.2345678,"Ding!"
                    #
                    mapping.csv
                /<uuid5>
                    # Metadata of the transcription.
                    #
                    # It is a translation if the language of the recording is
                    # different from the language of the transcription.
                    #
                    metadata.json
                    
                    # Transcription data for the recording uuid1.
                    #
                    # 0.0000000,4.5678900,"Honni soit qui mal y pense."
                    # 4.5678900,2.1234567,"Bonjour."
                    # 6.6913467,1.2345678,"Et voilà!"
                    #
                    mapping.csv
    
    # People - referred to in metadata.json of recordings/transcriptions/commentaries.
    #
    /people
        
        # Person.
        #
        /<uuid3>
        
            # Metadata of person uuid3.
            #
            # { "uuid":"<uuid1>", "languages":["usa","gsw"], "description":"...",
            # "location":"12.3456,-98.7654", "timestamp":"2007-04-05T14:30Z" }
            #
            metadata.json
            
            # Picture for person uuid3.
            #
            image.jpg
        /...

File internals
--------------

- Generally, open ended, database "record"-like data is stored as CSV (lists).
- Metadata structure which needs to be expanded in the future is stored as JSON (hashes).

This is usually both human-readable and easily processed via machine (and libraries are readily available).


Recording data.wav
------------------

- The data.wav contains only raw data.
- While WAVs can contain additional metadata in XMP format in the INFO chunk, we do not use it as not all applications can handle metadata in WAVs (and may strip it).


Recording metadata.json
-----------------------

- We use JSON hashes, as this enables us to:
    - extend the metadata
    - load "old" versions (applications need to be able to load hashes that might not contain all information)
- However: We do not store hash values that are open ended in size, as that makes the metadata hard to use for humans. Examples of this is the commentary mapping data which is far easier to read in a CSV format.
- If the recording references another recording (i.e. it is a commentary of another), that recording is referenced via "parent_uuid".

Example:
{ "uuid":"<uuid2>", "parent_uuid":"<uuid1>", "languages":["usa","gsw"], "description":"...", "location":"12.3456,-98.7654", "timestamp":"2007-04-05T14:30Z" }

location: Decimal coordinates (http://en.wikipedia.org/wiki/Geographic_coordinate_conversion#Ways_of_writing_coordinates)
timestamp: ISO 8601 (http://en.wikipedia.org/wiki/ISO_8601)


Recording mapping.csv
---------------------

- We use CSV, as it is essentially an open ended list of "records", where a record signifies a mapping between two segments.
- Structure: <orig_start,orig_end,commentary_start,commentary_end>

Example:
0,10000,0,20000
9850,12000,20001,24000
etc.


Transcription mapping.csv
-------------------------

- We use CSV, as it is essentially an open ended list of "records", where a record signifies a mapping between two segments (orig_start,orig_end,commentary_start,commentary_end).
- Structure: <orig_segment_start_time,orig_segment_length,transcription>

Example:
0.000000,4.567890,"Four score and..."
4.567890,2.123456,"hello, world!"
6.691346,1.234567,"Ding!"
etc.


Speaker metadata.json
---------------------

- We use JSON hashes, as this enables us to:
    - extend the metadata
    - load "old" versions (applications need to be able to load hashes that might not contain all information)
- However: We do not store hash values that are open ended in size, as that makes the metadata hard to use for humans.
- If the recording references another recording (i.e. it is a commentary of another), the 

Example:
{ "uuid":"<uuid1>", "languages":["usa","gsw"], "description":"...", "location":"12.3456,-98.7654", "timestamp":"2007-04-05T14:30Z" }

location: Decimal coordinates (http://en.wikipedia.org/wiki/Geographic_coordinate_conversion#Ways_of_writing_coordinates)
timestamp: ISO 8601 (http://en.wikipedia.org/wiki/ISO_8601)


Notes
-----

Prefix Trees
------------

Should the amount of recordings be too large we could relatively easily switch to a structure which uses the first 2 characters of a uuid to create a tree directory structure 2 one level deeper but much less wider:
/abd53... -> /ab/d53...