This repository has been archived by the owner on Apr 12, 2019. It is now read-only.
/
file-structure.txt
270 lines (205 loc) · 9.54 KB
/
file-structure.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
File Structure
==============
These are considerations concerning the future design of the file structure
layout.
Goals
-----
Balance between:
- Human readable
- Machine readable
- Well reasoned
Open issues
-----------
- Should recording/user directory names contain more than the uuid?
- Just the UUIDs are not very human readable.
- We could add metadata information in the directory, like so:
- Speaker: andy-<uuid>
- Recording: andy,peter_the-story-of-the-mountain-<uuid>
-> We can still extract the uuid data, but it also "speaks" to a user.
- Problem with the approach: If metadata is changed, the directory name would change.
- Should the mapping of original and respoken segments be in samples, or time (as with the transcriptions)?
- Time: Resistant to resampling of recordings.
- Samples: Easier to handle on the machine side (no recalculations necessary).
- Should commentaries also be subordinate to "original" recordings?
- In the example below, should <uuid2> be in a subdirectory /commentaries under <uuid1>.
-> Only actual original recordings would be top level in the /recordings directory.
-> Deeper structure, everything is subordinate to the "original" recording.
Overview & Example
------------------
Note that <uuidN> stands for a generated UUID.
uuid1 and uuid1 are the same while uuid1 and uuid2 are different uuids.
/aikuma
# Contains original recordings.
#
/recordings
# All data of the recording uuid1 (and commentaries/transcriptions) are in this directory.
#
/<uuid1>
# The audio data in WAV format.
#
data.wav
# Metadata of this recording:
# { "uuid":"<uuid2>", "people":["<uuid3>","<uuid4>"], "languages":["usa","gsw"],
# "description":"...", "location":"12.3456,-98.7654", "timestamp":"2007-04-05T14:30Z" }
#
metadata.json
# Human readable version of metadata.json.
#
# Only ever gets written, never read by a machine.
#
# Example:
# uuid: <uuid2>
# people: <uuid3>
# languages: usa, gsw
# description: Blah.
# location: 12.3456,-98.7654
# timestamp: 2007-04-05T14:30Z
#
# Note: We could use YAML here http://en.wikipedia.org/wiki/YAML.
#
metadata.txt
# Directory for audio commentaries (respeakings/translations).
#
/commentaries
# All data of commentary uuid2.
#
/<uuid2>
# The audio data in WAV format.
#
data.wav
# Metadata of this commentary:
#
#
metadata.json
# The mapping CSV, mapping original segment <start,length> in
# seconds to respoken/translated segment <start,length> in seconds.
#
# Example:
# 0.0000000,1.1234567,0.0000000,2.1234567
# 1.0123456,3.1234567,2.1234567,4.1234567
#
mapping.csv
# Transcriptions of this commentary.
#
/transcriptions
# All data of transcription uuid6 are in this directory.
#
/<uuid6>
#
#
metadata.json
# Transcription data for the commentary uuid2.
#
# Example:
# 0.0000000,4.5678900,"Four score and..."
# 4.5678900,2.1234567,"hello, world!"
# 6.6913467,1.2345678,"Ding!"
#
transcript.csv
# Images of commentary uuid2.
#
/images
/<uuid7>
metadata.json
image.jpg
/<uuid8>
metadata.json
image.jpg
/transcriptions
/<uuid4>
# Metadata of this transcription:
#
# It is a simple transcription if the language of the recording
# is the same as the language of the transcription.
metadata.json
# Transcription data for the recording uuid1.
#
# 0.0000000,4.5678900,"Four score and..."
# 4.5678900,2.1234567,"hello, world!"
# 6.6913467,1.2345678,"Ding!"
#
mapping.csv
/<uuid5>
# Metadata of the transcription.
#
# It is a translation if the language of the recording is
# different from the language of the transcription.
#
metadata.json
# Transcription data for the recording uuid1.
#
# 0.0000000,4.5678900,"Honni soit qui mal y pense."
# 4.5678900,2.1234567,"Bonjour."
# 6.6913467,1.2345678,"Et voilà!"
#
mapping.csv
# People - referred to in metadata.json of recordings/transcriptions/commentaries.
#
/people
# Person.
#
/<uuid3>
# Metadata of person uuid3.
#
# { "uuid":"<uuid1>", "languages":["usa","gsw"], "description":"...",
# "location":"12.3456,-98.7654", "timestamp":"2007-04-05T14:30Z" }
#
metadata.json
# Picture for person uuid3.
#
image.jpg
/...
File internals
--------------
- Generally, open ended, database "record"-like data is stored as CSV (lists).
- Metadata structure which needs to be expanded in the future is stored as JSON (hashes).
This is usually both human-readable and easily processed via machine (and libraries are readily available).
Recording data.wav
------------------
- The data.wav contains only raw data.
- While WAVs can contain additional metadata in XMP format in the INFO chunk, we do not use it as not all applications can handle metadata in WAVs (and may strip it).
Recording metadata.json
-----------------------
- We use JSON hashes, as this enables us to:
- extend the metadata
- load "old" versions (applications need to be able to load hashes that might not contain all information)
- However: We do not store hash values that are open ended in size, as that makes the metadata hard to use for humans. Examples of this is the commentary mapping data which is far easier to read in a CSV format.
- If the recording references another recording (i.e. it is a commentary of another), that recording is referenced via "parent_uuid".
Example:
{ "uuid":"<uuid2>", "parent_uuid":"<uuid1>", "languages":["usa","gsw"], "description":"...", "location":"12.3456,-98.7654", "timestamp":"2007-04-05T14:30Z" }
location: Decimal coordinates (http://en.wikipedia.org/wiki/Geographic_coordinate_conversion#Ways_of_writing_coordinates)
timestamp: ISO 8601 (http://en.wikipedia.org/wiki/ISO_8601)
Recording mapping.csv
---------------------
- We use CSV, as it is essentially an open ended list of "records", where a record signifies a mapping between two segments.
- Structure: <orig_start,orig_end,commentary_start,commentary_end>
Example:
0,10000,0,20000
9850,12000,20001,24000
etc.
Transcription mapping.csv
-------------------------
- We use CSV, as it is essentially an open ended list of "records", where a record signifies a mapping between two segments (orig_start,orig_end,commentary_start,commentary_end).
- Structure: <orig_segment_start_time,orig_segment_length,transcription>
Example:
0.000000,4.567890,"Four score and..."
4.567890,2.123456,"hello, world!"
6.691346,1.234567,"Ding!"
etc.
Speaker metadata.json
---------------------
- We use JSON hashes, as this enables us to:
- extend the metadata
- load "old" versions (applications need to be able to load hashes that might not contain all information)
- However: We do not store hash values that are open ended in size, as that makes the metadata hard to use for humans.
- If the recording references another recording (i.e. it is a commentary of another), the
Example:
{ "uuid":"<uuid1>", "languages":["usa","gsw"], "description":"...", "location":"12.3456,-98.7654", "timestamp":"2007-04-05T14:30Z" }
location: Decimal coordinates (http://en.wikipedia.org/wiki/Geographic_coordinate_conversion#Ways_of_writing_coordinates)
timestamp: ISO 8601 (http://en.wikipedia.org/wiki/ISO_8601)
Notes
-----
Prefix Trees
------------
Should the amount of recordings be too large we could relatively easily switch to a structure which uses the first 2 characters of a uuid to create a tree directory structure 2 one level deeper but much less wider:
/abd53... -> /ab/d53...