Adapt SHAWI data management workflows to include audio files #10

simar0at · 2023-05-05T10:31:58Z

We have wav files and we have timestamps in TEI XML.

We need a way to cut the wav files and also probably to export them as mp4.

One way to do this is to transform the TEI files to audacity 2 project files which also happen to be XML files.

simar0at · 2023-07-25T14:33:16Z

We can generate the labels in text format:

0.000000	0.439679	H
0.346413	0.912666	IG
0.925990	2.394558	IG
2.394558	5.009671	IG
2.711361	2.851247	H
5.006803	5.558277	ID
5.558277	8.576871	IG
8.576871	11.116553	IG
11.116553	23.942498	H

We can use this with audacity 2 or 3. Audacity 3 moved away from any (visible) XML.

dasch124 · 2023-11-17T15:23:24Z

IMO we should go for the CSV/TSV format which seems less effort to create.

dasch124 · 2023-12-19T12:58:24Z

General Workflow:

For each TEI transcription document, we generate "region labels" (i.e. named time spans in Audacity) in the format mentioned by Omar in the comment above and add them to the SHAWI data repository.
Team members at university with access to the original audio files open the files in Audacity, import the region label list and export the audio snippets both as uncompressed WAV (for archiving) and MP3 (for publishing in application)
They upload the resulting audio files to https://oeawcloud.oeaw.ac.at/index.php/apps/files/?dir=/R_Shawi_19367&fileid=36169042
We add references to both versions of the audio files to the TEI documents.

TEI > Auacity labels conversion

The TSV format is described here: https://manual.audacityteam.org/man/importing_and_exporting_labels.html

This should be generated by taking all <u> elements in the transcription documents, and re-calculating the absolute timestamps from the @interval attribute on the <when> elements inside of the <timeline>:

  <timeline unit="ms">
         <when xml:id="T0"/>
         …
         <when interval="197124" since="#T0" xml:id="T19"/>
         <when interval="197256" since="#T0" xml:id="T20"/>
          …
      </timeline>
     …
    <annotationBlock>
               <u xml:lang="ar-acm-x-shawi-vicav" xml:id="URFA-034_a20" who="#default" end="#T20" start="#T0">
                  … 
              </u>
    <annotationBlock>

Instead of having the speaker name as the label, we should use the utterance's xml:id, so the exported audio snippet can be named after the utterance id.

dasch124 · 2024-02-01T11:45:47Z

for some reason, the xml:id is missing from the @url attribute on , e.g. https://github.com/acdh-oeaw/shawi-data/blob/main/010_manannot/Urfa-097_Three_Daughters-Harran-2010.xml#L210

rausch-supola · 2024-02-05T09:28:23Z

As I said I only inserted the two lines with the media tag. I guess the linking to some data is missing

rausch-supola · 2024-06-04T07:44:49Z

this issue can be closed I guess @dasch124 ?

simar0at mentioned this issue May 5, 2023

Enhancements for SHAWI acdh-oeaw/vicav-app#56

Open

10 tasks

dasch124 assigned rausch-supola Dec 19, 2023

dasch124 added enhancement New feature or request data-processing labels Dec 20, 2023

rausch-supola closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt SHAWI data management workflows to include audio files #10

Adapt SHAWI data management workflows to include audio files #10

simar0at commented May 5, 2023 •

edited

Loading

simar0at commented Jul 25, 2023 •

edited

Loading

dasch124 commented Nov 17, 2023

dasch124 commented Dec 19, 2023 •

edited

Loading

dasch124 commented Feb 1, 2024

rausch-supola commented Feb 5, 2024

rausch-supola commented Jun 4, 2024

Adapt SHAWI data management workflows to include audio files #10

Adapt SHAWI data management workflows to include audio files #10

Comments

simar0at commented May 5, 2023 • edited Loading

simar0at commented Jul 25, 2023 • edited Loading

dasch124 commented Nov 17, 2023

dasch124 commented Dec 19, 2023 • edited Loading

General Workflow:

TEI > Auacity labels conversion

dasch124 commented Feb 1, 2024

rausch-supola commented Feb 5, 2024

rausch-supola commented Jun 4, 2024

simar0at commented May 5, 2023 •

edited

Loading

simar0at commented Jul 25, 2023 •

edited

Loading

dasch124 commented Dec 19, 2023 •

edited

Loading