allow multiple audio files per sentence #183

alanfgh · 2014-04-07T22:55:03Z

This would of course require changes to the database schema and UI, but it would help users, who repeatedly request this feature. Assuming we have a working duplicate merging script (see #182) by the time we made this change, we'd need to make sure that we preserve the links to the various audio files for each merged sentence.

jiru · 2014-10-09T07:32:56Z

For the record, this is much needed for Hebrew. Users are currently recording multiple versions of the sentences so that they will be added when we’ll add support for it.

ckjpn · 2016-04-20T21:37:41Z

While maybe not a high priority, this is something that definitely needs to be done eventually.

Members with different dialects or accents could contribute audio for the same sentence.
Sentences with the same spelling, but different meanings could be contributed.

Example:
I read my horoscope every day.
https://tatoeba.org/eng/sentences/show/4563313
(present vs. past)

Even members with the same dialect could contribute alternate recordings. Perhaps not as necessary, but this could also be helpful.

ckjpn · 2018-06-28T02:24:47Z

Here is another example of why it would be nice to eventually be able to have more than one audio file per sentence.

This same sentence can have a rising intonation at the end of the sentence or a falling intonation.

http://www.manythings.org/tatoeba/6249075.mp3
http://www.manythings.org/tatoeba/6249075-alt.mp3

https://tatoeba.org/sentences/show/6249075

Here's another example.
https://tatoeba.org/eng/sentences/show/6482226
You're not Canadian, are you?

https://audio.tatoeba.org/sentences/eng/6482226.mp3
http://study.aitech.ac.jp/6482226-alt.mp3

jiru · 2018-06-28T14:05:07Z

To whoever will implement this: read commit 61e9e43.

ckjpn · 2020-08-05T07:26:12Z

Here are 2 mp3 files of the same sentence with intonation differences.
I've uploaded them here in case my links above ever become "not found."
It's a different sentence, but I uploaded one of these today. It would have been nice to be able to upload both of them.

You don't think I'm going to do that, do you?
6236802-2 files with different intonation.zip
Uploaded as a zip, since MP3 files can't be directly added here.

jiru · 2020-08-09T10:36:20Z

@ckjpn Thanks! I think you could as well upload these yet-to-be-imported recordings to our server pretty much the same way you upload regular audio. I just set up a new directory called github_issue_183 for you. You can upload them here. Choose any naming convention as long as it’s consistent.

ckjpn · 2020-08-10T03:41:02Z

I created a folder in that folder named "eng."

These seem to be the situations for why there can be different recordings.

The same voice, different intonations, or other differences, possibly no difference in meaning.
The same voice, different intonations or other differences, with different meanings.
The same voice, a duplicate recording, but basically the same.
A different voice.
Another different voice.

At this point, 3, 4, 5 are only accidental duplications, since I try to make sure that I don't record the same sentence again, and that others aren't duplicating each other's work.

ckjpn · 2021-09-07T01:57:46Z

To get an idea of audio files in different dialects, you can use this advanced search.
This is sort of a preview of what would be possible.

Search for English with audio for a number of words with UK spelling, or vocabulary, linked to other English sentences with audio. A number of these are the same sentence with more than one dialect.

One possible query:

https://tatoeba.org/en/sentences/advanced_search?from=eng&has_audio=yes&native=&orphans=no&query=favourite%7Ccolour%7Ccentre%7Ccinema%7Cneighbour%7Cbehaviour%7Cpostman%7Ctyres%7Cfavour%7Cmum%7Clorry%7Capologise%7C%3Dmaths%7Cmotorcar%7C%3Dtravelling%7C%3Dtravelled%7C%22the+lift%22%7Cpractise%7Clicence%7Ctelly&sort=relevance&sort_reverse=&tags=&to=eng&trans_filter=limit&trans_has_audio=yes&trans_link=&trans_orphan=&trans_to=eng&trans_unapproved=&trans_user=&unapproved=no&user=

Refs #183.

This column will be used to differenciate audios belonging to the same sentence. Refs #183.

Refs #183.

This should make life easier to users of the Audio model. Refs #183.

Refs #183. Importing a new audio on a sentence already having audio no longer overwrites the existing audio. The new one is added instead along with existing ones. The already existing "audio id" is used to differentiate several audio files belonging to the same sentence. The file name is now: <sentenceid>-<audioid>.mp3 This makes audio filenames unique even after they got moved to a different directory or downloaded, thus avoiding potential mixups. The current directory structure /<lang>/<sentenceid>.mp3 is not a good practice because the tree folder is not balanced. If, by any chance, some program tries to browse /eng/ (such as a file indexer), it takes ages to parse that folder (620k+ files at the moment). So while I was at it, I reorganized the folder structure to something more scalable with a more balanced tree, based on the 6 least significant digits of the audio id. By the way, I used the following code to do some perf measurments on the production server on the disk where audio files are stored: https://gist.github.com/dmke/7f42ba41c777a34845894d7bfb8b16bd Here are the results: Ruby 2.5.5 x86_64-linux-gnu, depth 5, iterations 100000 user system total real prep-entries 0.397915 0.028046 0.425961 ( 0.425973) prep-paths 48.487086 24.663179 73.150265 (126.698493) write-5 3.235112 11.991072 15.226184 ( 52.606463) read-5 6.674915 20.073439 26.748354 (156.287056) delete-5 15.319855 19.824883 35.144738 ( 82.240757) write-4 2.903518 8.197650 11.101168 ( 49.753295) read-4 1.700867 3.193910 4.894777 ( 7.036485) delete-4 14.444888 18.088059 32.532947 ( 90.465323) write-3 2.719723 7.325887 10.045610 ( 43.367942) read-3 1.503995 1.974494 3.478489 ( 3.540495) delete-3 13.942245 17.697191 31.639436 ( 79.525504) write-2 2.614040 6.904246 9.518286 ( 44.898698) read-2 1.459150 2.130480 3.589630 ( 3.730895) delete-2 7.905721 10.256166 18.161887 ( 24.583616) write-1 2.308070 5.635690 7.943760 ( 8.327393) read-1 1.327464 1.688212 3.015676 ( 3.294940) delete-1 2.983404 4.173568 7.156972 ( 7.658301) write-0 2.429562 5.719596 8.149158 ( 8.717924) read-0 1.225677 1.823673 3.049350 ( 3.070201) delete-0 3.330306 4.291284 7.621590 ( 7.921333) https://chart.googleapis.com/chart?cht=bvg&chs=650x450&chd=t:3.07,3.29,3.73,3.54,7.04,156.29|8.72,8.33,44.9,43.37,49.75,52.61&chds=a&chbh=a,1,50&chco=ff7f0e,1f77b4&chtt=File%20access%20time%20for%20100000%20files&chdl=read|write&chxt=x,x,y,y&chxl=1:|depth|3:|time%20[s]| This commit implements a "depth 2" folder tree. According to this data, the new folder tree does not impact file read performance at all, while writing is about 4 times slower.

Accessing the files using that action instead of the mp3 file directly will prevent breaking audio third-party tools if we ever decide to change file naming again. Refs #183.

With multiple audio per sentence, there can now be multiple links here so it’s hard to display this information clearly. It’s better to hide it and let the user look at the sentence page instead which will have all the details. Refs #183.

Clicking on the audio button now plays the first audio, and if clicked again, plays the next audio, etc. and start over from the first one after all have been played. The tooltip also gets updated with authorship information regarding the audio that is "next to be played". Refs #183.

So that we can easily use it outside the sentence too, such as in the audio details section of the sentence page. In the audio details section, we can play a particular audio by clicking on its icon. Refs #183.

jiru · 2022-05-29T20:58:26Z

I implemented this.

ckjpn · 2022-05-29T23:21:04Z

Maybe it's not a problem, but the number of total audio files went down by 4.

Yesterday, when I finished uploading files it was this.
Sentences with audio (total 998,888)

This morning (my time), it was this.
Sentences with audio (total 998,884)

Perhaps an admin unlinked 4 audio files overnight. That might be the reason.

URL: https://tatoeba.org/en/audio/index

ckjpn · 2022-05-29T23:45:07Z

This may be a problem.

I've only tried it a few times, but the following page took a very long time to load.

https://tatoeba.org/en/audio/index

Two times I tried it, I got the error message that gets displayed when we have time-out errors.

Tatoeba is currently unavailable. We are sorry for the inconvenience. You can check our blog or Twitter for more information.

(Not really a time-out error message, but a message saying that tatoeba.org is offline.)

ckjpn · 2022-05-30T00:56:07Z

Note that it's implied that one audio file can be disabled, leaving the other one enabled.
However, the save button doesn't actually save the setting.

https://tatoeba.org/en/sentences/show/2958714

ckjpn · 2022-05-30T01:39:36Z

Note that it is possible to have one audio file disabled and another one enabled.
To do so, I had to uncheck "is enabled" and save the earlier audio file before importing the new file.

https://tatoeba.org/en/sentences/show/10869364
Temporarily, I left this online for you, but I plan to delete the first one in the near future.

ckjpn · 2022-05-30T02:08:39Z

Here is one example with about 30 audio files.

https://tatoeba.org/en/sentences/show/280288
Birds of a feather flock together.

ckjpn · 2022-05-30T07:43:02Z

This seems to be a bug.
I can't disable this one to edit the text.

https://tatoeba.org/en/sentences/show/3991877

it's interesting that I could disable the audio on a sentence I owned, but not on this one by another owner. I wonder if that is the reason.

alanfgh added the enhancement label Apr 7, 2014

alanfgh mentioned this issue Apr 9, 2014

links to audio should not be hard-coded to one place #132

Closed

tommy-3 mentioned this issue Dec 24, 2014

Indicate the contributors of audio #547

Closed

Sobsz mentioned this issue Mar 7, 2021

Common Voice sentence import? Maybe? #2637

Open

jiru added a commit that referenced this issue Sep 12, 2021

Regenerate audio fixtures using latest `cake bake'

1f1b467

Refs #183.

jiru added a commit that referenced this issue Sep 12, 2021

Add new column audios.audio_idx

530a2a5

This column will be used to differenciate audios belonging to the same sentence. Refs #183.

jiru added a commit that referenced this issue Sep 12, 2021

Add numeric validation for audios.audio_idx

9e48b49

Refs #183.

jiru added a commit that referenced this issue Sep 12, 2021

Add uniqueness constraint for multiaudio sentences

7ff581c

Refs #183.

jiru added a commit that referenced this issue Sep 12, 2021

Automatically set value of audios.audio_idx

bc5b23b

This should make life easier to users of the Audio model. Refs #183.

jiru added a commit that referenced this issue Oct 15, 2021

Add audio download controller action

d9ef999

Accessing the files using that action instead of the mp3 file directly will prevent breaking audio third-party tools if we ever decide to change file naming again. Refs #183.

jiru mentioned this issue Nov 29, 2021

Support multiple audio recordings per sentence #2880

Merged

jiru closed this as completed May 29, 2022

trang added this to the 2022-05-29 milestone May 30, 2022

trang mentioned this issue May 30, 2022

Cannot disable some audio #2947

Open

DJ-Saidez mentioned this issue Jun 15, 2022

Allow dialect labels on audio files #2958

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow multiple audio files per sentence #183

allow multiple audio files per sentence #183

alanfgh commented Apr 7, 2014

jiru commented Oct 9, 2014

ckjpn commented Apr 20, 2016 •

edited

ckjpn commented Jun 28, 2018 •

edited

jiru commented Jun 28, 2018

ckjpn commented Aug 5, 2020

jiru commented Aug 9, 2020

ckjpn commented Aug 10, 2020 •

edited

ckjpn commented Sep 7, 2021

jiru commented May 29, 2022

ckjpn commented May 29, 2022

ckjpn commented May 29, 2022

ckjpn commented May 30, 2022

ckjpn commented May 30, 2022 •

edited

ckjpn commented May 30, 2022 •

edited

ckjpn commented May 30, 2022 •

edited

allow multiple audio files per sentence #183

allow multiple audio files per sentence #183

Comments

alanfgh commented Apr 7, 2014

jiru commented Oct 9, 2014

ckjpn commented Apr 20, 2016 • edited

ckjpn commented Jun 28, 2018 • edited

jiru commented Jun 28, 2018

ckjpn commented Aug 5, 2020

jiru commented Aug 9, 2020

ckjpn commented Aug 10, 2020 • edited

ckjpn commented Sep 7, 2021

jiru commented May 29, 2022

ckjpn commented May 29, 2022

ckjpn commented May 29, 2022

ckjpn commented May 30, 2022

ckjpn commented May 30, 2022 • edited

ckjpn commented May 30, 2022 • edited

ckjpn commented May 30, 2022 • edited

ckjpn commented Apr 20, 2016 •

edited

ckjpn commented Jun 28, 2018 •

edited

ckjpn commented Aug 10, 2020 •

edited

ckjpn commented May 30, 2022 •

edited

ckjpn commented May 30, 2022 •

edited

ckjpn commented May 30, 2022 •

edited