Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow multiple audio files per sentence #183

Closed
alanfgh opened this issue Apr 7, 2014 · 15 comments
Closed

allow multiple audio files per sentence #183

alanfgh opened this issue Apr 7, 2014 · 15 comments
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba.
Milestone

Comments

@alanfgh
Copy link
Contributor

alanfgh commented Apr 7, 2014

This would of course require changes to the database schema and UI, but it would help users, who repeatedly request this feature. Assuming we have a working duplicate merging script (see #182) by the time we made this change, we'd need to make sure that we preserve the links to the various audio files for each merged sentence.

@jiru
Copy link
Member

jiru commented Oct 9, 2014

For the record, this is much needed for Hebrew. Users are currently recording multiple versions of the sentences so that they will be added when we’ll add support for it.

@ckjpn
Copy link

ckjpn commented Apr 20, 2016

While maybe not a high priority, this is something that definitely needs to be done eventually.

  • Members with different dialects or accents could contribute audio for the same sentence.
  • Sentences with the same spelling, but different meanings could be contributed.

Example:
I read my horoscope every day.
https://tatoeba.org/eng/sentences/show/4563313
(present vs. past)

  • Even members with the same dialect could contribute alternate recordings. Perhaps not as necessary, but this could also be helpful.

@ckjpn
Copy link

ckjpn commented Jun 28, 2018

Here is another example of why it would be nice to eventually be able to have more than one audio file per sentence.

This same sentence can have a rising intonation at the end of the sentence or a falling intonation.

http://www.manythings.org/tatoeba/6249075.mp3
http://www.manythings.org/tatoeba/6249075-alt.mp3

https://tatoeba.org/sentences/show/6249075

Here's another example.
https://tatoeba.org/eng/sentences/show/6482226
You're not Canadian, are you?

https://audio.tatoeba.org/sentences/eng/6482226.mp3
http://study.aitech.ac.jp/6482226-alt.mp3

@jiru
Copy link
Member

jiru commented Jun 28, 2018

To whoever will implement this: read commit 61e9e43.

@ckjpn
Copy link

ckjpn commented Aug 5, 2020

Here are 2 mp3 files of the same sentence with intonation differences.
I've uploaded them here in case my links above ever become "not found."
It's a different sentence, but I uploaded one of these today. It would have been nice to be able to upload both of them.

You don't think I'm going to do that, do you?
6236802-2 files with different intonation.zip
Uploaded as a zip, since MP3 files can't be directly added here.

@jiru
Copy link
Member

jiru commented Aug 9, 2020

@ckjpn Thanks! I think you could as well upload these yet-to-be-imported recordings to our server pretty much the same way you upload regular audio. I just set up a new directory called github_issue_183 for you. You can upload them here. Choose any naming convention as long as it’s consistent.

@ckjpn
Copy link

ckjpn commented Aug 10, 2020

I created a folder in that folder named "eng."

These seem to be the situations for why there can be different recordings.

  1. The same voice, different intonations, or other differences, possibly no difference in meaning.

  2. The same voice, different intonations or other differences, with different meanings.

  3. The same voice, a duplicate recording, but basically the same.

  4. A different voice.

  5. Another different voice.

At this point, 3, 4, 5 are only accidental duplications, since I try to make sure that I don't record the same sentence again, and that others aren't duplicating each other's work.

@ckjpn
Copy link

ckjpn commented Sep 7, 2021

To get an idea of audio files in different dialects, you can use this advanced search.
This is sort of a preview of what would be possible.

Search for English with audio for a number of words with UK spelling, or vocabulary, linked to other English sentences with audio. A number of these are the same sentence with more than one dialect.

One possible query:

favourite|colour|centre|cinema|neighbour|behaviour|postman|tyres|favour|mum|lorry|apologise|=maths|motorcar|=travelling|=travelled|"the lift"|practise|licence|telly

https://tatoeba.org/en/sentences/advanced_search?from=eng&has_audio=yes&native=&orphans=no&query=favourite%7Ccolour%7Ccentre%7Ccinema%7Cneighbour%7Cbehaviour%7Cpostman%7Ctyres%7Cfavour%7Cmum%7Clorry%7Capologise%7C%3Dmaths%7Cmotorcar%7C%3Dtravelling%7C%3Dtravelled%7C%22the+lift%22%7Cpractise%7Clicence%7Ctelly&sort=relevance&sort_reverse=&tags=&to=eng&trans_filter=limit&trans_has_audio=yes&trans_link=&trans_orphan=&trans_to=eng&trans_unapproved=&trans_user=&unapproved=no&user=

jiru added a commit that referenced this issue Sep 12, 2021
jiru added a commit that referenced this issue Sep 12, 2021
This column will be used to differenciate
audios belonging to the same sentence.

Refs #183.
jiru added a commit that referenced this issue Sep 12, 2021
jiru added a commit that referenced this issue Sep 12, 2021
jiru added a commit that referenced this issue Sep 12, 2021
This should make life easier to users of the Audio model.

Refs #183.
jiru added a commit that referenced this issue Sep 30, 2021
Refs #183.

Importing a new audio on a sentence already having audio
no longer overwrites the existing audio. The new one is
added instead along with existing ones.

The already existing "audio id" is used to differentiate
several audio files belonging to the same sentence. The file
name is now: <sentenceid>-<audioid>.mp3

This makes audio filenames unique even after they got moved to a
different directory or downloaded, thus avoiding potential mixups.

The current directory structure /<lang>/<sentenceid>.mp3 is not a
good practice because the tree folder is not balanced. If, by any
chance, some program tries to browse /eng/ (such as a file indexer),
it takes ages to parse that folder (620k+ files at the moment). So
while I was at it, I reorganized the folder structure to something
more scalable with a more balanced tree, based on the 6 least
significant digits of the audio id.

By the way, I used the following code to do some perf measurments
on the production server on the disk where audio files are stored:

https://gist.github.com/dmke/7f42ba41c777a34845894d7bfb8b16bd

Here are the results:

Ruby 2.5.5 x86_64-linux-gnu, depth 5, iterations 100000
                      user     system      total        real
prep-entries      0.397915   0.028046   0.425961 (  0.425973)
prep-paths       48.487086  24.663179  73.150265 (126.698493)
write-5           3.235112  11.991072  15.226184 ( 52.606463)
read-5            6.674915  20.073439  26.748354 (156.287056)
delete-5         15.319855  19.824883  35.144738 ( 82.240757)
write-4           2.903518   8.197650  11.101168 ( 49.753295)
read-4            1.700867   3.193910   4.894777 (  7.036485)
delete-4         14.444888  18.088059  32.532947 ( 90.465323)
write-3           2.719723   7.325887  10.045610 ( 43.367942)
read-3            1.503995   1.974494   3.478489 (  3.540495)
delete-3         13.942245  17.697191  31.639436 ( 79.525504)
write-2           2.614040   6.904246   9.518286 ( 44.898698)
read-2            1.459150   2.130480   3.589630 (  3.730895)
delete-2          7.905721  10.256166  18.161887 ( 24.583616)
write-1           2.308070   5.635690   7.943760 (  8.327393)
read-1            1.327464   1.688212   3.015676 (  3.294940)
delete-1          2.983404   4.173568   7.156972 (  7.658301)
write-0           2.429562   5.719596   8.149158 (  8.717924)
read-0            1.225677   1.823673   3.049350 (  3.070201)
delete-0          3.330306   4.291284   7.621590 (  7.921333)
https://chart.googleapis.com/chart?cht=bvg&chs=650x450&chd=t:3.07,3.29,3.73,3.54,7.04,156.29|8.72,8.33,44.9,43.37,49.75,52.61&chds=a&chbh=a,1,50&chco=ff7f0e,1f77b4&chtt=File%20access%20time%20for%20100000%20files&chdl=read|write&chxt=x,x,y,y&chxl=1:|depth|3:|time%20[s]|

This commit implements a "depth 2" folder tree.

According to this data, the new folder tree does not impact file
read performance at all, while writing is about 4 times slower.
jiru added a commit that referenced this issue Oct 15, 2021
Accessing the files using that action instead of the mp3
file directly will prevent breaking audio third-party
tools if we ever decide to change file naming again.

Refs #183.
jiru added a commit that referenced this issue Oct 15, 2021
With multiple audio per sentence, there can now be multiple
links here so it’s hard to display this information
clearly. It’s better to hide it and let the user look at
the sentence page instead which will have all the details.

Refs #183.
jiru added a commit that referenced this issue Oct 15, 2021
Clicking on the audio button now plays the first audio, and
if clicked again, plays the next audio, etc. and start over
from the first one after all have been played. The tooltip
also gets updated with authorship information regarding the
audio that is "next to be played".

Refs #183.
jiru added a commit that referenced this issue Oct 15, 2021
So that we can easily use it outside the sentence too,
such as in the audio details section of the sentence page.

In the audio details section, we can play a particular
audio by clicking on its icon.

Refs #183.
@jiru
Copy link
Member

jiru commented May 29, 2022

I implemented this.

@jiru jiru closed this as completed May 29, 2022
@ckjpn
Copy link

ckjpn commented May 29, 2022

Maybe it's not a problem, but the number of total audio files went down by 4.

Yesterday, when I finished uploading files it was this.
Sentences with audio (total 998,888)

This morning (my time), it was this.
Sentences with audio (total 998,884)

Perhaps an admin unlinked 4 audio files overnight. That might be the reason.

URL: https://tatoeba.org/en/audio/index

@ckjpn
Copy link

ckjpn commented May 29, 2022

This may be a problem.

I've only tried it a few times, but the following page took a very long time to load.

https://tatoeba.org/en/audio/index

Two times I tried it, I got the error message that gets displayed when we have time-out errors.

Tatoeba is currently unavailable. We are sorry for the inconvenience. You can check our blog or Twitter for more information.

(Not really a time-out error message, but a message saying that tatoeba.org is offline.)

@ckjpn
Copy link

ckjpn commented May 30, 2022

Note that it's implied that one audio file can be disabled, leaving the other one enabled.
However, the save button doesn't actually save the setting.

https://tatoeba.org/en/sentences/show/2958714
Screen Shot 2022-05-30 at 9 53 54

@ckjpn
Copy link

ckjpn commented May 30, 2022

Note that it is possible to have one audio file disabled and another one enabled.
To do so, I had to uncheck "is enabled" and save the earlier audio file before importing the new file.

https://tatoeba.org/en/sentences/show/10869364
Temporarily, I left this online for you, but I plan to delete the first one in the near future.

@ckjpn
Copy link

ckjpn commented May 30, 2022

Here is one example with about 30 audio files.

https://tatoeba.org/en/sentences/show/280288
Birds of a feather flock together.

@ckjpn
Copy link

ckjpn commented May 30, 2022

This seems to be a bug.
I can't disable this one to edit the text.

https://tatoeba.org/en/sentences/show/3991877

it's interesting that I could disable the audio on a sentence I owned, but not on this one by another owner. I wonder if that is the reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba.
Projects
None yet
Development

No branches or pull requests

4 participants