New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow multiple audio files per sentence #183
Comments
For the record, this is much needed for Hebrew. Users are currently recording multiple versions of the sentences so that they will be added when we’ll add support for it. |
While maybe not a high priority, this is something that definitely needs to be done eventually.
Example:
|
Here is another example of why it would be nice to eventually be able to have more than one audio file per sentence. This same sentence can have a rising intonation at the end of the sentence or a falling intonation. http://www.manythings.org/tatoeba/6249075.mp3 https://tatoeba.org/sentences/show/6249075 Here's another example. https://audio.tatoeba.org/sentences/eng/6482226.mp3 |
To whoever will implement this: read commit 61e9e43. |
Here are 2 mp3 files of the same sentence with intonation differences. You don't think I'm going to do that, do you? |
@ckjpn Thanks! I think you could as well upload these yet-to-be-imported recordings to our server pretty much the same way you upload regular audio. I just set up a new directory called |
I created a folder in that folder named "eng." These seem to be the situations for why there can be different recordings.
At this point, 3, 4, 5 are only accidental duplications, since I try to make sure that I don't record the same sentence again, and that others aren't duplicating each other's work. |
To get an idea of audio files in different dialects, you can use this advanced search. Search for English with audio for a number of words with UK spelling, or vocabulary, linked to other English sentences with audio. A number of these are the same sentence with more than one dialect. One possible query: favourite|colour|centre|cinema|neighbour|behaviour|postman|tyres|favour|mum|lorry|apologise|=maths|motorcar|=travelling|=travelled|"the lift"|practise|licence|telly |
This column will be used to differenciate audios belonging to the same sentence. Refs #183.
This should make life easier to users of the Audio model. Refs #183.
Refs #183. Importing a new audio on a sentence already having audio no longer overwrites the existing audio. The new one is added instead along with existing ones. The already existing "audio id" is used to differentiate several audio files belonging to the same sentence. The file name is now: <sentenceid>-<audioid>.mp3 This makes audio filenames unique even after they got moved to a different directory or downloaded, thus avoiding potential mixups. The current directory structure /<lang>/<sentenceid>.mp3 is not a good practice because the tree folder is not balanced. If, by any chance, some program tries to browse /eng/ (such as a file indexer), it takes ages to parse that folder (620k+ files at the moment). So while I was at it, I reorganized the folder structure to something more scalable with a more balanced tree, based on the 6 least significant digits of the audio id. By the way, I used the following code to do some perf measurments on the production server on the disk where audio files are stored: https://gist.github.com/dmke/7f42ba41c777a34845894d7bfb8b16bd Here are the results: Ruby 2.5.5 x86_64-linux-gnu, depth 5, iterations 100000 user system total real prep-entries 0.397915 0.028046 0.425961 ( 0.425973) prep-paths 48.487086 24.663179 73.150265 (126.698493) write-5 3.235112 11.991072 15.226184 ( 52.606463) read-5 6.674915 20.073439 26.748354 (156.287056) delete-5 15.319855 19.824883 35.144738 ( 82.240757) write-4 2.903518 8.197650 11.101168 ( 49.753295) read-4 1.700867 3.193910 4.894777 ( 7.036485) delete-4 14.444888 18.088059 32.532947 ( 90.465323) write-3 2.719723 7.325887 10.045610 ( 43.367942) read-3 1.503995 1.974494 3.478489 ( 3.540495) delete-3 13.942245 17.697191 31.639436 ( 79.525504) write-2 2.614040 6.904246 9.518286 ( 44.898698) read-2 1.459150 2.130480 3.589630 ( 3.730895) delete-2 7.905721 10.256166 18.161887 ( 24.583616) write-1 2.308070 5.635690 7.943760 ( 8.327393) read-1 1.327464 1.688212 3.015676 ( 3.294940) delete-1 2.983404 4.173568 7.156972 ( 7.658301) write-0 2.429562 5.719596 8.149158 ( 8.717924) read-0 1.225677 1.823673 3.049350 ( 3.070201) delete-0 3.330306 4.291284 7.621590 ( 7.921333) https://chart.googleapis.com/chart?cht=bvg&chs=650x450&chd=t:3.07,3.29,3.73,3.54,7.04,156.29|8.72,8.33,44.9,43.37,49.75,52.61&chds=a&chbh=a,1,50&chco=ff7f0e,1f77b4&chtt=File%20access%20time%20for%20100000%20files&chdl=read|write&chxt=x,x,y,y&chxl=1:|depth|3:|time%20[s]| This commit implements a "depth 2" folder tree. According to this data, the new folder tree does not impact file read performance at all, while writing is about 4 times slower.
Accessing the files using that action instead of the mp3 file directly will prevent breaking audio third-party tools if we ever decide to change file naming again. Refs #183.
With multiple audio per sentence, there can now be multiple links here so it’s hard to display this information clearly. It’s better to hide it and let the user look at the sentence page instead which will have all the details. Refs #183.
Clicking on the audio button now plays the first audio, and if clicked again, plays the next audio, etc. and start over from the first one after all have been played. The tooltip also gets updated with authorship information regarding the audio that is "next to be played". Refs #183.
So that we can easily use it outside the sentence too, such as in the audio details section of the sentence page. In the audio details section, we can play a particular audio by clicking on its icon. Refs #183.
I implemented this. |
Maybe it's not a problem, but the number of total audio files went down by 4. Yesterday, when I finished uploading files it was this. This morning (my time), it was this. Perhaps an admin unlinked 4 audio files overnight. That might be the reason. |
This may be a problem. I've only tried it a few times, but the following page took a very long time to load. https://tatoeba.org/en/audio/index Two times I tried it, I got the error message that gets displayed when we have time-out errors.
(Not really a time-out error message, but a message saying that tatoeba.org is offline.) |
Note that it's implied that one audio file can be disabled, leaving the other one enabled. |
Note that it is possible to have one audio file disabled and another one enabled. https://tatoeba.org/en/sentences/show/10869364 |
Here is one example with about 30 audio files. https://tatoeba.org/en/sentences/show/280288 |
This seems to be a bug. https://tatoeba.org/en/sentences/show/3991877 it's interesting that I could disable the audio on a sentence I owned, but not on this one by another owner. I wonder if that is the reason. |
This would of course require changes to the database schema and UI, but it would help users, who repeatedly request this feature. Assuming we have a working duplicate merging script (see #182) by the time we made this change, we'd need to make sure that we preserve the links to the various audio files for each merged sentence.
The text was updated successfully, but these errors were encountered: