WIP: Adding Transcription/Subtitle Viewing Support to the Web Player (VTT) #2918

mfcar · 2024-05-04T20:49:00Z

I have begun work on adding transcription support to the Web Player.
I've used Whisper to generate transcriptions for some audiobooks and podcasts. Many tools based on Whisper support exports in VTT and SRT formats.
For this pull request, I'm only supporting VTT as it is natively supported by browsers. Support for SRT can be added in a future pull request.

How does it work?

A new endpoint, api/items/:id/file/:fileid/transcript, has been created on the backend. This endpoint attempts to return a transcription for each audio track. For instance, if there's an audio file named adventuresherlockholmes_01_doyle_64kb.mp3, this endpoint will attempt to return the file adventuresherlockholmes_01_doyle_64kb.vtt.

On the frontend, when an audio file is set as the source property of the <audio> HTML tag, a <track> is created and linked to that <audio>. The source property for the <track> HTML tag is populated with the link to the aforementioned endpoint.

What does this PR support?

Show/Hide transcription block
Highlighting the current transcription line
Clicking on a line to seek the player to that time
Changing transcriptions when the audio file changes (supports audiobooks and podcasts)

Demo

Screen.Recording.2024-05-04.at.20.12.35.mov

What is missing for the scope of this PR

Hiding the "Show transcription" button when the transcription is not available for the audio file
Known issues

Known issues

When playing an audio file with transcription, if you close the web player and reopen it, the transcription block is not displayed, even though the transcription is still available. Clicking on the "Show transcription" button to display the block again. I think this is related with the MediaPlayerContainer.vue component not reloading the TranscriptionUi component.

Screen.Recording.2024-05-04.at.14.14.37.mov

When playing an audio file with transcription, if you change the audio file, the active transcription line for the new audio file focuses on the first line. The focus shifts to the correct line only when the next line change occurs.

@@ -116,6 +116,7 @@
    this.router.post('/items/:id/chapters', LibraryItemController.middleware.bind(this), LibraryItemController.updateMediaChapters.bind(this))
    this.router.get('/items/:id/ffprobe/:fileid', LibraryItemController.middleware.bind(this), LibraryItemController.getFFprobeData.bind(this))
    this.router.get('/items/:id/file/:fileid', LibraryItemController.middleware.bind(this), LibraryItemController.getLibraryFile.bind(this))
+    this.router.get('/items/:id/file/:fileid/transcript', LibraryItemController.middleware.bind(this), LibraryItemController.getTranscriptionFile.bind(this))


barolo · 2024-05-28T00:46:25Z

The placement irks me for some reason. I think that this feature demands something like "Now Playing" screen.
But even in this form I really really want this feature in.
My sister is hearing impaired and this would really help her.

mfcar · 2024-05-28T20:44:22Z

The placement irks me for some reason. I think that this feature demands something like "Now Playing" screen. But even in this form I really really want this feature in. My sister is hearing impaired and this would really help her.

I also don't like the placement. I was thinking putting it on a side panel or a floating, movable modal.
But, the side panel raises concerns about taking up too much space on the sidebar, especially if the user has a narrow display.
The floating, movable modal adds more complexity to the JavaScript and CSS.
I will make some tests with both behaviours and try to provide updates here.

Sidebar like Apple Music:

Floating transcription window:

ashwinm4friends · 2024-06-02T17:48:25Z

Great job on the project! For UX improvement, please consider looking into word highlighting in Snipd, as shown in this video: https://www.youtube.com/watch?v=jBi-OId37Uw
https://www.youtube.com/watch?v=jzPekGpC4uw

barolo · 2024-06-02T20:01:02Z

Great job on the project! For UX improvement, please consider looking into word highlighting in Snipd, as shown in this video: https://www.youtube.com/watch?v=jBi-OId37Uw https://www.youtube.com/watch?v=jzPekGpC4uw

Snipd uses word level timestamps, while such subs are easy to generate the only sane format is ssa/ass (srt can blow into megabytes which is insane) afaik. Which is not natively supported by browsers.
No idea if it's possible with vtt
(and I'm guessing that snippd just uses raw JSON or something since the subs can be generated on the fly)

mfcar · 2024-06-02T22:14:22Z

Great job on the project! For UX improvement, please consider looking into word highlighting in Snipd, as shown in this video: https://www.youtube.com/watch?v=jBi-OId37Uw https://www.youtube.com/watch?v=jzPekGpC4uw

Snipd uses word level timestamps, while such subs are easy to generate the only sane format is ssa/ass (srt can blow into megabytes which is insane) afaik. Which is not natively supported by browsers. No idea if it's possible with vtt (and I'm guessing that snippd just uses raw JSON or something since the subs can be generated on the fly)

Look at the WebVTT, which supports something similar to the "Karaoke Style" using :past and :future pseudo-classes. However, VTT files need to be adapted for this as well. I think it's not common to get a VTT file with this information.
I was using Whisper to generate transcriptions, but I'm not sure if we can generate word-by-word transcriptions.

SSA/ASS and SRT support, I was checking what the best approach is. I was considering parsing to VTT to keep the implementation consistent with how we show the transcriptions, I'm not sure if this is the best way yet

barolo · 2024-06-02T22:26:49Z

@mfcar I've used https://github.com/jianfch/stable-ts to generate ass/ssa karaoke style captions with custom style for my podcasts/books. I don't remember if vtt is one of the options.
Whisper.cpp can spit out world level output too, but you have to process it with a script to get a valid subs file.

ashwinm4friends · 2024-06-03T02:53:05Z

In the past, I have used stable-ts to create VTT files. I generated word-level timestamps with Whisper’s base.en model.

mfcar added 9 commits May 4, 2024 11:00

Initial transcription support

b37a863

Avoid duplicated code

8e5fc4a

Add seek support to transcriptions

ee8e7cf

Automatically scrolls to active cue when enable/disable the transcrip…

282203a

…tion panel

Avoid error: "Cannot read properties of null (reading 'track')"

68d4dac

Fix formatting

35f51f4

Remove semicolon

bfcf4e3

Fix small bug on the AudioTrack

1a9aaf1

Add support to recognize srt and vtt as subtitles formats on the file…

2f515cc

… table

github-advanced-security bot found potential problems May 4, 2024

View reviewed changes

mfcar mentioned this pull request May 4, 2024

[Enhancement]: Adding Transcription/Subtitle Viewing Support #2919

Open

18 tasks

mfcar changed the title ~~WIP: Adding Transcription Support to the Web Player (VTT)~~ WIP: Adding Transcription Playing Support to the Web Player (VTT) May 4, 2024

mfcar changed the title ~~WIP: Adding Transcription Playing Support to the Web Player (VTT)~~ WIP: Adding Transcription/Subtitle Viewing Support to the Web Player (VTT) May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Adding Transcription/Subtitle Viewing Support to the Web Player (VTT) #2918

WIP: Adding Transcription/Subtitle Viewing Support to the Web Player (VTT) #2918

mfcar commented May 4, 2024

barolo commented May 28, 2024

mfcar commented May 28, 2024

ashwinm4friends commented Jun 2, 2024

barolo commented Jun 2, 2024 •

edited

Loading

mfcar commented Jun 2, 2024

barolo commented Jun 2, 2024

ashwinm4friends commented Jun 3, 2024

WIP: Adding Transcription/Subtitle Viewing Support to the Web Player (VTT) #2918

Are you sure you want to change the base?

WIP: Adding Transcription/Subtitle Viewing Support to the Web Player (VTT) #2918

Conversation

mfcar commented May 4, 2024

How does it work?

What does this PR support?

Demo

What is missing for the scope of this PR

Known issues

Related

barolo commented May 28, 2024

mfcar commented May 28, 2024

ashwinm4friends commented Jun 2, 2024

barolo commented Jun 2, 2024 • edited Loading

mfcar commented Jun 2, 2024

barolo commented Jun 2, 2024

ashwinm4friends commented Jun 3, 2024

barolo commented Jun 2, 2024 •

edited

Loading