Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for podcast:transcript tag #4935

Closed
3 tasks done
tonytamsf opened this issue Feb 15, 2021 · 66 comments · Fixed by #7186
Closed
3 tasks done

Support for podcast:transcript tag #4935

tonytamsf opened this issue Feb 15, 2021 · 66 comments · Fixed by #7186
Assignees
Labels
Area: Podcast Index / Podcasting 2.0 Anything related to PodcastIndex.org and/or Podcasting 2.0 Type: Feature request

Comments

@tonytamsf
Copy link
Member

tonytamsf commented Feb 15, 2021

Checklist

  • I have used the search function to see if someone else has already submitted the same feature request.
  • I will only create one feature request per issue.
  • I will describe the problem with as much detail as possible

Feature description

Some podcasts include text transcript using podcast:transcipt

Suggested solution:
Display transcript and sync it with the audio.

Meeting notes from Nov 2nd, 2023 #4935 (comment)

@ByteHamster ByteHamster added the Area: Podcast Index / Podcasting 2.0 Anything related to PodcastIndex.org and/or Podcasting 2.0 label Feb 15, 2021
@tonytamsf
Copy link
Member Author

@keunes or @ByteHamster please assign to me, since I have the podcastindex namespace parsing done in #4933, I could tackle podcast:transcript

@joelostblom
Copy link

I have a a few suggestions for transcript related features I think would be great to include in AntennaPod (originally mentioned in #5002):

  1. Optionally including timestamps when copying from the trancsript. This would be helpful when using a noteapp to write down interesting segments from the podcast as it makes it easier to go back to the podcast an relisten to the particular segment.
  2. Having the transcript autoscroll as the episode is playing, so when switching to the transcript tab, the text is in the same position as the audio. This would facilitate notetaking while listening.
  3. Searching the transcript. I am referring to searching within a single transcript , which would make it easier to find interesting segments (which in turn could be navigated to if combined with the below item).
  4. Using the transcript for seeking in the audio (clicking sentences to jump to that position).

There are currently at least two podcast apps on iOS that provide some of these features, but none on Android as far as I know. Those apps are https://podcast9.com/ and https://www.airr.io/. They are labeling themselves as more educational / productive alternatives to regular podcast apps and their main value proposition is the ability to interact with and remember what you listen to as well as easily share your favorite bits with others.

A future item to consider which might be out of scope currently is to automatically create transcripts for episodes that don't provide it, e.g. using the live transcribe feature on phones where it is available.

@keunes
Copy link
Member

keunes commented Mar 7, 2021

Nice ideas 👍 My guess would be that # 4 wouldn't be too difficult. But for the others I guess we (the app) would need to receive the timestamps with the transcript I think - as I reckon it's not feasible for AP to automatically match the text with the audio. Do you know if transcripts typically have such timestamps?

@joelostblom
Copy link

Thanks @keunes ! I am not very experienced with podcast data and metadata, but I looked around a bit and it seems like at least some transcripts provide this. The example used on the podcastindex API docs page includes this segment:

transcriptUrl: "https://mp3s.nashownotes.com/NA-1322-Captions.srt"

That srt file includes timestamps which look like this:

1
00:00:00,000 --> 00:00:05,340
Pull my mask over my nose. Adam
Curry John C Dvorak, Thursday,

2
00:00:05,340 --> 00:00:09,030
February 18 2021. This is your
award winning media

3
00:00:09,030 --> 00:00:13,380
assassination Episode 13. This
is no agenda.

...

I am not sure how common it is, but at least it exists.

A complimentary approach that is less robust but does not rely on external data could be to match the relative position of sentences in the transcript with the relative time of the audio file (when no timestamps are available). Even if this was only able to generate precision to the minute, I think it would be helpful for most items I mentioned above. For some, like the scrolling text, it would probably perform quite well since so much of the transcript would be shown on the screen that the matched position does not have to be exact. It would probably benefit from calculating the relative audio position without silences (I noted that there is already an option in AP to skip these when listening).

@ueen
Copy link
Contributor

ueen commented Apr 2, 2021

I like to add an idea mentioned in the issue above: hide empty tabs - that could be empty transcripts and especially nonexistent chapters, if only one tab remains the tablayout itself should probably be hidden.

I think thats a sensible solution, to not crowd the AudioPlayer with empty tabs and require to user to swipe a thousand times to get to the desired tab, makes the UX clearer :)

@ByteHamster
Copy link
Member

@tonytamsf Any idea how to implement the data store? Downloading all transcripts of all episodes could result in a huge amount of text to be stored. We already have the problem that the database is pretty big. I think it would be better to only store the URLs and then load the transcript when actually viewing the transcript page in AntennaPod. We could enable caching with okhttp to reduce the number of web requests.

@tonytamsf
Copy link
Member Author

@tonytamsf Any idea how to implement the data store? Downloading all transcripts of all episodes could result in a huge amount of text to be stored. We already have the problem that the database is pretty big. I think it would be better to only store the URLs and then load the transcript when actually viewing the transcript page in AntennaPod. We could enable caching with okhttp to reduce the number of web requests.

I like your idea, there is no need to store transcription. Depending on the user, they may not want to be downloading transcript using mobile data, so we might have to get user consent to use mobile to download transcript.

@Matth7878
Copy link

Matth7878 commented Apr 6, 2021

Why not storing transcript on file text on device? Same way episodes are stored. They would be removed along with episodes and it would allow to make them available like episodes are.

Edit :
And it won't clutter database except maybe to put a flag to know there is transcript downloaded. I am not sure it's even needed as all logics would be parallel to what is done for downloaded episodes.

@ByteHamster
Copy link
Member

Sounds like a good idea. When streaming, users need a data connection anyways. When downloading, we can download the transcripts, too. 👍

@keunes
Copy link
Member

keunes commented Apr 8, 2021

Why not storing transcript on file text on device?

I'm just wondering: this wouldn't hinder 'processing' the info (esp. timestamps, for # 4 in OP).

When downloading, we can download the transcripts, too

Just to throw an idea out: couldn't this be done, but then in the db? As in, download & store in db when episode is downloaded; remove from dB when file is deleted? (Don't know if this would have any actual benefit, see above).

@keunes keunes added the Needs: Mock-up or user story Feature or enhancement request with an impact on the UI/UX, and needs mock-ups or user stories label Apr 8, 2021
@Matth7878
Copy link

I don't think storing and using text files would cause any problem : it is what video player does with srt files for subtitles.

@ByteHamster
Copy link
Member

Storing it in the database but only for downloaded episodes should work :) Then we don't need to store the file path, we can just store the transcript directly.

@Matth7878
Copy link

Don't know if you saw how Podverse implemented transcript with search. For reference there is a video of what they did : https://twitter.com/Podverse/status/1413642917760184325?s=19

IMHO about their screen :

  • I would prefer timecode on left side and with a smaller font and using grey as text color. So time code would be there but won't distract your eyes when reading transcription
  • I wouldn't separate block like they did. It's like there is an empty line beetween each segment which is annoying when reading

@keunes
Copy link
Member

keunes commented Jul 11, 2021

Thanks for sharing that @Matth7878!

I would prefer timecode on left side and with a smaller font and using grey as text color

Fully agree

I wouldn't separate block like they did

I understand your point, but we can't really merge sentences that are separated into two blocks I think:

  • we don't know where we can stitch parts together, and where a hard enter should be kept (in the interface). If there's no whitespace, it might still be complicated to read:
    Schermafdruk van 2021-07-11 10-45-14
  • if we do stitch together the sentences, there might be a situation where one line on the screen covers three lines in the transcript file. Then we can only display the first timestamp, which can be problematic in some cases (e.g. when there is a line with [silence] that takes a minute - you wouldn't be able to jump to the bit after the silence).

@tonytamsf
Copy link
Member Author

Let me take a stab at getting the podcast:transcript tag to download and stored. Then we can work on the UI on the next step

@tonytamsf
Copy link
Member Author

Let me take a stab at getting the podcast:transcript tag to download and stored. Then we can work on the UI on the next step

Sorry I have been taking a break from AntennaPod. I'm un-assigning myself until I get back

@tonytamsf tonytamsf self-assigned this Feb 16, 2023
@tonytamsf
Copy link
Member Author

We will be targeting the transcript branch on Github for this work. Starting with PR #6739

@tonytamsf
Copy link
Member Author

tonytamsf commented Nov 8, 2023

I have a test build of how we can display transcripts in 2 lines of text on the player screen.

It works for JSON (word by word) or SRT (a few seconds of text).

The way I am able to nicely display the transcript is as follows

  • join up segments of text at least one second, most json segments are less than 1 second
  • string together N segments and display them on the two lines of TextView, which has support for ellipses
  • when I detect an ellipse in the TexView, which means words are cut off at the end, start trimming words from the end until the display does not have ellipses.
  • move the words that have been ellipsed to the next segment, and trimming a little more to get some buffer
  • calculate a reasonable (percentage of words trimmed) new start time of the next segment
  • display the text sync to the position of the audio
  • move to the next segment

@keunes I would like to give this test build to you and maybe just one of two others (maybe @jamescridland ) to test out against different podcasts which have transcripts to make sure the algorithm works. I would like to do this before the code reviews because that will take many weeks while we can fine tune how transcripts will function in the UI.

note: detecting words will be difficult with languages like Chinese and Arabic

@tonytamsf
Copy link
Member Author

@keunes - I have a test build of the transcript functionality in the 3rd PR, it will display 2 lines of transcript in the player screen. When you have time, please check out it. If you wanted more podcasts that have the transcript tag to see the variety of how folks format their transcripts, I have crawl all 4 million podcasts from podcastindex.org found the 15233 urls of the podcasts which have at least one podcast:transcript tag

@tonytamsf
Copy link
Member Author

Just updating on the progress: in branch transcript the 2nd PR is merged to be able to download transcripts from the url in podcast:transcript #6797

The third PR is to parse SRT and JSON formats, and that is being reviewed #6852

@tonytamsf
Copy link
Member Author

4th PR to display 2 lines of transcript on the cover #6912

@tonytamsf
Copy link
Member Author

@keunes - take a look at the preview of iOS Podcast player and how the UX is for transcripts, play the video https://podnews.net/article/apple-podcasts-transcriptions-faq

  • Looks like they don't display 2 lines of transcript and just go full screen display
  • I like that they retain some basic episode information on the top
  • They do word by word highlight and increase the font size of the current line

@tonytamsf
Copy link
Member Author

Also adding some interesting stats here
https://podcastindustryinsights.com/podcasting-2-0-features/

Current
46,565
podcasts with transcripts

2,088,707
episodes with transcripts

@keunes
Copy link
Member

keunes commented Feb 25, 2024

Looks like they don't display 2 lines of transcript and just go full screen display

That's interesting. Makes you wonder if we need the two-line fragment on the cover. Let's take some use-cases:

  • Language learning: Having the full text instead of just two lines is probably preferred. I'm purposefully processing the written text, so it's OK it takes the full screen. However, I would want to be able to pause & resume from that big screen (which doesn't seem possible in Apple podcasts).
  • Understanding of speakers with accents: I'm mainly listening, and occasionally checking the text. Meanwhile, I'm cleaning my inbox. Having the transcript full-screen is fine if it stays open while I switch between the mini-player and full player (so I don't have to reopen the transcript each time). Otherwise I'd rather have two lines which I keep on for the whole episode.
  • Understanding who speaks when, as hosts have similar voices: I don't really read the text, so the full transcript screen would be a bit much, but would be just as fine as two lines on the player screen.
  • Understanding what someone said, as a car was honking while I'm waiting for a traffic light: Here I'll prefer a full screen, so I can just 'activate' it for a moment, read the bit I missed while listening the podcast continue, without having to rewind.

Seeing Apple's implementation and the use cases I can think of, I'm starting to think we might not need the subtitle version… @ByteHamster what do you think?

Plus it might be easier to implement?

Also adding some interesting stats here

Do you know what that does in terms of percentages vis-à-vis the full index?

@jamescridland
Copy link

Apple Podcasts actually has two views

The first is a full transcript view in the episode page, which is just text. Interestingly, there appear to be some restrictions on copy/paste from this view, with no "Select All" available, and it seems to only copy a few paragraphs over. This is a "dumb" transcript, in that it shows the whole transcript but there's nothing you can do with it.
IMG_0115

Additionally, when playing the audio, you can turn on the closed-caption view. Apple does additional processing on these to enable word-by-word highlights; this is partially so that they can avoid spam appearing in the service, I would presume. This is an active transcript, so you can scroll down and tap on a section you want to hear.

IMG_0116

Apple's own transcripts recognise changes of voices (judging by a look at the downloadable transcript) but it doesn't show the voice in the transcription. It's supposed to show voice names if you provide it with a marked-up VTT file, but it doesn't - the "Unknown" above is its attempt to parse my own VTT file.

My own view is that the "only show two lines" view - which the VTT file natually lends itself to - is not a particularly good thing; it requires you to look at the screen at all times, and offers little benefit in terms of navigation. Having seen Apple's version, it seems to avoid clutter by simply adding a "captions" button on the player itself, which opens a new screen.

The highlighting experience subtly makes the current paragraph larger; and then colours the text as the speaker says it. Assuming the VTT file is the only file you have, I'd suggest it's fine showing the current highlighted sentence in the paragraph; the word-by-word thing is nice, but doesn't really add much to the comprehension or experience.

@ByteHamster
Copy link
Member

Seeing Apple's implementation and the use cases I can think of, I'm starting to think we might not need the subtitle version… @ByteHamster what do you think?

Plus it might be easier to implement?

I would say it's probably a bit easier to implement, yes (at least we don't need to deal with doing line breaks manually). Also, I'm not a big fan of the moving text on the otherwise quite clean player screen anyway. So if @tonytamsf and you agree, I'm totally happy with just having the full transcript screen.

@tonytamsf
Copy link
Member Author

tonytamsf commented Mar 6, 2024

Apple Podcasts actually has two views

The first is a full transcript view in the episode page, which is just text. Interestingly, there appear to be some restrictions on copy/paste from this view, with no "Select All" available, and it seems to only copy a few paragraphs over. This is a "dumb" transcript, in that it shows the whole transcript but there's nothing you can do with it. !
...
Additionally, when playing the audio, you can turn on the closed-caption view. Apple does additional processing on these to enable word-by-word highlights; this is partially so that they can avoid spam appearing in the service, I would presume. This is an active transcript, so you can scroll down and tap on a section you want to hear.

@jamescridland - is there an issue with the transcript rollout in iTunes today?

  1. On the Podcasts app on my mac, they are featuring a 'Transcript' section, but none of those podcast RSS feeds show the podcast:transcript tag
  2. When I use view source or curl I also cannot see the podcast:transcript tag, for example the latest episode of RadioLab
  3. Is it at all possible that if it is not the iOS podcast app, Apple will not be showing the transcripts so Android apps or any non Apple iOS podcast apps will not benefit from transcripts?
  4. In this article on macrumors, For now, Apple’s generated transcripts are only available for podcasts published in the Apple Podcasts catalogue, not for podcasts you have manually added via an RSS feed - Does that mean not all of 4 million podcasts on iTunes will get transcripts?

@jamescridland
Copy link

  1. On the Podcasts app on my mac, they are featuring a 'Transcript' section, but none of those podcast RSS feeds show the podcast:transcript tag

Apple produces its own transcripts, as well as using the podcast:transcript tag.

  1. When I use view source or curl I also cannot see the podcast:transcript tag, for example the latest episode of RadioLab

As above!

  1. Is it at all possible that if it is not the iOS podcast app, Apple will not be showing the transcripts so Android apps or any non Apple iOS podcast apps will not benefit from transcripts?

If there's a podcast:transcript tag listed, then anyone (like AntennaPod) can use the transcript.

But Apple isn't making its own transcripts (in the app) available to anyone other than the podcast creator. Which seems fair enough.

  1. In this article on macrumors, For now, Apple’s generated transcripts are only available for podcasts published in the Apple Podcasts catalogue, not for podcasts you have manually added via an RSS feed - Does that mean not all of 4 million podcasts on iTunes will get transcripts?

There are only 2.4mn podcasts in Apple Podcasts. All of those come with auto-transcripts unless the creator has opted out.

@jamescridland
Copy link

4. Apple’s generated transcripts are only available for podcasts published in the Apple Podcasts catalogue, not for podcasts you have manually added via an RSS feed - Does that mean not all of 4 million podcasts on iTunes will get transcripts?

Just since I'm updating my article...

Apple won't produce automated transcripts for RSS feeds you've manually added. But the app will, apparently, show transcripts from the podcast:transcript tag.

@tonytamsf
Copy link
Member Author

Thank you @jamescridland for the explanation. Wow, talk about fragmenting the ecosystem with iOS wall.

Today I am writing a school paper on why televisions and movies have gotten close to 100% close captioning but transcripts in podcast are still at 1%. It is mind boggling

@jamescridland
Copy link

Wow, talk about fragmenting the ecosystem with iOS wall.

Actually, it's the opposite here. Apple has taken an open standard that's been under development for three years and implemented it in their app. But for those podcasts that can't be bothered to create a transcript to appear in Apple Podcasts, Apple will do it for them. That seems generous and the right thing to do.

transcripts in podcast are still at 1%

In Apple Podcasts, transcripts in podcasts are, from today, almost 100%.
And Apple have ensured that creators remain in control of their work.
If there's anything to take away, it's that Apple have done 100% the correct thing.

@tonytamsf
Copy link
Member Author

Wow, talk about fragmenting the ecosystem with iOS wall.

Actually, it's the opposite here. Apple has taken an open standard that's been under development for three years and implemented it in their app. But for those podcasts that can't be bothered to create a transcript to appear in Apple Podcasts, Apple will do it for them. That seems generous and the right thing to do.

So if podcast A is transcribed by Apple, the RSS feed that is served out to non iOS Podcast players will not show the podcast:transcript tag (let's say it's hosted on transistor.fm). Apple is transcribing podcast A only for the Apple Podcast player.

So I will assume that the Apple Podcast app no longer fetches from the transistor.fm RSS feed directly?

Then transistor.fm has to do it's own transcript in order for Spotify, AntennaPod, Overcast to consume. Is that the correct understanding?

I guess it's too much to ask Apple to do us all a favor and give us the transcript for free via a https://transcript.proxy.apple.com/?url=https://transistor.fm/feed.rss

@jamescridland
Copy link

So if podcast A is transcribed by Apple, the RSS feed that is served out to non iOS Podcast players will not show the podcast:transcript tag (let's say it's hosted on transistor.fm). Apple is transcribing podcast A only for the Apple Podcast player. So I will assume that the Apple Podcast app no longer fetches from the transistor.fm RSS feed directly? Then transistor.fm has to do it's own transcript in order for Spotify, AntennaPod, Overcast to consume. Is that the correct understanding?

No, precisely wrong. If the creator is supplying a transcript, Apple always uses it.
If the creator doesn't supply a transcript, Apple - at its own expense - makes a transcript for its own app.

I guess it's too much to ask Apple to do us all a favor and give us the transcript for free

As the creator - Apple can do what it wants in its own app, but I don't want Apple to suddenly take control of my own RSS feed, no; and nor do I want it to produce transcripts for others.

Apple here is respecting the creator. And, good on them for doing so.

@ssb22
Copy link

ssb22 commented Mar 7, 2024

re the references to Podverse: I can't find any obvious place in podcast-rn source that actually loads <podcast:transcript> tags from RSS feeds, nor have I been able to make Play Store's Podverse 4.15.4 fetch a transcript JSON file from my test server (after adding the test RSS feed to Podverse, my Apache logs show Podverse fetching the RSS but not the JSON, even when I play the episode, so it's unsurprising that no transcript is shown), nor was I able to make that version of Podverse show a transcript for feeds.buzzsprout.com/1.rss which has been cited as a working example of podcast:transcript. So unless I've missed something, it seems Podverse gets its transcripts by some means other than the <podcast:transcript> tag and they are available only on select podcasts, therefore it's unfortunately not suitable as a player for testing the behaviour of new <podcast:transcript> tags.

@jamescridland
Copy link

Podverse may not support JSON, but probably does SRT. Have you tried that?

@nathangathright
Copy link

I see Podverse using the JSON transcript from feeds.buzzsprout.com/1.rss in their oldest episode: https://podverse.fm/episode/ubyZFbGg8f

Here’s the line in podverse-rn where they fetch a transcript URL:

if (episode?.transcript?.[0]?.url && episode?.transcript?.[0]?.type) {
  try {
    if (episode?.id) {
      parsedTranscript = await getEpisodeProxyTranscript(episode.id)
    } else {
      parsedTranscript = await getParsedTranscript(episode.transcript[0].url)
    }
  } catch (error) {
    errorLogger(_fileName, 'componentDidMount', error)
  }
}

@jamescridland
Copy link

Oh yes.

Screenshot 2024-03-07 at 12 16 19 pm

To the point of @ssb22 - the JSON call is absolutely made from the Podverse player.

@ssb22
Copy link

ssb22 commented Mar 7, 2024

Ah yes, and getParsedTranscript in src/lib/transcriptHelpers.ts fetches transcriptUrl then calls convertFile from the Transcriptator library which isn't part of the podverse-rn repo which explains why my grepping for JSON handling wasn't finding it. Still my Apache logs were telling me my JSON file referred to in the below RSS wasn't being fetched by Podverse although it was fetched by Anytime Podcast Player:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:podcast="https://podcastindex.org/namespace/1.0">
<channel>
<atom:link href="https://url.removed/" rel="self" type="application/rss+xml" />
<title>Test Podcast Title</title>
<description>Test podcast description</description>
<link>https://url.removed</link>
<image>
<url>https://url.removed.jpg</url>
<title>Test Title</title>
<link>https://url.removed</link>
</image>
<item>
<guid>https://url.removed.mp3</guid>
<link>https://url.removed</link>
<title>Test Episode Title</title>
<description>Test episode description</description>
<podcast:transcript url="https://url.removed.json" type="application/json" />
<pubDate>Fri, 1 Mar 2024 00:00:00 +0000</pubDate>
<itunes:episode>1</itunes:episode>
<enclosure url="https://url.removed.mp3" type="audio/mpeg" length="2179906"/>
<itunes:duration>153</itunes:duration>
</item>
</channel>
</rss>

@kjmph

This comment has been minimized.

@keunes

This comment has been minimized.

@AntennaPod AntennaPod locked and limited conversation to collaborators Mar 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Area: Podcast Index / Podcasting 2.0 Anything related to PodcastIndex.org and/or Podcasting 2.0 Type: Feature request
Projects
None yet