Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YoutubeTranscript] 🚨 TypeError: Cannot read properties of undefined (reading 'transcriptBodyRenderer') #19

Closed
Ozanaydinn opened this issue Apr 5, 2024 · 30 comments

Comments

@Ozanaydinn
Copy link

Hello,
I'm not sure if this project is still getting maintenance but still wanted to create an issue for this!

We are using v1.1.0 in our project and this morning suddenly we started getting this error :

Error message: [YoutubeTranscript] 🚨 TypeError: Cannot read properties of undefined (reading 'transcriptBodyRenderer')

I also tried to run the example code in a brand new project but still the same error, so I guess that eliminates any errors that might have happened on our side. Any help regarding the issue would be greatly appreciated!

@Klajver07
Copy link

Hello im having same problem @Ozanaydinn

@PaulBratslavsky
Copy link

I haven't checked in too much detail what is happening but I think it has something to do with YouTube itself.

`https://www.youtube.com/youtubei/v1/get_transcript?key=${innerTubeApiKey}`,

Maybe this line of code is no longer getting the correct API.

@canadaduane
Copy link

It looks like the API must have changed. There is no longer a .body property in actions[0].updateEngagementPanelAction.content.transcriptRenderer:

image

In other words this line is failing:

const transcripts =
  body.actions[0].updateEngagementPanelAction.content
    .transcriptRenderer.body.transcriptBodyRenderer.cueGroups

@canadaduane
Copy link

Does anyone know more about the /timedtext internal API? It seems to provide the transcript data, but is behind a signature field.

image

@timothycarambat
Copy link

timothycarambat commented Apr 5, 2024

Yeah, looks like the endpoint was killed - RIP.
This script seems to emulate a few steps in the script but instead of getting INNERTUBE it gets the signed URL for that session to get /timedtext

Probably very flaky and requires an HTML parser

import { parse } from "node-html-parser";

const PAGE = await fetch("https://www.youtube.com/watch?v=bZQun8Y4L2A")
  .then((res) => res.text())
  .then((html) => parse(html));

const scripts = PAGE.getElementsByTagName("script");
const playerScript = scripts.find((script) =>
  script.textContent.includes("var ytInitialPlayerResponse = {"),
);

const dataString = playerScript.textContent
  ?.split("var ytInitialPlayerResponse = ")?.[1]
  ?.slice(0, -1);
const data = JSON.parse(dataString.trim());
const captionsUrl =
  data.captions.playerCaptionsTracklistRenderer.captionTracks[0].baseUrl;

const resXML = await fetch(captionsUrl)
  .then((res) => res.text())
  .then((xml) => parse(xml));

let transcript;
const chunks = resXML.getElementsByTagName("text");
for (const chunk of chunks) {
  transcript += chunk.textContent;
}
console.log(transcript); // :)

@SchmitzAndrew
Copy link

Appreciate the quick fix!

@timothycarambat
Copy link

timothycarambat commented Apr 5, 2024

If anybody is looking at making a class for this. Here is my currently working example for a quick patch. No promises it will keep working :)

const { parse } = require("node-html-parser");
const RE_YOUTUBE =
  /(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=)|youtu\.be\/)([^"&?\/\s]{11})/i;
const USER_AGENT =
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36,gzip(gfe)";

class YoutubeTranscriptError extends Error {
  constructor(message) {
    super(`[YoutubeTranscript] ${message}`);
  }
}

/**
 * Class to retrieve transcript if exist
 */
class YoutubeTranscript {
  /**
   * Fetch transcript from YTB Video
   * @param videoId Video url or video identifier
   * @param config Object with lang param (eg: en, es, hk, uk) format.
   * Will just the grab first caption if it can find one, so no special lang caption support.
   */
  static async fetchTranscript(videoId, config = {}) {
    const identifier = this.retrieveVideoId(videoId);
    const lang = config?.lang ?? "en";
    try {
      const transcriptUrl = await fetch(
        `https://www.youtube.com/watch?v=${identifier}`,
        {
          headers: {
            "User-Agent": USER_AGENT,
          },
        }
      )
        .then((res) => res.text())
        .then((html) => parse(html))
        .then((html) => this.#parseTranscriptEndpoint(html, lang));

      if (!transcriptUrl)
        throw new Error("Failed to locate a transcript for this video!");

      // Result is hopefully some XML.
      const transcriptXML = await fetch(transcriptUrl)
        .then((res) => res.text())
        .then((xml) => parse(xml));

      let transcript = "";
      const chunks = transcriptXML.getElementsByTagName("text");
      for (const chunk of chunks) {
        transcript += chunk.textContent;
      }

      return transcript;
    } catch (e) {
      throw new YoutubeTranscriptError(e);
    }
  }

  static #parseTranscriptEndpoint(document, langCode = null) {
    try {
      // Get all script tags on document page
      const scripts = document.getElementsByTagName("script");

      // find the player data script.
      const playerScript = scripts.find((script) =>
        script.textContent.includes("var ytInitialPlayerResponse = {")
      );

      const dataString =
        playerScript.textContent
          ?.split("var ytInitialPlayerResponse = ")?.[1] //get the start of the object {....
          ?.split("};")?.[0] + // chunk off any code after object closure.
        "}"; // add back that curly brace we just cut.

      const data = JSON.parse(dataString.trim()); // Attempt a JSON parse
      const availableCaptions =
        data?.captions?.playerCaptionsTracklistRenderer?.captionTracks || [];

      // If languageCode was specified then search for it's code, otherwise get the first.
      let captionTrack = availableCaptions?.[0];
      if (langCode)
        captionTrack =
          availableCaptions.find((track) =>
            track.languageCode.includes(langCode)
          ) ?? availableCaptions?.[0];

      return captionTrack?.baseUrl;
    } catch (e) {
      console.error(`YoutubeTranscript.#parseTranscriptEndpoint ${e.message}`);
      return null;
    }
  }

  /**
   * Retrieve video id from url or string
   * @param videoId video url or video id
   */
  static retrieveVideoId(videoId) {
    if (videoId.length === 11) {
      return videoId;
    }
    const matchId = videoId.match(RE_YOUTUBE);
    if (matchId && matchId.length) {
      return matchId[1];
    }
    throw new YoutubeTranscriptError(
      "Impossible to retrieve Youtube video ID."
    );
  }
}

module.exports = {
  YoutubeTranscript,
  YoutubeTranscriptError,
};

@sbbeez
Copy link

sbbeez commented Apr 6, 2024

My code relied entirely on the TranscriptionResponse schema from the package, leading to the entire API breaking down.

I made a slight adjustment to @timothycarambat 's code (which, by the way, functions flawlessly 🫡) to ensure we maintain the same signature:

   // use the following code snippet at the end of `fetchTranscript` 
    for (const chunk of chunks) {
      const [offset, duration] = chunk.rawAttrs.split(" ");
      const convertToMs = (text: string) =>
        parseFloat(text.split("=")[1].replace(/"/g, "")) * 1000;
      transcriptions.push({
        text: chunk.text,
        offset: convertToMs(offset),
        duration: convertToMs(duration),
      });
    }

@dcsan
Copy link

dcsan commented Apr 6, 2024

is there a new published version?

@dcsan
Copy link

dcsan commented Apr 6, 2024

if anyone wants a typescript version, here it is slightly cleaned up compared to above

// https://github.com/Kakulukian/youtube-transcript/issues/19
// If anybody is looking at making a class for this. Here is my currently working example for a quick patch. No promises it will keep working :)

import { parse } from "node-html-parser"
const RE_YOUTUBE =
  /(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=)|youtu\.be\/)([^"&?\/\s]{11})/i
const USER_AGENT =
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36,gzip(gfe)"

class YoutubeTranscriptError extends Error {
  constructor(message: string) {
    super(`[YoutubeTranscript] ${message}`)
  }
}

type YtFetchConfig = {
  lang?: string // Object with lang param (eg: en, es, hk, uk) format.
}

/**
 * Class to retrieve transcript if exist
 */
class YoutubeGrabTool {
  /**
   * Fetch transcript from YTB Video
   * @param videoId Video url or video identifier
   * @param config Object with lang param (eg: en, es, hk, uk) format.
   * Will just the grab first caption if it can find one, so no special lang caption support.
   */
  static async fetchTranscript(videoId: string, config: YtFetchConfig = {}) {
    const identifier = this.retrieveVideoId(videoId)
    const lang = config?.lang ?? "en"
    try {
      const transcriptUrl = await fetch(
        `https://www.youtube.com/watch?v=${identifier}`,
        {
          headers: {
            "User-Agent": USER_AGENT,
          },
        }
      )
        .then((res) => res.text())
        .then((html) => parse(html))
        .then((html) => this.#parseTranscriptEndpoint(html, lang))

      if (!transcriptUrl)
        throw new Error("Failed to locate a transcript for this video!")

      // Result is hopefully some XML.
      const transcriptXML = await fetch(transcriptUrl)
        .then((res) => res.text())
        .then((xml) => parse(xml))

      const chunks = transcriptXML.getElementsByTagName("text")

      function convertToMs(text: string) {
        const float = parseFloat(text.split("=")[1].replace(/"/g, "")) * 1000
        return Math.round(float)
      }

      let transcriptions = []
      for (const chunk of chunks) {
        const [offset, duration] = chunk.rawAttrs.split(" ")
        transcriptions.push({
          text: chunk.text,
          offset: convertToMs(offset),
          duration: convertToMs(duration),
        })
      }
      return transcriptions
    } catch (e: any) {
      throw new YoutubeTranscriptError(e)
    }
  }

  static #parseTranscriptEndpoint(document: any, langCode?: string) {
    try {
      // Get all script tags on document page
      const scripts = document.getElementsByTagName("script")

      // find the player data script.
      const playerScript = scripts.find((script: any) =>
        script.textContent.includes("var ytInitialPlayerResponse = {")
      )

      const dataString =
        playerScript.textContent
          ?.split("var ytInitialPlayerResponse = ")?.[1] //get the start of the object {....
          ?.split("};")?.[0] + // chunk off any code after object closure.
        "}" // add back that curly brace we just cut.

      const data = JSON.parse(dataString.trim()) // Attempt a JSON parse
      const availableCaptions =
        data?.captions?.playerCaptionsTracklistRenderer?.captionTracks || []

      // If languageCode was specified then search for it's code, otherwise get the first.
      let captionTrack = availableCaptions?.[0]
      if (langCode)
        captionTrack =
          availableCaptions.find((track: any) =>
            track.languageCode.includes(langCode)
          ) ?? availableCaptions?.[0]

      return captionTrack?.baseUrl
    } catch (e: any) {
      console.error(`YoutubeTranscript.#parseTranscriptEndpoint ${e.message}`)
      return null
    }
  }

  /**
   * Retrieve video id from url or string
   * @param videoId video url or video id
   */
  static retrieveVideoId(videoId: string) {
    if (videoId.length === 11) {
      return videoId
    }
    const matchId = videoId.match(RE_YOUTUBE)
    if (matchId && matchId.length) {
      return matchId[1]
    }
    throw new YoutubeTranscriptError("Impossible to retrieve Youtube video ID.")
  }
}

export { YoutubeGrabTool, YoutubeTranscriptError }

used like

      const transcriptChunks = await YoutubeGrabTool.fetchTranscript(videoUrl)

changed the name so I can uninstall the other one.

@alexmartinezm
Copy link

Hope we get a fix version asap. Thanks for your contributions!

@AbbasPlusPlus
Copy link

AbbasPlusPlus commented Apr 8, 2024

I've made a TS version of the class @timothycarambat created. I've also added support for youtube shorts in the regex:

import { parse } from 'node-html-parser';
const USER_AGENT =
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36,gzip(gfe)';

export class YoutubeTranscriptError extends Error {
  constructor(message: string) {
    super(`[YoutubeTranscript] ${message}`);
  }
}

export class YoutubeTranscript {
  /**
   * Fetch transcript from YouTube Video
   * @param videoId Video url or video identifier
   * @param config Object with lang param (eg: en, es, hk, uk) format.
   * Will just grab the first caption if it can find one, so no special lang caption support.
   */
  static async fetchTranscript(videoId: string, config: { lang?: string } = {}) {
    const identifier = this.retrieveVideoId(videoId);
    const lang = config?.lang ?? 'en';
    try {
      const transcriptUrl = await fetch(`https://www.youtube.com/watch?v=${identifier}`, {
        headers: {
          'User-Agent': USER_AGENT,
        },
      })
        .then((res) => res.text())
        .then((html) => parse(html))
        .then((html) => this.parseTranscriptEndpoint(html, lang));

      if (!transcriptUrl) throw new Error('Failed to locate a transcript for this video!');

      const transcriptXML = await fetch(transcriptUrl)
        .then((res) => res.text())
        .then((xml) => parse(xml));

      let transcript = '';
      const chunks = transcriptXML.getElementsByTagName('text');
      for (const chunk of chunks) {
        transcript += chunk.textContent + ' ';
      }

      return transcript.trim();
    } catch (e) {
      throw new YoutubeTranscriptError(e.message);
    }
  }

  private static parseTranscriptEndpoint(document: any, langCode: string | null = null) {
    try {
      const scripts = document.getElementsByTagName('script');
      const playerScript = scripts.find((script: any) =>
        script.textContent.includes('var ytInitialPlayerResponse = {')
      );

      const dataString = playerScript.textContent?.split('var ytInitialPlayerResponse = ')?.[1]?.split('};')?.[0] + '}';

      const data = JSON.parse(dataString.trim());
      const availableCaptions = data?.captions?.playerCaptionsTracklistRenderer?.captionTracks || [];

      let captionTrack = availableCaptions?.[0];
      if (langCode) {
        captionTrack =
          availableCaptions.find((track: any) => track.languageCode.includes(langCode)) ?? availableCaptions?.[0];
      }

      return captionTrack?.baseUrl;
    } catch (e) {
      console.error(`YoutubeTranscript.#parseTranscriptEndpoint ${e.message}`);
      return null;
    }
  }

  /**
   * Retrieve video id from url or string
   * @param videoId video url or video id
   */
  static retrieveVideoId(videoId: string) {
    const regex =
      /(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=|shorts\/)|youtu\.be\/)([^"&?\/\s]{11})/i;
    const matchId = videoId.match(regex);
    if (matchId && matchId.length) {
      return matchId[1];
    }
    throw new YoutubeTranscriptError('Impossible to retrieve Youtube video ID.');
  }
}

@piktur
Copy link

piktur commented Apr 10, 2024

Thansk @AbbasPlusPlus have handled shorts URL in this #21 . Also provides a test suite.

const RE_PATH = /v|e(?:mbed)?|shorts/;

// ...

export const getVideoId = (videoUrlOrId: string): string | null => {
  if (!videoUrlOrId) {
    return null
  }

  if (videoUrlOrId.length === ID_LENGTH) {
    return videoUrlOrId;
  }

  try {
    const url = new URL(videoUrlOrId);
    const segments = url.pathname.split('/');

    if (segments[1]?.length === ID_LENGTH) {
      return segments[1];
    }

    return (
      (RE_PATH.test(segments[1]) ? segments[2] : url.searchParams.get('v')) ||
      null
    );
  } catch (err) {
    return null;
  }
};

@sinedied
Copy link

@timothycarambat @sbbeez Thanks a LOT, you just saved my demo for a session I have this afternoon 🙏 ❤️

@canadaduane
Copy link

canadaduane commented Apr 11, 2024

Thanks to all who've helped out. I wrapped the typescript up in a fork that implements the above as a package, with dist/ included so we could replace (for now) the existing package. https://github.com/SchoolAI/youtube-transcript

We include it as follows in our package.json:

    "youtube-transcript": "github:schoolai/youtube-transcript#6455ee21aab22e631f0c290df21b9e34e10adc4f",

(Note that this is not directly API compatible, as the function above changed the API)

@sinedied
Copy link

@canadaduane Could you include the changes from @sbbeez ? It makes code API compatible with the existing youtube-transcript package 🙂

@piktur
Copy link

piktur commented Apr 11, 2024

Thanks @canadaduane I see you used the earlier PR, but I recommend merging the latter #21 and rebuilding. It replaces cheerio with node-html-parser, implements broader support for all youtube URLs and provides a test suite.

@sinedied the public interface of the above linked PR preserves the former interface. YoutubeTranscript.retrieveVideoId has been promoted to public property.
Screenshot 2024-04-12 at 08 33 13

@SahilPulikal
Copy link

For those who are facing this issue, you can refer below solution: Originally posted by @sinedied in langchain-ai/langchainjs#4994 (comment)
it worked for me.

For a temporary workaround until this is fixed upstream, you can:

npm i https://github.com/sinedied/youtube-transcript\#a10a073ac325b3b88018f321fa1bc5d62fa69b1c

This will use my fork that use a compatible drop-in code replacement from Kakulukian/youtube-transcript#19, all the code credits goes to the folks there.

When the issue is fixed upstream, you can simply:

npm rm youtube-transcript

and it will return to the upstream version.

Thank you so much @sinedied .

@Medullitus
Copy link

where is fix?? i didn't understand anything!!

@Medullitus
Copy link

I've made a TS version of the class @timothycarambat created. I've also added support for youtube shorts in the regex:

import { parse } from 'node-html-parser';
const USER_AGENT =
  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36,gzip(gfe)';

export class YoutubeTranscriptError extends Error {
  constructor(message: string) {
    super(`[YoutubeTranscript] ${message}`);
  }
}

export class YoutubeTranscript {
  /**
   * Fetch transcript from YouTube Video
   * @param videoId Video url or video identifier
   * @param config Object with lang param (eg: en, es, hk, uk) format.
   * Will just grab the first caption if it can find one, so no special lang caption support.
   */
  static async fetchTranscript(videoId: string, config: { lang?: string } = {}) {
    const identifier = this.retrieveVideoId(videoId);
    const lang = config?.lang ?? 'en';
    try {
      const transcriptUrl = await fetch(`https://www.youtube.com/watch?v=${identifier}`, {
        headers: {
          'User-Agent': USER_AGENT,
        },
      })
        .then((res) => res.text())
        .then((html) => parse(html))
        .then((html) => this.parseTranscriptEndpoint(html, lang));

      if (!transcriptUrl) throw new Error('Failed to locate a transcript for this video!');

      const transcriptXML = await fetch(transcriptUrl)
        .then((res) => res.text())
        .then((xml) => parse(xml));

      let transcript = '';
      const chunks = transcriptXML.getElementsByTagName('text');
      for (const chunk of chunks) {
        transcript += chunk.textContent + ' ';
      }

      return transcript.trim();
    } catch (e) {
      throw new YoutubeTranscriptError(e.message);
    }
  }

  private static parseTranscriptEndpoint(document: any, langCode: string | null = null) {
    try {
      const scripts = document.getElementsByTagName('script');
      const playerScript = scripts.find((script: any) =>
        script.textContent.includes('var ytInitialPlayerResponse = {')
      );

      const dataString = playerScript.textContent?.split('var ytInitialPlayerResponse = ')?.[1]?.split('};')?.[0] + '}';

      const data = JSON.parse(dataString.trim());
      const availableCaptions = data?.captions?.playerCaptionsTracklistRenderer?.captionTracks || [];

      let captionTrack = availableCaptions?.[0];
      if (langCode) {
        captionTrack =
          availableCaptions.find((track: any) => track.languageCode.includes(langCode)) ?? availableCaptions?.[0];
      }

      return captionTrack?.baseUrl;
    } catch (e) {
      console.error(`YoutubeTranscript.#parseTranscriptEndpoint ${e.message}`);
      return null;
    }
  }

  /**
   * Retrieve video id from url or string
   * @param videoId video url or video id
   */
  static retrieveVideoId(videoId: string) {
    const regex =
      /(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=|shorts\/)|youtu\.be\/)([^"&?\/\s]{11})/i;
    const matchId = videoId.match(regex);
    if (matchId && matchId.length) {
      return matchId[1];
    }
    throw new YoutubeTranscriptError('Impossible to retrieve Youtube video ID.');
  }
}

THX But How will we use this ??

@PaulBratslavsky
Copy link

Weren't the changes merged already? So you should be able to use the library.

@Gitmaxd
Copy link

Gitmaxd commented May 8, 2024

The error is still present. Just installed.

@dcsan
Copy link

dcsan commented May 8, 2024

you can install from this URL as someone documented above:

npm i https://github.com/sinedied/youtube-transcript\#a10a073ac325b3b88018f321fa1bc5d62fa69b1c

@Medullitus
Copy link

you can install from this URL as someone documented above:

npm i https://github.com/sinedied/youtube-transcript\#a10a073ac325b3b88018f321fa1bc5d62fa69b1c

How are we going to install from there? We are using the download section in Obsidian?

@Medullitus
Copy link

@Kakulukian
Why are you closing unresolved topic threads? People are trying to solve issues in your application. Meanwhile, you're not even addressing them!

@PaulBratslavsky
Copy link

@Medullitus what issue are you still running into. I just use the library directly. It works in my current project.

@Medullitus
Copy link

Medullitus commented May 9, 2024

@Medullitus what issue are you still running into. I just use the library directly. It works in my current project.

When I try to add YT video url and push the "Generate summary" button it gives me error! The error is that "Error: [YoutubeTranscript] TypeError: Cannot read properties of undefinied (reading 'transcriptBodyRenderer'). So I can't use the plugin...

@Medullitus
Copy link

HELLOOOOO

@dcsan
Copy link

dcsan commented May 12, 2024

FYI you seem to be confused here there is no "button" this is an NPM library to use to write your own code.

We are using the download section in Obsidian

this is not the repo for any obsidian plugin.

HELLOOOOO

chill out a bit, nobody is getting paid to solve your problem and your questions are so widely off base that it's clear you need to spend some time to gather a base level of information yourself and maybe find the right support channel for whatever tool you're using.

edit: my guess is there's some obsidian plugin that uses this library (this repo) and they need to update their code to use the updated version of this library. Perhaps the error message shown lead you mistakenly to come here.
So you need to find the right support channel for that plugin and go and annoy them.

@Medullitus
Copy link

Medullitus commented May 27, 2024

FYI you seem to be confused here there is no "button" this is an NPM library to use to write your own code.

We are using the download section in Obsidian

this is not the repo for any obsidian plugin.

HELLOOOOO

chill out a bit, nobody is getting paid to solve your problem and your questions are so widely off base that it's clear you need to spend some time to gather a base level of information yourself and maybe find the right support channel for whatever tool you're using.

edit: my guess is there's some obsidian plugin that uses this library (this repo) and they need to update their code to use the updated version of this library. Perhaps the error message shown lead you mistakenly to come here. So you need to find the right support channel for that plugin and go and annoy them.

Hello. How are you? I'm very sorry, I came here from the link on the Youtube Summarizer's GitHub page. It's really an important plugin for Obsidian, but it's not working. If you understand these things, is it possible for you to take a look? What do I need to do? Thanks...

ozdemir08/youtube-video-summarizer#14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests