-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YoutubeTranscript] 🚨 TypeError: Cannot read properties of undefined (reading 'transcriptBodyRenderer') #19
Comments
Hello im having same problem @Ozanaydinn |
I haven't checked in too much detail what is happening but I think it has something to do with YouTube itself. youtube-transcript/src/index.ts Line 52 in 0c6c6e7
Maybe this line of code is no longer getting the correct API. |
Yeah, looks like the endpoint was killed - RIP. Probably very flaky and requires an HTML parser import { parse } from "node-html-parser";
const PAGE = await fetch("https://www.youtube.com/watch?v=bZQun8Y4L2A")
.then((res) => res.text())
.then((html) => parse(html));
const scripts = PAGE.getElementsByTagName("script");
const playerScript = scripts.find((script) =>
script.textContent.includes("var ytInitialPlayerResponse = {"),
);
const dataString = playerScript.textContent
?.split("var ytInitialPlayerResponse = ")?.[1]
?.slice(0, -1);
const data = JSON.parse(dataString.trim());
const captionsUrl =
data.captions.playerCaptionsTracklistRenderer.captionTracks[0].baseUrl;
const resXML = await fetch(captionsUrl)
.then((res) => res.text())
.then((xml) => parse(xml));
let transcript;
const chunks = resXML.getElementsByTagName("text");
for (const chunk of chunks) {
transcript += chunk.textContent;
}
console.log(transcript); // :) |
Appreciate the quick fix! |
If anybody is looking at making a class for this. Here is my currently working example for a quick patch. No promises it will keep working :) const { parse } = require("node-html-parser");
const RE_YOUTUBE =
/(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=)|youtu\.be\/)([^"&?\/\s]{11})/i;
const USER_AGENT =
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36,gzip(gfe)";
class YoutubeTranscriptError extends Error {
constructor(message) {
super(`[YoutubeTranscript] ${message}`);
}
}
/**
* Class to retrieve transcript if exist
*/
class YoutubeTranscript {
/**
* Fetch transcript from YTB Video
* @param videoId Video url or video identifier
* @param config Object with lang param (eg: en, es, hk, uk) format.
* Will just the grab first caption if it can find one, so no special lang caption support.
*/
static async fetchTranscript(videoId, config = {}) {
const identifier = this.retrieveVideoId(videoId);
const lang = config?.lang ?? "en";
try {
const transcriptUrl = await fetch(
`https://www.youtube.com/watch?v=${identifier}`,
{
headers: {
"User-Agent": USER_AGENT,
},
}
)
.then((res) => res.text())
.then((html) => parse(html))
.then((html) => this.#parseTranscriptEndpoint(html, lang));
if (!transcriptUrl)
throw new Error("Failed to locate a transcript for this video!");
// Result is hopefully some XML.
const transcriptXML = await fetch(transcriptUrl)
.then((res) => res.text())
.then((xml) => parse(xml));
let transcript = "";
const chunks = transcriptXML.getElementsByTagName("text");
for (const chunk of chunks) {
transcript += chunk.textContent;
}
return transcript;
} catch (e) {
throw new YoutubeTranscriptError(e);
}
}
static #parseTranscriptEndpoint(document, langCode = null) {
try {
// Get all script tags on document page
const scripts = document.getElementsByTagName("script");
// find the player data script.
const playerScript = scripts.find((script) =>
script.textContent.includes("var ytInitialPlayerResponse = {")
);
const dataString =
playerScript.textContent
?.split("var ytInitialPlayerResponse = ")?.[1] //get the start of the object {....
?.split("};")?.[0] + // chunk off any code after object closure.
"}"; // add back that curly brace we just cut.
const data = JSON.parse(dataString.trim()); // Attempt a JSON parse
const availableCaptions =
data?.captions?.playerCaptionsTracklistRenderer?.captionTracks || [];
// If languageCode was specified then search for it's code, otherwise get the first.
let captionTrack = availableCaptions?.[0];
if (langCode)
captionTrack =
availableCaptions.find((track) =>
track.languageCode.includes(langCode)
) ?? availableCaptions?.[0];
return captionTrack?.baseUrl;
} catch (e) {
console.error(`YoutubeTranscript.#parseTranscriptEndpoint ${e.message}`);
return null;
}
}
/**
* Retrieve video id from url or string
* @param videoId video url or video id
*/
static retrieveVideoId(videoId) {
if (videoId.length === 11) {
return videoId;
}
const matchId = videoId.match(RE_YOUTUBE);
if (matchId && matchId.length) {
return matchId[1];
}
throw new YoutubeTranscriptError(
"Impossible to retrieve Youtube video ID."
);
}
}
module.exports = {
YoutubeTranscript,
YoutubeTranscriptError,
}; |
My code relied entirely on the TranscriptionResponse schema from the package, leading to the entire API breaking down. I made a slight adjustment to @timothycarambat 's code (which, by the way, functions flawlessly 🫡) to ensure we maintain the same signature: // use the following code snippet at the end of `fetchTranscript`
for (const chunk of chunks) {
const [offset, duration] = chunk.rawAttrs.split(" ");
const convertToMs = (text: string) =>
parseFloat(text.split("=")[1].replace(/"/g, "")) * 1000;
transcriptions.push({
text: chunk.text,
offset: convertToMs(offset),
duration: convertToMs(duration),
});
} |
is there a new published version? |
if anyone wants a typescript version, here it is slightly cleaned up compared to above // https://github.com/Kakulukian/youtube-transcript/issues/19
// If anybody is looking at making a class for this. Here is my currently working example for a quick patch. No promises it will keep working :)
import { parse } from "node-html-parser"
const RE_YOUTUBE =
/(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=)|youtu\.be\/)([^"&?\/\s]{11})/i
const USER_AGENT =
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36,gzip(gfe)"
class YoutubeTranscriptError extends Error {
constructor(message: string) {
super(`[YoutubeTranscript] ${message}`)
}
}
type YtFetchConfig = {
lang?: string // Object with lang param (eg: en, es, hk, uk) format.
}
/**
* Class to retrieve transcript if exist
*/
class YoutubeGrabTool {
/**
* Fetch transcript from YTB Video
* @param videoId Video url or video identifier
* @param config Object with lang param (eg: en, es, hk, uk) format.
* Will just the grab first caption if it can find one, so no special lang caption support.
*/
static async fetchTranscript(videoId: string, config: YtFetchConfig = {}) {
const identifier = this.retrieveVideoId(videoId)
const lang = config?.lang ?? "en"
try {
const transcriptUrl = await fetch(
`https://www.youtube.com/watch?v=${identifier}`,
{
headers: {
"User-Agent": USER_AGENT,
},
}
)
.then((res) => res.text())
.then((html) => parse(html))
.then((html) => this.#parseTranscriptEndpoint(html, lang))
if (!transcriptUrl)
throw new Error("Failed to locate a transcript for this video!")
// Result is hopefully some XML.
const transcriptXML = await fetch(transcriptUrl)
.then((res) => res.text())
.then((xml) => parse(xml))
const chunks = transcriptXML.getElementsByTagName("text")
function convertToMs(text: string) {
const float = parseFloat(text.split("=")[1].replace(/"/g, "")) * 1000
return Math.round(float)
}
let transcriptions = []
for (const chunk of chunks) {
const [offset, duration] = chunk.rawAttrs.split(" ")
transcriptions.push({
text: chunk.text,
offset: convertToMs(offset),
duration: convertToMs(duration),
})
}
return transcriptions
} catch (e: any) {
throw new YoutubeTranscriptError(e)
}
}
static #parseTranscriptEndpoint(document: any, langCode?: string) {
try {
// Get all script tags on document page
const scripts = document.getElementsByTagName("script")
// find the player data script.
const playerScript = scripts.find((script: any) =>
script.textContent.includes("var ytInitialPlayerResponse = {")
)
const dataString =
playerScript.textContent
?.split("var ytInitialPlayerResponse = ")?.[1] //get the start of the object {....
?.split("};")?.[0] + // chunk off any code after object closure.
"}" // add back that curly brace we just cut.
const data = JSON.parse(dataString.trim()) // Attempt a JSON parse
const availableCaptions =
data?.captions?.playerCaptionsTracklistRenderer?.captionTracks || []
// If languageCode was specified then search for it's code, otherwise get the first.
let captionTrack = availableCaptions?.[0]
if (langCode)
captionTrack =
availableCaptions.find((track: any) =>
track.languageCode.includes(langCode)
) ?? availableCaptions?.[0]
return captionTrack?.baseUrl
} catch (e: any) {
console.error(`YoutubeTranscript.#parseTranscriptEndpoint ${e.message}`)
return null
}
}
/**
* Retrieve video id from url or string
* @param videoId video url or video id
*/
static retrieveVideoId(videoId: string) {
if (videoId.length === 11) {
return videoId
}
const matchId = videoId.match(RE_YOUTUBE)
if (matchId && matchId.length) {
return matchId[1]
}
throw new YoutubeTranscriptError("Impossible to retrieve Youtube video ID.")
}
}
export { YoutubeGrabTool, YoutubeTranscriptError } used like const transcriptChunks = await YoutubeGrabTool.fetchTranscript(videoUrl) changed the name so I can uninstall the other one. |
Hope we get a fix version asap. Thanks for your contributions! |
I've made a TS version of the class @timothycarambat created. I've also added support for youtube shorts in the regex: import { parse } from 'node-html-parser';
const USER_AGENT =
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36,gzip(gfe)';
export class YoutubeTranscriptError extends Error {
constructor(message: string) {
super(`[YoutubeTranscript] ${message}`);
}
}
export class YoutubeTranscript {
/**
* Fetch transcript from YouTube Video
* @param videoId Video url or video identifier
* @param config Object with lang param (eg: en, es, hk, uk) format.
* Will just grab the first caption if it can find one, so no special lang caption support.
*/
static async fetchTranscript(videoId: string, config: { lang?: string } = {}) {
const identifier = this.retrieveVideoId(videoId);
const lang = config?.lang ?? 'en';
try {
const transcriptUrl = await fetch(`https://www.youtube.com/watch?v=${identifier}`, {
headers: {
'User-Agent': USER_AGENT,
},
})
.then((res) => res.text())
.then((html) => parse(html))
.then((html) => this.parseTranscriptEndpoint(html, lang));
if (!transcriptUrl) throw new Error('Failed to locate a transcript for this video!');
const transcriptXML = await fetch(transcriptUrl)
.then((res) => res.text())
.then((xml) => parse(xml));
let transcript = '';
const chunks = transcriptXML.getElementsByTagName('text');
for (const chunk of chunks) {
transcript += chunk.textContent + ' ';
}
return transcript.trim();
} catch (e) {
throw new YoutubeTranscriptError(e.message);
}
}
private static parseTranscriptEndpoint(document: any, langCode: string | null = null) {
try {
const scripts = document.getElementsByTagName('script');
const playerScript = scripts.find((script: any) =>
script.textContent.includes('var ytInitialPlayerResponse = {')
);
const dataString = playerScript.textContent?.split('var ytInitialPlayerResponse = ')?.[1]?.split('};')?.[0] + '}';
const data = JSON.parse(dataString.trim());
const availableCaptions = data?.captions?.playerCaptionsTracklistRenderer?.captionTracks || [];
let captionTrack = availableCaptions?.[0];
if (langCode) {
captionTrack =
availableCaptions.find((track: any) => track.languageCode.includes(langCode)) ?? availableCaptions?.[0];
}
return captionTrack?.baseUrl;
} catch (e) {
console.error(`YoutubeTranscript.#parseTranscriptEndpoint ${e.message}`);
return null;
}
}
/**
* Retrieve video id from url or string
* @param videoId video url or video id
*/
static retrieveVideoId(videoId: string) {
const regex =
/(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=|shorts\/)|youtu\.be\/)([^"&?\/\s]{11})/i;
const matchId = videoId.match(regex);
if (matchId && matchId.length) {
return matchId[1];
}
throw new YoutubeTranscriptError('Impossible to retrieve Youtube video ID.');
}
} |
Thansk @AbbasPlusPlus have handled shorts URL in this #21 . Also provides a test suite.
|
@timothycarambat @sbbeez Thanks a LOT, you just saved my demo for a session I have this afternoon 🙏 ❤️ |
Thanks to all who've helped out. I wrapped the typescript up in a fork that implements the above as a package, with We include it as follows in our package.json:
(Note that this is not directly API compatible, as the function above changed the API) |
@canadaduane Could you include the changes from @sbbeez ? It makes code API compatible with the existing youtube-transcript package 🙂 |
Thanks @canadaduane I see you used the earlier PR, but I recommend merging the latter #21 and rebuilding. It replaces cheerio with node-html-parser, implements broader support for all youtube URLs and provides a test suite. @sinedied the public interface of the above linked PR preserves the former interface. |
For those who are facing this issue, you can refer below solution: Originally posted by @sinedied in langchain-ai/langchainjs#4994 (comment)
Thank you so much @sinedied . |
where is fix?? i didn't understand anything!! |
THX But How will we use this ?? |
Weren't the changes merged already? So you should be able to use the library. |
The error is still present. Just installed. |
you can install from this URL as someone documented above:
|
How are we going to install from there? We are using the download section in Obsidian? |
@Kakulukian |
@Medullitus what issue are you still running into. I just use the library directly. It works in my current project. |
When I try to add YT video url and push the "Generate summary" button it gives me error! The error is that "Error: [YoutubeTranscript] TypeError: Cannot read properties of undefinied (reading 'transcriptBodyRenderer'). So I can't use the plugin... |
HELLOOOOO |
FYI you seem to be confused here there is no "button" this is an NPM library to use to write your own code.
this is not the repo for any obsidian plugin.
chill out a bit, nobody is getting paid to solve your problem and your questions are so widely off base that it's clear you need to spend some time to gather a base level of information yourself and maybe find the right support channel for whatever tool you're using. edit: my guess is there's some obsidian plugin that uses this library (this repo) and they need to update their code to use the updated version of this library. Perhaps the error message shown lead you mistakenly to come here. |
Hello. How are you? I'm very sorry, I came here from the link on the Youtube Summarizer's GitHub page. It's really an important plugin for Obsidian, but it's not working. If you understand these things, is it possible for you to take a look? What do I need to do? Thanks... |
Hello,
I'm not sure if this project is still getting maintenance but still wanted to create an issue for this!
We are using v1.1.0 in our project and this morning suddenly we started getting this error :
I also tried to run the example code in a brand new project but still the same error, so I guess that eliminates any errors that might have happened on our side. Any help regarding the issue would be greatly appreciated!
The text was updated successfully, but these errors were encountered: