-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YouTube] Add "most replayed" aka heatmap data #3888
Comments
I think some other porn website (I don't remember what exactly is) have feature like heatmap, so it may be worth to define a new field |
@Lesmiscore The implementation on the web app is a one-to-one copy. It's just uncanny. The way it uses an
I bet money that YouTube's front-end Dev went like *clicks on search bar* P *autocompletes* ⏎ *clicks on a video* F12 "ok, so that's what we're going with..." Ctrl+C |
YT renders that SVG from the initial data as explained in #3888 (comment) |
We could extract the raw data and put it in a new field, but what would users do with it? |
@pukkandan It's soft data, but there's still a lot you can do with it:
|
I understand how this data is useful when shown overlayed on the video. But obv, yt-dlp can't do that. We can only extract the data and put it in a field in the We shouldn't repeat the same mistake on this that yt-dlc made with live chat downloads. Implementing the live chat as a subtitle has made it quite difficult to expand this to other sites. Furthermore, by making it download the raw content, it made it difficult for any third parties to extract useful info out of the live chat. When implementing this, I just want to avoid similar mistakes. Once the feature is implemented and released, it is assumed that third parties may depends on it. Any changes after that point will be bound by backward compatibility requirements |
On a related note, has anyone requested this feature to any video player (mpv/mpc/vlc)? I would like to see how they want to handle this |
@pukkandan, I agree with you, the layout should be well thought out. But I found no standards, drafts or suggestions on how to format this. Not for any file format or player. There also seems to be no obvious standard throughout proprietary software. P***Hub calls this feature "hotspots". It seems they just yeet the data to the front-end by embedding it into HTML using a JavaScript variable called "hotspots":["2126473","1509162","1415385","1263371","1130396","1063220","1032573","1012590","1049435","1042380","1047685","1097360","1098579","1046719","1120822","1071058","1024836","1004157","1010109","1007019","1009978","1014976","1088071","1141994","1142713","1115550","1004629","909680","921539","865001","830299","827111","831371","803551","788714","755455","728531","699063","665349","642931","619397","716136","663634","661081","651885","598485","556105","549820","580571","677124","697647","763739","828103","779439","780161","821711","782140","745356","702643","676545","675660","718809","755582","766731","695390","665713","637099","633781","641765","630462","618966","610812","590461","581882","556716","549597","536563","578955","554676","593583","607611","721988","627043","619569","621223","609902","594560","588217","569446","638016","552857","549362","517154","508932","492234","495604","469092","467487","468587","516710","556376","659476","727488","774049","702127","653983","588627","548224","505261","471961","445253","397921"] Naming it heatmap in the One of the possible formats could look like this: "heatmap": [{
"start": 0, // Start of the marker in seconds. Always provided.
// Might be calculated with: video duration ÷ number of all markers · index of this marker
"end": 4.321, // End of the marker in seconds. Always provided.
// Might be calculated with: video duration ÷ number of all markers · (index of this marker + 1)
"normalized": 0.81, // Normalized heat value from 0 to 1. Always provided.
// Might be calculated with: absolute value of this marker ÷ biggest absolute value of all markers
"absolute": 2126473, // Absolute heat value. Might be not provided at all.
// ... add more attributes for markers in the future here
}, ...],
"heatmapMeta": { // ... add more data about the heat map in the future here |
This comment was marked as spam.
This comment was marked as spam.
I don't want to make an inappropriate remark but I have been contacted about this thread so I thought I may help some people in immediate need: my open-source YouTube operational API is able to retrieve the most replayed data from a YouTube video from its id by fetching https://A_USUAL_INSTANCE/videos?part=mostReplayed&id=VIDEO_ID (note that the official instance |
To extract data youtube-heatmap |
This comment was marked as off-topic.
This comment was marked as off-topic.
Are we going anywhere with this data?
I didn't understand the calculation, can someone help a dumb guy, if I understand I can implement it. EDIT: ok I got the instructions. |
I think niklas-englert's proposal is mostly good.
Note that once we add a field, it can never be removed/changed due to compatibility requirements. But any field that we skip can always be added back in future if needed |
My suggestion was designed to provide a standard able to represent heatmap data in a uniform way as well as to include all additional information that other sites provide as well as information that may come in the future.
For YT (and for now) we can only provide
Well, you could drop
Can be completely omitted at the moment. I just wanted to demonstrate what could be done if more data than just the points of the heatmap are provided. PS: I'm still looking forward for any implementation of yt-dlp. I have been receiving non-stop private messages asking for a solution ever since I opened the original issue on youtube-dl (50+ and still counting). Since I took my public email off GitHub, it's gotten a little better. But this is a compromise that I would like to reverse as soon as possible. |
About the duration and normalized, IMO the duration is necessary. I believe we don't need the start and end, because probably the heatmap will take most of the video, even if sometimes we have some picks (mountains) in the heatmap. This is what I propose:
Edit: I am having a look in the code at the moment, and the theory about the
is not usable when we have chapters. What is the rationale for the 5.0?. Can you explain? Maybe I will find out about the chapters.
|
Following the approach from @Benjamin-Loison and @WillianAgostini, I made the following gist in python. This gist does not use the methodology explained here, as I was not able to completely translate the logic once we have videos with chapters. What the Gist is doing is getting the HTML from a link search for a specific script that contains the ytInitialData, inside this data we can retrieve the original data from youtube. You can also see the same code in Javascript, this is my original version. I have made a version from python that runs, but with a few errors on puppeteer, but as Python is not my "native" language and I do it only for a hobby I will let it for the experts to fix it. https://github.com/guifeliper/yt-heatmap#readme
|
There is an abandoned project that creates a timebar of thumbnails and other data, with a script that rendered it in MPV: https://github.com/nordlicht/nordlicht Related to that project, YouTube also has a way to show several seekbar thumbnails at the same time, and there was in the past a userscript to show them all at the same time as a gallery. It would be nice to be able to get something like nordlicht/nordlicht#67 (comment) from them (click the image if it is slow to load). |
@aleksejrs How's that related to this issue? yt-dlp can already download storyboards (thumbnails in timebar)1. This post is about heatmap (see image in OP), not storyboards Footnotes
|
@pukkandan The program is related by having a script for MPV (I don't know if the script still works though). How do I learn about the existence of storyboards (they seem to be mentioned in the format list, so the word appears a lot in issues, with no details) and how to use them? |
They are listed in |
I don't know if it's a YouTube side change or if I just found out a video having this behavior but sometimes (after multiple webpage refreshes) from yt_dlp import YoutubeDL
import json
with YoutubeDL() as ydl:
info_dict = ydl.extract_info('https://www.youtube.com/watch?v=MX5GkDRIdno', download=False)
print(json.dumps(info_dict.get('heatmap'), indent = 4)) Instead of considering: ytInitialData['playerOverlays']['playerOverlayRenderer']['decoratedPlayerBarRenderer']['decoratedPlayerBarRenderer']['playerBar']['multiMarkersPlayerBarRenderer']['markersMap'][-1]['value']['heatmap']['heatmapRenderer']` we have to consider: ytInitialData['frameworkUpdates']['entityBatchUpdate']['mutations'][0]['payload']['macroMarkersListEntity']['markersList'] Note that the data structure doesn't just go from an element to an array, as for instance I am also currently managing this issue in my YouTube operational API. |
Closes yt-dlp#3888 Authored by: tntmod54321
Checklist
Region
Germany
Example URLs
https://www.youtube.com/watch?v=Z8Z51no1TD0
Description
Since Mai 18 YouTube started to roll out a feature adding a "most replayed" graph (internal name seems to be "heatmap") to the progress bar after they were experimenting with it for at least two years. (see this tweet)
The data of this new feature seems to be missing right now. I'm currently helping myself laboriously with a self-written web extension. I hope what I found out so far is somehow helpful:
YouTube's implementation on the web page is relatively straight forward and easy to extract (using an extension inject). A SVG tag on the page (
svg.ytp-heat-map-svg
1000x100) contains a path defined with cubic Bézier curves (aC
followed by threex,y
pairs).Every third
x,y
parameter after aC
, wherex
ends with5.0
, is a usable data point:x
is the time stamp in percent. Just compute(x-5)/1000
for a value from 0 to 1.y
is the heat value for this time period. Just compute(100-y)/100
for a value from 0 to 1.Example:
Here's the SVG tag for this video:
...and boiled down data:
The text was updated successfully, but these errors were encountered: