Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

author metadata field is null for YouTube videos #397

Closed
basilioss opened this issue Jul 27, 2023 · 2 comments
Closed

author metadata field is null for YouTube videos #397

basilioss opened this issue Jul 27, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@basilioss
Copy link

I want to get channel name from a YouTube video. It seems like the author field could provide this value, but if I try to use command-line interface to get the metadata, the author key is always null. Here's an example:

$ trafilatura --json -u 'https://youtu.be/lTRiuFIWV54' | jq
{
  "title": "1 A.M Study Session 📚 - [lofi hip hop/chill beats]",
  "author": null,
  "hostname": "youtube.com",
  "date": "2019-12-08",
  "categories": "",
  "tags": "chilledcow, chilled cow, lofi, lofi hiphop, lofi hip hop, lo-fi hiphop, lo fi hip-hop, 1.AM Study session, study, study music, study mix, lofi mix, lofi hip hop mix, lofi compilation, chilledcow mix, chillhop, chill beats, chill, chill music",
  "fingerprint": "3f62dbb16fd42ffe",
  "id": null,
  "license": null,
  "comments": "",
  "raw_text": "Про розділ\nДля преси\nАвторські права\nЗв'язатися з нами\nДля авторів\nДля рекламодавців\nДля розробників\nУмови\nКонфіденційність\nПравила й безпека\nЯк працює YouTube\nСпробувати нові функції\n© 2023 Google LLC",
  "text": "Про розділ\nДля преси\nАвторські права\nЗв'язатися з нами\nДля авторів\nДля рекламодавців\nДля розробників\nУмови\nКонфіденційність\nПравила й безпека\nЯк працює YouTube\nСпробувати нові функції\n© 2023 Google LLC",
  "language": null,
  "image": "https://i.ytimg.com/vi/lTRiuFIWV54/maxresdefault.jpg",
  "pagetype": "video.other",
  "source": "https://www.youtube.com/watch?v=lTRiuFIWV54",
  "source-hostname": "YouTube",
  "excerpt": "🎭 | Create your lofi avatar now→ https://bit.ly/lofigirl-generator🎼 | Listen on Spotify, Apple music and more→ https://fanlink.to/lofigirl-music🌎 | Lofi..."
}
@adbar adbar added the enhancement New feature or request label Aug 2, 2023
@adbar
Copy link
Owner

adbar commented Aug 2, 2023

Hi @basilioss, I can reproduce the issue, I assume it's necessary to add an additional X-Path expression to target authors names on Youtube.

@adbar
Copy link
Owner

adbar commented Apr 19, 2024

I regularly add XPath expressions to address metadata issues, e.g. #567. I tried to fix this issue but Youtube extraction is too variable for a generic extractor like Trafilatura, it would be too difficult to maintain, you can use a package focusing on Youtube for better results.

@adbar adbar closed this as not planned Won't fix, can't repro, duplicate, stale Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants