Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Scrapers] Introduce new TV show scraper interface #972

Merged
merged 1 commit into from Dec 6, 2020

Conversation

bugwelle
Copy link
Collaborator

@bugwelle bugwelle commented May 3, 2020

The new scraper is described in docs/contributing/scraper-interface.md.

May fix:

There is still a lot to do. Currently only the interface is defined as well as some basic change to make MediaElch compile. Nothing works at the moment! Not even the search.

  • Make search work
    • search with different languages
    • Integrate language into UI
  • Make TV show scraping work
    • basic details loaded
    • scrape with different languages
    • load all show details
  • Make episode scraping work
    • single episode scraping works
    • multi episode scraping works
    • episode loading through TV show scraping works
  • Settings overview with language dropdown
  • Custom TV scraper works
  • IMDb scraper
  • Initialization can be repeated
  • "Fill missing episodes" works

@bugwelle
Copy link
Collaborator Author

Language/Scraper drop-down menu in search window works. :)

Screen Shot 2020-05-13 at 14 38 12

@bugwelle bugwelle force-pushed the tvshowscraper branch 3 times, most recently from 146b4d8 to 0504e8a Compare May 19, 2020 14:30
@bugwelle
Copy link
Collaborator Author

Single episode scraping now works as well.

@bugwelle
Copy link
Collaborator Author

Multiepisode scraping works as well. Just some minor issues left. :-)

@bugwelle bugwelle force-pushed the tvshowscraper branch 4 times, most recently from 23891a5 to 7707c13 Compare May 22, 2020 18:04
@bugwelle
Copy link
Collaborator Author

Episode scraping through TV show works now as well.
What's left is proper show/episode merging as well as images (that will be the most difficult part I guess).

@bugwelle
Copy link
Collaborator Author

New TV scraper settings:

Screen Shot 2020-05-30 at 18 55 07

@ticao2
Copy link

ticao2 commented Dec 1, 2020

I was having space problems on the HD. So I'm just answering now. I backed up and solved it.
So, let's go in parts, as Jack The Ripper would say.

I installed in Portable mode.
I configured options as I always do: Brazilian Portuguese language.
In advancedsettings.xml file Debug True option
I have the Dr. House series to test. All Seasons and Episodes.
I put in the folder for scraper only seasons 1 and 2.

When I opened the Scraper interface it looked like this:
Scraper = TMDb TV
Language = English-US> Different from what I chose before.
Found 2 results
In the dropdown, 5 options:

  1. Update TV Show only
  2. Update TV Show and new Episodes
  3. Update TV Show and all Episodes
  4. Update new Episodes
  5. Update all Episodes

I haven't changed any selection of Details.
I didn't do the Scraper.
Close MediaElch and Copy Debug Log File
Wrong MediaElch_New_2.6.7_Debug_2020-12-01-17-44.log
Right MediaElch_New_2.6.7_Debug_2020-12-01-18-59.log
I fumbled with the log file. If necessary I can redo everything again.
Create another portable installation and do everything from the beginning.

Would it be necessary to test TMDb and also TVDB and IMDB?
I assume it would be interesting to establish an order.
With seasons 1 and 2 Complete:
1 Open MediaElch
Update TV Show only
Close MediaElch and Copy Debug Log File
2 Open MediaElch
Update TV Show and all Episodes
Close MediaElch and Copy Debug Log File

Add season 3 Complete
3 Open MediaElch
Update TV Show and new Episodes
Close MediaElch and Copy Debug Log File

Add season 4 Partially (10 Episodes)
4 Open MediaElch
Update new Episodes
Close MediaElch and Copy Debug Log File

Am I crazy? Waiting instructions

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 2, 2020

Thanks! To answer some questions:

Scraper = TMDb TV
Language = English-US> Different from what I chose before.

That happened because TMDb is a new scraper. The language should be saved for each scraper. It is a different one than the movie TMDb scraper. ;-)

The search results for TMDb seems correct: https://www.themoviedb.org/search/tv?query=Dr%20%20House


Would it be necessary to test TMDb and also TVDB and IMDB?

That would be great. 👍

I'm only interested in the Debug log if MediaElch should crash.
Otherwise just play around and have a look if MediaElch behaves as you would think. :-)

@txtsd
Copy link

txtsd commented Dec 2, 2020

What settings do I need to change to search with TMDb?
My TV Scraper settings still shows TVDB:
2020-12-02-193354_604x509_scrot
and the only options are TVDB and IMDB.

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 2, 2020

Hi,

are you sure that you're on the git branch tvshowscraper of my repository? :-)

Because that settings page now looks like this:

Screen Shot 2020-12-02 at 15 11 51

Screen Shot 2020-12-02 at 15 12 18

Regards,
Andre

@txtsd
Copy link

txtsd commented Dec 2, 2020

Woops. Looks like I built master instead. Built the correct branch this time.

@txtsd
Copy link

txtsd commented Dec 2, 2020

These settings don't stick after changing them. They revert to TMDb.:
2020-12-02-204553_763x651_scrot

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 2, 2020

@txtsd Thanks! Turns out the settings were saved but when displaying them I had introduced a bug... It's now fixed. 😄

@txtsd
Copy link

txtsd commented Dec 2, 2020

I should've mentioned explicitly that it happens in the next tab -Episode Show Details- too.

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 2, 2020

That should be fixed as well. Or did I forget to commit all changes?

@txtsd
Copy link

txtsd commented Dec 2, 2020

The first tab is sticking with the last commit, but the second tab isn't.

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 2, 2020

Strange. But thanks. I had tested it :/
Will look at it tomorrow.

Thanks for reporting!

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 3, 2020

Yep, forgot to commit Settings.cpp. There was a typo in storing the settings. The code needs to be refactored in the future. Copy&Pasting (or rather re-typing) strings everywhere is error-prone...

@txtsd
Copy link

txtsd commented Dec 3, 2020

Ok it sticks now!

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 4, 2020

Any major issues? :-)

If not then I'll merge it and will release a new nightly. Then translators can translate all new strings and I'll notify users on the Kodi-forum that they can test it. :-)

@txtsd
Copy link

txtsd commented Dec 4, 2020

I've scraped with all the scrapers, with all the options in the dropdown. No crashes yet so we should be good to go!

@ticao2
Copy link

ticao2 commented Dec 5, 2020

Sorry I'm late.

MediaElch NEW

I will change some translations for pt-BR.
Some phrases or words are not in Locale. Is there another way for translation?
The new tab, Scrapers TV, is only for defining language. It does not define the standard font, the chosen one, the default. Am I right?
TMDb language list - must be the same for Movies, Concerts and Series.
IMDB has no language option? Just English?

Icon Spacing - I've complained about this before. :-)
Next to the title of the item, in the left column, we have an arrow to open and close the list of Seasons and Episodes.

  • Right Arrow (Closed List) - Invisible
  • Down Arrow (Open List) - Almost Invisible
    Update Dropdown options - move up? next to Language? I'm not really sure.

TMDb OK (Season 1 and 2)
TVDB OK (Season 3)
IMDB - very slow (Season 4) English only.

In Episode > Info tab we have fields for TVDB ID and IMDB ID.
Scraper with:
TMDb = does not add either.
TVDB = adds both
IMDB = only IMDB ID

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 5, 2020

Some phrases or words are not in Locale. Is there another way for translation?

When the PR is merged, I'll update the strings on transifex. Then you'll be able to translate them. :-)
I'll do so for German then.

The new tab, Scrapers TV, is only for defining language. It does not define the standard font, the chosen one, the default. Am I right?

In the settings dialog, the only option for all scrapers is the language. I don't get the "standard font". What do you refer to? :-)

TMDb language list - must be the same for Movies, Concerts and Series.

Should be. But I've looked the their API for the TV scraper more intensive. If the movie/music scraper does not list as many languages, then I have to update them as well.

IMDB has no language option? Just English?

Correct. To be more precise, I need to update it to "no language available". IMDb has no HTTP header/API option/... to set the language. It is based on the user's IP address... Very annoying, especially for testing the scraper...

Icon Spacing - I've complained about this before. :-)

Yeah... :-/


Sorry, I'm in a hurry. I'll answer the rest of your points tomorrow. :-)

Thank you all for testing this PR!

@ticao2
Copy link

ticao2 commented Dec 6, 2020

In the settings dialog, the only option for all scrapers is the language. I don't get the "standard font". What do you refer to? :-)

Google translate issues: Font = Source
When I reopened the ME to test the IMDB I realized that the scraper selected was TMDb and not the last one that was TVDB.
I suppose that users usually choose a data source they are going to use (IMDB, TVDB or TMDb) and always want to use it.
So, either stay in the "memory" of Dropdown or somehow select the preferred one, the "standard font", somewhere.
It doesn't bother me because I prefer TMDb. :-)

TMDb language list - must be the same for Movies, Concerts and Series.

Should be. But I've looked the their API for the TV scraper more intensive. If the movie/music scraper does not list as many languages, then I have to update them as well.

I don't know how many language-COUNTRY are on the list for TMDb here on ME.
But...

TMDb primary_translations

https://developers.themoviedb.org/3/configuration/get-primary-translations
Valid for everything: Movies, TV Shows, Concerts (which no longer exists separately) and Profiles of people.
There are 68 on the list but I believe there is one that is not on the list.
hr-HR - hr Croatian - HR Croatia
So there are 69.

And there it is written:

Get a list of the officially supported translations on TMDb.
While it's technically possible to add a translation in any one of the languages (186) we have added to TMDb (we don't restrict content), the ones listed in this method are the ones we also support for localizing the website with which means they are what we refer to as the "primary" translations.

https://api.themoviedb.org/3/configuration/primary_translations?api_key=THE_KEY
TMDb primary_translations.txt

Sorry, I'm an in hurry. I'll answer the rest of your points tomorrow. :-)

Do not drink too much :-)

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 6, 2020

I suppose that users usually choose a data source they are going to use (IMDB, TVDB or TMDb) and always want to use it.

True. I've added this functionality so that the last used scraper is auto-selected. :-)

TMDb primary_translations

I'm currently listing 68 languages. I've added hr-HR. :-)
I currently do not load the languages from their site. That may come in a future version, though.

Do not drink too much :-)

Didn't drink anything. 😄
Just a lot of university stuff.


From your previous post:

IMDB - very slow (Season 4)

True.. I'm currently loading each episode one after the other. And their website has always been very slow...

In Episode > Info tab we have fields for TVDB ID and IMDB ID.
Scraper with:
TMDb = does not add either.
TVDB = adds both
IMDB = only IMDB ID

Episodes don't have a TMDb ID. But the TMDb episode loader should load all other IDs if and only if you scrape each episode on its own. If you use "update all new episodes" that information is not available, yet.
That's a limitation on their side (in the API). I have to load each episode in a separate request which is annoying. I can't use a "batch request". I still have to update that...

@bugwelle bugwelle merged commit c40bda5 into Komet:master Dec 6, 2020
New Scraper Interfaces automation moved this from In progress to Done Dec 6, 2020
@bugwelle bugwelle deleted the tvshowscraper branch December 6, 2020 16:53
@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 6, 2020

I've merged this PR. But I have to manually provide new Nightlies because TravisCI does not work for free projects like before anymore.

If you find any other issues, please open a normal GitHub issue. :-)

@ticao2
Copy link

ticao2 commented Dec 7, 2020

That's a limitation on their side (in the API). I have to load each episode in a separate request which is annoying. I can't use a "batch request". I still have to update that...

Perhaps an API Request for TvDB, specifically to obtain only those two IDs, TvDB and IMDB.
I have no idea if it's simpler or more complex.

@bugwelle
Copy link
Collaborator Author

bugwelle commented Dec 7, 2020

TvDB won't work in the future (starting Jan 2021). :-/

Will work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: TV shows TV shows, series, episodes priority: high
Projects
Development

Successfully merging this pull request may close these issues.

None yet

3 participants