Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMDB collections not working #1496

Closed
ryant523 opened this issue Jul 7, 2023 · 11 comments
Closed

IMDB collections not working #1496

ryant523 opened this issue Jul 7, 2023 · 11 comments
Assignees
Labels
bug Bug is with Plex Meta Manager status:not-yet-viewed I haven't reviewed the Feature or Bug yet

Comments

@ryant523
Copy link

ryant523 commented Jul 7, 2023

Version Number

1.19.0

What branch are you on?

master

Describe the Bug

The IMDB Chart collections are no longer being updated.

I'm not sure when this started, but I traced this to to receiving a 403 forbidden error when retrieving the IMDB resources, https://www.imdb.com/chart/moviemeter.

The User-Agent header is not being overwritten, which is causing IMDB to reject these requests.

modules/imdb.py

def _ids_from_chart(self, chart):
    if chart == "box_office":
        url = "chart/boxoffice"
    elif chart == "popular_movies":
        url = "chart/moviemeter"
    elif chart == "popular_shows":
        url = "chart/tvmeter"
    elif chart == "top_movies":
        url = "chart/top"
    elif chart == "top_shows":
        url = "chart/toptv"
    elif chart == "top_english":
        url = "chart/top-english-movies"
    elif chart == "top_indian":
        url = "india/top-rated-indian-movies"
    elif chart == "lowest_rated":
        url = "chart/bottom"
    else:
        raise Failed(f"IMDb Error: chart: {chart} not ")
    #return self.config.get_html(f"https://www.imdb.com/{url}").xpath("//div[@class='wlb_ribbon']/@data-tconst")
    # works when I comment the line above and add this 
    return self.config.get_html(f"https://www.imdb.com/{url}", headers=util.header()).xpath("//div[@class='wlb_ribbon']/@data-tconst")
[2023-07-07 12:37:31,918] [plex_meta_manager.py:697]  [INFO]     |================================= Running IMDb Popular Collection ==================================|
[2023-07-07 12:37:31,919] [plex_meta_manager.py:711]  [INFO]     |                                                                                                    |
[2023-07-07 12:37:31,919] [plex_meta_manager.py:712]  [INFO]     | Sync Mode: sync                                                                                    |
[2023-07-07 12:37:31,919] [plex_meta_manager.py:715]  [DEBUG]    |                                                                                                    |
[2023-07-07 12:37:31,919] [plex_meta_manager.py:716]  [DEBUG]    | Builder: imdb_chart: popular_movies                                                                |
[2023-07-07 12:37:31,919] [plex_meta_manager.py:717]  [INFO]     |                                                                                                    |
[2023-07-07 12:37:31,920] [imdb.py:231]               [INFO]     | Processing IMDb Chart: Most Popular Movies                                                         |
[2023-07-07 12:37:32,179] [builder.py:2871]           [INFO]     |                                                                                                    |
[2023-07-07 12:37:32,179] [builder.py:2872]           [INFO]     |=========================== Updating Details of IMDb Popular Collection ============================|
[2023-07-07 12:37:32,180] [builder.py:2873]           [INFO]     |                                                                                                    |
[2023-07-07 12:37:32,180] [builder.py:2927]           [INFO]     | !020_IMDb Popular                                                                                  |
[2023-07-07 12:37:32,197] [builder.py:2947]           [INFO]     | Collection Metadata Edits                                                                          |
[2023-07-07 12:37:32,197] [util.py:208]               [DEBUG]    | 1 poster found:                                                                                    |
[2023-07-07 12:37:32,197] [util.py:210]               [DEBUG]    | Method: url_poster Poster: https://raw.githubusercontent.com/meisnate12/Plex-Meta-Manager-Images/master/chart/IMDb%20Popular.jpg |
[2023-07-07 12:37:32,198] [library.py:193]            [INFO]     | Detail: poster update not needed                                                                   |
[2023-07-07 12:37:32,760] [builder.py:3061]           [INFO]     |                                                                                                    |
[2023-07-07 12:37:32,760] [builder.py:3062]           [INFO]     |================================= Sorting IMDb Popular Collection ==================================|
[2023-07-07 12:37:32,760] [builder.py:3063]           [INFO]     |                                                                                                    |
[2023-07-07 12:37:32,761] [builder.py:3097]           [INFO]     | No Sorting Required                                                                                |
[2023-07-07 12:37:32,761] [plex_meta_manager.py:839]  [INFO]     |                                                                                                    |
[2023-07-07 12:37:32,761] [plex_meta_manager.py:840]  [INFO]     |====================================================================================================|
[2023-07-07 12:37:32,761] [plex_meta_manager.py:840]  [INFO]     |                                  Finished IMDb Popular Collection                                  |
[2023-07-07 12:37:32,761] [plex_meta_manager.py:840]  [INFO]     |                                    Collection Run Time: 0:00:01                                    |
[2023-07-07 12:37:32,762] [plex_meta_manager.py:840]  [INFO]     |====================================================================================================|

Relevant Collection/Overlay/Playlist Definition

No response

Logs

No response

@ryant523 ryant523 added bug Bug is with Plex Meta Manager status:not-yet-viewed I haven't reviewed the Feature or Bug yet labels Jul 7, 2023
@chazlarson
Copy link
Contributor

There was apparently a recent change on the IMDB side such that PMM's requests now fail or return empty responses. It is yet to be fully characterized, and there's no ETA for a fix.

@ryant523
Copy link
Author

ryant523 commented Jul 7, 2023

I'm sure this wasn't clear in my initial post, but I was able to fix this by passing the headers.util.header() to the get_html method. This is included in most of the get_html calls, but is missing in a few places. Without this we are sending the user agent as python-requests/2.28.2 and with this we are sending Mozilla/5.0 Firefox/102.0.

The actual solution probably should pass the language to the headers function.

@chazlarson
Copy link
Contributor

Oh, thanks; I've been answering this same question reflexively over and over recently so didn't read this as closely as I obviously should have.

@Glenn332
Copy link

Glenn332 commented Jul 9, 2023

#1499 this should fix it <3

@jonathan-bloodworth
Copy link

Getting a similar error too, I've switched over to the nightly branch but the imdb fix recently committed there hasn't done the trick. I've got two collections that just check the imdb popular lists and neither have updated or been sorted correctly for the last few weeks.

@Glenn332
Copy link

Glenn332 commented Jul 14, 2023

Getting a similar error too, I've switched over to the nightly branch but the imdb fix recently committed there hasn't done the trick. I've got two collections that just check the imdb popular lists and neither have updated or been sorted correctly for the last few weeks.

Can you share the specific urls to these imdb lists?

@jonathan-bloodworth
Copy link

Getting a similar error too, I've switched over to the nightly branch but the imdb fix recently committed there hasn't done the trick. I've got two collections that just check the imdb popular lists and neither have updated or been sorted correctly for the last few weeks.

Can you share the specific urls to these imdb lists?

Just using the default templates by PMM, here's the config I'm using:

libraries:
  Movies:
    metadata_path:
    - pmm: imdb
      template_variables:
        item_radarr_tag_popular: popular
        name_popular: Popular
        order_popular: 0
        summary_popular: ''
        use_popular: true
        use_top: false
        use_lowest: false
        visible_home_popular: true
        visible_library: true
        visible_shared_popular: true
  TV:
    metadata_path:
    - pmm: imdb
      template_variables:
        name_popular: Popular
        order_popular: 0
        summary_popular: ''
        use_popular: true
        use_top: false
        use_lowest: false
        visible_home_popular: true
        visible_library: true
        visible_shared_popular: true

@Glenn332
Copy link

Getting a similar error too, I've switched over to the nightly branch but the imdb fix recently committed there hasn't done the trick. I've got two collections that just check the imdb popular lists and neither have updated or been sorted correctly for the last few weeks.

Can you share the specific urls to these imdb lists?

Just using the default templates by PMM, here's the config I'm using:

These use IMDb charts, they (IMDb) are currently changing the html structure of charts. This might've broken the IMDb scraper.

@ryant523
Copy link
Author

I just noticed that this was no longer working too. I was able to make a quick fix to get this to work. I tested only the popular and top 250 urls:

https://www.imdb.com/chart/moviemeter
https://www.imdb.com/chart/top

It's been awhile since I've used python, but I got this to work by changing the last line in _ids_from_chart in imdb.py. This is from the nightly branch.

    def _ids_from_chart(self, chart, language):
        if chart == "box_office":
            url = "chart/boxoffice"
        elif chart == "popular_movies":
            url = "chart/moviemeter"
        elif chart == "popular_shows":
            url = "chart/tvmeter"
        elif chart == "top_movies":
            url = "chart/top"
        elif chart == "top_shows":
            url = "chart/toptv"
        elif chart == "top_english":
            url = "chart/top-english-movies"
        elif chart == "top_indian":
            url = "india/top-rated-indian-movies"
        elif chart == "lowest_rated":
            url = "chart/bottom"
        else:
            raise Failed(f"IMDb Error: chart: {chart} not ")
        #return self._request(f"https://www.imdb.com/{url}", language=language, xpath="//div[@class='wlb_ribbon']/@data-tconst")
        ids = self._request(f"https://www.imdb.com/{url}", language=language, xpath="//a[@class='ipc-lockup-overlay ipc-focusable']/@href")
        return [i.split('/')[2] for i in ids]

This at least works with the current HTML returned from IMDb.

@JohnFawkes
Copy link
Contributor

yea imdb just pushed out more changes. the current nightly69 should fix it again

@JohnFawkes
Copy link
Contributor

i imagine this will be a cat and mouse game for a while, until imdb is done doing what they're doing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug is with Plex Meta Manager status:not-yet-viewed I haven't reviewed the Feature or Bug yet
Projects
None yet
Development

No branches or pull requests

6 participants