Python script to quickly make Plex-Meta-Manager poster entries from ThePosterDatabase (TPDb) sets.
Because TPDb doesn't permit automated scraping, this tool reads HTML files.
This tool will read handle collections, movies, shows, and season posters all in one file and output YAML that can be used in Plex-Meta-Manager metadata files. An example of part of the output for TheDoctor30's Marvel Television Set is shown below:
# --------------------------------------------------------------------------------
# collections
Marvel Television:
url_poster: https://theposterdb.com/api/assets/19724
# --------------------------------------------------------------------------------
# shows
Marvel's Daredevil:
url_poster: https://theposterdb.com/api/assets/19725
seasons:
1: {url_poster: https://theposterdb.com/api/assets/19726}
2: {url_poster: https://theposterdb.com/api/assets/19727}
3: {url_poster: https://theposterdb.com/api/assets/19728}
Marvel's Jessica Jones:
url_poster: https://theposterdb.com/api/assets/19729
seasons:
1: {url_poster: https://theposterdb.com/api/assets/19730}
2: {url_poster: https://theposterdb.com/api/assets/19731}
3: {url_poster: https://theposterdb.com/api/assets/19732}
Marvel's Luke Cage:
url_poster: https://theposterdb.com/api/assets/19733
seasons:
1: {url_poster: https://theposterdb.com/api/assets/19734}
2: {url_poster: https://theposterdb.com/api/assets/19735}
# etc..
This is a Python command-line tool. All arguments are shown with --help
:
$ poetry run python main.py -h
usage: main.py [-h] [-p] [-q] HTML_FILE
TPDb Collection Maker
positional arguments:
HTML_FILE file with TPDb Collection page HTML to scrape
optional arguments:
-h, --help show this help message and exit
-p, --primary-only only parse the primary set (ignore any Additional Sets)
-q, --always-quote put all titles in quotes ("")
NOTE: If copying these commands, do not copy the
$
- that is just to show this is a command.
-
Install
poetry
- see here with thepipx
method. -
Download this tool:
$ git clone https://github.com/CollinHeist/TPDbCollectionMaker/
- Install the Python dependencies:
cd TPDbCollectionMaker
poetry install
- Run the script (see Arguments for details).
poetry run python main.py -h
Because TPDb doesn't permit automated scraping, this tool reads HTML files. To
get the HTML of a set, right-click the set page and select Inspect
:
This should launch your browser's HTML inspector. It should look something like:
Go to the top-most HTML element (if HTML is selected, hold the left-arrow key to collapse all the HTML). The top-most HTML should look like:
<!DOCTYPE html>
<html class="h-100" lang="en"><head>
...
Right-click the <html class="h-100" lang="en"><head>
element, go to Copy
>
Inner HTML
. Your clipboard now has the complete HTML of the set page; paste
this into some file alongside the main.py
file of this project. This file will
be the input to the script (see below).
Input HTML file to parse.
Only parse the primary content on the given HTML page, ignoring any Additional Sets. If unspecified, then the entire page is parsed.
Quote all titles in the output. If unspecified, only titles with colons are quoted.
Below is an example of this argument:
$ poetry run python main.py in.html --always-quote
"Iron Man (2008)":
url_poster: https://theposterdb.com/api/assets/9773
"The Incredible Hulk (2008)":
url_poster: https://theposterdb.com/api/assets/9775
"Iron Man 2 (2010)":
url_poster: https://theposterdb.com/api/assets/9776