# Invader ZIM (DVD)

<https://www.thetvdb.com/series/invader-zim>

![Invader ZIM Clearart](https://artworks.thetvdb.com/banners/v4/series/75545/clearart/611bab4a8f233_t.png)


These notes preserve my memories of auto-generating [Kodi](https://kodi.tv/) [[GitHub](https://github.com/xbmc)] `episodedetails` XML files in the `*.nfo` format for _Invader Zim_, volumes 1 & 2 (a four-disc DVD set, 2004 [Viacom International](https://en.wikipedia.org/wiki/ViacomCBS)), a masterpiece of [Jhonen Vasquez](https://en.wikipedia.org/wiki/Jhonen_Vasquez), [Richard Steven Horvitz](https://en.wikipedia.org/wiki/Richard_Steven_Horvitz) and other people.

This `*.nfo` format is kind of covered in the following documents:

- [NFO files](https://kodi.wiki/view/NFO_files)
- [NFO files/TV shows](https://kodi.wiki/view/NFO_files/TV_shows)
- [NFO files/Episodes](https://kodi.wiki/view/NFO_files/Episodes)

Today my research is telling me to be satisfied with this format:

```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<episodedetails>
    <title></title>
    <plot></plot>
</episodedetails>
```

The top-level location of the `episodedetails` data is saved in the variable below:


In [1]:
dvd_episode_location = 'https://www.thetvdb.com/series/invader-zim/seasons/dvd/1'


Let’s use this location to save the HTML page to the `html` variable:


In [2]:

import requests

response = requests.get(dvd_episode_location)
html = response.content.decode()

We expect to find one `table` in this document:


In [3]:
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')

table = soup.find('table')

We then invoke Jupyter `IPython` magic to display this table:


In [4]:
from IPython.core.display import display, HTML

display(HTML(table.decode()))

Unnamed: 0,Name,First Aired,Runtime,Image,Certified
S01E01,The Nightmare Begins,"March 30, 2001 Nickelodeon",30,,
S01E02,Bestest Friend,"April 13, 2001 Nickelodeon",30,,
S01E02,NanoZIM,"April 13, 2001 Nickelodeon",30,,
S01E03,Parent Teacher Night,"April 6, 2001 Nickelodeon",30,,
S01E03,Walk of Doom,"April 6, 2001 Nickelodeon",30,,
S01E04,Germs,"April 20, 2001 Nickelodeon",30,,
S01E04,Dark Harvest,"April 20, 2001 Nickelodeon",30,,
S01E05,The Wettening,"April 27, 2001 Nickelodeon",30,,
S01E05,Attack of the Saucer Morons,"April 27, 2001 Nickelodeon",30,,
S01E06,Career Day,"May 4, 2001 Nickelodeon",30,,


## the episode data and media file rendering

The episode data and media file rendering present the following challenges:

**One episode can contain two shows.** The table is listing duplicate episodes because one episode can span two shows (e.g. S01E15 has shows “Future Dib” and “Mysterious Mysteries”).

**Opening credits do not appear for the second show.** Using something like [FFmpeg](https://github.com/FFmpeg/) to split the DVD files into separate episodes will not work here because opening credits do not appear for the _second_ show. The intention of the DVD authors were to keep two shows under one episode.

There are two known ways to meet these challenges:

1. add multi-part episodes by declaring multiple `<episodedetails>` XML blocks in succession in the same `*.nfo` file [📖 [docs](https://kodi.wiki/view/NFO_files/Episodes)] (which makes invalid `standalone` XML documents—not a concern for Kodi) like the following:

```xml
<episodedetails>
    <title>[first show title]</title>
    <plot>[plot for first show]</plot>
</episodedetails>
<episodedetails>
    <title>[second show title]</title>
    <plot>[plot for second show]</plot>
</episodedetails>
```

2. make a valid `standalone` XML document that will not look great in Kodi like the following:

```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<episodedetails>
    <title>Episode [n]</title>
    <plot>[first show title]: [plot for first show] // [second show title]: [plot for second show]</plot>
</episodedetails>
```

Let’s make the invalid XML documents by looping through the `table` rows with the following plan:

1. get `episode_data` without plot information
2. test one episode page to discover how to get plot information
3. loop through `episode_data` to set plot information
4. use `episode_data` to generate `*.nfo` files


## `episode_data` without plot information


In [5]:
rows = table.tbody.find_all('tr')

def getTitle(cell):
    a = cell.find('a')
    return str(a.string).strip()

def getPlotUri(cell):
    a = cell.find('a')
    return a['href']

episode_data = []

for row in rows:
    cells = row.find_all('td')

    episode_data.append({
        'episode' : str(cells[0].string).strip(),
        'title': getTitle(cells[1]),
        'plotUri': getPlotUri(cells[1]),
        'plot': ''
    })

episode_data

[{'episode': 'S01E01',
  'title': 'The Nightmare Begins',
  'plotUri': '/series/invader-zim/episodes/172884',
  'plot': ''},
 {'episode': 'S01E02',
  'title': 'Bestest Friend',
  'plotUri': '/series/invader-zim/episodes/172887',
  'plot': ''},
 {'episode': 'S01E02',
  'title': 'NanoZIM',
  'plotUri': '/series/invader-zim/episodes/172888',
  'plot': ''},
 {'episode': 'S01E03',
  'title': 'Parent Teacher Night',
  'plotUri': '/series/invader-zim/episodes/172885',
  'plot': ''},
 {'episode': 'S01E03',
  'title': 'Walk of Doom',
  'plotUri': '/series/invader-zim/episodes/172886',
  'plot': ''},
 {'episode': 'S01E04',
  'title': 'Germs',
  'plotUri': '/series/invader-zim/episodes/172889',
  'plot': ''},
 {'episode': 'S01E04',
  'title': 'Dark Harvest',
  'plotUri': '/series/invader-zim/episodes/172890',
  'plot': ''},
 {'episode': 'S01E05',
  'title': 'The Wettening',
  'plotUri': '/series/invader-zim/episodes/172892',
  'plot': ''},
 {'episode': 'S01E05',
  'title': 'Attack of the Saucer M

## testing one episode page to discover how to get plot information

The `getPlot` function:


In [6]:
def getPlot(uriPath, title):
    response = requests.get(f'https://www.thetvdb.com{uriPath}')
    html = response.content.decode()

    soup = BeautifulSoup(html, 'html.parser')

    div = soup.find('div', { 'data-title' : title })

    return str(div.p.string).strip()

getPlot('/series/invader-zim/episodes/172884', 'The Nightmare Begins')

'The Irken race consists of beings that have desires of gaining universal domination.  Their leaders, the Almighty Tallest Red and Purple, have just begun assigning Invaders, the elite soldiers who will capture the universe’s most dangerous planets.  However, when Zim, an Irken soldier who was banished from the Irken Empire, returns and demands to be assigned a task as an Invader, the Tallest decide to send Zim to a “planet that nobody has ever heard of”: Earth.  Zim gets assigned a cheap robot that is malfunctioning and he begins the trek towards Earth, but little does he know that a human child on Earth named Dib is already aware that he is coming, and plans to prevent Zim from completing his goal.'

## looping through `episode_data` to set plot information


In [7]:
for i in episode_data:
    plot = getPlot(i['plotUri'], i['title'])
    i['plot'] = plot

episode_data

[{'episode': 'S01E01',
  'title': 'The Nightmare Begins',
  'plotUri': '/series/invader-zim/episodes/172884',
  'plot': 'The Irken race consists of beings that have desires of gaining universal domination.  Their leaders, the Almighty Tallest Red and Purple, have just begun assigning Invaders, the elite soldiers who will capture the universe’s most dangerous planets.  However, when Zim, an Irken soldier who was banished from the Irken Empire, returns and demands to be assigned a task as an Invader, the Tallest decide to send Zim to a “planet that nobody has ever heard of”: Earth.  Zim gets assigned a cheap robot that is malfunctioning and he begins the trek towards Earth, but little does he know that a human child on Earth named Dib is already aware that he is coming, and plans to prevent Zim from completing his goal.'},
 {'episode': 'S01E02',
  'title': 'Bestest Friend',
  'plotUri': '/series/invader-zim/episodes/172887',
  'plot': 'When Zim makes a huge scene over his inability to ea

## using `episode_data` to generate `*.nfo` files


In [8]:
import xml.etree.ElementTree as ET

def getXml(titleValue, plotValue):
    episodedetails = ET.Element('episodedetails')

    title = ET.SubElement(episodedetails, 'title')
    title.text = titleValue

    plot = ET.SubElement(episodedetails, 'plot')
    plot.text = plotValue

    return ET.tostring(episodedetails, 'unicode')

def writeXml(fileName, xml):
    fileName = f'./beautifulsoup-invader-zim/Invader ZIM (2004) {fileName}.nfo'

    with open(fileName, 'w', encoding='utf-8') as f:
        print(xml, file=f)


### anaomaly 1: the need to ‘flip’ the order of selected records

To my horror, some of the duplicate episodes need to be flipped as the HTML table is not in the order of the actual shows:


In [9]:
flipped_episodes = [
    'S01E05',
    'S01E07',
    'S01E09',
    'S01E12',
    'S01E15',
    'S01E16',
    'S01E18',
]

### anomaly 2: Kodi is not reading `*.nfo` files with only one `episodedetails` element

Not only is Kodi is _not_ reading `*.nfo` files with only one `episodedetails` element, the `*.nfo` files with two `episodedetails` elements are displayed in Kodi as expected: only the data in the first element is displayed. The assumption here is that Kodi is not designed to handle a TV show with a _mixture_ of single-element and multple-element `*.nfo` files; I assume the Kodi design prefers one format or the other—not a mix.

I can fix the first part of this issue by duplicating the `episodedetails` element for `*.nfo` files with only one `episodedetails` element which totally sucks but is considered a short-term patch until a real solution is found.

An additional function, `orderXml` represents the patch:


In [10]:
def orderXml(previous_xml, current_xml, flipXml = False):
    if flipXml: return f'{current_xml}\n{previous_xml}'
    else: return f'{previous_xml}\n{current_xml}'

In [12]:
episode_key = 'episode'
title_key = 'title'
plot_key = 'plot'

for i, item in enumerate(episode_data):

    current_episode = item[episode_key]
    current_xml = getXml(item[title_key], item[plot_key])

    if i > 0:
        previous_episode = episode_data[i - 1][episode_key]

        if current_episode == previous_episode:
            previous_xml = getXml(episode_data[i - 1][title_key], episode_data[i - 1][plot_key])

            writeXml(
                current_episode,
                orderXml(previous_xml, current_xml, flipXml=current_episode in flipped_episodes))
        else:
            writeXml(current_episode, orderXml(current_xml, current_xml))
    else:
        writeXml(current_episode, orderXml(current_xml, current_xml))


@[BryanWilhite](https://twitter.com/BryanWilhite)
