Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list index out of range #10

Closed
cognitivetech opened this issue Apr 20, 2024 · 1 comment
Closed

list index out of range #10

cognitivetech opened this issue Apr 20, 2024 · 1 comment

Comments

@cognitivetech
Copy link

cognitivetech commented Apr 20, 2024

Working fine with other books... must be something to do with its structure

$ python3 epubsplit.py --split-by-section books/Charlottes-Web_E-B-White.epub 
Traceback (most recent call last):
  File "/home/asimo/jamstack/summarization/workspace/epubsplit.py", line 1354, in <module>
    main(sys.argv[1:])
  File "/home/asimo/jamstack/summarization/workspace/epubsplit.py", line 1285, in main
    lines = epubO.get_split_lines()
  File "/home/asimo/jamstack/summarization/workspace/epubsplit.py", line 734, in get_split_lines
    if href in self.get_toc_map():
  File "/home/asimo/jamstack/summarization/workspace/epubsplit.py", line 665, in get_toc_map
    textnode = navpoint.getElementsByTagName("text")[0].firstChild
IndexError: list index out of range

Charlottes-Web_E-B-White.epub.zip

(edit: as i explore this more, I wonder if it might be easier to unpack the epub archive and use the html files directly)

JimmXinu added a commit that referenced this issue Apr 20, 2024
@JimmXinu
Copy link
Owner

This version of epubsplit.py addresses this issue.

Arguably, I shouldn't fix this. The epub is technically in violation of the standard.

I'm not likely to do a full release of the Calibre plugin for this unless it comes up again.

Details

NCX DTD says a <navPoint> tag requires one or more <navLabel> tags inside it, which this epub doesn't have.

OTOH, epubsplit.py was looking for a <text> tag inside an assumed <navLabel> tag and the <text> tag isn't required if there's an <audio> tag.

DTD

<!-- Navigation Point - contains description(s) of target, as well as a pointer to 
entire content of target.
Hierarchy is represented by nesting navPoints.  "class" attribute describes the kind 
of structural unit this object represents (e.g., "chapter", "section").
-->
<!ELEMENT navPoint (navLabel+, content, navPoint*)>
<!-- Revised, 3/29/2004:  Removed onFocus/onBlur -->
<!-- Revised, 3/29/2004:  Removed value -->
<!-- Revised, 3/31/2004:  Removed pageRef -->
<!-- Revised, 4/5/2004:  Added playOrder -->
<!ATTLIST navPoint
  id    ID      #REQUIRED
  class    CDATA    #IMPLIED
  playOrder CDATA       #REQUIRED
>

<!-- Navigation Label - Contains a description of a given <navMap>, <navPoint>, 
<navList>, or <navTarget> in various media for presentation to the user. Can be 
repeated so descriptions can be provided in multiple languages. -->
<!ELEMENT navLabel (((text, audio?) | audio), img?)>
<!ATTLIST navLabel
  %i18n; 
>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants