Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ebook import sometimes messes up chapter order. #264

Closed
simjanos-dev opened this issue May 16, 2024 · 1 comment
Closed

Ebook import sometimes messes up chapter order. #264

simjanos-dev opened this issue May 16, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@simjanos-dev
Copy link
Owner

simjanos-dev commented May 16, 2024

There should be an option to use the current default option, or order chapters by the element.

Someone sent me this code, I will look at it later.

def loadBook(file):
    # rp and rt tags are used in adding prononciation over words, we need to remove the content of the tags
    cleaner = lxml.html.clean.Cleaner(allow_tags=[''], remove_unknown_tags=False, kill_tags = ['rp','rt'], page_structure=False)
    content = ''
    book = epub.read_epub(file)
    items = list(book.get_items())
    spine_keys = {idref: ii for ii, (idref, _) in enumerate(book.spine)}
    sorted_items = sorted(items, key=lambda item: spine_keys.get(item.id, float('inf')))

    for item in sorted_items:
        if item.get_type() == ebooklib.ITEM_DOCUMENT:
            epubPage = cleaner.clean_html(item.get_content()).decode('utf-8')
            # needed to removed extra div created by cleaner...
            epubPage = lxml.html.fromstring(epubPage).text_content()
            content += epubPage
@simjanos-dev simjanos-dev added the bug Something isn't working label May 16, 2024
@simjanos-dev
Copy link
Owner Author

Added an option to import e-books based on spine metadata. It fixes this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant