Skip to content

queelius/ebk

Repository files navigation

ebk

ebk Logo

ebk is a lightweight and versatile tool for managing eBook metadata. It provides a rich Typer-based CLI (with colorized output courtesy of Rich), supports import/export of libraries from multiple sources (Calibre, raw ebooks, ZIP archives), enables advanced set-theoretic merges, and offers an interactive Streamlit web dashboard.

Note: We have future plans to integrate Large Language Model (LLM) features for automated tagging, summarization, and metadata generation—stay tuned!


Table of Contents


Features

  • Typer + Rich CLI: A colorized, easy-to-use, and extensible command-line interface.
  • Multiple Import Paths:
    • Calibre libraries → JSON-based ebk library
    • Raw eBook folders → Basic metadata inference (cover extraction, PDF metadata)
    • Existing ebk libraries in .zip format
  • Advanced Metadata:
    • Set-theoretic merges (union, intersect, diff, symdiff)
    • Unique entry identification (hash-based)
    • Automatic cover image extraction
  • Flexible Exports:
    • Export to ZIP
    • Hugo-compatible Markdown for static site integration
  • Streamlit Dashboard:
    • Interactive web interface for browsing, filtering, and managing your eBook library
    • Search by title, author, subjects, language, etc.
    • Download eBooks from the dashboard
  • Regex & JMESPath Searching: Perform advanced queries on your metadata (CLI + Streamlit).
  • (Planned) LLM Extensions: Automatic summarization, tagging, or classification using large language models.

Installation

  1. Clone the Repository

    git clone https://github.com/queelius/ebk.git
    cd ebk
  2. (Optional) Create a Virtual Environment

    Using venv:

    python -m venv venv
    source venv/bin/activate  # (On Windows: venv\Scripts\activate)

    Using conda:

    conda create -n ebk python=3.8
    conda activate ebk
  3. Install Dependencies & ebk

    pip install -r requirements.txt
    pip install .

Note: You need Python 3.8+.


Configuration

The primary configuration file should be placed in ~/.ebkrc. Here’s a sample configuration:

[llm]
endpoint = <your_llm_endpoint>
api_key = <your_llm_api_key>
model = <your_llm_model>

[streamlit]
port = 8501
host = "0.0.0.0" # this allows external access

[export]
hugo = "/path/to/hugo_site"


CLI Usage

ebk uses Typer under the hood, providing subcommands for imports, exports, merges, searches, listing, updates, etc. The CLI also leverages Rich for colorized/logging output.

General CLI Structure

ebk --help
ebk <command> --help     # see specific usage, options

The primary commands include:

  • import-zip
  • import-calibre
  • import-ebooks
  • export
  • merge
  • search
  • stats
  • list
  • add
  • remove
  • remove-index
  • update-index
  • update-id
  • dash
  • …and more!

Importing Libraries

Import from Zip (import-zip)

Load an existing ebk library archive (which has a metadata.json plus eBook/cover files) into a folder:

ebk import-zip /path/to/ebk_library.zip --output-dir /path/to/output
  • If --output-dir is omitted, the default will be derived from the zip filename.
  • This unpacks the ZIP while retaining the metadata.json structure.

Import Calibre Library (import-calibre)

Convert your Calibre library into an ebk JSON library:

ebk import-calibre /path/to/calibre/library --output-dir /path/to/output
  • Extracts metadata from metadata.opf files (if present) or from PDF/EPUB fallback.
  • Copies ebook files + covers into the output directory, producing a consolidated metadata.json.

Import Raw Ebooks (import-ebooks)

Import a folder of eBooks (PDF, EPUB, etc.) by inferring minimal metadata:

ebk import-ebooks /path/to/raw/ebooks --output-dir /path/to/output
  • Uses PyPDF2 for PDF metadata and attempts a best-effort cover extraction (first page → thumbnail).
  • Creates metadata.json and copies files + covers to /path/to/output.

Exporting Libraries

Available formats:

  • Hugo:

    ebk export hugo /path/to/ebk_library /path/to/hugo_site

    This writes Hugo-compatible Markdown files (and copies covers/ebooks) into your Hugo content + static folders.

  • Zip:

    ebk export zip /path/to/ebk_library /path/to/export.zip

    Creates a .zip archive containing the entire library.


Merging Libraries

Use set-theoretic operations to combine multiple ebk libraries:

ebk merge <operation> /path/to/merged_dir [libs...]

Where <operation> can be:

  • union: Combine all unique entries
  • intersect: Keep only entries common to all libraries
  • diff: Keep entries present in the first library but not others
  • symdiff: Entries in exactly one library (exclusive-or)

Example:

ebk merge union /path/to/merged_lib /path/to/lib1 /path/to/lib2

Searching

Regex Search

ebk search <regex> /path/to/ebk_library

By default, it searches the title field. You can specify additional fields:

ebk search "Python" /path/to/lib --regex-fields title creators

JMESPath Search

For more powerful, structured searches:

ebk search "[?language=='en']" /path/to/lib --jmespath

JMESPath expressions allow you to filter, project fields, etc. If you want to see these results as JSON:

ebk search "[?language=='en']" /path/to/lib --jmespath --json

Listing, Adding, Updating, and Removing Entries

  • List:

    ebk list /path/to/lib

    Prints all ebooks with indexes, clickable file links (via Rich).

  • Add:

    ebk add /path/to/lib --title "My Book" --creators "Alice" --ebooks "/path/to/book.pdf"

    or

    ebk add /path/to/lib --json /path/to/new_entries.json

    to bulk-add entries from a JSON file.

  • Update:

    • By index:
      ebk update-index /path/to/lib 12 --title "New Title"
    • By unique ID:
      ebk update-id /path/to/lib <unique_id> --cover /path/to/new_cover.jpg
  • Remove:

    • By regex in title, creators, or identifiers:
      ebk remove /path/to/lib "SomeRegex" --apply-to title creators
    • By index:
      ebk remove-index /path/to/lib 3 4 5
    • By unique ID:
      ebk remove-id /path/to/lib <unique_id>
  • Stats:

    ebk stats /path/to/lib --keywords python data "machine learning"

    Returns aggregated statistics (common languages, top creators, subject frequency, etc.).


Launch Streamlit Dashboard

ebk dash --port 8501
  • By default, the dashboard runs at http://localhost:8501.

Streamlit Dashboard Usage

  1. Prepare a ZIP Archive
    From any ebk library folder (containing metadata.json), compress the entire folder into a .zip. Or use:

    ebk export zip /path/to/lib /path/to/lib.zip
  2. Upload it via the Streamlit interface (ebk dash).

  3. Browse & Filter your library:

    • Advanced filtering (author, subject, language, year, etc.).
    • View cover images, descriptions, and download eBooks.
    • JMESPath-based advanced search in the “Advanced Search” tab.
  4. Enjoy a modern, interactive interface for eBook exploration.


Library Management Class (Python API)

For programmatic usage, ebk includes a simple LibraryManager class:

from ebk.manager import LibraryManager

manager = LibraryManager("metadata.json")

# List all books
all_books = manager.list_books()

# Add a book
manager.add_book({
    "Title": "Example Book",
    "Author": "Alice",
    "Tags": "fiction"
})

# Delete or update
manager.delete_book("Old Title")
manager.update_book("Example Book", {"Tags": "fiction, fantasy"})

LLM Integration

The ebk library may be queried using a natural language interface using the streamlit dashboard's chat interface or the command line. For the comamnd line interface, the llm subcommand is used:

ebk llm <ebklib> "What are the books about Python and machine learning published after 2020?"

The llm subcommand uses the ebk library to answer questions about the library using a large language model. The configuration file should contain the endpoint of the LLM server, the API key, and the model to use. Either an Ollama compatible endpoint or an OpenAI compatible endpoint can be used.


Contributing

Contributions are welcome! Here’s how to get involved:

  1. Fork the Repo
  2. Create a Branch for your feature or fix
  3. Commit & Push your changes
  4. Open a Pull Request describing the changes

We appreciate code contributions, bug reports, and doc improvements alike.


License

Distributed under the MIT License.


Known Issues & TODOs

  1. Exporter Module:
    • Switch from os.system to shutil for safer file operations
    • Expand supported eBook formats & metadata fields
  2. Merger Module:
    • Resolve conflicts automatically or allow user-specified conflict resolution
    • Performance optimization for large libraries
  3. Consistent Entry Identification:
    • Support multiple eBook files per entry seamlessly
    • Improve hash-based deduplication for large files
  4. LLM-Based Metadata (Planned):
    • Summaries or tags automatically generated via language models
    • Potential GPU/accelerator support for on-device inference

Stay Updated


Support


Happy eBook managing! 📚✨

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages