ebk

ebk is a lightweight and versatile tool for managing eBook metadata. It provides a rich Typer-based CLI (with colorized output courtesy of Rich), supports import/export of libraries from multiple sources (Calibre, raw ebooks, ZIP archives), enables advanced set-theoretic merges, and offers an interactive Streamlit web dashboard.

Note: We have future plans to integrate Large Language Model (LLM) features for automated tagging, summarization, and metadata generation—stay tuned!

Features

Typer + Rich CLI: A colorized, easy-to-use, and extensible command-line interface.
Multiple Import Paths:
- Calibre libraries → JSON-based ebk library
- Raw eBook folders → Basic metadata inference (cover extraction, PDF metadata)
- Existing ebk libraries in .zip format
Advanced Metadata:
- Set-theoretic merges (union, intersect, diff, symdiff)
- Unique entry identification (hash-based)
- Automatic cover image extraction
Flexible Exports:
- Export to ZIP
- Hugo-compatible Markdown for static site integration
Streamlit Dashboard:
- Interactive web interface for browsing, filtering, and managing your eBook library
- Search by title, author, subjects, language, etc.
- Download eBooks from the dashboard
Regex & JMESPath Searching: Perform advanced queries on your metadata (CLI + Streamlit).
(Planned) LLM Extensions: Automatic summarization, tagging, or classification using large language models.

Installation

Clone the Repository

git clone https://github.com/queelius/ebk.git
cd ebk

(Optional) Create a Virtual Environment

Using venv:

python -m venv venv
source venv/bin/activate  # (On Windows: venv\Scripts\activate)

Using conda:

conda create -n ebk python=3.8
conda activate ebk

Install Dependencies & ebk

pip install -r requirements.txt
pip install .

Note: You need Python 3.8+.

Configuration

The primary configuration file should be placed in ~/.ebkrc. Here’s a sample configuration:

[llm]
endpoint = <your_llm_endpoint>
api_key = <your_llm_api_key>
model = <your_llm_model>

[streamlit]
port = 8501
host = "0.0.0.0" # this allows external access

[export]
hugo = "/path/to/hugo_site"

CLI Usage

ebk uses Typer under the hood, providing subcommands for imports, exports, merges, searches, listing, updates, etc. The CLI also leverages Rich for colorized/logging output.

General CLI Structure

ebk --help
ebk <command> --help     # see specific usage, options

The primary commands include:

import-zip
import-calibre
import-ebooks
export
merge
search
stats
list
add
remove
remove-index
update-index
update-id
dash
…and more!

Importing Libraries

Import from Zip (`import-zip`)

Load an existing ebk library archive (which has a metadata.json plus eBook/cover files) into a folder:

ebk import-zip /path/to/ebk_library.zip --output-dir /path/to/output

If --output-dir is omitted, the default will be derived from the zip filename.
This unpacks the ZIP while retaining the metadata.json structure.

Import Calibre Library (`import-calibre`)

Convert your Calibre library into an ebk JSON library:

ebk import-calibre /path/to/calibre/library --output-dir /path/to/output

Extracts metadata from metadata.opf files (if present) or from PDF/EPUB fallback.
Copies ebook files + covers into the output directory, producing a consolidated metadata.json.

Import Raw Ebooks (`import-ebooks`)

Import a folder of eBooks (PDF, EPUB, etc.) by inferring minimal metadata:

ebk import-ebooks /path/to/raw/ebooks --output-dir /path/to/output

Uses PyPDF2 for PDF metadata and attempts a best-effort cover extraction (first page → thumbnail).
Creates metadata.json and copies files + covers to /path/to/output.

Exporting Libraries

Available formats:

Hugo:
```
ebk export hugo /path/to/ebk_library /path/to/hugo_site
```
This writes Hugo-compatible Markdown files (and copies covers/ebooks) into your Hugo content + static folders.

Zip:

ebk export zip /path/to/ebk_library /path/to/export.zip

Creates a .zip archive containing the entire library.

Merging Libraries

Use set-theoretic operations to combine multiple ebk libraries:

ebk merge <operation> /path/to/merged_dir [libs...]

Where <operation> can be:

union: Combine all unique entries
intersect: Keep only entries common to all libraries
diff: Keep entries present in the first library but not others
symdiff: Entries in exactly one library (exclusive-or)

Example:

ebk merge union /path/to/merged_lib /path/to/lib1 /path/to/lib2

Searching

Regex Search

ebk search <regex> /path/to/ebk_library

By default, it searches the title field. You can specify additional fields:

ebk search "Python" /path/to/lib --regex-fields title creators

JMESPath Search

For more powerful, structured searches:

ebk search "[?language=='en']" /path/to/lib --jmespath

JMESPath expressions allow you to filter, project fields, etc. If you want to see these results as JSON:

ebk search "[?language=='en']" /path/to/lib --jmespath --json

Listing, Adding, Updating, and Removing Entries

List:
```
ebk list /path/to/lib
```
Prints all ebooks with indexes, clickable file links (via Rich).

Add:

ebk add /path/to/lib --title "My Book" --creators "Alice" --ebooks "/path/to/book.pdf"

or

ebk add /path/to/lib --json /path/to/new_entries.json

to bulk-add entries from a JSON file.

Update:

By index:

ebk update-index /path/to/lib 12 --title "New Title"

By unique ID:

ebk update-id /path/to/lib <unique_id> --cover /path/to/new_cover.jpg

Remove:

By regex in title, creators, or identifiers:

ebk remove /path/to/lib "SomeRegex" --apply-to title creators

By index:
```
ebk remove-index /path/to/lib 3 4 5
```
By unique ID:
```
ebk remove-id /path/to/lib <unique_id>
```

Stats:
```
ebk stats /path/to/lib --keywords python data "machine learning"
```
Returns aggregated statistics (common languages, top creators, subject frequency, etc.).

Launch Streamlit Dashboard

ebk dash --port 8501

By default, the dashboard runs at http://localhost:8501.

Streamlit Dashboard Usage

Prepare a ZIP Archive
From any ebk library folder (containing metadata.json), compress the entire folder into a .zip. Or use:
```
ebk export zip /path/to/lib /path/to/lib.zip
```
Upload it via the Streamlit interface (ebk dash).
Browse & Filter your library:
- Advanced filtering (author, subject, language, year, etc.).
- View cover images, descriptions, and download eBooks.
- JMESPath-based advanced search in the “Advanced Search” tab.
Enjoy a modern, interactive interface for eBook exploration.

Library Management Class (Python API)

For programmatic usage, ebk includes a simple LibraryManager class:

from ebk.manager import LibraryManager

manager = LibraryManager("metadata.json")

# List all books
all_books = manager.list_books()

# Add a book
manager.add_book({
    "Title": "Example Book",
    "Author": "Alice",
    "Tags": "fiction"
})

# Delete or update
manager.delete_book("Old Title")
manager.update_book("Example Book", {"Tags": "fiction, fantasy"})

LLM Integration

The ebk library may be queried using a natural language interface using the streamlit dashboard's chat interface or the command line. For the comamnd line interface, the llm subcommand is used:

ebk llm <ebklib> "What are the books about Python and machine learning published after 2020?"

The llm subcommand uses the ebk library to answer questions about the library using a large language model. The configuration file should contain the endpoint of the LLM server, the API key, and the model to use. Either an Ollama compatible endpoint or an OpenAI compatible endpoint can be used.

Contributing

Contributions are welcome! Here’s how to get involved:

Fork the Repo
Create a Branch for your feature or fix
Commit & Push your changes
Open a Pull Request describing the changes

We appreciate code contributions, bug reports, and doc improvements alike.

License

Distributed under the MIT License.

Known Issues & TODOs

Exporter Module:
- Switch from os.system to shutil for safer file operations
- Expand supported eBook formats & metadata fields
Merger Module:
- Resolve conflicts automatically or allow user-specified conflict resolution
- Performance optimization for large libraries
Consistent Entry Identification:
- Support multiple eBook files per entry seamlessly
- Improve hash-based deduplication for large files
LLM-Based Metadata (Planned):
- Summaries or tags automatically generated via language models
- Potential GPU/accelerator support for on-device inference

Stay Updated

GitHub: https://github.com/queelius/ebk
Website: https://metafunctor.com

Support

Issues: Open an Issue on GitHub
Contact: lex@metafunctor.com

Happy eBook managing! 📚✨

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
ebk		ebk
tests		tests
.gitignore		.gitignore
.src2mdignore		.src2mdignore
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yml		environment.yml
project.md		project.md
requirements.txt		requirements.txt
requirements2.txt		requirements2.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ebk

Table of Contents

Features

Installation

Configuration

CLI Usage

General CLI Structure

Importing Libraries

Import from Zip (`import-zip`)

Import Calibre Library (`import-calibre`)

Import Raw Ebooks (`import-ebooks`)

Exporting Libraries

Merging Libraries

Searching

Regex Search

JMESPath Search

Listing, Adding, Updating, and Removing Entries

Launch Streamlit Dashboard

Streamlit Dashboard Usage

Library Management Class (Python API)

LLM Integration

Contributing

License

Known Issues & TODOs

Stay Updated

Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

queelius/ebk

Folders and files

Latest commit

History

Repository files navigation

ebk

Table of Contents

Features

Installation

Configuration

CLI Usage

General CLI Structure

Importing Libraries

Import from Zip (import-zip)

Import Calibre Library (import-calibre)

Import Raw Ebooks (import-ebooks)

Exporting Libraries

Merging Libraries

Searching

Regex Search

JMESPath Search

Listing, Adding, Updating, and Removing Entries

Launch Streamlit Dashboard

Streamlit Dashboard Usage

Library Management Class (Python API)

LLM Integration

Contributing

License

Known Issues & TODOs

Stay Updated

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Import from Zip (`import-zip`)

Import Calibre Library (`import-calibre`)

Import Raw Ebooks (`import-ebooks`)

Packages