ebk is a lightweight and versatile tool for managing eBook metadata. It provides a rich Typer-based CLI (with colorized output courtesy of Rich), supports import/export of libraries from multiple sources (Calibre, raw ebooks, ZIP archives), enables advanced set-theoretic merges, and offers an interactive Streamlit web dashboard.
Note: We have future plans to integrate Large Language Model (LLM) features for automated tagging, summarization, and metadata generation—stay tuned!
- Features
- Installation
- Configuration
- CLI Usage
- Streamlit Dashboard Usage
- Library Management Class (Python API)
- Future LLM Integration
- Contributing
- License
- Known Issues & TODOs
- Stay Updated
- Support
- Typer + Rich CLI: A colorized, easy-to-use, and extensible command-line interface.
- Multiple Import Paths:
- Calibre libraries → JSON-based ebk library
- Raw eBook folders → Basic metadata inference (cover extraction, PDF metadata)
- Existing ebk libraries in
.zip
format
- Advanced Metadata:
- Set-theoretic merges (union, intersect, diff, symdiff)
- Unique entry identification (hash-based)
- Automatic cover image extraction
- Flexible Exports:
- Export to ZIP
- Hugo-compatible Markdown for static site integration
- Streamlit Dashboard:
- Interactive web interface for browsing, filtering, and managing your eBook library
- Search by title, author, subjects, language, etc.
- Download eBooks from the dashboard
- Regex & JMESPath Searching: Perform advanced queries on your metadata (CLI + Streamlit).
- (Planned) LLM Extensions: Automatic summarization, tagging, or classification using large language models.
-
Clone the Repository
git clone https://github.com/queelius/ebk.git cd ebk
-
(Optional) Create a Virtual Environment
Using
venv
:python -m venv venv source venv/bin/activate # (On Windows: venv\Scripts\activate)
Using
conda
:conda create -n ebk python=3.8 conda activate ebk
-
Install Dependencies &
ebk
pip install -r requirements.txt pip install .
Note: You need Python 3.8+.
The primary configuration file should be placed in ~/.ebkrc
.
Here’s a sample configuration:
[llm]
endpoint = <your_llm_endpoint>
api_key = <your_llm_api_key>
model = <your_llm_model>
[streamlit]
port = 8501
host = "0.0.0.0" # this allows external access
[export]
hugo = "/path/to/hugo_site"
ebk uses Typer under the hood, providing subcommands for imports, exports, merges, searches, listing, updates, etc. The CLI also leverages Rich for colorized/logging output.
ebk --help
ebk <command> --help # see specific usage, options
The primary commands include:
import-zip
import-calibre
import-ebooks
export
merge
search
stats
list
add
remove
remove-index
update-index
update-id
dash
- …and more!
Load an existing ebk library archive (which has a metadata.json
plus eBook/cover files) into a folder:
ebk import-zip /path/to/ebk_library.zip --output-dir /path/to/output
- If
--output-dir
is omitted, the default will be derived from the zip filename. - This unpacks the ZIP while retaining the
metadata.json
structure.
Convert your Calibre library into an ebk JSON library:
ebk import-calibre /path/to/calibre/library --output-dir /path/to/output
- Extracts metadata from
metadata.opf
files (if present) or from PDF/EPUB fallback. - Copies ebook files + covers into the output directory, producing a consolidated
metadata.json
.
Import a folder of eBooks (PDF, EPUB, etc.) by inferring minimal metadata:
ebk import-ebooks /path/to/raw/ebooks --output-dir /path/to/output
- Uses PyPDF2 for PDF metadata and attempts a best-effort cover extraction (first page → thumbnail).
- Creates
metadata.json
and copies files + covers to/path/to/output
.
Available formats:
-
Hugo:
ebk export hugo /path/to/ebk_library /path/to/hugo_site
This writes Hugo-compatible Markdown files (and copies covers/ebooks) into your Hugo
content
+static
folders. -
Zip:
ebk export zip /path/to/ebk_library /path/to/export.zip
Creates a
.zip
archive containing the entire library.
Use set-theoretic operations to combine multiple ebk libraries:
ebk merge <operation> /path/to/merged_dir [libs...]
Where <operation>
can be:
union
: Combine all unique entriesintersect
: Keep only entries common to all librariesdiff
: Keep entries present in the first library but not otherssymdiff
: Entries in exactly one library (exclusive-or)
Example:
ebk merge union /path/to/merged_lib /path/to/lib1 /path/to/lib2
ebk search <regex> /path/to/ebk_library
By default, it searches the title
field. You can specify additional fields:
ebk search "Python" /path/to/lib --regex-fields title creators
For more powerful, structured searches:
ebk search "[?language=='en']" /path/to/lib --jmespath
JMESPath expressions allow you to filter, project fields, etc. If you want to see these results as JSON:
ebk search "[?language=='en']" /path/to/lib --jmespath --json
-
List:
ebk list /path/to/lib
Prints all ebooks with indexes, clickable file links (via Rich).
-
Add:
ebk add /path/to/lib --title "My Book" --creators "Alice" --ebooks "/path/to/book.pdf"
or
ebk add /path/to/lib --json /path/to/new_entries.json
to bulk-add entries from a JSON file.
-
Update:
- By index:
ebk update-index /path/to/lib 12 --title "New Title"
- By unique ID:
ebk update-id /path/to/lib <unique_id> --cover /path/to/new_cover.jpg
- By index:
-
Remove:
- By regex in
title
,creators
, oridentifiers
:ebk remove /path/to/lib "SomeRegex" --apply-to title creators
- By index:
ebk remove-index /path/to/lib 3 4 5
- By unique ID:
ebk remove-id /path/to/lib <unique_id>
- By regex in
-
Stats:
ebk stats /path/to/lib --keywords python data "machine learning"
Returns aggregated statistics (common languages, top creators, subject frequency, etc.).
ebk dash --port 8501
- By default, the dashboard runs at
http://localhost:8501
.
-
Prepare a ZIP Archive
From any ebk library folder (containingmetadata.json
), compress the entire folder into a.zip
. Or use:ebk export zip /path/to/lib /path/to/lib.zip
-
Upload it via the Streamlit interface (
ebk dash
). -
Browse & Filter your library:
- Advanced filtering (author, subject, language, year, etc.).
- View cover images, descriptions, and download eBooks.
- JMESPath-based advanced search in the “Advanced Search” tab.
-
Enjoy a modern, interactive interface for eBook exploration.
For programmatic usage, ebk
includes a simple LibraryManager
class:
from ebk.manager import LibraryManager
manager = LibraryManager("metadata.json")
# List all books
all_books = manager.list_books()
# Add a book
manager.add_book({
"Title": "Example Book",
"Author": "Alice",
"Tags": "fiction"
})
# Delete or update
manager.delete_book("Old Title")
manager.update_book("Example Book", {"Tags": "fiction, fantasy"})
The ebk library may be queried using a natural language interface using the
streamlit dashboard's chat interface or the command line. For the comamnd line
interface, the llm
subcommand is used:
ebk llm <ebklib> "What are the books about Python and machine learning published after 2020?"
The llm
subcommand uses the ebk
library to answer questions about the library
using a large language model. The configuration file should contain the endpoint
of the LLM server, the API key, and the model to use. Either an Ollama compatible
endpoint or an OpenAI compatible endpoint can be used.
Contributions are welcome! Here’s how to get involved:
- Fork the Repo
- Create a Branch for your feature or fix
- Commit & Push your changes
- Open a Pull Request describing the changes
We appreciate code contributions, bug reports, and doc improvements alike.
Distributed under the MIT License.
- Exporter Module:
- Switch from
os.system
toshutil
for safer file operations - Expand supported eBook formats & metadata fields
- Switch from
- Merger Module:
- Resolve conflicts automatically or allow user-specified conflict resolution
- Performance optimization for large libraries
- Consistent Entry Identification:
- Support multiple eBook files per entry seamlessly
- Improve hash-based deduplication for large files
- LLM-Based Metadata (Planned):
- Summaries or tags automatically generated via language models
- Potential GPU/accelerator support for on-device inference
- GitHub: https://github.com/queelius/ebk
- Website: https://metafunctor.com
- Issues: Open an Issue on GitHub
- Contact: lex@metafunctor.com
Happy eBook managing! 📚✨