AIβAssisted Cataloging for Omeka S
OmekaRapper is an AI-powered cataloging assistant for Omeka S that helps curators, archivists, and researchers generate metadata automatically from article text, PDFs, OCR output, or web content.
The module integrates modern AI systems to analyze uploaded materials and generate suggested metadata such as:
- Title
- Abstract / Description
- Subjects / Keywords
- Creators
- Publication information
- Identifiers
- Language
These suggestions can then be reviewed and applied to Omeka items by a curator.
Automatically extract structured metadata from:
- journal articles
- reports
- scanned documents
- OCR text
- web content
The AI analyzes text and produces metadata suggestions suitable for:
- Dublin Core
- Custom vocabularies
- Resource templates
OmekaRapper adds an AI Assistant sidebar panel directly to:
Admin β Items β Add Item
Admin β Items β Edit Item
From there curators can:
- Paste article or OCR text
- Upload a PDF
- Select an AI provider
- Generate metadata suggestions
- Apply suggestions to item fields
Current built-in apply buttons support:
Apply titleβdcterms:titleApply abstractβdcterms:abstract(fallbackdcterms:description)Apply creatorsβdcterms:creatorApply subjectsβdcterms:subjectApply dateβdcterms:dateApply publisherβdcterms:publisherApply languageβdcterms:languageApply identifiersβdcterms:identifier
Current limitations:
- values are applied as literal text entries only
- publication/container title is displayed but not yet mapped to an Omeka property automatically
- PDF import requires the
pdftotextcommand-line tool (for example from Poppler) - OCR fallback for scanned PDFs requires
pdftoppmandtesseract - OCR currently uses English (
eng) and processes up to 10 PDF pages per request - OpenAI integration depends on PHP cURL being available server-side
OmekaRapper supports multiple AI providers through a pluggable provider system.
Supported or planned providers:
| Provider | Status |
|---|---|
| DummyProvider | Included |
| ChatGPT | Included |
| Codex | Included |
| Claude | Included |
| Ollama / OpenAI-compatible local LLM | Included |
| Claude Code | Not yet implemented as a direct provider |
Developers can add additional providers easily.
The module uses a provider abstraction layer so AI systems can be swapped without affecting the rest of the module.
Omeka Admin UI
β
βΌ
OmekaRapper Panel
β
βΌ
AssistController
β
βΌ
AiClientManager
β
βΌ
ProviderInterface
β
ββββββ΄βββββββββββββββ
βΌ βΌ
OpenAIProvider ClaudeProvider
Download or clone the repository.
/modules/OmekaRapper
Admin β Modules β Install β OmekaRapper
Admin β Modules β OmekaRapper β Configure
Install pdftotext so OmekaRapper can extract text from uploaded PDFs before sending that text to the selected provider. For scanned PDFs, install tesseract too so OmekaRapper can fall back to OCR.
On macOS with Homebrew:
brew install poppler tesseractAvailable settings:
- default provider for the item editor
- enable or disable the ChatGPT provider
- OpenAI API key
- ChatGPT model dropdown
- ChatGPT endpoint
- enable or disable the Codex provider
- Codex model dropdown
- Codex endpoint
- enable or disable the Claude provider
- Anthropic API key
- Claude model dropdown
- Claude endpoint
- enable or disable the Ollama provider
- Ollama model dropdown
- Ollama endpoint
- optional Ollama API key
OmekaRapper can talk to local models through an OpenAI-compatible endpoint.
For Ollama:
- Install and run Ollama.
- Pull a model, for example:
ollama pull llama3.2
- In
Admin β Modules β OmekaRapper β Configure, enableOllama. - Use:
Model: llama3.2
Endpoint: http://localhost:11434/v1/chat/completions
API key: ollama
Notes:
- the API key is usually ignored by Ollama, but a placeholder value is acceptable
- other OpenAI-compatible local servers such as LM Studio can be used by changing the endpoint and model name
- Claude Code is not exposed as a direct OmekaRapper provider yet; the module currently integrates Claude through the Anthropic Messages API
- model lists are loaded dynamically from the configured provider endpoints
- Go to:
Admin β Items β Add Item
-
Locate the OmekaRapper AI Assistant panel.
-
Paste article or OCR text, upload a PDF, or do both.
-
Select a provider.
-
Click:
Suggest Metadata
AI-generated metadata suggestions will appear in the panel and can be applied to item fields.
OmekaRapper exposes internal endpoints used by the admin UI.
GET /admin/omeka-rapper/providers
POST /admin/omeka-rapper/suggest
Example request:
text=Example article text
provider=dummy
Example response:
{
"ok": true,
"provider": "dummy",
"suggestions": {
"title": "Example Article Title",
"abstract": "First portion of article text"
}
}OmekaRapper
βββ Module.php
βββ config
β βββ module.config.php
β βββ module.ini
βββ src
β βββ Controller
β β βββ AssistController.php
β βββ Factory
β β βββ AiClientManagerFactory.php
β β βββ AssistControllerFactory.php
β βββ Service
β βββ AiClientManager.php
β βββ Provider
β βββ ProviderInterface.php
β βββ DummyProvider.php
βββ view
β βββ omeka-rapper
β βββ admin
β βββ assist
β βββ panel.phtml
βββ asset
βββ js
βββ omeka-rapper.js
When integrating external AI providers:
- Avoid sending restricted archival data without approval
- Store API keys securely
- Avoid exposing keys in JavaScript
- Implement request throttling and input limits
- Consider local models for sensitive collections
Planned features for future versions:
- OpenAI provider
- Claude provider
- PDF ingestion
- OCR processing
- Autoβmapping to Dublin Core fields
- Resource template awareness
- Batch cataloging
- Datasetβspecific prompt profiles
- Linked open data enrichment
Contributions are welcome.
Suggested areas for development:
- new AI providers
- metadata extraction improvements
- vocabulary integrations
- UI enhancements
- automated ingestion pipelines
MIT License
OmekaRapper aims to become a full AIβassisted archival cataloging platform for Omeka S that helps institutions:
- reduce manual metadata entry
- improve metadata consistency
- accelerate digitization workflows
- enhance discoverability of cultural collections