Skip to content

TFFHRTP/OmekaRapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

🎀 OmekaRapper

AI‑Assisted Cataloging for Omeka S

Omeka S PHP License Status

OmekaRapper is an AI-powered cataloging assistant for Omeka S that helps curators, archivists, and researchers generate metadata automatically from article text, PDFs, OCR output, or web content.

The module integrates modern AI systems to analyze uploaded materials and generate suggested metadata such as:

  • Title
  • Abstract / Description
  • Subjects / Keywords
  • Creators
  • Publication information
  • Identifiers
  • Language

These suggestions can then be reviewed and applied to Omeka items by a curator.


✨ Key Features

AI Metadata Extraction

Automatically extract structured metadata from:

  • journal articles
  • reports
  • scanned documents
  • OCR text
  • web content

The AI analyzes text and produces metadata suggestions suitable for:

  • Dublin Core
  • Custom vocabularies
  • Resource templates

Omeka Admin Integration

OmekaRapper adds an AI Assistant sidebar panel directly to:

Admin β†’ Items β†’ Add Item
Admin β†’ Items β†’ Edit Item

From there curators can:

  1. Paste article or OCR text
  2. Upload a PDF
  3. Select an AI provider
  4. Generate metadata suggestions
  5. Apply suggestions to item fields

Current built-in apply buttons support:

  • Apply title β†’ dcterms:title
  • Apply abstract β†’ dcterms:abstract (fallback dcterms:description)
  • Apply creators β†’ dcterms:creator
  • Apply subjects β†’ dcterms:subject
  • Apply date β†’ dcterms:date
  • Apply publisher β†’ dcterms:publisher
  • Apply language β†’ dcterms:language
  • Apply identifiers β†’ dcterms:identifier

Current limitations:

  • values are applied as literal text entries only
  • publication/container title is displayed but not yet mapped to an Omeka property automatically
  • PDF import requires the pdftotext command-line tool (for example from Poppler)
  • OCR fallback for scanned PDFs requires pdftoppm and tesseract
  • OCR currently uses English (eng) and processes up to 10 PDF pages per request
  • OpenAI integration depends on PHP cURL being available server-side

Multi‑Provider AI Architecture

OmekaRapper supports multiple AI providers through a pluggable provider system.

Supported or planned providers:

Provider Status
DummyProvider Included
ChatGPT Included
Codex Included
Claude Included
Ollama / OpenAI-compatible local LLM Included
Claude Code Not yet implemented as a direct provider

Developers can add additional providers easily.


🧠 Architecture

The module uses a provider abstraction layer so AI systems can be swapped without affecting the rest of the module.

Omeka Admin UI
      β”‚
      β–Ό
OmekaRapper Panel
      β”‚
      β–Ό
AssistController
      β”‚
      β–Ό
AiClientManager
      β”‚
      β–Ό
ProviderInterface
      β”‚
 β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β–Ό                   β–Ό
OpenAIProvider   ClaudeProvider

πŸ“¦ Installation

1. Download the module

Download or clone the repository.

2. Copy to Omeka modules directory

/modules/OmekaRapper

3. Install in Omeka

Admin β†’ Modules β†’ Install β†’ OmekaRapper

4. Configure providers

Admin β†’ Modules β†’ OmekaRapper β†’ Configure

5. Enable PDF import

Install pdftotext so OmekaRapper can extract text from uploaded PDFs before sending that text to the selected provider. For scanned PDFs, install tesseract too so OmekaRapper can fall back to OCR.

On macOS with Homebrew:

brew install poppler tesseract

Available settings:

  • default provider for the item editor
  • enable or disable the ChatGPT provider
  • OpenAI API key
  • ChatGPT model dropdown
  • ChatGPT endpoint
  • enable or disable the Codex provider
  • Codex model dropdown
  • Codex endpoint
  • enable or disable the Claude provider
  • Anthropic API key
  • Claude model dropdown
  • Claude endpoint
  • enable or disable the Ollama provider
  • Ollama model dropdown
  • Ollama endpoint
  • optional Ollama API key

Local LLMs with Ollama

OmekaRapper can talk to local models through an OpenAI-compatible endpoint.

For Ollama:

  1. Install and run Ollama.
  2. Pull a model, for example:
ollama pull llama3.2
  1. In Admin β†’ Modules β†’ OmekaRapper β†’ Configure, enable Ollama.
  2. Use:
Model: llama3.2
Endpoint: http://localhost:11434/v1/chat/completions
API key: ollama

Notes:

  • the API key is usually ignored by Ollama, but a placeholder value is acceptable
  • other OpenAI-compatible local servers such as LM Studio can be used by changing the endpoint and model name
  • Claude Code is not exposed as a direct OmekaRapper provider yet; the module currently integrates Claude through the Anthropic Messages API
  • model lists are loaded dynamically from the configured provider endpoints

πŸš€ Usage

  1. Go to:
Admin β†’ Items β†’ Add Item
  1. Locate the OmekaRapper AI Assistant panel.

  2. Paste article or OCR text, upload a PDF, or do both.

  3. Select a provider.

  4. Click:

Suggest Metadata

AI-generated metadata suggestions will appear in the panel and can be applied to item fields.


πŸ”Œ API Endpoints

OmekaRapper exposes internal endpoints used by the admin UI.

GET  /admin/omeka-rapper/providers
POST /admin/omeka-rapper/suggest

Example request:

text=Example article text
provider=dummy

Example response:

{
  "ok": true,
  "provider": "dummy",
  "suggestions": {
    "title": "Example Article Title",
    "abstract": "First portion of article text"
  }
}

πŸ—‚ Project Structure

OmekaRapper
β”œβ”€β”€ Module.php
β”œβ”€β”€ config
β”‚   β”œβ”€β”€ module.config.php
β”‚   └── module.ini
β”œβ”€β”€ src
β”‚   β”œβ”€β”€ Controller
β”‚   β”‚   └── AssistController.php
β”‚   β”œβ”€β”€ Factory
β”‚   β”‚   β”œβ”€β”€ AiClientManagerFactory.php
β”‚   β”‚   └── AssistControllerFactory.php
β”‚   └── Service
β”‚       β”œβ”€β”€ AiClientManager.php
β”‚       └── Provider
β”‚           β”œβ”€β”€ ProviderInterface.php
β”‚           └── DummyProvider.php
β”œβ”€β”€ view
β”‚   └── omeka-rapper
β”‚       └── admin
β”‚           └── assist
β”‚               └── panel.phtml
└── asset
    └── js
        └── omeka-rapper.js

πŸ”’ Security Considerations

When integrating external AI providers:

  • Avoid sending restricted archival data without approval
  • Store API keys securely
  • Avoid exposing keys in JavaScript
  • Implement request throttling and input limits
  • Consider local models for sensitive collections

πŸ›£ Roadmap

Planned features for future versions:

  • OpenAI provider
  • Claude provider
  • PDF ingestion
  • OCR processing
  • Auto‑mapping to Dublin Core fields
  • Resource template awareness
  • Batch cataloging
  • Dataset‑specific prompt profiles
  • Linked open data enrichment

πŸ§‘β€πŸ’» Contributing

Contributions are welcome.

Suggested areas for development:

  • new AI providers
  • metadata extraction improvements
  • vocabulary integrations
  • UI enhancements
  • automated ingestion pipelines

πŸ“œ License

MIT License


🌍 Vision

OmekaRapper aims to become a full AI‑assisted archival cataloging platform for Omeka S that helps institutions:

  • reduce manual metadata entry
  • improve metadata consistency
  • accelerate digitization workflows
  • enhance discoverability of cultural collections

About

Official Omeka Rapper Module

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors