AI Cookbook for Libraries

Jump to bottom

CENL-AI-WG edited this page Aug 16, 2021 · 19 revisions

AI Recipes

The recipes are classified by technical domain, and for each recipe, classical use cases in libraries are provided.

Text

Natural Language Processing (NLP)

Recipe #1: Named entity extraction & linking

enrichment of digital collections (creating new metadata such as person names, organizations, locations) for information retrieval, scientific objectives, etc.
establishing links between documents and authority data

Recipe #2: Topic Modeling

understanding large collections of unstructured text documents (text mining use case)
enrichment of digital collections with topics (information retrieval)

Recipe #3: Text Classification, computational semantics

enrichment of digitized collections with genres (novel, poetry, science, ...) or other classification schema (Dewey...)
cataloguing of born-digital materials

Recipe #4: Language models

creation and use of language models for NLP tasks

Recipe #5: OCR Post-correction

correction of ocerized collections

ocr

Optical Character Recognition (OCR)

Recipe #1: Training an OCR on a specific corpora

Recipe #2: Evaluating OCR quality

pen

Handwritten Text Recognition (HTR)

Recipe #1: HTR

HTR for full text indexing
HTR for transcription

Recipe #2: Authorship attribution

Attribution of authorship based on handwriting
Attribution of style of writing (uncial, carolingian, etc.)

Document Analysis

Recipe #1: Document Classification

pre-treatment of uncatalogued collections (filtering, preindexing, ...) based on the document type (letter, typewritten, map, etc.)
enrichment of digital catalogued collections with document types

Recipe #2: Page Segmentation

extraction of text from heritage documents for full text indexing
segmentation of illustrations from heritage documents

Recipe #3: Article Recognition for newspapers, dictionaries, sales catalogues (arts, coins/medals...)

Recipe #4: Tables Recognition (census tables, stock market, etc.)

Recipe #5: Date Attribution

camera

Computer vision

Recipe #1: Images Classification

pre-treatment of uncatalogued collections (filtering, preindexing, ...)
enrichment of digital catalogued collections (creating new metadata) for information retrieval, scientific objectives, etc.

Recipe #2: Object Detection and Face Detection, Instance Search

enrichment of digital collections
data analysis, visual studies

Recipe #3: Image Similarity

information retrieval based on visual similarity
navigating massive digital collections
curation of digital collections (duplicate detection, variant detection)

Recipe #4: Video Indexing

pre-treatment of uncatalogued collections (cutting into sequences, ...)
enrichment of digital catalogued collections (creating new metadata): object detection, scene classification, etc.
subtitles transcription (OCR)

Recipe #5: Audio Transcription

speech to text
speaker detection

To add a new AI recipe, use the recipe template