Skip to content

AI Cookbook for Libraries

CENL-AI-WG edited this page Aug 16, 2021 · 19 revisions

AI Recipes

The recipes are classified by technical domain, and for each recipe, classical use cases in libraries are provided.


Text

Natural Language Processing (NLP)

Recipe #1: Named entity extraction & linking

  • enrichment of digital collections (creating new metadata such as person names, organizations, locations) for information retrieval, scientific objectives, etc.
  • establishing links between documents and authority data

Recipe #2: Topic Modeling

  • understanding large collections of unstructured text documents (text mining use case)
  • enrichment of digital collections with topics (information retrieval)

Recipe #3: Text Classification, computational semantics

  • enrichment of digitized collections with genres (novel, poetry, science, ...) or other classification schema (Dewey...)
  • cataloguing of born-digital materials

Recipe #4: Language models

  • creation and use of language models for NLP tasks
  • correction of ocerized collections

ocr

Optical Character Recognition (OCR)

Recipe #1: Training an OCR on a specific corpora

Recipe #2: Evaluating OCR quality


pen

Handwritten Text Recognition (HTR)

Recipe #1: HTR

  • HTR for full text indexing
  • HTR for transcription

Recipe #2: Authorship attribution

  • Attribution of authorship based on handwriting
  • Attribution of style of writing (uncial, carolingian, etc.)

Document Analysis

Recipe #1: Document Classification

  • pre-treatment of uncatalogued collections (filtering, preindexing, ...) based on the document type (letter, typewritten, map, etc.)
  • enrichment of digital catalogued collections with document types

Recipe #2: Page Segmentation

  • extraction of text from heritage documents for full text indexing
  • segmentation of illustrations from heritage documents

Recipe #3: Article Recognition for newspapers, dictionaries, sales catalogues (arts, coins/medals...)

Recipe #4: Tables Recognition (census tables, stock market, etc.)

Recipe #5: Date Attribution


camera

Computer vision

  • pre-treatment of uncatalogued collections (filtering, preindexing, ...)
  • enrichment of digital catalogued collections (creating new metadata) for information retrieval, scientific objectives, etc.

Recipe #2: Object Detection and Face Detection, Instance Search

  • enrichment of digital collections
  • data analysis, visual studies

Recipe #3: Image Similarity

  • information retrieval based on visual similarity
  • navigating massive digital collections
  • curation of digital collections (duplicate detection, variant detection)

Recipe #4: Video Indexing

  • pre-treatment of uncatalogued collections (cutting into sequences, ...)
  • enrichment of digital catalogued collections (creating new metadata): object detection, scene classification, etc.
  • subtitles transcription (OCR)

Recipe #5: Audio Transcription

  • speech to text
  • speaker detection

To add a new AI recipe, use the recipe template