Skip to content

mpuren/jdh_articlemethodo

Repository files navigation

Abstract

Binder

The historian's profession is now considerably renewed thanks to the possibilities offered by the digitisation of sources and the translation of images into textual format (HTR and OCR), to the point that the production and exploitation of digital historical data is considered by some historians as the main intellectual and technological turning point in history. More and more texts are digitised, and now ocerised; it is therefore becoming increasingly possible to use the methods offered by distant reading, as defined by Franco Moretti , to explore these corpora of digital sources. This first step paves the way for the use of statistical analysis methods that are particularly useful for the processing of large corpora. Some methods such as factor analysis, hierarchical classification or the various tools of lexicometric analysis are familiar to a growing number of historians. However, many are reluctant to use them out of fear of misusing them, at the risk of creating a certain form of rejection of these methods. It is indeed crucial to know the limits of a tool, to be able to evaluate the quality of a result, and even to know when to switch to other less used algorithms. In this article, we will draw on the study of French parlementiary debates from the end of the nineteenth to highlight the many pitfalls that researchers may face, as well as ways of getting around them when possible. In particular, we will study the evaluation of the quality of a classification, the relevance of the choice of a representation or the adequacy of the modelling of a time series. We also wish to question the most relevant methods to be used in the context of political history. We will take advantage of this opportunity to present in an accessible way some tools that are not yet widely used by historians, from generative models in language processing to graph generation tools, via the self-organising map. Each time, we will try to show how each method can solve a concrete difficulty posed by the corpus, but also to illustrate its limits and to present some safeguards.

Keywords

NLP, Contemporary History, Parliamentary debates, Methodology Chaîne de traitement

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages