CURRENS

Multi-Purpose Universal Latin Text Cleaner and Normaliser

PURPOSE AND MODULES

CURRENS is a custom written program that addresses the issue of having to deal with orthographically not-unified Latin texts. The program includes 4 main modules that address different issues.

General module

Orthography issues: tokenizer, j > i, v > u, elimination of punctuation and of roman and arabic numerals and non UTF-8 symbols, replacer of every capital letter into non capital.

(Optional)
Enclitics handler

Splits the enclitic from the root word. Exceptions are handled through a list of custom written exception words for every main Latin enclitic (que, ne-n, st, ue-ve: CartellaQueExceptions, CartellaNeExceptions, CartellaNExceptions, CartellaStExceptions, CartellaUeExceptions, CartellaVeExceptions).
Archaisms handler

Translates archaisms into classical Latin variants. Still to do: *ont-*unt, med-me, ostr-estr, uelt-uult, *umus-*imus/ume-ime/uma-ima, oncul-uncul, ube-ibe, issum-issim, quoi-cui, acherun-acheron
Stopwords handler

Removes stopwords from a custom built list of Latin stopwords (from the folder "cartellastopwords", divided per letter). You can add your own custom stopwords to the files.

Developed and maintained by:

Andrea Peverelli, junior researcher and PhD candidate in Digital Humanities for the Translatin Project (PI: Jan Bloemendal) at the Huygens Institute, KNAW Humanities Cluster, Amsterdam (Netherlands)
Alessandro Rossi, Politecnico di Milano, Dipartimento di Ingegneria Informatica (Computer Engineering Department)

The program is completely free and open access. We kindly ask you to cite this GitHub repository and the authors for reference in your own research if you end up using CURRENS.

HOW TO USE THE PROGRAM

Download the whole folder and store locally.
Open the currens.py file in a Python environment editor (VSCode, Sublime, Atom, Anaconda, Jupyter...).
At line 10: with open('Path/to/your/file.txt', 'r') as file: insert the full path of your text file (in txt format).
A prompt will appear in your consol asking you to make choices for the three optional modules of CURRENS (archaisms, stopwords and enclitics handler):

Type yes or no in the console for each, depending on the module you require for your experiment.

You should have now a txt file called "temp" in the CURRENS folder. This is the output file of the cleaned text after the process.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
_pycache_		_pycache_
cartellaneexceptions		cartellaneexceptions
cartellanexceptions		cartellanexceptions
cartellaqueexceptions		cartellaqueexceptions
cartellastexceptions		cartellastexceptions
cartellastopwords		cartellastopwords
cartellaueexceptions		cartellaueexceptions
cartellaveexceptions		cartellaveexceptions
ArchaismsHandler.py		ArchaismsHandler.py
CreaFileNExceptions.py		CreaFileNExceptions.py
CreaFileNeExceptions.py		CreaFileNeExceptions.py
CreaFileQueExceptions.py		CreaFileQueExceptions.py
CreaFileStExceptions.py		CreaFileStExceptions.py
CreaFileUeExceptions.py		CreaFileUeExceptions.py
CreaFileVeExceptions.py		CreaFileVeExceptions.py
EncliticsHandler.py		EncliticsHandler.py
README.md		README.md
creafilestopwords.py		creafilestopwords.py
currens.py		currens.py
final_stop_removal.py		final_stop_removal.py
removeStopwords.py		removeStopwords.py
replacer.py		replacer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CURRENS

About

Releases

Packages

Languages

AndrewPeverells/CURRENS

Folders and files

Latest commit

History

Repository files navigation

CURRENS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages