Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Named Entity Extraction II
Date: Thursday, April 20, 2017, 17h00-18h15 (CEST time)
Session coordinators: Matteo Romanello and Francesco Mambrini (Deutsches Archäologisches Institut, Berlin)
YouTube link: https://www.youtube.com/watch?v=mD5icsPJIG4
Slides: SunoikisisDC, session 13
After reviewing the main technical concepts introduced in the previous session, in this session we aim to address some more advanced topics of both programming in Python and Named Entity Recognition (NER). We will introduce the Object-Oriented Paradigm in Python, and write some advanced function to derive basic statistics about the extracted named entities. We will then move to the extraction of dates from journal articles by using regular expressions. Finally, we will cover the topic of how to compare and evaluate different solutions (i.e. algorithms) to perform the same NLP task.
- recap of main concepts of programming in Python
- tagging of Caesar's De Bello Gallico: how to calculate some basic statistics about the extracted NEs?
- how to extract dates from journal articles by using regular expressions?
- evaluation of NER: error types and accuracy measures
S. Bird, E. Klein and E. Loper, Natural Language Processing with Python, O’Reilly, 2009. Available at: http://www.nltk.org/book/:
Dan Jurafsky and James H. Martin. Speech and Language Processing, Chapter 21, p. 1-7 (Information Extraction). 3rd edition draft available at https://web.stanford.edu/~jurafsky/slp3/21.pdf.
- Erdmann, Alex, Christopher Brown, Brian D. Joseph, Mark Janse, and Petra Ajaka. 2016. “Challenges and Solutions for Latin Named Entity Recognition.” In Coling. Association for Computational Linguistics. Available at https://www.clarin-d.de/images/lt4dh/pdf/LT4DH12.pdf.
The exercise is described in last section of this notebook.
The students are asked to apply some of the notions discussed in the common classes and reuse some of the code presented in order to extract and annotate in IOB. Students will also evaluate the performances of the NER tagger using the methodology and metrics (precision, recall, F-score) explained in the lecture.