Skip to content

Named Entity Recognition: SNAP and Recogito (February 24)

Monica Berti edited this page Feb 24, 2016 · 17 revisions

Named Entity Recognition: SNAP and Recogito

February 24, 2016: 17h00-18h15 CET

Gabriel Bodard (ICS London) and Chiara Palladino (University of Bari and Leipzig)


Aim of the session

This session aims to provide a framework about manual techniques of Named Entity Recognition, focusing on two particular categories peculiar: personal names and place-names. The lesson will especially focus on manual annotation on ancient sources, presenting two in-development interfaces created for this purpose, SNAP (for personal names) and Recogito (for place-names). The teachers will introduce the theoretical framework of these projects within the larger contexts of prosopography and geography, then a practical introduction to Recogito will be presented: it will be illustrated how to annotate place-names on ancient texts and maps from the Pelagios database (either in English or in other languages); then, the process of geotagging will be presented and analyzed in depth, to show how to associate a simple annotated name with a place on the map.

In the final part of the session, some advanced applications with the annotated data will be presented: it will be shown how geotagged texts can be manipulated with some basic features to provide a better understanding of their spatial concept and model.

Outline of the class

  1. Introduction to NER (15 min)
  2. The Pelagios project (5 min)
  3. The SNAP:DRGN project (10 min)
  4. Pelagios and Recogito practical session (20 min)
  5. What can you do with geotagged data? (15 min)
  6. Exercise (10 min)

Required reading

Further reading

Essay title

With reference to either prosopography (people) or geography (places) or another type of ancient data, describe and assess the value Linked Open Data currently has for ancient studies.

Exercise

As a possible assignment, the students will have to create their own small dataset annotating and geotagging a chosen text or image on Recogito.

  1. Familiarise yourself with the Pelagios Annotation Principles
  2. Read and work through the Recogito Beginner’s Tutorial
  3. Select a dataset in a chosen language (text or map image) in Recogito (probably with guidance from your tutor)
  4. Annotate roughly 50-100 placenames in text or as many placenames on a map as you can (ideally 15-30). NB: follow the Annotation Principles carefully
  5. Work through the names you have annotated, and georesolve or flag as many as you can (ideally at least half of them). NB: follow the beginner’s tutorial carefully
  6. Optional: Look at the maps, document stats and other visualizations in Recogito for the material you have added. Try downloading the geodata as a CSV file, and visualize it in QGIS following the tutorial provided. What can you learn from the data you have annotated?

##Recogito QGIS Tutorial (By Leif Isaksen)

This tutorial is intended to give a flavour of the potential of Recogito download data within a GIS system. It is not intended to be exhaustive or specific to QGIS.

Preparation:

  1. If necessary, download and install QGIS
  2. Download your chosen CSV file from Recogito:
  3. Within QGIS, activate the OpenLayers and Qgis2threejs plugins and ensure they are up to date.

Add base layer:

  1. Create a New Project in QGIS. (Project | New )
  2. Change the canvas unit to metres (Project | Project Properties…)
  3. Add an aerial base layer map (Web | OpenLayers plugin | MapQuest | MapQuest Open Aerial )

Load Recogito data:

  1. Add a csv file to the project (Layer | Add Delimited Text Layer…)
  2. Select the Bordeaux itinerary file and ensure the delimiters are set correctly, that the first line has field names and that the x and y values represent longitude (E-W) and latitude (N-S) respectively.
  3. Select EPSG:4326 CRS system You should now see the places on the map.

Symbolize the data.

  1. Double-click the layer (or right-click| properties)
  2. In the Labels tab select Label this Layer with, and select the toponym field
  3. In the Style tab, change the Single Symbol drop down menu to Categorized
  4. Under Column select tags
  5. Click Classify. You will now see that each kind of tag entry has been assigned a color. QGIS cannot separate out tags so you will need to color-code multiple entries the same, or you can use filtering to create individual layers for each type of feature.
  6. Choose an individual feature class and change it by double-clicking on the symbol or selecting change…. You can change multiple features at once. Try representing different categories in different styles. Experiment with different combinations of colour and shape.
Clone this wiki locally