This script takes a list of persons and extracts all their movement starting from their English Wikipedia page.
The file need to be encoded in
utf-8 and the names of the persons in the list must be the same used in their Wikipedia url, e.g.
Albert_Einstein . An example of the list can be found in the file
The script returns three files as output:
.tsvfile containing all the movements in a tab separated format. This file includes also the movements missing some informations (e.g. the coodrdinates)
_clean.tsvfile containing only the movements having all the information in a tab separated format.
.jsonfile: same content of the
.jsonformat. This the file to be used with the Ramble On Navigator
This script requires the
SPARQLWrapper python modules.
To install them:
sudo easy_install pip pip install --user KafNafParserPy pip install --user SPARQLWrapper
To run the code:
python ramble_on.py -l list.txt -p pantheon_subset.txt -e -o output_movements
-lfile containing the list of names
-oname of the output
-pfiles with metadata from the Pantheon dataset (optional). This file is not necessary, but, if present, it provides the information needed by the Ramble On Navigator to filter the queries (e.g. by nation or profession)
-eextend the number of corefence chains to use
This script relies on
DBpedia. The urls of these services can be configured in the file
config.ini. For an heavy use of this script (e.g. dozens of biographies) we suggest to install these services locally.