## Getting started
First let's load the pleiades data from our repository using the pandas package.    
The 'usecols' parameter selects wich columns we load into python.    

In [1]:
import re
import pandas as pd
import json
datatable = pd.read_csv("./data/pleiades-places.csv", 
                        usecols=["authors", "title", "id", "description", "featureTypes", 
                                 "reprLat", "reprLong", "path","timePeriodsKeys", "timePeriodsRange", "minDate", "maxDate"],
                       index_col="id")

def load_dict_from_file(path):
    f = open(path,'r')
    data=f.read()
    f.close()
    return eval(data)


Now let's have a look at our imported datatable:

In [2]:
datatable

Unnamed: 0_level_0,authors,description,featureTypes,maxDate,minDate,path,reprLat,reprLong,timePeriodsKeys,timePeriodsRange,title
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
48210385,"Becker, J., T. Elliott",The post-Roman settlement at Alba Fucens becam...,settlement,1453.0,640.0,/places/48210385,42.082885,13.411984,mediaeval-byzantine,"640.0,1453.0",Borgo Medievale
48210386,"Becker, J., T. Elliott",A major urban sanctuary at Vulci with a long p...,temple-2,300.0,-750.0,/places/48210386,42.419374,11.628546,"archaic,classical,hellenistic-republican,roman","-750.0,300.0",Tempio Grande at Vulci
265876,"Spann, P., DARMC, R. Talbert, S. Gillies, R. W...","An ancient settlement, likely of Celtic origin...",settlement,640.0,-330.0,/places/265876,39.460299,-3.606772,"hellenistic-republican,roman,late-antique","-330.0,640.0",Consabura/Consabrum
265877,"Spann, P., R. Warner, R. Talbert, S. Gillies, ...","An ancient place, cited: BAtlas 27 D3 Contestania",region,300.0,-330.0,/places/265877,38.988847,-0.515639,"hellenistic-republican,roman","-330.0,300.0",Contestania
265878,"Spann, P., R. Warner, R. Talbert, T. Elliott, ...","An ancient place, cited: BAtlas 27 C4 Cueva de...",mine,300.0,-330.0,/places/265878,37.241792,-2.403309,"hellenistic-republican,roman","-330.0,300.0",Cueva de la Paloma
265880,"Spann, P., J. Becker, DARMC, T. Elliott, S. Gi...","An ancient place, cited: BAtlas 27 F3 Dianium/...",settlement,2100.0,-750.0,/places/265880,38.842795,0.107511,"archaic,classical,hellenistic-republican,roman...","-750.0,2100.0",Dianium/Hemeroskopeion
265882,"Spann, P., R. Warner, R. Talbert, T. Elliott, ...","An ancient place, cited: BAtlas 27 B4 Ebura",settlement,300.0,-30.0,/places/265882,37.463838,-3.924959,roman,"-30.0,300.0",Ebura
265883,"Spann, P., DARMC, R. Talbert, R. Warner, J. Be...","An ancient place, cited: BAtlas 27 G3 Ebusus",settlement,640.0,-750.0,/places/265883,38.908393,1.432146,"archaic,classical,hellenistic-republican,roman...","-750.0,640.0",Ebusus
265884,"Spann, P., R. Warner, R. Talbert, S. Gillies, ...",Ebusus Ins. (modern Ibiza) is an island in the...,island,2100.0,-750.0,/places/265884,38.980000,1.430000,"archaic,classical,hellenistic-republican,roman...","-750.0,2100.0",Ebusus (island)
265886,"Spann, P., DARMC, R. Talbert, J. Becker, R. Wa...","An ancient place, cited: BAtlas 27 B3 Edeba",settlement,300.0,-30.0,/places/265886,38.640276,-3.362253,roman,"-30.0,300.0",Edeba


Consider the 'id' Column. This number is a unique identifier for every row in our Table.    
We can access a row through this id:   

In [3]:
datatable.loc[48210386]

authors                                        Becker, J., T. Elliott
description         A major urban sanctuary at Vulci with a long p...
featureTypes                                                 temple-2
maxDate                                                           300
minDate                                                          -750
path                                                 /places/48210386
reprLat                                                       42.4194
reprLong                                                      11.6285
timePeriodsKeys        archaic,classical,hellenistic-republican,roman
timePeriodsRange                                         -750.0,300.0
title                                          Tempio Grande at Vulci
Name: 48210386, dtype: object

In [4]:
datatable.loc[48210386]['reprLat']

42.4193742

The big problem here: We do it the other way around. We are scanning the Corpus for the titles of places!    
At the same time we need to be aware that there might be different words indicating the same place (Athens, Athenian, Athenians .... )    
The idea is to create a huge python-dictionary that maps words representing places to the corresponding id of our datatable.

In [5]:
word_lookup = load_dict_from_file("./data/word_lookup.txt")

In [6]:
word_lookup_extension = load_dict_from_file("./data/word_lookup_extension.txt")
word_lookup = {**word_lookup, **word_lookup_extension}

Now let's load and scan our Herodotus Corpus!   

In [7]:
herodotus=open('./data/herodotus_history.txt','r',newline='\n').read()
herodotus = herodotus.replace('\n', ' ')
withoutMarks = re.sub(pattern='\W', string=herodotus, repl=" ")
result = re.sub(pattern=' +', string=withoutMarks, repl=" ")
words_in_Herodotus = result.split()

In [8]:
words_in_Herodotus

['BOOK',
 'I',
 'Clio',
 'These',
 'are',
 'the',
 'researches',
 'of',
 'Herodotus',
 'of',
 'Halicarnassus',
 'which',
 'he',
 'publishes',
 'in',
 'the',
 'hope',
 'of',
 'thereby',
 'preserving',
 'from',
 'decay',
 'the',
 'remembrance',
 'of',
 'what',
 'men',
 'have',
 'done',
 'and',
 'of',
 'preventing',
 'the',
 'great',
 'and',
 'wonderful',
 'actions',
 'of',
 'the',
 'Greeks',
 'and',
 'the',
 'Barbarians',
 'from',
 'losing',
 'their',
 'due',
 'meed',
 'of',
 'glory',
 'and',
 'withal',
 'to',
 'put',
 'on',
 'record',
 'what',
 'were',
 'their',
 'grounds',
 'of',
 'feuds',
 'According',
 'to',
 'the',
 'Persians',
 'best',
 'informed',
 'in',
 'history',
 'the',
 'Phoenicians',
 'began',
 'to',
 'quarrel',
 'This',
 'people',
 'who',
 'had',
 'formerly',
 'dwelt',
 'on',
 'the',
 'shores',
 'of',
 'the',
 'Erythraean',
 'Sea',
 'having',
 'migrated',
 'to',
 'the',
 'Mediterranean',
 'and',
 'settled',
 'in',
 'the',
 'parts',
 'which',
 'they',
 'now',
 'inhabit',
 'b

In [9]:
word_count ={}

for word in words_in_Herodotus:
    if word in word_lookup.keys():
            if word in word_count:
                word_count[word]+=1
            else:
                word_count[word]=1
        
        

In [10]:
def feature2Json( title,  description,  Lat,  Lng,  count, featureTypes, timePeriodsKeys, timePeriodsRange, minDate, maxDate):
    result = {
         "type": "Feature",
         "properties": {
             "title": title,
             "description": description,
             "count": count,
             "featureTypes": featureTypes,
             "timePeriodsKeys": timePeriodsKeys,
             "timePeriodsRange": timePeriodsRange,
             "minDate": minDate,
             "maxDate": maxDate
         },
        "geometry": {
             "type": "Point",
             "coordinates": [
                 Lng,
                 Lat
             ]  
         }
        }
    return result

In [11]:
for word, the_id in word_lookup.items():
    if the_id == 491527:
        print(word, the_id)

In [12]:
features = []
for word, count in word_count.items():
    features.append(
        feature2Json(
            datatable.loc[word_lookup[word]]['title'],
            datatable.loc[word_lookup[word]]['description'],
            datatable.loc[word_lookup[word]]['reprLat'], 
            datatable.loc[word_lookup[word]]['reprLong'],
            count,
            datatable.loc[word_lookup[word]]['featureTypes'],
            datatable.loc[word_lookup[word]]['timePeriodsKeys'],
            datatable.loc[word_lookup[word]]['timePeriodsRange'],
            datatable.loc[word_lookup[word]]['minDate'],
            datatable.loc[word_lookup[word]]['maxDate']
        )
    )
FeatureCollection = {
    "type": "FeatureCollection",
    "features": features
 }

f = open('./FeatureCollection.js', 'w')  # start to write JS-variable
f.write("var placeFeatures = \n")
f.flush
f.close()

with open('./FeatureCollection.js', 'a') as f:  # append JSON object
    json.dump(FeatureCollection, f, indent=1)

In [29]:
len(word_count)

208