## Getting started
First let's load the pleiades data from our repository using the pandas package.    
The 'usecols' parameter selects wich columns we load into python.    

In [88]:
import re
import pandas as pd
import numpy as np
import json
datatable = pd.read_csv("./data/pleiades-places.csv", 
                        usecols=["authors", "title", "id", "description", "featureTypes", 
                                 "reprLat", "reprLong", "path","timePeriodsKeys", "timePeriodsRange", "minDate", "maxDate"],
                       index_col="id")

def load_dict_from_file(path):
    f = open(path,'r')
    data=f.read()
    f.close()
    return eval(data)


Now let's have a look at our imported datatable:

In [89]:
datatable

Unnamed: 0_level_0,authors,description,featureTypes,maxDate,minDate,path,reprLat,reprLong,timePeriodsKeys,timePeriodsRange,title
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
48210385,"Becker, J., T. Elliott",The post-Roman settlement at Alba Fucens becam...,settlement,1453.0,640.0,/places/48210385,42.082885,13.411984,mediaeval-byzantine,"640.0,1453.0",Borgo Medievale
48210386,"Becker, J., T. Elliott",A major urban sanctuary at Vulci with a long p...,temple-2,300.0,-750.0,/places/48210386,42.419374,11.628546,"archaic,classical,hellenistic-republican,roman","-750.0,300.0",Tempio Grande at Vulci
265876,"Spann, P., DARMC, R. Talbert, S. Gillies, R. W...","An ancient settlement, likely of Celtic origin...",settlement,640.0,-330.0,/places/265876,39.460299,-3.606772,"hellenistic-republican,roman,late-antique","-330.0,640.0",Consabura/Consabrum
265877,"Spann, P., R. Warner, R. Talbert, S. Gillies, ...","An ancient place, cited: BAtlas 27 D3 Contestania",region,300.0,-330.0,/places/265877,38.988847,-0.515639,"hellenistic-republican,roman","-330.0,300.0",Contestania
265878,"Spann, P., R. Warner, R. Talbert, T. Elliott, ...","An ancient place, cited: BAtlas 27 C4 Cueva de...",mine,300.0,-330.0,/places/265878,37.241792,-2.403309,"hellenistic-republican,roman","-330.0,300.0",Cueva de la Paloma
265880,"Spann, P., J. Becker, DARMC, T. Elliott, S. Gi...","An ancient place, cited: BAtlas 27 F3 Dianium/...",settlement,2100.0,-750.0,/places/265880,38.842795,0.107511,"archaic,classical,hellenistic-republican,roman...","-750.0,2100.0",Dianium/Hemeroskopeion
265882,"Spann, P., R. Warner, R. Talbert, T. Elliott, ...","An ancient place, cited: BAtlas 27 B4 Ebura",settlement,300.0,-30.0,/places/265882,37.463838,-3.924959,roman,"-30.0,300.0",Ebura
265883,"Spann, P., DARMC, R. Talbert, R. Warner, J. Be...","An ancient place, cited: BAtlas 27 G3 Ebusus",settlement,640.0,-750.0,/places/265883,38.908393,1.432146,"archaic,classical,hellenistic-republican,roman...","-750.0,640.0",Ebusus
265884,"Spann, P., R. Warner, R. Talbert, S. Gillies, ...",Ebusus Ins. (modern Ibiza) is an island in the...,island,2100.0,-750.0,/places/265884,38.980000,1.430000,"archaic,classical,hellenistic-republican,roman...","-750.0,2100.0",Ebusus (island)
265886,"Spann, P., DARMC, R. Talbert, J. Becker, R. Wa...","An ancient place, cited: BAtlas 27 B3 Edeba",settlement,300.0,-30.0,/places/265886,38.640276,-3.362253,roman,"-30.0,300.0",Edeba


Consider the 'id' Column. This number is a unique identifier for every row in our Table.    
We can access a row through this id:   

In [90]:
datatable.loc[48210386]

authors                                        Becker, J., T. Elliott
description         A major urban sanctuary at Vulci with a long p...
featureTypes                                                 temple-2
maxDate                                                           300
minDate                                                          -750
path                                                 /places/48210386
reprLat                                                       42.4194
reprLong                                                      11.6285
timePeriodsKeys        archaic,classical,hellenistic-republican,roman
timePeriodsRange                                         -750.0,300.0
title                                          Tempio Grande at Vulci
Name: 48210386, dtype: object

In [91]:
datatable.loc[48210386]['reprLat']

42.4193742

The big problem here: We do it the other way around. We are scanning the Corpus for the titles of places!    
At the same time we need to be aware that there might be different words indicating the same place (Athens, Athenae ..). 
The idea is to create a huge python-dictionary that maps words representing places to the corresponding id of our datatable.

In [92]:
word_lookup = load_dict_from_file("./data/word_lookup.txt")

In [93]:
word_lookup_extension = load_dict_from_file("./data/word_lookup_extension.txt")
word_lookup = {**word_lookup, **word_lookup_extension}
filteredWords = pd.read_csv('./data/FilteredWords.txt', header=None, delimiter="\t",)
for word in filteredWords.values[0].tolist():
    if word in word_lookup.keys():
        del word_lookup[word]

In [94]:
for word, the_id in word_lookup.items():
    if the_id == 491528:
        print(word, the_id)

arrolos 491528
arolos 491528
arauros 491528
arason 491528


Now let's load and scan our Herodotus Corpus!   

In [95]:
herodotus_tmp=open('./data/herodotus_history.txt','r',newline='\n').read()
herodotus_tmp = herodotus_tmp.replace('\n', ' ')
herodotus_tmp = re.sub(pattern='\W', string=herodotus_tmp, repl=" ")
herodotus_tmp = re.sub(pattern=' +', string=herodotus_tmp, repl=" ").lower()
herodotus_tmp = herodotus_tmp.split()

In [96]:
words_in_Herodotus = np.asarray(herodotus_tmp)

In [97]:
words_in_Herodotus.size

188799

In [98]:
def analyse_Text(topic_lists=[[]], topic_name_list=["Count"],scope=15):
    if len(topic_lists)!=len(topic_name_list):
        raise Exception('Number of topic lists needs to be equal to number of topic names!')

    text_analysis = {}
    
    
    for topic_index, topic_words in enumerate(topic_lists, start=0):
        topic_name = topic_name_list[topic_index]
    
        for index, place_name in enumerate(words_in_Herodotus, start=0):
            if place_name in word_lookup.keys():
                if len(topic_words) == 0:
                #Simple Count Analysis:
                    if place_name not in text_analysis.keys():
                        #initialize place entry for all topics:
                        text_analysis[place_name]={}
                        for t in topic_name_list:
                            print(topic_name)
                            text_analysis[place_name][topic_name]= [0,[]]
                    text_analysis[place_name][topic_name][0]+=1
                    text_analysis[place_name][topic_name][1].append(index//1000)
                        
                else:
                #Topic specific Analysis:
                    for i in range(max(index-scope,0),
                                   min(index+scope+1, len(words_in_Herodotus))):
                        if words_in_Herodotus[i] in topic_words: 
                            if place_name not in text_analysis.keys():
                                #initialize place entry for all topics:
                                text_analysis[place_name]={}
                                for t in topic_name_list:
                                    text_analysis[place_name][t]= [0,[]]
                            text_analysis[place_name][topic_name][0]+=1
                            text_analysis[place_name][topic_name][1].append(index//1000)
    return text_analysis

In [99]:
place_analysis = analyse_Text([["battle", "war", "weapon"],["battle", "war", "weapon"]], ["Count", "War"])
place_analysis

{'colchis': {'Count': [1, [0]], 'War': [1, [0]]},
 'phasis': {'Count': [1, [0]], 'War': [1, [0]]},
 'priene': {'Count': [1, [2]], 'War': [1, [2]]},
 'miletus': {'Count': [3, [2, 156, 160]], 'War': [3, [2, 156, 160]]},
 'cimmerians': {'Count': [2, [2, 2]], 'War': [2, [2, 2]]},
 'alyattes': {'Count': [4, [2, 2, 3, 13]], 'War': [4, [2, 2, 3, 13]]},
 'lydia': {'Count': [2, [2, 4]], 'War': [2, [2, 4]]},
 'erythrae': {'Count': [2, [2, 2]], 'War': [2, [2, 2]]},
 'eleusis': {'Count': [1, [4]], 'War': [1, [4]]},
 'sardis': {'Count': [6, [7, 12, 14, 150, 150, 159]],
  'War': [6, [7, 12, 14, 150, 150, 159]]},
 'athens': {'Count': [5, [11, 151, 181, 184, 185]],
  'War': [5, [11, 151, 181, 184, 185]]},
 'tegea': {'Count': [1, [11]], 'War': [1, [11]]},
 'delphi': {'Count': [2, [11, 164]], 'War': [2, [11, 164]]},
 'pteria': {'Count': [1, [14]], 'War': [1, [14]]},
 'phocaea': {'Count': [1, [15]], 'War': [1, [15]]},
 'thyrea': {'Count': [1, [15]], 'War': [1, [15]]},
 'nineveh': {'Count': [1, [20]], 'Wa

In [100]:
place_analysis.keys()

dict_keys(['colchis', 'phasis', 'priene', 'miletus', 'cimmerians', 'alyattes', 'lydia', 'erythrae', 'eleusis', 'sardis', 'athens', 'tegea', 'delphi', 'pteria', 'phocaea', 'thyrea', 'nineveh', 'babylon', 'ionia', 'assyria', 'syria', 'sidon', 'thebes', 'corinth', 'lycia', 'samos', 'lance', 'nysa', 'scythia', 'euxine', 'thermodon', 'sauromatae', 'maeotis', 'libya', 'psylli', 'strymon', 'crotona', 'sybaris', 'sicily', 'sicyon', 'argos', 'attica', 'ephesus', 'salamis', 'maeander', 'aeolis', 'cyme', 'chios', 'lade', 'atarneus', 'lampsacus', 'tiryns', 'hercules', 'marathon', 'sparta', 'paros', 'lemnos', 'macedonia', 'abide'])

In [54]:
len(words_in_Herodotus)

188799

In [101]:
def analysis2Json(singe_place_analysis):
    result = {
                "count": singe_place_analysis[0],
                "occurrences": singe_place_analysis[1]
               }
    return result

In [102]:
def feature2Json( title,  description,  Lat,  Lng,  singe_place_analysis, featureTypes, timePeriodsKeys, timePeriodsRange, minDate, maxDate):
    result = {
         "type": "Feature",
         "properties": {
             "title": title,
             "description": description,
             "featureTypes": featureTypes,
             "timePeriodsKeys": timePeriodsKeys,
             "timePeriodsRange": timePeriodsRange,
             "minDate": minDate,
             "maxDate": maxDate,
             "Analysis": singe_place_analysis,
         },
        "geometry": {
             "type": "Point",
             "coordinates": [
                 Lng,
                 Lat
             ]  
         }
        }
    return result

In [83]:
def write_Feature_Collection(filepath, place_analysis, collection_name):
    features = []
    for word, count in place_analysis.items():
        features.append(
            feature2Json(
                datatable.loc[word_lookup[word]]['title'],
                datatable.loc[word_lookup[word]]['description'],
                datatable.loc[word_lookup[word]]['reprLat'], 
                datatable.loc[word_lookup[word]]['reprLong'],
                place_analysis[word],
                datatable.loc[word_lookup[word]]['featureTypes'],
                datatable.loc[word_lookup[word]]['timePeriodsKeys'],
                datatable.loc[word_lookup[word]]['timePeriodsRange'],
                datatable.loc[word_lookup[word]]['minDate'],
                datatable.loc[word_lookup[word]]['maxDate']
            )
        )
    FeatureCollection = {
        "type": "FeatureCollection",
        "features": features
     }

    f = open(filepath, 'w')  # start to write JS-variable
    f.write("var " + collection_name + " = \n")
    f.flush
    f.close()

    with open(filepath, 'a') as f:  # append JSON object
        json.dump(FeatureCollection, f, indent=1)

In [103]:
#place_analysis = analyse_Text([[]])
#write_Feature_Collection( filepath = './count.js', 
#                         place_analysis = place_analysis, 
#                         collection_name = "count")

place_analysis = analyse_Text([["war", "weapon", "battle"]], ["War"])
write_Feature_Collection( filepath = './war.js', 
                         place_analysis = place_analysis, 
                         collection_name = "War_Topic")

place_analysis = analyse_Text([["croesus"]], ["Count"] )
write_Feature_Collection( filepath = './croesus.js', 
                         place_analysis = place_analysis, 
                         collection_name = "Croesus")

place_analysis = analyse_Text([["temple", "god", "goddess", "sacrifice"]], ["Religion"] )
write_Feature_Collection( filepath = './religion.js', 
                         place_analysis = place_analysis, 
                         collection_name = "Religion_Topic")

place_analysis = analyse_Text([["war", "weapon", "battle"],
                               ["temple", "god", "goddess", "sacrifice"]
                              ], 
                              ["War", "Religion"] )
write_Feature_Collection( filepath = './war_religion.js', 
                         place_analysis = place_analysis, 
                         collection_name = "War_Religion_Topic")

In [217]:
i=0
for word, count in place_analysis.items():
    if i<30:
        print(place_analysis[word][0])
    i+=1

AttributeError: 'list' object has no attribute 'items'

In [207]:
len([[]])

1

In [209]:
len[[]][0]

[]

In [35]:
place_analysis

{}

In [17]:
if 'consabura'

{'consabura': 265876,
 'consabrum': 265876,
 'kondabora': 265876,
 'consaburrenses': 265876,
 'contestania': 265877,
 'contestani': 265877,
 'hemeroskopeion': 265880,
 'dénia': 265880,
 'ebura': 265882,
 'cerialis': 265882,
 'edeba': 265886,
 'edeta': 265887,
 'leiria': 265887,
 'eliocroca': 265891,
 'epora': 265893,
 'epora foederatorum': 265893,
 'etouissa': 265894,
 'villa del faro': 265896,
 'ferrarium': 265897,
 'tenebrium': 265897,
 'ficariensis locus': 265898,
 'fortuna': 265899,
 'guium': 265906,
 'cinium': 265906,
 'iamo': 265916,
 'iamna': 265916,
 'iaspis': 265917,
 'igabrum': 265919,
 'egabrum': 265919,
 'municipium iulium': 265919,
 'cabra': 265919,
 'ildum': 265920,
 'iliberri': 265921,
 'elvira': 265921,
 'municipium florentinum': 265921,
 'granada': 265921,
 'ilsh': 265922,
 'ilici': 265922,
 'ecclesia elotana': 265922,
 'elche': 265922,
 'ilicitanus sinus': 265923,
 'iliturgi': 265924,
 'iliturgicola': 265925,
 'illikitanos limen': 265926,
 'ilorci': 265927,
 'ilourgei

In [31]:
x = []
x.append(3,[3,6])

TypeError: append() takes exactly one argument (2 given)

In [30]:
x

[[3, [3, 6]]]