## Place-based analysis of literary textes

### NER and corpus linguistics

#### Useful methods from the Spacy (https://spacy.io/) NLP library in Python.

### Prerequisites

You can download and install on your computer the latest Python version from here: https://www.python.org/downloads/. For Windows OS, the utility program Pip will be installed, for other OS, it needs to be installed separatedly.

Then, install the Spacy library via the Pip utility by running the command line on your terminal: 

**pip install spacy**

If you have multiple versions of Python installed, make sure you use the pip version that corresponds to your last installed version of Python.

After installation, you can check the version of Spacy (and Python) via the command on your terminal: 

**python -m spacy info**

We will download and install the NER model named en_core_web_lg provided by Spacy, its documentation (as well as the other templates) can be found here: https://spacy.io/models/en. On your terminal, run in command line:

**python -m spacy download en_core_web_lg**

**Import Spacy model and load an available model for English. 
Increase memory usage limits to be able to process long texts.**

In [1]:
import spacy
nlp = spacy.load("en_core_web_lg")
nlp.max_length = 2500000

**Read and load existing text files.**

In [2]:
def read_text(file_name):
    with open (file_name, "r", encoding="utf_8") as myfile:
        lines = list(line for line in (l.strip() for l in myfile) if line)
    return str(lines)  

data_je = read_text("JaneEyre.txt")
data_bl = read_text("BleakHouse.txt")

**Apply language model on texts.**

In [3]:
doc_je = nlp(data_je)
doc_bl = nlp(data_bl)

**Highlight NE mentions in texts.**

In [4]:
from spacy import displacy
options = {"ents": ["LOC", "FAC", "GPE"]}
displacy.render(doc_je, style = "ent",jupyter = True, options=options)

*FAC Buildings, airports, highways, bridges, etc.*

*GPE Countries, cities, states.*

*LOC Non-GPE locations, mountain ranges, bodies of water.*

**Display detected place names acoording to the categories defined in Spacy NER pretrained model (LOC, FAC, GPE).**

In [5]:
def read_save_ents(doc):
    ne_mentions = []
    if doc.ents: 
        for ent in doc.ents: 
            if (ent.label_ in ["LOC","FAC","GPE"] ):
                ne_mentions.append(ent.text)
                print(ent.text+' - ' +str(ent.start_char) +' - '+ str(ent.end_char) +' - '+ent.label_+ ' - '+str(spacy.explain(ent.label_)))             
    return ne_mentions

In [6]:
print("--- Jane Eyre ---")
ne_mentions_je = read_save_ents(doc_je)

--- Jane Eyre ---
London - 227 - 233 - GPE - Countries, cities, states
Israel - 3411 - 3417 - GPE - Countries, cities, states
Norway - 8346 - 8352 - GPE - Countries, cities, states
North Cape - 8433 - 8443 - LOC - Non-GPE locations, mountain ranges, bodies of water
Atlantic - 8564 - 8572 - LOC - Non-GPE locations, mountain ranges, bodies of water
Hebrides - 8608 - 8616 - LOC - Non-GPE locations, mountain ranges, bodies of water
Lapland - 8687 - 8694 - GPE - Countries, cities, states
Siberia - 8696 - 8703 - LOC - Non-GPE locations, mountain ranges, bodies of water
Iceland - 8731 - 8738 - GPE - Countries, cities, states
Greenland - 8740 - 8749 - GPE - Countries, cities, states
the Arctic Zone - 8775 - 8790 - LOC - Non-GPE locations, mountain ranges, bodies of water
Alpine - 8947 - 8953 - LOC - Non-GPE locations, mountain ranges, bodies of water
Moreland - 10666 - 10674 - GPE - Countries, cities, states
Gateshead Hall - 20041 - 20055 - FAC - Buildings, airports, highways, bridges, etc.
Ma

In [7]:
print("--- Bleak House ---")
ne_mentions_bl = read_save_ents(doc_bl)

--- Bleak House ---
London - 30 - 36 - GPE - Countries, cities, states
Inn Hall - 112 - 120 - FAC - Buildings, airports, highways, bridges, etc.
Essex - 1284 - 1289 - GPE - Countries, cities, states
Kentish - 1310 - 1317 - GPE - Countries, cities, states
Greenwich - 1542 - 1551 - GPE - Countries, cities, states
Inn Hall - 2501 - 2509 - FAC - Buildings, airports, highways, bridges, etc.
Shropshire - 7084 - 7094 - GPE - Countries, cities, states
day\ - 7169 - 7173 - GPE - Countries, cities, states
Chancery Lane - 8854 - 8867 - FAC - Buildings, airports, highways, bridges, etc.
Shropshire - 12733 - 12743 - GPE - Countries, cities, states
Shropshire - 12841 - 12851 - GPE - Countries, cities, states
Shropshire - 14509 - 14519 - GPE - Countries, cities, states
Paris - 16324 - 16329 - GPE - Countries, cities, states
Lincolnshire - 16668 - 16680 - GPE - Countries, cities, states
Lincolnshire - 16704 - 16716 - GPE - Countries, cities, states
Lincolnshire - 18418 - 18430 - GPE - Countries, citie

**Counting distinct NE mentions.**

In [8]:
def count_ents(ne_mentions):
    sorted_count_occ = {}
    count_occ = {i:ne_mentions.count(i) for i in ne_mentions}
    #print(count_occ)
    sorted_count_occ = {k: v for k, v in sorted(count_occ.items(), key=lambda item: item[1])}
    #print(sorted_count_occ)
    for k in sorted_count_occ:
        print(k, sorted_count_occ[k])
    return sorted_count_occ

In [9]:
print("--- Jane Eyre ---")
sorted_count_occ_je = count_ents(ne_mentions_je)

--- Jane Eyre ---
Israel 1
North Cape 1
Hebrides 1
Lapland 1
Siberia 1
Iceland 1
Greenland 1
the Arctic Zone 1
Alpine 1
Moreland 1
earth 1
near,—I 1
Augusta 1
Brocklehurst Hall 1
Lisle 1
Scotland 1
Deepden 1
Cumberland 1
St. Matthew 1
Bethesda 1
Pisa 1
the Prince of Wales 1
the Holy Virgin 1
Turkey 1
Newfoundland 1
Heidelberg 1
Rhine 1
the Evening Star 1
Medes 1
Sphynx 1
the Apollo Belvidere 1
Havannah 1
Mdlle 1
the Bois de Boulogne 1
Italy 1
Beulah 1
Ladyship 1
Kingston 1
Spanish Town 1
Eshton 1
Andes 1
Fairfax Rochester 1
Mason 1
Carthage 1
Bewick 1
South 1
Venice 1
Vienna 1
Hercules 1
qu’elle 1
Stamboul 1
Grimsby Retreat 1
Mediterranean 1
West Indian 1
the Grimbsy Retreat 1
St. Petersburg 1
Eden 1
Deutsch 1
door:—oh 1
East 1
the Vale of Morton 1
Rosamond 1
Madagascar 1
Cape 1
Himalayan 1
Guinea Coast 1
Whitcross 1
East Indiaman 1
Macedonia 1
the Rock of Ages 1
Calcutta 1
Marsh Glen 1
St. John’s 1
ST 1
North-Midland 1
Antipodes 1
the Manor House 1
Norway 2
Atlantic 2
Marseilles 2
Nor

In [15]:
print("--- Bleak House ---")
sorted_count_occ_bl = count_ents(ne_mentions_bl)

--- Bleak House ---
Essex 1
Kentish 1
Greenwich 1
Lilliput 1
Windsor 1
France 1
Piccadilly 1
yesterday\ 1
Barnet 1
Know\'d 1
\'On 1
Botheration Buildings 1
the Ghost's Walk 1
Peaks 1
the Great Seal\'s 1
Cobweb 1
the Himalaya Mountains 1
Penton Place 1
Jellyby\ 1
St. Andrews 1
the Rolls Yard of a Sunday 1
Inn Garden 1
school!—in 1
Phairy 1
Hampstead 1
Turk 1
the Rue de Rivoli 1
the Palace Garden 1
the Elysian Fields 1
Avignon 1
Marseilles 1
the Dedlock Arms 1
Hoodle 1
Koodle 1
Colonies 1
Cuffy 1
Rome 1
Greece 1
M.R.C.S. 1
Plymouth Harbour 1
Providence 1
the Great Wall of China 1
Pavilion 1
the Opera Colonnade 1
the Great Bailiff 1
Zoodle 1
Pacific 1
Queen Square 1
the Torrid Zone 1
Switzerland 1
Venice 1
Nile 1
Margate 1
Ramsgate 1
Pastoral Gardens 1
the Rolls Yard 1
Imeantersay 1
Blackfriars Bridge 1
St. Paul\'s 1
Old Square 1
the Tower of Babel 1
Pollys 1
Waterloo Bridge 1
Haymarket 1
South 1
murder\ 1
Soho Square 1
Ireland 1
\'If 1
Liverpool 1
\'Lo 1
Clerkenwell 1
Smiffeld 1
Cornwall

**Correct the following NE using rules on Jane Eyre's model: Spitzbergen and Nova Zembla (not recognised) and Fairfax and Adèle (wrongly categorised),  examples taken from Jane Eyre.**

In [11]:
#print(nlp.pipe_names)
print(nlp.config.to_str())
if ("entity_ruler" in nlp.pipe_names):
    nlp.remove_pipe("entity_ruler")
config = {"overwrite_ents": True }
ruler = nlp.add_pipe("entity_ruler", config=config)
#print(nlp.pipe_names)
placenames = ["Spitzbergen", "Nova Zembla"]
for p in placenames:
    ruler.add_patterns([{"label": "GPE", "pattern": p}])

not_placenames = ["Fairfax", "Adèle", "Jane"]
for p in not_placenames:
    ruler.add_patterns([{"label": "PER", "pattern": p}])  
     
doc_je = nlp(data_je)
ne_mentions = read_save_ents(doc_je)

#print(nlp.config.to_str())

[paths]
train = "corpus/en-core-web/train.spacy"
dev = "corpus/en-core-web/dev.spacy"
vectors = "corpus/en_vectors"
raw = null
init_tok2vec = null
vocab_data = null

[system]
gpu_allocator = null
seed = 0

[nlp]
lang = "en"
pipeline = ["tok2vec","tagger","parser","senter","ner","attribute_ruler","lemmatizer"]
disabled = ["senter"]
before_creation = null
after_creation = null
after_pipeline_creation = null
batch_size = 256
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.attribute_ruler]
factory = "attribute_ruler"
validate = false

[components.lemmatizer]
factory = "lemmatizer"
mode = "rule"
model = null
overwrite = false

[components.ner]
factory = "ner"
moves = null
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2Vec.v1"

[component

**Check that errors were corrected.** 

In [16]:
print("--- Jane Eyre ---")
ne_mentions_je = read_save_ents(doc_je)

--- Jane Eyre ---
London - 227 - 233 - GPE - Countries, cities, states
Israel - 3411 - 3417 - GPE - Countries, cities, states
Norway - 8346 - 8352 - GPE - Countries, cities, states
North Cape - 8433 - 8443 - LOC - Non-GPE locations, mountain ranges, bodies of water
Atlantic - 8564 - 8572 - LOC - Non-GPE locations, mountain ranges, bodies of water
Hebrides - 8608 - 8616 - LOC - Non-GPE locations, mountain ranges, bodies of water
Lapland - 8687 - 8694 - GPE - Countries, cities, states
Siberia - 8696 - 8703 - LOC - Non-GPE locations, mountain ranges, bodies of water
Spitzbergen - 8705 - 8716 - GPE - Countries, cities, states
Nova Zembla - 8718 - 8729 - GPE - Countries, cities, states
Iceland - 8731 - 8738 - GPE - Countries, cities, states
Greenland - 8740 - 8749 - GPE - Countries, cities, states
the Arctic Zone - 8775 - 8790 - LOC - Non-GPE locations, mountain ranges, bodies of water
Alpine - 8947 - 8953 - LOC - Non-GPE locations, mountain ranges, bodies of water
Moreland - 10666 - 10674 

**Find common place names between authors**

In [17]:
je_ent_as_set = set(sorted_count_occ_je.keys())
intersection = je_ent_as_set.intersection(sorted_count_occ_bl.keys())
intersection_as_list = list(intersection)
common_places = intersection_as_list
print(common_places)

['Ireland', 'Scotland', 'Paris', 'Providence', 'England', 'Turkey', 'Venice', 'Mediterranean', 'London', 'Rome', 'Europe', 'Marseilles', 'India', 'South', 'the West Indies', 'France']


**Listing adjectives automatically detected in texts.**

In [18]:
def show_adj(doc):
    adjectives = []
    for token in doc:
        if token.pos_ == 'ADJ':
            adjectives.append(token)
    print(adjectives)
    return adjectives

In [19]:
print("--- Jane Eyre ---")
adjectives_je = show_adj(doc_je)

--- Jane Eyre ---
[first, unnecessary, second, few, miscellaneous, due, indulgent, plain, few, fair, honest, obscure, practical, frank, unknown, unrecommended, vague, vague, definite, certain, generous, large, hearted, high, minded, select, small, timorous, few, such, unusual, wrong, such, certain, obvious, certain, simple, first, last, impious, distinct, mistaken, narrow, human, few, good, bad, convenient, external, worth, white, vouch, clean, indebted, good, bloody, faithful, own, delicate, great, deep, like, vital, dauntless, daring, high, Greek, fatal, unique, first, social, very, warped, comic, bright, attractive, same, serious, mere, lambent, electric, total, second, third, other, due, future, leafless, cold, sombre, further, outdoor, glad, long, chilly, dreadful, raw, physical, happy, own, good, sociable, childlike, attractive, sprightly, lighter, franker, natural, contented, happy, little, silent, cross, -, legged, red, close, double, scarlet, right, clear, drear, pale, wet, ce

In [20]:
print("--- Bleak House ---")
adjectives_bl = show_adj(doc_bl)

# There are some missed matches, for instance, in "thriving place", thriving is categorised as verb.

--- Bleak House ---
[Implacable, much, wonderful, long, elephantine, soft, black, big, undistinguishable, better, very, general, ill, other, new, green, waterside, great, dirty, great, small, ancient, wrathful, close, little, nether, misty, spongey, Most, haggard, unwilling, raw, rawest, dense, densest, muddy, muddiest, leaden, old, appropriate, leaden, old, very, thick, deep, pestilent, hoary, foggy, crimson, large, great, little, interminable, endless, slippery, serious, various, not?—ranged, matted, red, costly, dim, heavy, uninitiated, owlish, padded, attendant, blighted, lunatic, dead, slipshod, threadbare, monied, honourable, murky, petty, privy, legal, dry, short, blank, better, curtained, little, mad, old, incomprehensible, certain, small, dry, sallow, half, dozenth, personal, solitary, likely, ignorant, desolate, good, ready, sonorous, few, dismal, little, complicated, alive, total, Innumerable, innumerable, young, innumerable, old, whole, legendary, little, new, real, other, 

**Calculating concordances of NE in text, we also keep syntax information for the following step.**

In [23]:
#see: https://spacy.io/api/token#attributes
#IOB code of named entity tag. 3 means the token begins an entity, 2 means it is outside an entity, 
#1 means it is inside an entity, and 0 means no entity tag is set.
#print(len(doc))
# Adapt to more than 3 mutitoken NE.
    
def calculate_concordances(doc, window):
    count = 0
    ent_context = []
    id_ent = 0
    right_context, left_context = [], []
    for token in doc:

        if (token.ent_type_ in ["LOC","FAC","GPE"]):

            if (token.ent_iob == 3 
                and doc[count+1].ent_iob == 2): #single token NE

                ne = token.text
                left_context = doc[count-window:count]
                right_context = doc[count+1:count+window+1]

                ent_context.append({"id":id_ent, "ent": ne, "left_context":left_context, "right_context":right_context})
                id_ent = id_ent + 1

            elif (token.ent_iob == 3  and doc[count+1].ent_iob == 1): #multi-token NE, begin
                ent_started = True
                ne = token.text
                left_context = doc[count-window:count]
                right_context = ""

            elif (ent_started == True and token.ent_iob == 1):  #multi-token NE, continuation
                ent_started = False
                ne = ne + " " + token.text
                right_context = doc[count+1:count+window+1]

                ent_context.append({"id":id_ent, "ent": ne, "left_context":left_context, "right_context":right_context})
                id_ent = id_ent + 1

        count = count + 1   
    return ent_context    

ent_context_je = calculate_concordances(doc_je, 20)
ent_context_bl = calculate_concordances(doc_bl, 20)

**What each author is saying about places?**

**Adjectives are good indicators for obtaining somehow the emotional sense of the text associated to the context of a place. We display concordances of NE with adjectives highlighted.**

In [24]:
from termcolor import colored
def print_concordances(ent_context):
    for nec in ent_context:
        exc_msg = colored("Excerpt #"+str(nec["id"]), color='blue')
        print(exc_msg)
        print(' '.join(["_"+x.text+"_" if x.pos_=='ADJ' else x.text for x in nec["left_context"]]) 
          + " **" + nec["ent"] + "** " +
          ' '.join(["_"+x.text+"_" if x.pos_=='ADJ' else x.text for x in nec["right_context"]])) 
        print("")    

In [25]:
print("--- Jane Eyre ---")
print_concordances(ent_context_je)

--- Jane Eyre ---
[34mExcerpt #0[0m
' The Illustrations ' , ' in this Volume are the copyright of ' , ' SERVICE & PATON , **London** ' , ' TO ' , ' W. M. THACKERAY , ESQ . , ' , ' This Work '

[34mExcerpt #1[0m
the _great_ ones of society , much as the son of Imlah came before the throned Kings of Judah and **Israel** ; and who speaks truth as _deep_ , with a power as prophet - _like_ and as _vital_ — a

[34mExcerpt #2[0m
- fowl ; of “ the _solitary_ rocks and promontories ” by them only inhabited ; of the coast of **Norway** , studded with isles from its _southern_ extremity , the Lindeness , or Naze , to the North Cape —

[34mExcerpt #3[0m
coast of Norway , studded with isles from its _southern_ extremity , the Lindeness , or Naze , to the **North Cape** — ' , ' “ Where the Northern Ocean , in _vast_ whirls , ' , ' Boils round the

[34mExcerpt #4[0m
, ' , ' Boils round the _naked_ , _melancholy_ isles ' , ' Of _farthest_ Thule ; and the **Atlantic** surge ' , ' Pours in amo

In [26]:
print("--- Bleak House ---")
print_concordances(ent_context_bl)

--- Bleak House ---
[34mExcerpt #0[0m
 **London** . Michaelmas term lately over , and the Lord Chancellor sitting in Lincoln 's Inn Hall . _Implacable_ November weather

[34mExcerpt #1[0m
In Chancery ' , " London . Michaelmas term lately over , and the Lord Chancellor sitting in Lincoln 's **Inn Hall** . _Implacable_ November weather . As _much_ mud in the streets as if the waters had but newly retired from

[34mExcerpt #2[0m
the tiers of shipping and the _waterside_ pollutions of a _great_ ( and _dirty_ ) city . Fog on the **Essex** marshes , fog on the Kentish heights . Fog creeping into the cabooses of collier - brigs ; fog lying

[34mExcerpt #3[0m
_waterside_ pollutions of a _great_ ( and _dirty_ ) city . Fog on the Essex marshes , fog on the **Kentish** heights . Fog creeping into the cabooses of collier - brigs ; fog lying out on the yards and hovering

[34mExcerpt #4[0m
; fog drooping on the gunwales of barges and _small_ boats . Fog in the eyes and throats of _ancient

perfection with the trooper 's help . He is borne into Mr. Tulkinghorn 's _great_ room and deposited on the **Turkey** rug before the fire . Mr. Tulkinghorn is not within at the _present_ moment but will be back directly .

[34mExcerpt #393[0m
on _military_ compulsion , I am not a man of business . Among civilians I am what they call in **Scotland** a ne\'er - do - weel . I have no head for papers , sir . I can stand any

[34mExcerpt #394[0m
Sword Alley , which would seem to be something in his way ) , and by Blackfriars Bridge , and **Blackfriars Road** , Mr. George sedately marches to a street of _little_ shops lying somewhere in that ganglion of roads from Kent

[34mExcerpt #395[0m
Road , Mr. George sedately marches to a street of _little_ shops lying somewhere in that ganglion of roads from **Kent** and Surrey , and of streets from the bridges of London , centring in the far - famed elephant who

[34mExcerpt #396[0m
Mr. George sedately marches to a street of _little_ shops 

[34mExcerpt #732[0m
their _little_ _feeble_ _prismatic_ twinkling , all seem Volumnias . ' , ' For the rest , Lincolnshire life to **Volumnia** is a _vast_ blank of _overgrown_ house looking out upon trees , sighing , wringing their hands , bowing their

[34mExcerpt #733[0m
it — passion and pride , even to the stranger 's eye , have died away from the place in **Lincolnshire** and yielded it to _dull_ repose . " , ' CHAPTER LXVII ' , " The Close of Esther 's

[34mExcerpt #734[0m
Caddy keeps her _own_ _little_ carriage now instead of hiring one , and lives _full_ two miles further westward than **Newman Street** . She works very hard , her husband ( an _excellent_ one ) being _lame_ and _able_ to do very

[34mExcerpt #735[0m
_ignoble_ marriage and pursuits , but I hope she got over it in time . She has been disappointed in **Borrioboola -** Gha , which turned out a failure in consequence of the king of Borrioboola wanting to sell everybody — who

[34mExcerpt #736[0m
to have do

*A dependency parsing at the sentence level may help to distinguish accurately a polarity associated to a place name.*

**Functions to display place mentions and adjectives in the context of the mention.**

In [27]:
from IPython.display import HTML, display
import tabulate

def create_display_table_ent_context(ent_context):
    table = []
    for nec in ent_context:
        table.append([[x.text for x in nec["left_context"] if x.pos_=='ADJ'], 
                      nec["ent"], 
                      [x.text for x in nec["right_context"] if x.pos_=='ADJ']])
       
    display(HTML(tabulate.tabulate(table, tablefmt='html')))
    return table

def create_display_table_ent_context_from_mention(ent_context, mention):
    table = []
    for nec in ent_context:
        if (nec["ent"] == mention):
            table.append([[x.text for x in nec["left_context"] if x.pos_=='ADJ'], 
                      nec["ent"], 
                      [x.text for x in nec["right_context"] if x.pos_=='ADJ']])
       
    display(HTML(tabulate.tabulate(table, tablefmt='html')))
    return table

In [28]:
print("--- Jane Eyre ---")
table_ent_cont_je = create_display_table_ent_context(ent_context_je)
print(len(table_ent_cont_je))

--- Jane Eyre ---


0,1,2
[],London,[]
['great'],Israel,"['deep', 'like', 'vital']"
['solitary'],Norway,['southern']
['southern'],North Cape,['vast']
"['naked', 'melancholy', 'farthest']",Atlantic,['stormy']
"['farthest', 'stormy']",Hebrides,['bleak']
"['stormy', 'bleak']",Lapland,['vast']
['bleak'],Siberia,['vast']
['bleak'],Spitzbergen,['vast']
['bleak'],Nova Zembla,"['vast', 'forlorn']"


351


In [29]:
print("--- Bleak House ---")
table_ent_cont_bl = create_display_table_ent_context(ent_context_bl)
print(len(table_ent_cont_bl))

--- Bleak House ---


0,1,2
[],London,['Implacable']
[],Inn Hall,"['Implacable', 'much']"
"['waterside', 'great', 'dirty']",Essex,[]
"['waterside', 'great', 'dirty']",Kentish,[]
"['small', 'ancient']",Greenwich,[]
"['leaden', 'old']",Inn Hall,['very']
[],Shropshire,[]
[],day\,['ignorant']
['old'],Chancery Lane,"['dreary', 'hopeless']"
[],Shropshire,[]


737


**Display similar information for common places.**

In [93]:
print("--- Jane Eyre ---")
for p in common_places:
    create_display_table_ent_context_from_mention(ent_context_je, p)

0,1,2
['genuine'],Paris,"['deep', 'sarcastic']"
"['warm', 'tired']",Paris,['happy']
"['destitute', 'poor']",Paris,"['clean', 'wholesome', 'English']"
[],Paris,[]
[],Paris,[]


0,1,2
"['hard', 'tough']",India,"['pervious', 'sentient']"
['final'],India,['much']
['similar'],India,[]
"['deep', 'relentless']",India,[]
[],India,['premature']
['premature'],India,[]
['premature'],India,['clear']
['ready'],India,['free']
"['short', 'sharp']",India,[]
"['whole', 'void']",India,['married']


0,1,2
"['own', 'cordial']",Rome,['Italian']
[],Rome,[]
[],Rome,[]
[],Rome,['old']


0,1,2
['whitewashed'],Mediterranean,"['happy', 'innocent']"


0,1,2
[],Europe,['distant']
[],Europe,['mad']
"['past', 'fresh']",Europe,['open']
"['right', 'sweet']",Europe,"['refreshed', 'glorious']"
[],Europe,"['sullied', 'filthy']"
[],Europe,"['peculiar', 'subdued', 'emphatic']"
"['British', 'mannered', 'most']",Europe,"['best', 'ignorant']"


0,1,2
[],Scotland,[]


0,1,2
"['large', 'stately', 'purple']",Turkey,"['vast', 'rich', 'stained', 'lofty']"


0,1,2
['sad'],England,"['savage', 'wilder', 'thicker', 'scant']"
[],England,[]
[],England,[]
[],England,[]
[],England,[]
['certain'],England,"['mule', 'large']"
['new'],England,['hot']
[],England,[]
['splendid'],England,"['pure', 'radiant', 'long']"
"['West', 'Indian', 'large', 'gay']",England,[]


0,1,2
['thankful'],Providence,"['invaluable', 'careful']"
[],Providence,[]
[],Providence,[]
[],Providence,"['unmarried', 'childless']"
['undisturbed'],Providence,['fine']
['ugly'],Providence,"['last', 'little', 'better']"
['ghastly'],Providence,[]
['cold'],Providence,['glazed']


0,1,2
[],Ireland,"['such', 'warm', 'hearted']"
[],Ireland,"['such', 'warm', 'hearted']"
['sure'],Ireland,['certain']
['certain'],Ireland,"['much', 'good']"
['long'],Ireland,"['sorry', 'little', 'such', 'weary']"
['better'],Ireland,[]
[],Ireland,[]


0,1,2
[],London,[]
"['Different', 'benevolent', 'minded']",London,[]
['nearer'],London,['remote']
['foreign'],London,[]
['surprised'],London,[]
[],London,[]
['same'],London,['dead']
[],London,[]
[],London,['second']
[],London,['certain']


0,1,2
"['white', 'snowy']",Marseilles,"['prominent', 'ample', 'easy']"
['better'],Marseilles,"['fevered', 'delusive', 'bitterest', 'next']"


0,1,2
['Italian'],South,['glorious']


0,1,2
[],Venice,[]


0,1,2
['yellow'],France,[]
[],France,"['whitewashed', 'happy']"
['southern'],France,['delirious']


In [94]:
print("--- Bleak House ---")
for p in common_places:
    create_display_table_ent_context_from_mention(ent_context_bl, p)

0,1,2
"['few', 'previous']",Paris,"['uncertain', 'fashionable']"
"['fashionable', 'few', 'previous']",Paris,['uncertain']
"['cause""—cautious', 'more']",Paris,[]
[],Paris,['fashionable']
"['much', 'particular']",Paris,"['dusky', 'brooding']"
"['full', 'hospitable']",Paris,"['fashionable', 'glad', 'benighted']"
[],Paris,[]
"['rusty', 'little', 'full', 'gusty', 'little']",Paris,[]
['own'],Paris,[]
"['imperfect', 'last']",Paris,['endless']


0,1,2
['great'],India,[]
[],India,['red']
[],India,"['long', 'long']"
"['many', 'handsome', 'English']",India,[]
[],India,['great']
['curious'],India,['such']
"['future', 'probable']",India,['doubtful']


0,1,2
"['sure', 'young', 'classic']",Rome,"['same', 'young']"


0,1,2
[],Mediterranean,['twelfth']
[],Mediterranean,[]
"['dear', 'old']",Mediterranean,[]


0,1,2
['astonishing'],Europe,"['other', 'wonderful']"
"['little', 'second']",Europe,[]
['grey'],Europe,['old']
"['identical', 'interesting']",Europe,"['latter', 'faithful']"
['oldest'],Europe,"['clean', 'blue', 'white', 'essential']"


0,1,2
['military'],Scotland,[]
['old'],Scotland,"['old', 'little']"


0,1,2
"['great', 'last', 'last', 'thick', 'dingy']",Turkey,"['old', 'fashioned']"
['great'],Turkey,['present']


0,1,2
"['same', 'dear']",England,[]
"['fashionable', 'glad', 'benighted']",England,"['brilliant', 'distinguished']"
[],England,['many']
"['hot', 'motionless', 'little']",England,['open']
[],England,['long']
[],England,['long']
[],England,"['bitter', 'universal']"
[],England,"['proud', 'whole', 'whole']"
['eloquent'],England,[]
['hopeful'],England,['likely']


0,1,2
['probable'],Providence,"['own', 'heathen']"


0,1,2
[],Ireland,"['post', '-', 'long']"


0,1,2
[],London,['Implacable']
[],London,"['superior', 'eloquent', 'majestic']"
[],London,[]
[],London,[]
['dear'],London,['particular']
['strange'],London,[]
[],London,"['particular', 'delighted']"
[],London,"['narrow', 'high', 'oblong']"
"['forewarned', 'early', 'curious']",London,['good']
"['hairy', 'tall']",London,['weary']


0,1,2
['southern'],Marseilles,"['large', 'eyed', 'brown', 'black', 'handsome', 'certain', 'feline']"


0,1,2
['certain'],South,"['quick', 'strong', 'high']"


0,1,2
"['Other', 'same', 'great']",Venice,['second']


0,1,2
['little'],France,['fat']


**Display results side by side.**

In [91]:
comparison_table = []
comparison_table.append(["place mention", "Jane Eyre", "Bleak House"])
for p in common_places:
    
    je_left_context = [x[0] for x in table_ent_cont_je if x[1]==p]
    je_right_context = [x[2] for x in table_ent_cont_je if x[1]==p]
    
    bl_left_context = [x[0] for x in table_ent_cont_bl if x[1]==p]
    bl_right_context = [x[0] for x in table_ent_cont_bl if x[1]==p]
    
    comparison_table.append((p, [item for sublist in je_left_context + je_right_context for item in sublist], 
                             [item for sublist in bl_left_context + bl_right_context for item in sublist]))
    
display(HTML(tabulate.tabulate(comparison_table, tablefmt='html')))

0,1,2
place mention,Jane Eyre,Bleak House
Paris,"['genuine', 'warm', 'tired', 'destitute', 'poor', 'deep', 'sarcastic', 'happy', 'clean', 'wholesome', 'English']","['few', 'previous', 'fashionable', 'few', 'previous', 'cause""—cautious', 'more', 'much', 'particular', 'full', 'hospitable', 'rusty', 'little', 'full', 'gusty', 'little', 'own', 'imperfect', 'last', 'few', 'previous', 'fashionable', 'few', 'previous', 'cause""—cautious', 'more', 'much', 'particular', 'full', 'hospitable', 'rusty', 'little', 'full', 'gusty', 'little', 'own', 'imperfect', 'last']"
India,"['hard', 'tough', 'final', 'similar', 'deep', 'relentless', 'premature', 'premature', 'ready', 'short', 'sharp', 'whole', 'void', 'slightest', 'Indian', 'pervious', 'sentient', 'much', 'premature', 'clear', 'free', 'married', 'kinder', 'considerable', 'resolute']","['great', 'many', 'handsome', 'English', 'curious', 'future', 'probable', 'great', 'many', 'handsome', 'English', 'curious', 'future', 'probable']"
Rome,"['own', 'cordial', 'Italian', 'old']","['sure', 'young', 'classic', 'sure', 'young', 'classic']"
Mediterranean,"['whitewashed', 'happy', 'innocent']","['dear', 'old', 'dear', 'old']"
Europe,"['past', 'fresh', 'right', 'sweet', 'British', 'mannered', 'most', 'distant', 'mad', 'open', 'refreshed', 'glorious', 'sullied', 'filthy', 'peculiar', 'subdued', 'emphatic', 'best', 'ignorant']","['astonishing', 'little', 'second', 'grey', 'identical', 'interesting', 'oldest', 'astonishing', 'little', 'second', 'grey', 'identical', 'interesting', 'oldest']"
Scotland,[],"['military', 'old', 'military', 'old']"
Turkey,"['large', 'stately', 'purple', 'vast', 'rich', 'stained', 'lofty']","['great', 'last', 'last', 'thick', 'dingy', 'great', 'great', 'last', 'last', 'thick', 'dingy', 'great']"
the West Indies,[],[]
England,"['sad', 'certain', 'new', 'splendid', 'West', 'Indian', 'large', 'gay', 'requisite', 'previous', 'unlikely', 'filthy', 'intellectual', 'faithful', 'mere', 'larger', 'free', 'honest', 'breezy', 'healthy', 'own', 'cold', 'dismayed', 'worst', 'true', 'Unchanged', 'unchangeable', 'plain', 'premature', 'main', 'equivalent', 'cold', 'savage', 'wilder', 'thicker', 'scant', 'mule', 'large', 'hot', 'pure', 'radiant', 'long', 'eager', 'married', 'due', 'frosty', 'other', 'sure', 'singular', 'right', 'such', 'reckless', 'former', 'severe', 'ordinary', 'loved', 'empty', 'future', 'certain', 'greater', 'sole', 'fancy', 'fancy']","['same', 'dear', 'fashionable', 'glad', 'benighted', 'hot', 'motionless', 'little', 'eloquent', 'hopeful', 'greatest', 'National', 'other', 'own', 'marvellous', 'other', 'glad', 'first', 'young', 'beautiful', 'unmarried', 'same', 'dear', 'fashionable', 'glad', 'benighted', 'hot', 'motionless', 'little', 'eloquent', 'hopeful', 'greatest', 'National', 'other', 'own', 'marvellous', 'other', 'glad', 'first', 'young', 'beautiful', 'unmarried']"
