# Cognitive Mapping

The data consists of text and relations, which have three parts (Concept1, Explanation and Concept2). Each of the three parts correspond to a multi-word phrase in a text. All relations and text can be easily read into a Jupyter notebook.

The goal is to identify the three parts of the relation in a text automatically.

**Example**: '3-2: <span style="background-color: lightblue;">[concept Giving to the ECB the ultimate responsibility for supervision of banks in the euro area concept]</span> <span style="background-color: pink;">[explanation will decisively contribute to increase explanation]</span> <span style="background-color: lightblue;">[concept confidence between the banks concept]</span> <span style="background-color: pink;">[explanation and in this way increase explanation]</span> <span style="background-color: lightblue;">[concept the financial stability in the euro area concept]</span>. The euro area governments and the European institutions, including naturally the European Commission and the ECB, will do whatever is necessary to secure the financial stability of the euro area.\n'

## 1. Plan
    
1. machine learning of paragraphs: do they contain a causal relation or not
2. find phrases of relations in text: either concepts, explanations or not present in a relation
3. identify relations based on recognized concept phrases and explanation phrases

We need a tagger or a entity recognition program, for example transformers, huggingface/bert: https://github.com/huggingface/transformers , or Spacy

## 1.1. Data encoding for plan step 2

contribute E\
to E\
increase E\
confidence C\
between C\
the C\
banks C\
the X

## 1.2. References

Hosseini, M.J., Chambers, N., Reddy, S., Holt, X R., Cohen, S B., Johnson, M., & Steedman, M. (2018). Learning Typed Entailment Graphs with Global Soft Constraints, Transactions of the Association for Computational Linguistics. Sizov, G. & Ozturk, P. (2013). 

Automatic Extraction of Reasoning Chains from Textual Reports. Proceedings of TextGraphs-8 Workshop “Graph-based Methods for Natural Language Processing”, Empirical Methods in Natural Language Processing. 

https://colab.research.google.com/drive/14V9Ooy3aNPsRfTK88krwsereia8cfSPc?usp=sharing#scrollTo=H_kiqxjbW3lh

## 2. Load data

In [1]:
import pandas as pd
import re

In [2]:
REQUIRED_COLUMNS = "Content_Concept_1 Content_Merged_Concept_1 Content_Relation_Explanation Content_Concept_2 Content_Merged_Concept_2 Content_Source_ID".split()

def read_relations(file_name):
    return pd.read_csv(file_name).loc[:, REQUIRED_COLUMNS]

In [3]:
def read_paragraphs(file_name):
    paragraph_list = []
    data_file = open(file_name, "r", encoding="latin1")
    for line in data_file:
        paragraph_list.append(line.strip())
    data_file.close()
    return paragraph_list

In [4]:
relation_df = read_relations("Map_Contents.csv")

In [5]:
paragraph_list = read_paragraphs("2012-07-26 Barroso European Commission ann.txt")

In [54]:
relation_df

Unnamed: 0,Content_Concept_1,Content_Merged_Concept_1,Content_Relation_Explanation,Content_Concept_2,Content_Merged_Concept_2,Content_Source_ID
0,to ask Greece to reform,austerity programme problemstates,,so this is why it is the right approach,Benefit of the MS,1-2
1,to increase its [Greece's] competitiveness,competitiveness,,so this is why it is the right approach,Benefit of the MS,1-2
2,"you in Greece, with out support, you need to r...",taking your own responsibility,to,increase the competitiveness of Greece,competitiveness,1-2
3,"you in Greece, with out support, you need to r...",Structural reforms,to,increase the competitiveness of Greece,competitiveness,1-2
4,"you in Greece, with out support, you need to r...",Improve the national administration,to,increase the competitiveness of Greece,competitiveness,1-2
5,"you in Greece, with out support, you need to r...",Economic policy change,to,increase the competitiveness of Greece,competitiveness,1-2
6,"you in Greece, with out support, you need to r...",taking your own responsibility,and the best hope of a,return to growth,economic growth,1-2
7,"you in Greece, with out support, you need to r...",Structural reforms,and the best hope of a,return to growth,economic growth,1-2
8,"you in Greece, with out support, you need to r...",Improve the national administration,and the best hope of a,return to growth,economic growth,1-2
9,"you in Greece, with out support, you need to r...",Economic policy change,and the best hope of a,return to growth,economic growth,1-2


In [53]:
paragraph_list

['Statement by President Barroso following his meeting with Mr Antonis Samaras, Prime Minister of Greece',
 'Press point/Athens',
 '26 July 2012',
 '',
 'http://europa.eu/rapid/press-release_SPEECH-12-571_en.htm',
 'Thank you very much.',
 "I really want to thank my good friend Antonis Samaras for the invitation. I really appreciate our friendship and the Prime Minister's clear commitment to Europe. This is so crucial during these difficult times. And I know that this aspiration and this determination are shared by the two other respective leaders of the coalition government Mr Venizelos and Mr Kouvelis and I want to thank them also for their strong pro-European commitment.",
 '1-2: Today I want to send a clear message to the people of this great country, of Greece. I know that many people feel without hope. Many are making extremely difficult sacrifices. And many people ask why they should do more. I understand those concerns. And I agree that some of the efforts seem unfair. But I as

In [52]:
relation_df

Unnamed: 0,Content_Concept_1,Content_Merged_Concept_1,Content_Relation_Explanation,Content_Concept_2,Content_Merged_Concept_2,Content_Source_ID
0,to ask Greece to reform,austerity programme problemstates,,so this is why it is the right approach,Benefit of the MS,1-2
1,to increase its [Greece's] competitiveness,competitiveness,,so this is why it is the right approach,Benefit of the MS,1-2
2,"you in Greece, with out support, you need to r...",taking your own responsibility,to,increase the competitiveness of Greece,competitiveness,1-2
3,"you in Greece, with out support, you need to r...",Structural reforms,to,increase the competitiveness of Greece,competitiveness,1-2
4,"you in Greece, with out support, you need to r...",Improve the national administration,to,increase the competitiveness of Greece,competitiveness,1-2
5,"you in Greece, with out support, you need to r...",Economic policy change,to,increase the competitiveness of Greece,competitiveness,1-2
6,"you in Greece, with out support, you need to r...",taking your own responsibility,and the best hope of a,return to growth,economic growth,1-2
7,"you in Greece, with out support, you need to r...",Structural reforms,and the best hope of a,return to growth,economic growth,1-2
8,"you in Greece, with out support, you need to r...",Improve the national administration,and the best hope of a,return to growth,economic growth,1-2
9,"you in Greece, with out support, you need to r...",Economic policy change,and the best hope of a,return to growth,economic growth,1-2


## 3. Causal concepts

Utility relations: A is good for B, or A is a good thing, where B is defined as the benefit of society.

We are interested in causal relations.

In [6]:
token_dict = {}
for i, row in relation_df.iterrows():
    if not re.search("benefit", row["Content_Merged_Concept_2"], flags=re.IGNORECASE):
        token_list = row["Content_Relation_Explanation"].split()
        for token in token_list:
            if token not in token_dict:
                token_dict[token] = 0
            token_dict[token] += 1
token_dict

{'to': 6,
 'and': 9,
 'the': 8,
 'best': 8,
 'hope': 8,
 'of': 8,
 'a': 8,
 'safeguard': 1,
 'as': 1,
 'we': 1,
 'said': 1,
 'there': 1,
 'will': 2,
 'not': 1,
 'be': 1,
 'devisively': 1,
 'contribute': 1,
 'increase': 1,
 'in': 1,
 'this': 1,
 'way': 1}

## 4. Task 1: Predict presence of causal relations in paragraphs

Steps:

1. store the paragraphs in the data structure X (data) after separating punctuation from words and replacing upper case by lower case
2. create a data structure y (labels) with True for paragraphs with causal relations and False for others
3. predict a label for each paragraph with a machine learning model generated from the other paragraphs
4. evaluate the results

The code in this task uses the packages `fasttext` (for machine learning) and `nltk` (for language processing) 

The task uses limited natural language processing to prepare the data for machine leaning:

1. tokenization: separate punctuation from words
2. conversion of upper case characters to lower case

Other interesting natural language preprocessing steps:

3. part-of-tagging
4. full parsing (Stanford parser)

In [40]:
import fasttext
from nltk.tokenize import word_tokenize

In [41]:
def make_X(paragraph_list):
    X = {}
    for paragraph in paragraph_list:
        tokens = paragraph.split()
        if len(tokens) > 0 and re.search(r'^\d+-\d+:$', tokens[0]):
            key = re.sub(":", "", tokens[0])
            X[key] = " ".join(word_tokenize(" ".join(tokens[1:]))).lower()
    return X          

In [42]:
def make_y(relation_df, X):
    y = {}
    for i, row in relation_df.iterrows():
        if not re.search("benefit", row["Content_Merged_Concept_2"], flags=re.IGNORECASE):
            y[row["Content_Source_ID"]] = True
    for key in X:
        if key not in y:
            y[key] = False
    return y

In [43]:
def make_train_test(X, y, test_index=0):
    train_list = []
    test_list = []
    index = 0
    for key in sorted(X.keys()):
        if index == test_index:
            test_list.append(f"__label__{str(y[key])} {X[key]}")
        else:
            train_list.append(f"__label__{str(y[key])} {X[key]}")
        index += 1
    return train_list, test_list

In [44]:
def make_train_file(file_name, train_list):
    data_file = open(file_name, "w")
    for line in train_list:
        print(line, file=data_file)
    data_file.close()

In [45]:
def decode_label(label):
    return re.sub("__label__", "", label)

In [46]:
def show_results(results):
    return pd.DataFrame(list(results.values()), index=list(results.keys()))

In [47]:
def evaluate_results(results):
    correct_count = 0
    for key in results:
        if decode_label(results[key]["predicted"]) == str(results[key]["correct"]):
            correct_count += 1
    return correct_count/len(results)

In [48]:
X = make_X(paragraph_list)
y = make_y(relation_df, X)

In [56]:
results = {}
for i in range(0, len(X)):
    key = list(sorted(X.keys()))[i]
    train_list, test_list = make_train_test(X, y, i)
    make_train_file("train_file.txt", train_list)
    model = fasttext.train_supervised("train_file.txt", dim=10, dictionary_file=...)
    predicted_label = model.predict(test_list)
    results[key] = {"correct": y[key], "predicted": decode_label(predicted_label[0][0][0])}

In [57]:
evaluate_results(results)

0.4

In [58]:
show_results(results)

Unnamed: 0,correct,predicted
1-2,True,True
2-3,True,True
3-1,False,True
3-2,True,False
3-3,False,True


In [59]:
model["euro"]

array([-0.08877166,  0.06041456,  0.0967663 , -0.08981141, -0.05204509,
       -0.07906921,  0.0069568 ,  0.00426694, -0.03400405, -0.03646002],
      dtype=float32)

In [63]:
X

{'1-2': 'today i want to send a clear message to the people of this great country , of greece . i know that many people feel without hope . many are making extremely difficult sacrifices . and many people ask why they should do more . i understand those concerns . and i agree that some of the efforts seem unfair . but i ask people to recognise the other alternatives which will be much more difficult for greece and will affect even more the most vulnerable in the greek society . so this is why it is the right approach to ask greece to reform , to increase its competitiveness to have a viable future , irrespective of the crisis . you , in greece , with our support , need to rebuild your country , your structures , your administration , your economy to increase the competitiveness of greece . and the best hope of a return to growth and job creation is inside the euro area . staying in the euro is the best chance to avoid worse hardship and difficulties to the greek people , namely for tho

## 5. Task 2: Find relevant phrases in text

In [19]:
from transformers import pipeline

## 5.1 Testing the pretrained Named Entity Recognition (NER) model

In [22]:
classifier = pipeline('ner')

HBox(children=(IntProgress(value=0, description='Downloading', max=998, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=1334448817, style=ProgressStyle(description…




HBox(children=(IntProgress(value=0, description='Downloading', max=213450, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=60, style=ProgressStyle(description_width='…




In [27]:
print(paragraph_list[0])
classifier(paragraph_list[0])

Statement by President Barroso following his meeting with Mr Antonis Samaras, Prime Minister of Greece


[{'word': 'Barr',
  'score': 0.9984531998634338,
  'entity': 'I-PER',
  'index': 5,
  'start': 23,
  'end': 27},
 {'word': '##oso',
  'score': 0.9939749836921692,
  'entity': 'I-PER',
  'index': 6,
  'start': 27,
  'end': 30},
 {'word': 'Anton',
  'score': 0.9995124936103821,
  'entity': 'I-PER',
  'index': 12,
  'start': 61,
  'end': 66},
 {'word': '##is',
  'score': 0.9988749027252197,
  'entity': 'I-PER',
  'index': 13,
  'start': 66,
  'end': 68},
 {'word': 'Sam',
  'score': 0.9991353154182434,
  'entity': 'I-PER',
  'index': 14,
  'start': 69,
  'end': 72},
 {'word': '##ara',
  'score': 0.9854997992515564,
  'entity': 'I-PER',
  'index': 15,
  'start': 72,
  'end': 75},
 {'word': '##s',
  'score': 0.9926432967185974,
  'entity': 'I-PER',
  'index': 16,
  'start': 75,
  'end': 76},
 {'word': 'Greece',
  'score': 0.9994153380393982,
  'entity': 'I-LOC',
  'index': 21,
  'start': 96,
  'end': 102}]

## 5.2 Training a phrase recognition model

Source: https://huggingface.co/transformers/task_summary.html#named-entity-recognition

This does not change the behaviour of the system. Perhaps we need to start from https://github.com/huggingface/transformers/blob/master/examples/pytorch/token-classification/run_ner.py

In [60]:
from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch

model = AutoModelForTokenClassification.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
label_list = [
    "O",       # Outside of a phrase
    "I-CON"    # Concept
    "I-EXP"    # Explanation
]

In [61]:
sequence = paragraph_list[0]

# Bit of a hack to get the tokens with the special tokens
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
inputs = tokenizer.encode(sequence, return_tensors="pt")
outputs = model(inputs).logits
predictions = torch.argmax(outputs, dim=2)

In [62]:
predictions

tensor([[0, 0, 0, 0, 0, 4, 4, 0, 0, 0, 0, 0, 4, 4, 4, 4, 4, 0, 0, 0, 0, 8, 0]])

In [34]:
for token, prediction in zip(tokens, predictions[0].numpy()):
    print((token, model.config.id2label[prediction]))

('[CLS]', 'O')
('State', 'O')
('##ment', 'O')
('by', 'O')
('President', 'O')
('Barr', 'I-PER')
('##oso', 'I-PER')
('following', 'O')
('his', 'O')
('meeting', 'O')
('with', 'O')
('Mr', 'O')
('Anton', 'I-PER')
('##is', 'I-PER')
('Sam', 'I-PER')
('##ara', 'I-PER')
('##s', 'I-PER')
(',', 'O')
('Prime', 'O')
('Minister', 'O')
('of', 'O')
('Greece', 'I-LOC')
('[SEP]', 'O')


In [35]:
paragraph_list


['Statement by President Barroso following his meeting with Mr Antonis Samaras, Prime Minister of Greece',
 'Press point/Athens',
 '26 July 2012',
 '',
 'http://europa.eu/rapid/press-release_SPEECH-12-571_en.htm',
 'Thank you very much.',
 "I really want to thank my good friend Antonis Samaras for the invitation. I really appreciate our friendship and the Prime Minister's clear commitment to Europe. This is so crucial during these difficult times. And I know that this aspiration and this determination are shared by the two other respective leaders of the coalition government Mr Venizelos and Mr Kouvelis and I want to thank them also for their strong pro-European commitment.",
 '1-2: Today I want to send a clear message to the people of this great country, of Greece. I know that many people feel without hope. Many are making extremely difficult sacrifices. And many people ask why they should do more. I understand those concerns. And I agree that some of the efforts seem unfair. But I as