<a href="https://colab.research.google.com/github/beatriceyapsm/temporaltest/blob/main/SurveyTemporalinfo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Temporal Information Extraction
Temporal information can be represented as {T, E, R}, where T denotes the temporal points, durations or intervals, E means the events, and R represents the temporal relation. 
Three main approaches to the task of temporal information extraction: rule-based, datadriven, and hybrid.
Tempeval-1, Tempeval-2 and Tempeval-3 are all exercises in temporal information extraction. 

TimeML is a set of rules for encoding documents electronically.
- EVENT tag is used to annotate those elements in a text that mark the semantic events 
- TIMEX3 tag is primarily used to mark up explicit temporal expressions, such as times, dates, durations, etc. 
- MAKEINSTANCE tag: for when a new instance should be created as an event occurs on multiple days (eg. He taught last Wednesday and today.)
- TLINK or Temporal Link represents the temporal relationship holding
between events, times, or between an event and a time.

Best f1-score for timex3: rule-based system Heideltime
Best f1-score for event & makeinstance: ATT-1 Using Max Entropy

Publicly available dataset, namely TimeBank.

In [51], a method for extracting temporal relations between two
events was proposed. It had two stages: (1) a machine-learning model for classifying event attributes (i.e., tense, aspect, modality, polarity, and event class), and (2) a machine-learning model for classifying the
relation types between two events. It used TimeBank for experiments, and reported that Naive Bayes (NB) generally gives better performance than maximum entropy (ME). // https://aclanthology.org/P07-2044/

In [61], a new corpus for the task of extraction of temporal expressions, namely WikiWars, was introduced. 

In SemEval-2018, ‘Task 6: Parsing Time Normalizations’ was held as a shared task related to time information extraction [93]. // https://aclanthology.org/D10-1089/





### SPACY

In [1]:
# Load SPACY 
import spacy
#from spacy.lang.en import English
from spacy import displacy
nlp=spacy.load('en_core_web_sm')
import pandas as pd
import numpy as np
import re

In [11]:
# Load Data & Temporal Extraction
raw_text = '04.10.2022 Tesla shares dropped nearly 16% during what CEO Elon Musk called a “very intense 7 days indeed” to one of his 108 million followers on Twitter. Tesla shares closed at $265.25 on Friday, Sept. 30. At market’s close one week later, Tesla shares were trading at $223.07, a decline of nearly 16%. It was the worst week for the stock since Mar. 2020, when the Covid-19 pandemic began to grip the U.S., shutting down businesses and public life. Over the weekend, Tesla reported electric vehicle production and delivery numbers that did not meet analysts’ expectations. On Monday, Musk proceeded to stir up a political firestorm by opining about how he thought Russia’s brutal invasion of Ukraine should be resolved. After that, public records revealed that Musk had informed the Delaware Chancery Court that he would complete a $44 billion acquisition of Twitter in 10/10/2022 October, a deal he had been trying to evade for months.'
#nlp = English()
#nlp.add_pipe('sentencizer')



In [12]:
#replace numeral months
regEx2 = r'[\.\/\-](0[1-9]|1[012])[\.\/\-]'

raw_text=re.sub(regEx2, '.Oct.', raw_text) #need to write a function to match the numbers to the right mth. I just put in Oct here for now to make Spacy read it as date entity.
    


In [4]:
doc = nlp(raw_text)
sentences = [sent.text.strip() for sent in doc.sents]
df = pd.DataFrame()
df['Sentences']= sentences

#print(f"\033[1mSentence{x*250}Date{x*22}RDate{x*11}\033[0m")

###Spacy Visualisation

In [None]:
displacy.render(doc, style="ent")

In [5]:
#regEx1 = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)?(?:\d{2,4})'
#regEx2 = r'(\d{2,4})[\.\/\-](0[1-9]|1[012])[\.\/\-](\d{2,4})'
df['RDates'] = df['Sentences'].apply(lambda sent: [(ent.text) for ent in nlp(sent).ents if ent.label_ == "DATE"])  
#df['RDates']=df['RDates'].str.split(",").str[0] 
#df['Dates'] = df['Sentences'].apply(lambda sent: re.findall(regEx1, sent))  
#df['Dates2'] = df['Sentences'].apply(lambda sent: re.findall(regEx2, sent))
#df['Dates'] =  df['Dates']+df['Dates2']  
#df=df.drop(['Dates2'],axis=1)
#df = pd.DataFrame(df['RDates'].values.tolist(), index=df.index)
df.head(8)


Unnamed: 0,Sentences,RDates
0,04.Oct.2022 Tesla shares dropped nearly 16% du...,[7 days]
1,"Tesla shares closed at $265.25 on Friday, Sept...","[Friday, Sept. 30]"
2,"At market’s close one week later, Tesla shares...",[one week later]
3,It was the worst week for the stock since Mar....,"[the worst week, Mar. 2020]"
4,"Over the weekend, Tesla reported electric vehi...",[the weekend]
5,"On Monday, Musk proceeded to stir up a politic...",[Monday]
6,"After that, public records revealed that Musk ...","[10.Oct.2022 October, months]"


###EVENTS
https://www.qualicen.de/natural-language-processing-timeline-extraction-with-regexes-and-spacy/

In [6]:
pip install daterangeparser

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [7]:
import spacy
import requests
import re
import IPython
from daterangeparser import parse

In [None]:
#response = requests.get('https://raw.githubusercontent.com/qualicen/timeline/master/history_of_germany.txt')
#text = response.text
#print('Loaded {} lines'.format(text.count('\n')))

In [13]:
doc = nlp(raw_text)
for ent in filter(lambda e: e.label_=='DATE',doc.ents):
  print(ent.text)

7 days
Friday, Sept. 30
one week later
the worst week
Mar. 2020
the weekend
Monday
10.Oct.2022 October
months


In [None]:
#doc = nlp("After that, public records revealed that Musk had informed the Delaware Chancery Court that he would complete a $44 billion acquisition of Twitter in 10/10/2022 October, a deal he had been trying to evade for months.")
#IPython.display.HTML(spacy.displacy.render(doc,style="dep", page=True, options={"compact":True}))

https://downloads.cs.stanford.edu/nlp/software/dependencies_manual.pdf

In [19]:
def dep_subtree(token, dep):
  deps =[child.dep_ for child in token.children]
  child=next(filter(lambda c: c.dep_==dep, token.children), None)
  if child != None:
    return " ".join([c.text for c in child.subtree])
  else:
    return ""

# to remove citations, e.g. "[91]" as this makes problems with spaCy
p = re.compile(r'\[\d+\]')
  

In [None]:
def extract_events_spacy(line):
  line=p.sub('', line)
  events = []
  doc = nlp(line)
  for ent in filter(lambda e: e.label_=='DATE',doc.ents):
    try:
      start,end = parse(ent.text)
    except:
      # could not parse the dates, hence ignore it
      continue
    current = ent.root
    while current.dep_ != "ROOT":
      current = current.head
    desc = " ".join(filter(None,[
                                 dep_subtree(current,"nsubj"),
                                 dep_subtree(current,"nsubjpass"),
                                 dep_subtree(current,"auxpass"),
                                 dep_subtree(current,"amod"),
                                 dep_subtree(current,"det"),
                                 current.text, 
                                 dep_subtree(current,"acl"),
                                 dep_subtree(current,"dobj"),
                                 dep_subtree(current,"attr"),
                                 dep_subtree(current,"advmod")]))
    events = events + [(start,ent.text,desc)]
    print (events)
  return events

In [27]:
def extract_all_events(text, extract_function):
  all_events = []
  processed = 0
  # Process the events
  for processed,line in enumerate(text.splitlines()):
    events = extract_function(line)
    all_events = all_events + events

  print("Extracted {} events.".format(len(all_events)))

  # Print out the events
  for event in all_events:
    print(event)

  devent= pd.DataFrame(all_events)
  return devent

In [16]:
def extract_events_spacytest(line):
  line=p.sub('', line)
  events = []
  doc = nlp(line)
  for ent in filter(lambda e: e.label_=='DATE',doc.ents):

    current = ent.root
    while current.dep_ != "ROOT":
      current = current.head
    desc = " ".join(filter(None,[
                                 dep_subtree(current,"nsubj"),
                                 dep_subtree(current,"csubj"),
                                 dep_subtree(current,"auxpass"),
                                 dep_subtree(current,"pobj"),
                                 current.text,
                                 dep_subtree(current,"prep"),
                                 dep_subtree(current,"dobj"),
                                 dep_subtree(current,"advmod"),
                                 dep_subtree(current,"xcomp"),
                                 dep_subtree(current,"acl"),
                                 dep_subtree(current,"attr")]))
    events = events + [(ent.text,desc)]
  return events

In [17]:
text = raw_text

In [28]:
extract_all_events(text,extract_events_spacytest)

Extracted 9 events.
('7 days', '04.Oct.2022 Tesla shares dropped during what CEO Elon Musk called a “ very intense 7 days indeed ” to one of his 108 million followers on Twitter')
('Friday, Sept. 30', 'Tesla shares closed at $ 265.25')
('one week later', 'Tesla shares trading At market ’s close one week later a decline of nearly 16 %')
('the worst week', 'It was the worst week for the stock since Mar. 2020 , when the Covid-19 pandemic began to grip the U.S. , shutting down businesses and public life')
('Mar. 2020', 'It was the worst week for the stock since Mar. 2020 , when the Covid-19 pandemic began to grip the U.S. , shutting down businesses and public life')
('the weekend', 'Tesla reported Over the weekend electric vehicle production and delivery numbers that did not meet analysts ’ expectations')
('Monday', 'Musk proceeded On Monday to stir up a political firestorm by opining about how he thought Russia ’s brutal invasion of Ukraine should be resolved')
('10.Oct.2022 October', 'pu

Unnamed: 0,0,1
0,7 days,04.Oct.2022 Tesla shares dropped during what C...
1,"Friday, Sept. 30",Tesla shares closed at $ 265.25
2,one week later,Tesla shares trading At market ’s close one we...
3,the worst week,It was the worst week for the stock since Mar....
4,Mar. 2020,It was the worst week for the stock since Mar....
5,the weekend,Tesla reported Over the weekend electric vehic...
6,Monday,Musk proceeded On Monday to stir up a politica...
7,10.Oct.2022 October,public records revealed After that
8,months,public records revealed After that


### BERT INITIATION

In [None]:
import tensorflow as tf

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

In [20]:
!pip install pytorch-pretrained-bert pytorch-nlp

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pytorch-pretrained-bert
  Downloading pytorch_pretrained_bert-0.6.2-py3-none-any.whl (123 kB)
[K     |████████████████████████████████| 123 kB 5.4 MB/s 
[?25hCollecting pytorch-nlp
  Downloading pytorch_nlp-0.5.0-py3-none-any.whl (90 kB)
[K     |████████████████████████████████| 90 kB 8.4 MB/s 
Collecting boto3
  Downloading boto3-1.24.89-py3-none-any.whl (132 kB)
[K     |████████████████████████████████| 132 kB 44.6 MB/s 
Collecting botocore<1.28.0,>=1.27.89
  Downloading botocore-1.27.89-py3-none-any.whl (9.2 MB)
[K     |████████████████████████████████| 9.2 MB 63.4 MB/s 
[?25hCollecting jmespath<2.0.0,>=0.7.1
  Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.7.0,>=0.6.0
  Downloading s3transfer-0.6.0-py3-none-any.whl (79 kB)
[K     |████████████████████████████████| 79 kB 8.1 MB/s 
[?25hCollecting urllib3<1.27,>=1.25.4
  Downloading urllib

In [None]:
import torch
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from pytorch_pretrained_bert import BertTokenizer, BertConfig
from pytorch_pretrained_bert import BertAdam, BertForSequenceClassification
from tqdm import tqdm, trange
import pandas as pd
import io
import numpy as np
import matplotlib.pyplot as plt

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()
torch.cuda.get_device_name(0)

##BERT NER https://analyticsindiamag.com/how-to-perform-named-entity-recognition-ner-using-a-transformer/

In [1]:
pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.23.1-py3-none-any.whl (5.3 MB)
[K     |████████████████████████████████| 5.3 MB 5.3 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 28.4 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 65.8 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.10.1 tokenizers-0.13.1 transformers-4.23.1


https://github.com/satya77/Transformer_Temporal_Tagger

In [13]:
from transformers import AutoTokenizer, AutoModelForTokenClassification, BertForTokenClassification, EncoderDecoderModel
from transformers import pipeline

In [18]:
tokenizer = AutoTokenizer.from_pretrained("satyaalmasian/temporal_tagger_roberta2roberta")
model = EncoderDecoderModel.from_pretrained("satyaalmasian/temporal_tagger_roberta2roberta")

In [77]:
ARTICLE_TO_SUMMARIZE = (raw_text)

In [78]:
input_ids = tokenizer(ARTICLE_TO_SUMMARIZE, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids)
generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text)



04.10.2022 Tesla shares dropped nearly 16% during what CEO Elon Musk called a “very intense  <timex3 type="DURATION" value="P7D"> 7 days </timeX3>  indeed” to one of his 108 million followers on Twitter.Tesla shares closed at $265.25 on  than  (<time x3 scale. 25, Sept. At market’s close one week later, Elon shares were trading at$223.07, a decline of nearly 160%. It was the worst week for the stock since Mar. 2020, when the Covid-19 pandemic began to grip the U.S., shutting down businesses and public life. Over the weekend, Tesla reported electric vehicle production and delivery numbers that did not meet analysts‘ expectations. On  Mr thought to stir up a political firestorm by opining about how he thought Russia‧s brutal invasion of Ukraine should be resolved. After that, public records revealed that Musk had informed the Delaware Chancery Court that he would complete a $44 billion acquisition of Twitter in 10/10/20 22 <timeXX-10-20"> 10 </ timex5>, a deal he had been trying to evade

## ARCHIVE CODES

In [None]:
#skip
 
sentence = "Tesla shares dropped nearly 16% during what CEO Elon Musk called a “very intense 7 days indeed” to one of his 108 million followers on Twitter. Tesla shares closed at $265.25 on Friday, Sept. 30. At market’s close one week later, Tesla shares were trading at $223.07, a decline of nearly 16%. It was the worst week for the stock since Mar. 2020, when the Covid-19 pandemic began to grip the U.S., shutting down businesses and public life. Over the weekend, Tesla reported electric vehicle production and delivery numbers that did not meet analysts’ expectations. On Monday, Musk proceeded to stir up a political firestorm by opining about how he thought Russia’s brutal invasion of Ukraine should be resolved. After that, public records revealed that Musk had informed the Delaware Chancery Court that he would complete a $44 billion acquisition of Twitter in October, a deal he had been trying to evade for months."
x=' '
doc=nlp(sentence)
print(f"\033[1mText{x*22}Label{x*11}\033[0m")
for entities in doc.ents:
  if entities.label_ == "DATE":
    print(f"{entities.text:<25} {entities.label_:<15}")

In [None]:
# Step Three: Load Data & Temporal Extraction
raw_text = '2022.07.01 Tesla shares dropped nearly 16% during what CEO Elon Musk called a “very intense 7 days indeed” to one of his 108 million followers on Twitter. Tesla shares closed at $265.25 on Friday, Sept. 30. At market’s close one week later, Tesla shares were trading at $223.07, a decline of nearly 16%. It was the worst week for the stock since Mar. 2020, when the Covid-19 pandemic began to grip the U.S., shutting down businesses and public life. Over the weekend, Tesla reported electric vehicle production and delivery numbers that did not meet analysts’ expectations. On Monday, Musk proceeded to stir up a political firestorm by opining about how he thought Russia’s brutal invasion of Ukraine should be resolved. After that, public records revealed that Musk had informed the Delaware Chancery Court that he would complete a $44 billion acquisition of Twitter in 10/10/2022 October, a deal he had been trying to evade for months.'
doc = nlp(raw_text)
sentences = [sent.text.strip() for sent in doc.sents]
print(f"\033[1mSentence{x*250}Date{x*22}RDate{x*22}\033[0m")
regEx1 = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)?(?:\d{2,4})'
regEx2 = r'(\d{2,4})[\.\/\-](0[1-9]|1[012])[\.\/\-](\d{2,4})'
for s in sentences:
  doc=nlp(s)
  datef1=re.findall(regEx1, s)
  datef2=re.findall(regEx2, s)
  for entities in doc.ents:
    if entities.label_ == "DATE":
      print(f"{s:<250} {entities.text:<22} ")
      print(datef1)
      print(datef2)

In [None]:
regEx1 = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)?(?:\d{2,4})'
for s in sentences:
  new=re.findall(regEx1, s)
  print(new)

regEx2 = r'(\d{2,4})[\.\/\-](0[1-9]|1[012])[\.\/\-](\d{2,4})'
for s in sentences:
  new=re.findall(regEx2, s)
  print(new)


https://github.com/mmxgn/spacy-clausie

In [63]:
!pip install git+https://github.com/mmxgn/spacy-clausie.git
import claucy   
claucy.add_to_pipe(nlp)   

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/mmxgn/spacy-clausie.git
  Cloning https://github.com/mmxgn/spacy-clausie.git to /tmp/pip-req-build-qp2xwav0
  Running command git clone -q https://github.com/mmxgn/spacy-clausie.git /tmp/pip-req-build-qp2xwav0


In [65]:
for ent in filter(lambda e: e.label_=='DATE',doc.ents):
  doc = nlp(raw_text)
  doc._.clauses 