<a href="https://colab.research.google.com/github/beatriceyapsm/Temporal-Information/blob/main/SurveyTemporalinfo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Temporal Information Extraction
Temporal information can be represented as {T, E, R}, where T denotes the temporal points, durations or intervals, E means the events, and R represents the temporal relation. 
Three main approaches to the task of temporal information extraction: rule-based, datadriven, and hybrid.
Tempeval-1, Tempeval-2 and Tempeval-3 are all exercises in temporal information extraction. 

TimeML is a set of rules for encoding documents electronically.
- EVENT tag is used to annotate those elements in a text that mark the semantic events 
- TIMEX3 tag is primarily used to mark up explicit temporal expressions, such as times, dates, durations, etc. 
- MAKEINSTANCE tag: for when a new instance should be created as an event occurs on multiple days (eg. He taught last Wednesday and today.)
- TLINK or Temporal Link represents the temporal relationship holding
between events, times, or between an event and a time.

Best f1-score for timex3: rule-based system Heideltime
Best f1-score for event & makeinstance: ATT-1 Using Max Entropy

Publicly available dataset, namely TimeBank.

In [51], a method for extracting temporal relations between two
events was proposed. It had two stages: (1) a machine-learning model for classifying event attributes (i.e., tense, aspect, modality, polarity, and event class), and (2) a machine-learning model for classifying the
relation types between two events. It used TimeBank for experiments, and reported that Naive Bayes (NB) generally gives better performance than maximum entropy (ME). // https://aclanthology.org/P07-2044/

In [61], a new corpus for the task of extraction of temporal expressions, namely WikiWars, was introduced. 

In SemEval-2018, ‘Task 6: Parsing Time Normalizations’ was held as a shared task related to time information extraction [93]. // https://aclanthology.org/D10-1089/





### SPACY

In [None]:
!python3 -m spacy download en_core_web_trf
!pip install spacy-transformers

2022-10-17 02:53:17.605786: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting en-core-web-trf==3.4.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_trf-3.4.0/en_core_web_trf-3.4.0-py3-none-any.whl (460.3 MB)
[K     |████████████████████████████████| 460.3 MB 26 kB/s 
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_trf')
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
# Load SPACY 
import spacy
#from spacy.lang.en import English
from spacy import displacy
nlp=spacy.load('en_core_web_sm')
import pandas as pd
import numpy as np
import re

In [None]:
# Load Data & Temporal Extraction
raw_text = '04.10.2022 Tesla shares dropped nearly 16% during what CEO Elon Musk called a “very intense 7 days indeed” to one of his 108 million followers on Twitter. Tesla shares closed at $265.25 on Friday, Sept. 30. At market’s close one week later, Tesla shares were trading at $223.07, a decline of nearly 16%. It was the worst week for the stock since Mar. 2020, when the Covid-19 pandemic began to grip the U.S., shutting down businesses and public life. Over the weekend, Tesla reported electric vehicle production and delivery numbers that did not meet analysts’ expectations. On Monday, Musk proceeded to stir up a political firestorm by opining about how he thought Russia’s brutal invasion of Ukraine should be resolved. After that, public records revealed that Musk had informed the Delaware Chancery Court that he would complete a $44 billion acquisition of Twitter in Q2, a deal he had been trying to evade for months.'
#nlp = English()
#nlp.add_pipe('sentencizer')

In [None]:
#replace numeral months, not needed when using transformer, but needed when using nlp solely
#regEx2 = r'[\.\/\-](0[1-9]|1[012])[\.\/\-]'
#raw_text=re.sub(regEx2, '.Oct.', raw_text) #need to write a function to match the numbers to the right mth. I just put in Oct here for now to make Spacy read it as date entity.

In [None]:
doc = nlp(raw_text)
sentences = [sent.text.strip() for sent in doc.sents]
df = pd.DataFrame()
df['Sentences']= sentences

#print(f"\033[1mSentence{x*250}Date{x*22}RDate{x*11}\033[0m")

In [None]:
#load roberta transformer
trf = spacy.load('en_core_web_trf')

###Spacy Visualisation

In [None]:
displacy.render(doc, style="ent")

'<div class="entities" style="line-height: 2.5; direction: ltr">\n<mark class="entity" style="background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    04.10.2022 Tesla\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">ORG</span>\n</mark>\n shares dropped \n<mark class="entity" style="background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    nearly 16%\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem">PERCENT</span>\n</mark>\n during what CEO \n<mark class="entity" style="background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;">\n    Elon Musk\n    <span style="font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-le

In [None]:
#regEx1 = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)?(?:\d{2,4})'
#regEx2 = r'(\d{2,4})[\.\/\-](0[1-9]|1[012])[\.\/\-](\d{2,4})'
df['NLPDates'] = df['Sentences'].apply(lambda sent: [(ent.text) for ent in nlp(sent).ents if ent.label_ == "DATE"])
df['TRFDates'] = df['Sentences'].apply(lambda sent: [(ent.text) for ent in trf(sent).ents if ent.label_ == "DATE"])    
#df['RDates']=df['RDates'].str.split(",").str[0] 
#df['Dates'] = df['Sentences'].apply(lambda sent: re.findall(regEx1, sent))  
#df['Dates2'] = df['Sentences'].apply(lambda sent: re.findall(regEx2, sent))
#df['Dates'] =  df['Dates']+df['Dates2']  
#df=df.drop(['Dates2'],axis=1)
#df = pd.DataFrame(df['RDates'].values.tolist(), index=df.index)
df.head(8)




Unnamed: 0,Sentences,NLPDates,TRFDates
0,04.10.2022 Tesla shares dropped nearly 16% dur...,[7 days],"[04.10.2022, 7 days]"
1,"Tesla shares closed at $265.25 on Friday, Sept...","[Friday, Sept. 30]","[Friday, Sept. 30]"
2,"At market’s close one week later, Tesla shares...",[one week later],[one week later]
3,It was the worst week for the stock since Mar....,"[the worst week, Mar. 2020]","[the worst week, Mar. 2020]"
4,"Over the weekend, Tesla reported electric vehi...",[the weekend],[the weekend]
5,"On Monday, Musk proceeded to stir up a politic...",[Monday],[Monday]
6,"After that, public records revealed that Musk ...",[months],"[Q2, months]"
