<a href="https://colab.research.google.com/github/beatriceyapsm/temporaltest/blob/main/SurveyTemporalinfo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Temporal Information Extraction
Temporal information can be represented as {T, E, R}, where T denotes the temporal points, durations or intervals, E means the events, and R represents the temporal relation. 
Three main approaches to the task of temporal information extraction: rule-based, datadriven, and hybrid.
Tempeval-1, Tempeval-2 and Tempeval-3 are all exercises in temporal information extraction. 

TimeML is a set of rules for encoding documents electronically.
- EVENT tag is used to annotate those elements in a text that mark the semantic events 
- TIMEX3 tag is primarily used to mark up explicit temporal expressions, such as times, dates, durations, etc. 
- MAKEINSTANCE tag: for when a new instance should be created as an event occurs on multiple days (eg. He taught last Wednesday and today.)
- TLINK or Temporal Link represents the temporal relationship holding
between events, times, or between an event and a time.

Best f1-score for timex3: rule-based system Heideltime
Best f1-score for event & makeinstance: ATT-1 Using Max Entropy

Publicly available dataset, namely TimeBank.

In [51], a method for extracting temporal relations between two
events was proposed. It had two stages: (1) a machine-learning model for classifying event attributes (i.e., tense, aspect, modality, polarity, and event class), and (2) a machine-learning model for classifying the
relation types between two events. It used TimeBank for experiments, and reported that Naive Bayes (NB) generally gives better performance than maximum entropy (ME). // https://aclanthology.org/P07-2044/

In [61], a new corpus for the task of extraction of temporal expressions, namely WikiWars, was introduced. 

In SemEval-2018, ‘Task 6: Parsing Time Normalizations’ was held as a shared task related to time information extraction [93]. // https://aclanthology.org/D10-1089/





### SPACY

In [111]:
# Load SPACY 
import spacy
nlp=spacy.load('en_core_web_sm')
import pandas as pd
import numpy as np
import re

In [251]:
# Load Data & Temporal Extraction
raw_text = '2022.07.01 Tesla shares dropped nearly 16% during what CEO Elon Musk called a “very intense 7 days indeed” to one of his 108 million followers on Twitter. Tesla shares closed at $265.25 on Friday, Sept. 30. At market’s close one week later, Tesla shares were trading at $223.07, a decline of nearly 16%. It was the worst week for the stock since Mar. 2020, when the Covid-19 pandemic began to grip the U.S., shutting down businesses and public life. Over the weekend, Tesla reported electric vehicle production and delivery numbers that did not meet analysts’ expectations. On Monday, Musk proceeded to stir up a political firestorm by opining about how he thought Russia’s brutal invasion of Ukraine should be resolved. After that, public records revealed that Musk had informed the Delaware Chancery Court that he would complete a $44 billion acquisition of Twitter in 10/10/2022 October, a deal he had been trying to evade for months.'
doc = nlp(raw_text)
sentences = [sent.text.strip() for sent in doc.sents]
df = pd.DataFrame()
df['Sentences']= sentences
#print(f"\033[1mSentence{x*250}Date{x*22}RDate{x*11}\033[0m")
regEx1 = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)?(?:\d{2,4})'
regEx2 = r'(\d{2,4})[\.\/\-](0[1-9]|1[012])[\.\/\-](\d{2,4})'
df['RDates'] = df['Sentences'].apply(lambda sent: [(ent.text) for ent in nlp(sent).ents if ent.label_ == "DATE"])  
df['Dates'] = df['Sentences'].apply(lambda sent: re.findall(regEx1, sent))  
df['Dates2'] = df['Sentences'].apply(lambda sent: re.findall(regEx2, sent))
df['Dates'] =  df['Dates']+df['Dates2']  
df=df.drop(['Dates2'],axis=1)
#notworking df['RDates']=df['RDates'].str.replace(regEx1, '') 
df.head(8)

Unnamed: 0,Sentences,RDates,Dates
0,2022.07.01 Tesla shares dropped nearly 16% dur...,[7 days],"[(2022, 07, 01)]"
1,"Tesla shares closed at $265.25 on Friday, Sept...","[Friday, Sept. 30]",[Sept. 30]
2,"At market’s close one week later, Tesla shares...",[one week later],[]
3,It was the worst week for the stock since Mar....,"[the worst week, Mar. 2020]",[Mar. 2020]
4,"Over the weekend, Tesla reported electric vehi...",[the weekend],[]
5,"On Monday, Musk proceeded to stir up a politic...",[Monday],[]
6,"After that, public records revealed that Musk ...","[10/10/2022 October, months]","[(10, 10, 2022)]"


### BERT INITIATION

In [None]:
import tensorflow as tf

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

In [None]:
!pip install pytorch-pretrained-bert pytorch-nlp

In [4]:
import torch
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from pytorch_pretrained_bert import BertTokenizer, BertConfig
from pytorch_pretrained_bert import BertAdam, BertForSequenceClassification
from tqdm import tqdm, trange
import pandas as pd
import io
import numpy as np
import matplotlib.pyplot as plt

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
n_gpu = torch.cuda.device_count()
torch.cuda.get_device_name(0)

## ARCHIVE CODES

In [None]:
#skip
 
sentence = "Tesla shares dropped nearly 16% during what CEO Elon Musk called a “very intense 7 days indeed” to one of his 108 million followers on Twitter. Tesla shares closed at $265.25 on Friday, Sept. 30. At market’s close one week later, Tesla shares were trading at $223.07, a decline of nearly 16%. It was the worst week for the stock since Mar. 2020, when the Covid-19 pandemic began to grip the U.S., shutting down businesses and public life. Over the weekend, Tesla reported electric vehicle production and delivery numbers that did not meet analysts’ expectations. On Monday, Musk proceeded to stir up a political firestorm by opining about how he thought Russia’s brutal invasion of Ukraine should be resolved. After that, public records revealed that Musk had informed the Delaware Chancery Court that he would complete a $44 billion acquisition of Twitter in October, a deal he had been trying to evade for months."
x=' '
doc=nlp(sentence)
print(f"\033[1mText{x*22}Label{x*11}\033[0m")
for entities in doc.ents:
  if entities.label_ == "DATE":
    print(f"{entities.text:<25} {entities.label_:<15}")

In [None]:
# Step Three: Load Data & Temporal Extraction
raw_text = '2022.07.01 Tesla shares dropped nearly 16% during what CEO Elon Musk called a “very intense 7 days indeed” to one of his 108 million followers on Twitter. Tesla shares closed at $265.25 on Friday, Sept. 30. At market’s close one week later, Tesla shares were trading at $223.07, a decline of nearly 16%. It was the worst week for the stock since Mar. 2020, when the Covid-19 pandemic began to grip the U.S., shutting down businesses and public life. Over the weekend, Tesla reported electric vehicle production and delivery numbers that did not meet analysts’ expectations. On Monday, Musk proceeded to stir up a political firestorm by opining about how he thought Russia’s brutal invasion of Ukraine should be resolved. After that, public records revealed that Musk had informed the Delaware Chancery Court that he would complete a $44 billion acquisition of Twitter in 10/10/2022 October, a deal he had been trying to evade for months.'
doc = nlp(raw_text)
sentences = [sent.text.strip() for sent in doc.sents]
print(f"\033[1mSentence{x*250}Date{x*22}RDate{x*22}\033[0m")
regEx1 = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)?(?:\d{2,4})'
regEx2 = r'(\d{2,4})[\.\/\-](0[1-9]|1[012])[\.\/\-](\d{2,4})'
for s in sentences:
  doc=nlp(s)
  datef1=re.findall(regEx1, s)
  datef2=re.findall(regEx2, s)
  for entities in doc.ents:
    if entities.label_ == "DATE":
      print(f"{s:<250} {entities.text:<22} ")
      print(datef1)
      print(datef2)

In [None]:
regEx1 = r'(?:\d{1,2}[-/th|st|nd|rd\s]*)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z\s,.]*(?:\d{1,2}[-/th|st|nd|rd)\s,]*)?(?:\d{2,4})'
for s in sentences:
  new=re.findall(regEx1, s)
  print(new)

regEx2 = r'(\d{2,4})[\.\/\-](0[1-9]|1[012])[\.\/\-](\d{2,4})'
for s in sentences:
  new=re.findall(regEx2, s)
  print(new)
