# Filter temporal answers

To conduct temporal error analysis, I need to focus on expected answers that are temporal, e.g., dates, times, or durations.

## Remove non-temporal answer types using author's script
From the head dataset, I remove non-temporal questions. An in depth analysis can be found in a [quarto report](../quarto_reports/temporal_eval.qmd). Basically, I used a script of the authors that classified answer types. In the quarto report, I double checked if the removed answer types are indeed not temporal. They were indeed non-temporal.

In [24]:
import pandas as pd

answer_types_to_drop = [
    "PERSON",
    "ORGANIZATION",
    "PLACE",
    "YES/NO",
    "PERCENTAGE",
    "MONEY",
    "PRODUCT",
    "ORDINAL",
]
head_dataset = (
    pd.read_csv("../../data/maindata/qapairs/head-set/head-set_analysis.csv")
    .loc[:, ["question", "answer", "table_id", "answer_type"]]
    .query("~answer_type.isin(@answer_types_to_drop)")
)
head_dataset.shape

(1567, 4)

In [28]:
prob_temp = head_dataset.query(
    "answer.str.lower().str.contains('year') "
    "or answer.str.lower().str.contains('month') "
    "or answer.str.lower().str.contains('day') "
    "or question.str.lower().str.contains('when') "
    "or question.str.lower().str.contains('year') "
    "or question.str.lower().str.contains('what age')"
    "or question.str.lower().str.contains('how long')"
    "or question.str.lower().str.contains('how many days')"
    "or question.str.lower().str.contains('how many full months')"
    "or question.str.lower().str.contains('how many months')"
    "or question.str.lower().str.contains('what is the date')"
    "or question.str.lower().str.contains('what was the opening date')" 
    "or question.str.lower().str.contains('what was the release date')"  
    "or question.str.lower().str.contains('what was the time span')"  
    "or question.str.lower().str.contains('if mars express begins')"
)

In [36]:
head_dataset.loc[: , "is_temporal"] = False
head_dataset.loc[prob_temp.index, "is_temporal"] = True

In [39]:
head_dataset.to_csv("temp_head_labeling.csv", index=False)