# What

As per issue https://github.com/1jamesthompson1/TAIC-report-summary/issues/181

Extracting the event type would be useful in adding an extra summary data point to the table in the safety issue extraction.

## Modules

In [None]:
# Local
import engine.gather.WebsiteScraping as WebsiteScraping

# Third party 
import pandas as pd

# Build in
import os

## Problem

We need to assign a event type to every report id.

The goal is to do this by just reading the report title. The report title can always be found on the taic investigation website.

There are given event types for each mode, the report titles should be pigeoned holed into one of these event types.

# Collecting event types data

In [None]:
marine_event_types = pd.read_csv(os.path.join("data", "MarineOccurrenceCategory_ValueListItem.csv"))
aviation_event_types = pd.read_csv(os.path.join("data", "OccurrenceCategoryLevel123_ValueListItem.csv"))
rail_event_types = pd.read_csv(os.path.join("data", "OCG1CategoryLevel1_ValueListItem.csv"))
display(marine_event_types)
display(aviation_event_types)
display(rail_event_types)

In [None]:
aviation_event_types_level2 = aviation_event_types.query("ValueListName == 'OccurrenceCategoryLevel2'")[['ValueListName', 'Value']]
aviation_event_types_level3 = aviation_event_types.query("ValueListName == 'OccurrenceCategoryLevel3'")[['ValueListName', 'Value']]
display(aviation_event_types_level2)
display(aviation_event_types_level3)

In [None]:
print(rail_event_types['Value'].tolist())

In [None]:
all_event_types = pd.concat(
    [
        marine_event_types.assign(mode="marine"),
        aviation_event_types_level2.assign(mode="aviation"),
        rail_event_types.assign(mode="rail"),
    ],
    ignore_index=True,
    axis=0
).drop(columns=["ValueListName"])

all_event_types

In [None]:
all_event_types.groupby("mode").count()

In [None]:
all_event_types.to_csv("../../data/event_types.csv")

# Getting report titles via webscraping

In [None]:
report_titles = pd.read_pickle("../../output/report_titles.pkl")
report_titles

# Assigning event types to reports

I need to get some testing titles data to use for the proper testing of the report event type assignment

In [None]:
report_titles.sample(frac=0.1, random_state = 42).to_pickle("../../tests/data/report_titles.pkl")

In [None]:
assigned_reports = pd.read_pickle("../../output/report_event_types.pkl")
assigned_reports