We have Inspection Reports (IRs) that are documents that explain the Human Errors in Nuclear Power Plants.
Every sentence in a document can be labelled by the type of Human Error.
The types of Human Errors are:
T: Team Cognition
P: Procedural
O: Organizational
D: Design
H: Human
The sentences in IR documents need to be classified into the type of these Human Errors.
We have 3 coders: Sally, Trixy and Frenard. They have labelled the sentences of various Instruction Reports (IRs). For the sentences which do not belong to any of the type of Human Error, were labelled as U: useless , by them. For the sentences which were labelled by them as one of the type of Human Error, will be referred as useful
The entire text in an IR file is split by periods(.) into sentences. Every IR .txt file is converted into .csv file with columns: text, line, start_pos, end_pos, file, label. Every row specifies a sentence of IR, line no. in which that sentence lies, position where the sentence starts, position where the sentence ends, filename and label assigned by coders, respectively.
Because of 3 different coders we have considered 9 'Coder Types'. These 'Coder Types' basically define different types of Dataset. The 'Dataset Types' depends upon 'Coder Types':
- Sally - for all useful sentences labelled by sally
- Trixy - for all useful sentences labelled by Trixy
- Frenard - for all useful sentences labelled by Frenard
- Sally_Trixy_Agree - for all useful sentences labelled by Sally and Trixy where they gave same labels
- Sally_Frenard_Agree - for all useful sentences labelled by Sally and Trixy where they gave same labels
- Sally_Trixy_DisAgree - for all useful sentences labelled by Sally where Sally and Trixy gave different labels
- Trixy_Sally_DisAgree - for all useful sentences labelled by Trixy where Sally and Trixy gave different labels
- Sally_Frenard_DisAgree - for all useful sentences labelled by Sally where Sally and Frenard gave different labels
- Frenard_Sally_DisAgree - for all useful sentences labelled by Frenard where Sally and Frenard gave different labels
Kappa and Alpha Scores are used as a metric. These metrics are used to calculate the Inter-coder Reliability between Sally and Trixy and between Sally and Frenard. The metrics are used to compare which pair of coders has better Inter-coder Reliability.
This classification model includes: GloVe + CNN + LSTM.
This classification model includes: pre-trained Google Word2Vec + CNN + LSTM
This classification model includes: Training self Word2Vec + CNN + LSTM
This classification model implements BERT using tensorflow 1.x.
This classification model implements ULMFit using FastAI.
All the classification results of the above models are compiled and compared
We have implemented different Classification Models and have compared their results for different 'Dataset Types'.
All the above Classification Models are used in:
This is done to observe how accurate a model is in predicting if a sentence is useless or useful . It uses the Dataset which contains all the sentences labelled by Sally. Equal no. of useless or useful sentences are taken in this Dataset.
This is done to observe how accurate a model is in predicting the type of Human Error of a sentence. This is done on all the 'Dataset Types' for the 'Coder Types' mentioned above.