Neptune Classification

We have Inspection Reports (IRs) that are documents that explain the Human Errors in Nuclear Power Plants.
Every sentence in a document can be labelled by the type of Human Error.

The types of Human Errors are:
T: Team Cognition
P: Procedural
O: Organizational
D: Design
H: Human

The sentences in IR documents need to be classified into the type of these Human Errors.

We have 3 coders: Sally, Trixy and Frenard. They have labelled the sentences of various Instruction Reports (IRs). For the sentences which do not belong to any of the type of Human Error, were labelled as U: useless , by them. For the sentences which were labelled by them as one of the type of Human Error, will be referred as useful

Dataset

The entire text in an IR file is split by periods(.) into sentences. Every IR .txt file is converted into .csv file with columns: text, line, start_pos, end_pos, file, label. Every row specifies a sentence of IR, line no. in which that sentence lies, position where the sentence starts, position where the sentence ends, filename and label assigned by coders, respectively.

Because of 3 different coders we have considered 9 'Coder Types'. These 'Coder Types' basically define different types of Dataset. The 'Dataset Types' depends upon 'Coder Types':

Sally - for all useful sentences labelled by sally
Trixy - for all useful sentences labelled by Trixy
Frenard - for all useful sentences labelled by Frenard
Sally_Trixy_Agree - for all useful sentences labelled by Sally and Trixy where they gave same labels
Sally_Frenard_Agree - for all useful sentences labelled by Sally and Trixy where they gave same labels
Sally_Trixy_DisAgree - for all useful sentences labelled by Sally where Sally and Trixy gave different labels
Trixy_Sally_DisAgree - for all useful sentences labelled by Trixy where Sally and Trixy gave different labels
Sally_Frenard_DisAgree - for all useful sentences labelled by Sally where Sally and Frenard gave different labels
Frenard_Sally_DisAgree - for all useful sentences labelled by Frenard where Sally and Frenard gave different labels

Prerequisites

Project Structure

1. InterCoderReliability

Kappa and Alpha Scores are used as a metric. These metrics are used to calculate the Inter-coder Reliability between Sally and Trixy and between Sally and Frenard. The metrics are used to compare which pair of coders has better Inter-coder Reliability.

2. ClassificationModel_GloVe

This classification model includes: GloVe + CNN + LSTM.

3. ClassificationModel_GoogleW2V

This classification model includes: pre-trained Google Word2Vec + CNN + LSTM

4. ClassificationModel_TrainedW2V

This classification model includes: Training self Word2Vec + CNN + LSTM

5. ClassificationModel_BERT

This classification model implements BERT using tensorflow 1.x.

6. ClassificationModel_ULMFit

This classification model implements ULMFit using FastAI.

7. Classification_Results_Compilations

All the classification results of the above models are compiled and compared

Classification

We have implemented different Classification Models and have compared their results for different 'Dataset Types'.
All the above Classification Models are used in:

1. Binary Classification:

This is done to observe how accurate a model is in predicting if a sentence is useless or useful . It uses the Dataset which contains all the sentences labelled by Sally. Equal no. of useless or useful sentences are taken in this Dataset.

2. Multi-class Classification:

This is done to observe how accurate a model is in predicting the type of Human Error of a sentence. This is done on all the 'Dataset Types' for the 'Coder Types' mentioned above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Neptune Classification

Dataset

Prerequisites

Project Structure

1. InterCoderReliability

2. ClassificationModel_GloVe

3. ClassificationModel_GoogleW2V

4. ClassificationModel_TrainedW2V

5. ClassificationModel_BERT

6. ClassificationModel_ULMFit

7. Classification_Results_Compilations

Classification

1. Binary Classification:

2. Multi-class Classification:

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
ClassificationModel_BERT		ClassificationModel_BERT
ClassificationModel_GloVe		ClassificationModel_GloVe
ClassificationModel_GoogleW2V		ClassificationModel_GoogleW2V
ClassificationModel_TrainedW2V		ClassificationModel_TrainedW2V
ClassificationModel_ULMFit		ClassificationModel_ULMFit
Classification_Results_Compilations		Classification_Results_Compilations
InterCoderReliability		InterCoderReliability
.gitignore		.gitignore
README.md		README.md

UnitForDataScience/Neptune-Classification

Folders and files

Latest commit

History

Repository files navigation

Neptune Classification

Dataset

Prerequisites

Project Structure

1. InterCoderReliability

2. ClassificationModel_GloVe

3. ClassificationModel_GoogleW2V

4. ClassificationModel_TrainedW2V

5. ClassificationModel_BERT

6. ClassificationModel_ULMFit

7. Classification_Results_Compilations

Classification

1. Binary Classification:

2. Multi-class Classification:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages