evaluator

Python script for evaluating classification tasks against ground truth.

Usage

usage: evaluator.py [-h] -p PREDICTION_FILE -g GROUND_TRUTH_FILE
                    [-b BINARY_PREDICTION_FILE] [-bw] [--icd] [-t] [-w]

Evaluate the predictions against ground truth.

optional arguments:
  -h, --help            show this help message and exit
  -p PREDICTION_FILE, --prediction_file PREDICTION_FILE
                        Predictions file.
  -g GROUND_TRUTH_FILE, --ground_truth_file GROUND_TRUTH_FILE
                        Ground truth file.
  -b BINARY_PREDICTION_FILE, --binary_prediction_file BINARY_PREDICTION_FILE
                        Binary prediction to filter the prediction_file.
  -bw, --binary_weka    Binary prediction is in Weka format.
  --icd                 Whether the evaluation is using ICD code; in which
                        case consider only first three characters (CXX).
                        Default is False
  -t, --tex             Print results in LaTeX table format.
  -w, --weka            Predictions are in weka format

Inputs

The format of the predictions file is:

docId[tab]classification1[tab]classification2[tab]...[tab]classificationN

i.e., one or more classification are processed per document. If any one is correct then the document is correct.

The ground truth file is simply:

docId[tab]classification

The binary_prediction_file is used to filter the predictions file according to a binary "0" or "1" label. Its format is:

docId[tab]classification

where classification = 0 or 1

The --icd flag is used when processing ICD codes where you only care about the first three characters, e.g., CXX. The default is false (i.e., not ICD).

Outputs

The program will print the individual predictions for each docId in the following format:

docId[tab]GroundTruthValue[tab]classification1[tab]...[tab]classificationN

A * character will be marked against the classification if that classification matches the groundtruth. A sample output is provided below:

docId   Actual  Predictions (1..n) *=correct
380770  other   other*
210836  other   other*
279222  Flu     other
323953  other   other*
305748  other   other*
82922   Flu     Flu*
346547  other   other*
1475    other   Flu

After the individual predictions a summary is provided for each class. For example, a sample output for Flu is provided below:

Flu results:
				Classifier
				-	+
	Ground	-	0	7
	Truth	+	2	36


	Flu Recall: 0.9474
	Flu Precsion: 0.8372
	Flu Fmeasure: 0.8889

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
evaluator.py		evaluator.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

evaluator.py

evaluator.py

Repository files navigation

evaluator

Usage

Inputs

Outputs

About

Releases

Packages

Languages

bevankoopman/evaluator

Folders and files

Latest commit

History

README.md

README.md

evaluator.py

evaluator.py

Repository files navigation

evaluator

Usage

Inputs

Outputs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages