## Purpose of this notebook

- Examine similarities and differences between different raters' reliability report grades
- Calculate Cohen's kappa between pairs of raters
- Identify and print reports where raters strongly disagree (grade of 2 vs grade of 0)

## How to use this notebook

- Run each of the cells in order. Make sure you run Cell 01 first.
- Cells 02-03 get the proc_ord_id values unique to each report and the names of the persons who have graded reliability reports.
- Cells 04-07 can be modified to change which users you want to look at. 

In [None]:
# Cell 01: load libraries
from reliabilityLib import *
from google.cloud import bigquery # SQL table interface on Arcus
import pandas
import numpy
import matplotlib.pyplot as plt

In [None]:
# Cell 02: Get the list of proc_ord_id values used to identify the reliability reports
procIds = getReliabilityProcOrdIds()

In [None]:
# Cell 03: Compare the reliability reports for the users we want to evaluate
graders = ['Jenna Schabdach', 
           'Megan M. Himes', 
           'Naomi Shifman', 
           'Shreya Gudapati']

# Metric options: "disagreement", "kappa", "kappa2vAll", "kappa0vAll"
df = calculateMetricForGraders(graders, "disagreement")
print(df)

In [None]:
# Cell 04: 
# This is the cell where you can look at the disagreement reports for each pair of users
# User 1: Naomi
# User 2: Jenna
grader1 = "Jenna Schabdach"
grader2 = "Megan M. Himes"
procIds = getReliabilityProcOrdIds()
grades1 = getReportsForUser(grader1, procIds)
grades2 = getReportsForUser(grader2, procIds)
disagreement = identifyDisagreementReports(grades1, grades2)

printDisagreementReports(disagreement, grades1, grades2)

In [None]:
# Cell 05: release a set of your reports back into your queue 
# ONLY USE THIS IF YOU'RE CERTAIN
# releaseReports(grader1, disagreement)