## Purpose of this notebook

- Examine similarities and differences between different raters' reliability report grades
- Calculate Cohen's kappa between pairs of raters
- Identify and print reports where raters strongly disagree (grade of 2 vs grade of 0)

## How to use this notebook

- Run each of the cells in order. Make sure you run Cell 01 first.
- Cell 02 can be used to see how many reports have been graded since a date (YYYY-MM-DD format)
- Cells 03-04 get the proc_ord_id values unique to each report and the names of the persons who have graded reliability reports.
- Cell 05 can be used to examine the reports a pair of graders disagree on
- Cells 06 will release examined reports back to a specified grader for regrading
- Cell 07 can be used to examine and regrade reports marked with a -1 flag

In [4]:
# Cell 01: load libraries
from reliabilityLib import *
from reportMarkingFunctions import *
from google.cloud import bigquery # SQL table interface on Arcus
import pandas
import numpy
import random
import matplotlib.pyplot as plt

client = bigquery.Client()
backup_grader_table()

lab.grader_table_with_metadata backup successful


## Evolution of SLIP over time

In [5]:
# Cell 02:
getGradeCountsSinceDate("2024-09-01")

# Reports 	 Grader Name
81 		 Megan M. Himes
338 		 Sepp Kohler
159 		 Dabriel Zimmerman
201 		 Leila Abdel-Qader
304 		 Laura Mercedes
628 		 Jenna Schabdach
151 		 Zhiqiang Sha
215 		 Benjamin Jung

Any graders not in the displayed table have not graded any reports since before 2024-09-01


## Grader agreement on reliability reports

In [3]:
# Cell 03: Get the list of proc_ord_id values used to identify the reliability reports
procIds = get_reliability_proc_ord_ids()

In [8]:
# Cell 04: Compare the reliability reports for the users we want to evaluate
graders = ['Jenna Schabdach', 
           'Megan M. Himes', 
           # 'Naomi Shifman', 
           # 'Alexa DeJean',
           # 'Julia Katowitz',
           'Matt Buczek',
           'Shivaram Karandikar',
           'Dabriel Zimmerman',
           # 'Shreya Gudapati',
           'Harry Hearn', 
           'Sepp Kohler', 
           'Eren Kafadar', 
           'Leila Abdel-Qader', 
           'Laura Mercedes', 
           'Zhiqiang Sha',
           'Benjamin Jung']

# Metric options: "disagreement", "kappa", "kappa2vAll", "kappa0vAll"
df = calculateMetricForGraders(graders, "kappa")
print()
print(df)

  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k
  metricTable.loc[graders[idx1], graders[idx2]] = k



                     Megan M. Himes  Naomi Shifman  Alexa DeJean  \
Jenna Schabdach            0.827278       0.659910      0.765639   
Megan M. Himes             0.000000       0.670741      0.766317   
Naomi Shifman              0.000000       0.000000      0.670720   
Alexa DeJean               0.000000       0.000000      0.000000   
Julia Katowitz             0.000000       0.000000      0.000000   
Matt Buczek                0.000000       0.000000      0.000000   
Shivaram Karandikar        0.000000       0.000000      0.000000   
Dabriel Zimmerman          0.000000       0.000000      0.000000   
Shreya Gudapati            0.000000       0.000000      0.000000   
Harry Hearn                0.000000       0.000000      0.000000   
Sepp Kohler                0.000000       0.000000      0.000000   
Eren Kafadar               0.000000       0.000000      0.000000   
Leila Abdel-Qader          0.000000       0.000000      0.000000   
Laura Mercedes             0.000000       0.000

In [9]:
# Cell 05: 
# This is the cell where you can look at the disagreement reports for each pair of users
# User 1: Naomi
# User 2: Jenna
grader1 = "Jenna Schabdach"
grader2 = "Dabriel Zimmerman"
procIds = get_reliability_proc_ord_ids()
grades1 = get_reports_for_user(grader1, procIds)
grades2 = get_reports_for_user(grader2, procIds)
# grades2 = grades1.copy(deep=True)
# grades2['grade'] = [random.randint(0, 2) for i in range(len(grades1))]
# grades2['grade'] = [ 0 for i in range(len(grades1)) ]
disagreement = identify_disagreement_reports(grades1, grades2)
print(len(disagreement))

calc_kappa(grades1, grades2)

# print_disagreement_reports(disagreement, grades1, grades2)

23


0.7650043981324852

In [10]:
# Cell 06: release a set of your reports back into your queue 
# ONLY USE THIS IF YOU'RE CERTAIN
release_reports(grader2, disagreement)

23 were released back into the queue for Dabriel Zimmerman


## Release reports of retired graders

In [None]:
# # Edit this list to include the username of anyone who is actively grading
# active_graders = ['Jenna Schabdach', 
#                   'Harry Hearn', 
#                   'Dabriel Zimmerman', 
#                   'Sepp Kohler', 
#                   'Megan M. Himes', 
#                   'Eren Kafadar', 
#                   'Matt Buczek', 
#                   'Leila Abdel-Qader', 
#                   'Laura Mercedes', 
#                   'Benjamin Jung', 
#                   'Zhiqiang Sha', 
#                   'Shivaram Karandikar']

# # Run this cell once with the following flag set to True to check your grader list
# just_check = True

# # Get the list of graders
# q = "select distinct grader_name from lab.grader_table_with_metadata;"
# graders = client.query(q).to_dataframe()['grader_name'].values

# # Drop "Coarse Text Search"
# graders = [i for i in graders if "Coarse Text Search" not in i]
# print(graders)

# # For every grader
# for grader in graders:
#     # If the grader is no longer active
#     if grader not in active_graders:
#         # Get the number of reports in their queue
#         q = 'select * from lab.grader_table_with_metadata where grader_name = "'+grader+'" and grade = 999 and grade_category = "Unique"'
#         grader_df = client.query(q).to_dataframe()
#         # Print the number of reports for the user
#         print(grader, grader_df.shape)
#         # Eventually, delete those entries
#         if len(grader_df) > 0 and just_check = False:
#             q = 'delete from lab.grader_table_with_metadata where grader_name = "'+grader+'" and grade = 999 and grade_category = "Unique"' 
#             job = client.query(q)
#             job.result()🟢

## Examine flagged reports

In [None]:
# Cell 07: regrade skipped reports
# client: A bigquery client object (created in Cell 01)
# skippedGrader: A string of the grader's name (leave blank to review all flagged reports)
skippedGrader = "" # "Naomi Shifman"
regrade_skipped_reports(client, grader=skippedGrader)

## For clinician review only

In [None]:
# Cell 08: clinician regrade skipped reports
# client: A bigquery client object (created in Cell 01)
# skippedGrader: A string of the grader's name (leave blank to review all flagged reports)
regrade_skipped_reports(client, flag=-2)