# Re-Curation Statistics

This notebooks outlines basic statistics about the re-curation of the ten NeuroMMSig subgraphs.

In [1]:
import sys
import time

import os

import pybel
import pybel_tools
from pybel_tools.summary import edge_summary

import pandas as pd

from nltk import agreement

import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
%matplotlib inline

In [3]:
print(sys.version)

3.6.5 (default, Apr 20 2018, 08:54:42) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)]


In [4]:
print(time.asctime())

Tue Apr  2 12:05:05 2019


In [5]:
print(pybel.get_version())

0.13.1


In [6]:
print(pybel_tools.get_version())

0.7.3-dev


In [7]:
try:
    NEUROMMSIG_BASE = os.environ['NEUROMMSIG_KNOWLEDGE']
except KeyError:
    print('Please modify the environment path to point to this repo: https://github.com/bel-enrichment/neurommsig-knowledge')
    
BEL_DIRECTORY = os.path.join(NEUROMMSIG_BASE, 'neurommsig_knowledge')

Get the JSON files located in the NeuroMMSig Knowledge repo

In [8]:
json_graph_paths = [
    os.path.join(BEL_DIRECTORY, file) 
    for file in os.listdir(BEL_DIRECTORY)
    if os.path.isfile(os.path.join(BEL_DIRECTORY, file)) and file.endswith('.json')
]

Load and combine the graphs using PyBEL

In [9]:
combined_graph = pybel.union([
    pybel.from_json_path(json_file)
    for json_file in json_graph_paths
])

# Rename the combined subgraph
combined_graph.name = '10 re-curated NeuroMMSig subgraphs '

In [10]:
combined_graph.summarize()

10 re-curated NeuroMMSig subgraphs  v5.1.2
Number of Nodes: 2003
Number of Edges: 6829
Network Density: 1.70E-03
Number of Components: 14


Count Confidence annotation values

In [11]:
input_kappa = edge_summary.count_annotation_values(combined_graph, 'Confidence')

In [12]:
data_df = data_df = pd.DataFrame({
    k: [v]
    for k, v in input_kappa.items()
})

# Plot
    
fig = plt.figure(figsize=(10, 5), dpi=140)

ax = plt.gca()

sns.barplot(data=data_df)

plt.show()

RuntimeError: libpng signaled error

<Figure size 1400x700 with 1 Axes>

In [17]:
data_df

Unnamed: 0,High,Medium,Very High,Low
0,3169,2333,430,139


In [21]:
# Assume that all the statements before were correct
total_statements = sum([v for v in input_kappa.values()])

# y1 corresponds to all correct statements before re-curation
y1 = [1] * total_statements

y2 = [1] * input_kappa['High'] + [1] * input_kappa['Very High'] + [0] * input_kappa['Low'] + [0] * input_kappa['Medium']

taskdata=[[0,str(i),str(y1[i])] for i in range(0,len(y1))]+[[1,str(i),str(y2[i])] for i in range(0,len(y2))]

ratingtask = agreement.AnnotationTask(data=taskdata)

Calculate Cohen's Kappa and Scott's pi coefficient

In [22]:
ratingtask.pi()

-0.2556359875904863

In [20]:
ratingtask.kappa()

0.0

Since we conducted a **re-curation approach** (i.e., one curator recurates BEL triplets that have previously been annotated), the following issues arise when calculating Cohen's Kappa score and Scott's pi coefficient. 

1. The second curator knows what has been coded already by the original curator. In other words, the first curator decides what is "right" and the second curator evaluates whether the first annotations relative to the first curator. This does not fit with the assumptions of both Cohen's Kappa score and Scott's pi coefficient since they first assume independence in the inter-annotation, something that does not apply to our approach.
2. We assume that the original curator had correctly annotated all the BEL triplets. This conflicts also with the way that these two coefficients are calculated. To illustrate the problem, we show the confusion matrix used to calculate the Cohen's Kappa score below. Note that the the curation results of the original curator correspond to the table columns and the curation results of the second curator to the table rows.

|| Correct | Wrong |
| ------------- |:-------------:| -----:|
| Correct     | 3599 | 0 |
| Wrong      |  2472 | 0 |

**Ultimately, we have decided not report these two statistics in the manuscript**

*References*
- Scott, W. (1955). "Reliability of content analysis: The case of nominal scale coding." Public Opinion Quarterly, 19(3), 321-325.
-  J. Cohen (1960). “A coefficient of agreement for nominal scales”. Educational and Psychological Measurement 20(1):37-46. doi:10.1177/001316446002000104.