# CAC from Raw Ratings

The raw ratings, or wide data format, is a way to organize your data in a table
format where each row represents a subject, each column a rater and each data
point at the junction of the row and column represents the rating the rater
assigned to the subject. Its main advantage is the completeness of the
information it presents. With this format, there is no loss of information as
it shows what rater rated what subject and the specific rating assigned to
every subject. A secondary advantage of this format is its ability to use
categorical ratings as well as quantitative measurements.

Such datasets are the ones with the `raw_` prefix in the
[datasets](../irrCAC.rst#module-irrCAC.datasets) module.
One example dataset is [raw_4raters](../irrCAC.rst#irrCAC.datasets.raw_4raters).

In [1]:
from irrCAC.datasets import raw_4raters

data = raw_4raters()
data

Unnamed: 0_level_0,Rater1,Rater2,Rater3,Rater4
Units,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,1.0,1.0,,1.0
2,2.0,2.0,3.0,2.0
3,3.0,3.0,3.0,3.0
4,3.0,3.0,3.0,3.0
5,2.0,2.0,2.0,2.0
6,1.0,2.0,3.0,4.0
7,4.0,4.0,4.0,4.0
8,1.0,1.0,2.0,1.0
9,2.0,2.0,2.0,2.0
10,,5.0,5.0,5.0


As you can see, a dataset of raw ratings is merely a listing of ratings that
the raters assigned to the subjects. Each row is associated with a single
subject. Typically, the same subject would be rated by all or some of the
raters. The dataset `raw_4raters` contains some missing ratings represented by
the symbol `NaN`, suggesting that some raters did not rate all subjects. As a
matter of fact, in this particular case, no rater rated all subjects.

.. note:: The categories appears as floating numbers because of the `NaN` values.


## Initialize CAC

To compute the various agreement coefficients using the raw ratings, first
initialize a [CAC](../irrCAC.rst#module-irrCAC.raw) object.
By initializing the object, it has information about the subjects, raters,
categories, and weights.

In [2]:
from irrCAC.raw import CAC

cac_4raters = CAC(data)
print(cac_4raters)


<irrCAC.raw.CAC Subjects: 12, Raters: 4, Categories: [1.0, 2.0, 3.0, 4.0, 5.0], Weights: "identity">


To calculate the agreement coefficients, you call the appropriate methods.

### Fleiss' Coefficient

To calculate the Fleiss' coefficient, call the `fleiss()` method.

In [3]:
fleiss = cac_4raters.fleiss()
fleiss

{'est': {'coefficient_value': 0.76117,
  'coefficient_name': "Fleiss' kappa",
  'confidence_interval': (0.42438, 1),
  'p_value': 0.00042,
  'z': 4.97434,
  'se': 0.15302,
  'pa': 0.81818,
  'pe': 0.23872},
 'weights': array([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]]),
 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}

### Gwet's Coefficient

To calculate Gwet's coefficient, call the `gwet()` method.

In [4]:
gwet = cac_4raters.gwet()
gwet

{'est': {'coefficient_value': 0.77544,
  'coefficient_name': 'AC1',
  'confidence_interval': (0.46081, 1),
  'p_value': 0.00021,
  'z': 5.42458,
  'se': 0.14295,
  'pa': 0.81818,
  'pe': 0.19032},
 'weights': array([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]]),
 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}

## Weighted Analysis

We can use custom weights or predefined weight types initializing the
[CAC](../irrCAC.rst#module-irrCAC.raw) objects. For the available weight
types see the [Weights](../irrCAC.rst#module-irrCAC.weights) module.

In the following example, we initialize a new object on the same data using
[quadratic](../irrCAC.rst#irrCAC.weights.Weights.quadratic) weights.

In [5]:
cac_4raters_quadratic = CAC(data, weights='quadratic')
cac_4raters_quadratic

<irrCAC.raw.CAC Subjects: 12, Raters: 4, Categories: [1.0, 2.0, 3.0, 4.0, 5.0], Weights: "quadratic">

To see the weights' matrix we can print the `weights_mat` attribute of the
object.

In [6]:
cac_4raters_quadratic.weights_mat

array([[1.    , 0.9375, 0.75  , 0.4375, 0.    ],
       [0.9375, 1.    , 0.9375, 0.75  , 0.4375],
       [0.75  , 0.9375, 1.    , 0.9375, 0.75  ],
       [0.4375, 0.75  , 0.9375, 1.    , 0.9375],
       [0.    , 0.4375, 0.75  , 0.9375, 1.    ]])

Next, we simply call the method of the coefficient we want the calculation.
Here for example we show the weighted Gwet coefficient.

In [7]:
gwet_quadratic = cac_4raters_quadratic.gwet()
gwet_quadratic

{'est': {'coefficient_value': 0.914,
  'coefficient_name': 'AC2',
  'confidence_interval': (0.68518, 1),
  'p_value': 0.0,
  'z': 8.79166,
  'se': 0.10396,
  'pa': 0.97538,
  'pe': 0.7137},
 'weights': array([[1.    , 0.9375, 0.75  , 0.4375, 0.    ],
        [0.9375, 1.    , 0.9375, 0.75  , 0.4375],
        [0.75  , 0.9375, 1.    , 0.9375, 0.75  ],
        [0.4375, 0.75  , 0.9375, 1.    , 0.9375],
        [0.    , 0.4375, 0.75  , 0.9375, 1.    ]]),
 'categories': [1.0, 2.0, 3.0, 4.0, 5.0]}

To compare the results of the unweighted (which is the
[identity](../irrCAC.rst#irrCAC.weights.Weights.identity) weights) and
the calculation with the [quadratic](../irrCAC.rst#irrCAC.weights.Weights.quadratic)
weights, we display the results side by side.

In [8]:
import pandas as pd

df = pd.DataFrame(
    zip(gwet['est'].items(),
        gwet_quadratic['est'].items()),
    columns=['Identity Weights', 'Quadratic Weights'])
df

Unnamed: 0,Identity Weights,Quadratic Weights
0,"(coefficient_value, 0.77544)","(coefficient_value, 0.914)"
1,"(coefficient_name, AC1)","(coefficient_name, AC2)"
2,"(confidence_interval, (0.46081, 1))","(confidence_interval, (0.68518, 1))"
3,"(p_value, 0.00021)","(p_value, 0.0)"
4,"(z, 5.42458)","(z, 8.79166)"
5,"(se, 0.14295)","(se, 0.10396)"
6,"(pa, 0.81818)","(pa, 0.97538)"
7,"(pe, 0.19032)","(pe, 0.7137)"
