This competion uses kappa (specifically quadratic weighted kappa) for evaluation. Unlike accuracy score, this metric takes **chance** into account. [Here](https://en.wikipedia.org/wiki/Cohen%27s_kappa#Example) is nice example.

## So what is (Cohen's) Kappa?

Kappa is *Chance adjusted index* for reliability of categorical measurements. This means that, it accounts for amount of agreement between raters that can be expected to have occurred due to chance (i.e., random guessing). This is unlike accuracy where you can get relatively fair score by intelligently random guessing.
Here 3 matrices are involved:
* N&times;N(N is number of categories) histogram matrix **O**, where each element e<sub>ij</sub> of O corresponds to the number of observations that received a category i by A and a category j by B. In our case, N=5 so O is 5&times;5 matrix. Each element e<sub>ij</sub> will represent count of images that recieved category i by A(say human) and category j by B(our models). So greater the number in diagonal, greater good.
* N&times;N weights matrix **W**, where each element is calculated using distance between ratings. More on this later.
* N&times;N histogram matrix of expected ratings **E**, which is calculated as the outer product between each rater's histogram *vector* of ratings.
**E** is normalized so that **E** and **O** have the same sum.
Now, each cell in **O** is multiplied by corresponding cell in **W** and sum the results across all the cells. Call this P<sub>o</sub>. Same is done for **E**. Call this P<sub>e</sub>.
Then kappa is calculated as below:$$Kappa = \frac{P_o - P_e}{1 - p_e}$$
You can find good example [here](http://vassarstats.net/kappaexp.html)

## What is "quadratic weighted"?

When no weight matrix is involved, its called unweighted kappa. This means that there is no progression between categories. They are nominal.But when categories are ordinal, i.e., they have some kind of progression relationship, for example: sad, ok, happy, very happy or No DR, Mild, Moderate, Severe, Proliferative DR, then weighted kappa is used.
Concept of **distance** is used to to calculate each element of **W**. Distance between category 2 and 0 is 2, between 3, 0 is 3, between 4 and 1 is 3 and so on. 
Linear weight is calculated as: $$weight = 1 - \frac{|\text{distance}|}{\text{Maximum Possible Distance}}$$
Quadratic weight is calculated as: $$weight = 1 - \frac{|\text{distance}|^2 }{\text{(Maximum Possible Distance)}^2}$$
In this competition, maximum possible distance will be 4.

## So, what this means for us?

* 1st thing 1st, any random guess will be penalized.
* *If you use ensemble of model by averaging their prediction*, you will get floating values between 0 and 4, which you will then have to convert to integer using round, clip or some other method. But keep in mind that you will be penalized more if distance is more. So if your average is 1.6, and true label is 3, then if you predict to 1, distance would be 2, hence less weight: $$1-\frac{2^2}{4^2} = 0.75$$. But if you predict 2, then distance will be 1 and thus, comparitively more weight: $$1-\frac{1^2}{4^2} = 0.9375$$.
* This means we need to come with threshold for every category. Here's where, [OptimizedRounder class for Quadratic Weighted Kappa (QWK)](https://www.kaggle.com/abhishek/optimizer-for-quadratic-weighted-kappa) by Abhishek, comes into play.

## Using Dummy Classifier

Let's now try to predict using random guess but stratified, using dummy classifier. Note difference between score and QWK.

In [None]:
import numpy as np
import pandas as pd
from sklearn.dummy import DummyClassifier
from sklearn.metrics import cohen_kappa_score

import os
print(os.listdir("../input"))

In [None]:
train = pd.read_csv('../input/train.csv')
train.head()

Let's use stratified strategy. So, this classifier will generate predictions by respecting the training set’s class distribution.

In [None]:
dummy_clf = DummyClassifier() # Default strategy is 'stratified'
dummy_clf.fit(train.id_code, train.diagnosis) # Inputs doesn't matter, it's dummy
train_predictions = dummy_clf.predict(train.id_code)
print(f"Score: {dummy_clf.score(train.id_code, train.diagnosis)}")
print(f"Cohen kappa score: {cohen_kappa_score(train_predictions, train.diagnosis, weights='quadratic')}")

In [None]:
test = pd.read_csv('../input/test.csv')
test.head()

In [None]:
predictions = dummy_clf.predict(test.id_code)
submissions = pd.read_csv('../input/sample_submission.csv')
submissions['diagnosis'] = predictions
submissions.head()

In [None]:
submissions.to_csv('submission.csv', index=False)

Check out dummy's super high LB score. 1st, if you look bottom-up :D

Please let me know if you have any questions, corrections or any other comments.
Also bonus point is that you can use this metric as loss ;). Happy Kaggling!!