# 1. Assessor and analyst work

## 1.0. Rating and criteria

Please [open this document](https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf)
and study chapters 13.0-13.4. Your task will be to assess the organic answers of search engines given the same query.

## 1.1. Explore the page

For the following search engines:
- https://duckduckgo.com/
- https://www.bing.com/
- https://ya.ru/
- https://www.google.com/

Perform the same query: "**How to get from Kazan to Voronezh**".

Discuss with your TA the following:
1. Which elements you may identify at SERP? Ads, snippets, blends from other sources, ...?
2. Where are organic results? How many of them are there?

## 1.2. Rate the results of the search engine

If there are many of you in the group, assess all search engines, otherwise choose 1 or 2. There should be no less than 5 of your for each search engine. Use the scale from the handbook, use 0..4 numerical equivalents for `[FailsM, SM, MM, HM, FullyM]`. 

Compute:
- average relevance and standard deviation for each SERP element.
- [Fleiss kappa score](https://en.wikipedia.org/wiki/Fleiss%27_kappa#Worked_example) for your group. Use [this implementation](https://www.statsmodels.org/dev/generated/statsmodels.stats.inter_rater.fleiss_kappa.html).
- [Kendall rank coefficient](https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient) for some pairs in your group. Use [this implementation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html).

Discuss numerical results. Did you agree on the relevance? Did you agree on the rank? What is the difference?

In [19]:
import numpy as np
# example input by users
ranking_data = np.array([
    [4, 4, 4, 4, 4, 4, 4, 4, 4, 4], # assessor 1 relevance
    [4, 2, 3, 3, 2, 2, 2, 2, 3, 2], # 2
    [4, 4, 4, 2, 3, 2, 3, 2, 2, 2],
    [4, 4, 4, 3, 4, 3, 3, 3, 3, 4],
    [4, 2, 4, 4, 4, 4, 4, 3, 4, 0],
    [4, 1, 4, 4, 1, 3, 3, 2, 4, 2],
    [4, 2, 4, 4, 3, 2, 2, 2, 2, 2],
    [4, 2, 4, 3, 2, 4, 2, 2, 4, 2],
    [4, 4, 4, 3, 4, 3, 4, 3, 3, 0],
    [4, 4, 4, 2, 4, 4, 3, 0, 3, 2],
    [4, 4, 4, 3, 2, 2, 2, 2, 2, 2],
    [4, 2, 4, 3, 3, 3, 3, 2, 3, 2],
    [4, 4, 4, 2, 2, 2, 2, 1, 1, 1],
])

Averages ang standard deviations per item.

In [9]:
import math 

res = []
for i in range(len(ranking_data[0])):
    sum = 0
    for j in range(len(ranking_data)):
        sum += ranking_data[j][i]
        
    mean = sum / len(ranking_data)
    
    sum = 0
    for j in range(len(ranking_data)):
        term = ranking_data[j][i] - mean
        sum += term * term
    
    std = math.sqrt(sum / len(ranking_data))
    res.append((mean, std))
        
print(res)

[(4.0, 0.0), (3.0, 1.1094003924504583), (3.923076923076923, 0.2664693550105965), (3.076923076923077, 0.7297563831157798), (2.923076923076923, 0.997037030524286), (2.923076923076923, 0.8284868934053083), (2.8461538461538463, 0.7692307692307693), (2.1538461538461537, 0.9483713850721502), (2.923076923076923, 0.9166442529086912), (1.9230769230769231, 1.1409536133993328)]


Fleiss kappa score

In [12]:
#!pip install s tatsmodels

In [27]:
from statsmodels.stats.inter_rater import aggregate_raters, fleiss_kappa
tr = ranking_data.T

ar, cl = aggregate_raters(tr)
fk = fleiss_kappa(ar)
print(ar)
print(cl)
print(fk)

[[ 0  0  0  0 13]
 [ 0  1  5  0  7]
 [ 0  0  0  1 12]
 [ 0  0  3  6  4]
 [ 0  1  4  3  5]
 [ 0  0  5  4  4]
 [ 0  0  5  5  3]
 [ 1  1  7  3  1]
 [ 0  1  3  5  4]
 [ 2  1  8  0  2]]
[0 1 2 3 4]
0.1643502432244614


Kendall tau score is pairwise. Compare one to another.

In [29]:
from scipy.stats import kendalltau

for i in range(9):
    print(i, i + 1, kendalltau(ranking_data[i], ranking_data[i + 1]))

0 1 KendalltauResult(correlation=nan, pvalue=nan)
1 2 KendalltauResult(correlation=0.20739033894608505, pvalue=0.49841635220157854)
2 3 KendalltauResult(correlation=0.6465790872963897, pvalue=0.04125001659393949)
3 4 KendalltauResult(correlation=-0.28577380332470415, pvalue=0.36836447825890395)
4 5 KendalltauResult(correlation=0.5443310539518175, pvalue=0.06285135804535168)
5 6 KendalltauResult(correlation=0.4169751944147297, pvalue=0.159503942345546)
6 7 KendalltauResult(correlation=0.3931079294405248, pvalue=0.2061280020855033)
7 8 KendalltauResult(correlation=0.0, pvalue=1.0)
8 9 KendalltauResult(correlation=0.6141827746434741, pvalue=0.040531251037785)


# 2. Engineer work

You will create a bucket of URLs which are relevant for the query **"free cloud git"**. Then you will automate the search procedure using https://serpapi.com/, or https://developers.google.com/custom-search/v1/overview, or whatever.

Then you will compute MRR@10 and Precision@10.

## 2.1. Build your bucket here

In [30]:
rel_bucket = [
    "github.com",
    "bitbucket.org",
    "gitpod.io",
    "gitlab.com",
    "azure.microsoft.com",
    "gitea.io",
    "sourceforge.net",
    "codebasehq.com",
    "source.cloud.google.com",
]

query = "free git cloud"

## 2.2. Relevance assessment

Write the code to check that the obtained document is relevant (True) or not (False).

In [31]:
def is_rel(resp_url):
    return resp_url in rel_bucket

## 2.3. Automation

Get search results from the automation tool you use.

In [36]:
import requests
import BeautifulSoup
url = "http://google.com/search?client=safari&rls=en&q=Google+Search+api&ie=UTF-8&oe=UTF-8"
doc = requests.get(url).content

soup = BeautifulSoup(doc)
# Your code here
rels = []

print(soup)

ModuleNotFoundError: No module named 'BeautifulSoup'

In [75]:
rels

[0, 1, 0, 1, 1, 0, 0, 1, 0, 0]

## 2.4. MRR

Compute MRR:

In [None]:
def mrr(list_of_lists, k=10):
    # todo your code here
    return 0.

In [None]:
mrr([rels]) # BTW, why do I wrap the list into additional brackets? :)

## 2.5. Precision
Compute mean precision:

In [None]:
def mp(list_of_lists, k=10):
    # todo your code here
    return 0.

In [None]:
mp([rels])