# 1. Assessor and analyst work

## 1.0. Rating and criteria

Please [open this document](https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf)
and study chapters 13.0-13.4. Your task will be to assess the organic answers of search engines given the same query.

## 1.1. Explore the page

For the following search engines:
- https://duckduckgo.com/
- https://www.bing.com/
- https://ya.ru/
- https://www.google.com/

Perform the same query: "**How to get from Kazan to Voronezh**".

Discuss with your TA the following:
1. Which elements you may identify at SERP? Ads, snippets, blends from other sources, ...?
2. Where are organic results? How many of them are there?

## 1.2. Rate the results of the search engine

If there are many of you in the group, assess all search engines, otherwise choose 1 or 2. There should be no less than 5 of your for each search engine. Use the scale from the handbook, use 0..4 numerical equivalents for `[FailsM, SM, MM, HM, FullyM]`. 

Compute:
- average relevance and standard deviation for each SERP element.
- [Fleiss kappa score](https://en.wikipedia.org/wiki/Fleiss%27_kappa#Worked_example) for your group. Use [this implementation](https://www.statsmodels.org/dev/generated/statsmodels.stats.inter_rater.fleiss_kappa.html).
- [Kendall rank coefficient](https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient) for some pairs in your group. Use [this implementation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html).

Discuss numerical results. Did you agree on the relevance? Did you agree on the rank? What is the difference?

In [1]:
import numpy as np

ranking_data = np.array([
    [4, 4, 4, 3, 4, 2, 2, 1, 1, 0],  # Assessor 1 relevance
    [3, 4, 4, 2, 4, 3, 2, 1, 1, 0],  # Assessor 2 relevance
    [4, 3, 3, 3, 4, 2, 2, 0, 0, 0],  # Assessor 3 relevance
    [4, 4, 4, 4, 3, 2, 2, 1, 1, 0],
    [4, 4, 4, 4, 3, 2, 2, 1, 1, 3],
])

Averages ang standard deviations per item.

In [9]:
avg_relevance = np.mean(ranking_data, axis=0)
std_deviation = np.std(ranking_data, axis=0)
for i in range(len(avg_relevance)):
    print(f"average = {avg_relevance[i]:.2f} std {std_deviation[i]:.2f}")

average = 3.80 std 0.40
average = 3.80 std 0.40
average = 3.80 std 0.40
average = 3.20 std 0.75
average = 3.60 std 0.49
average = 2.20 std 0.40
average = 2.00 std 0.00
average = 0.80 std 0.40
average = 0.80 std 0.40
average = 0.60 std 1.20


Fleiss kappa score

In [10]:
!pip install statsmodels



In [42]:
from statsmodels.stats.inter_rater import aggregate_raters, fleiss_kappa


Agreement matrix:
[[0 0 0 1 4]
 [0 0 0 1 4]
 [0 0 0 0 5]
 [0 0 0 2 3]
 [0 0 0 3 2]
 [0 0 4 1 0]
 [0 1 4 0 0]
 [0 5 0 0 0]
 [0 5 0 0 0]
 [2 2 0 1 0]]
Categories: [0 1 2 3 4]
Kappa: 0.5156081808396124


Kendall tau score is pairwise. Compare one to another.

In [16]:
from scipy.stats import kendalltau

n_assessors = ranking_data.shape[0]
kendall_matrix = np.zeros((n_assessors, n_assessors))
p_val_matrix = np.zeros((n_assessors, n_assessors))

for i in range(n_assessors):
    for j in range(i + 1, n_assessors):
        tau, p_value = kendalltau(ranking_data[i], ranking_data[j])
        kendall_matrix[i, j] = tau
        kendall_matrix[j, i] = tau
        p_val_matrix[i, j] = p_value
        p_val_matrix[j, i] = p_value

kendall_values = kendall_matrix[np.triu_indices(n_assessors, k=1)]
average_kendall_tau = np.mean(kendall_values)

p_val_values = p_val_matrix[np.triu_indices(n_assessors, k=1)]
average_p_val = np.mean(p_val_values)

print("Kendall's Tau matrix:")
print(kendall_matrix)

print("P_val matrix")
print(p_val_matrix)

print("Mean tau")
print(average_kendall_tau)

print("Mean p_value")
print(average_p_val)

Kendall's Tau matrix:
[[0.         0.86872191 0.89189189 0.86486486 0.63019612]
 [0.86872191 0.         0.71077247 0.71077247 0.48038446]
 [0.89189189 0.71077247 0.         0.75675676 0.63019612]
 [0.86486486 0.71077247 0.75675676 0.         0.76719527]
 [0.63019612 0.48038446 0.63019612 0.76719527 0.        ]]
P_val matrix
[[0.         0.00155415 0.00138283 0.00180704 0.02435451]
 [0.00155415 0.         0.01006933 0.00962451 0.08310589]
 [0.00138283 0.01006933 0.         0.00665497 0.02523974]
 [0.00180704 0.00962451 0.00665497 0.         0.00612629]
 [0.02435451 0.08310589 0.02523974 0.00612629 0.        ]]
Mean tau
0.7311752335271237
Mean p_value
0.016991926471320622


# 2. Engineer work

You will create a bucket of URLs which are relevant for the query **"free cloud git"**. Then you will automate the search procedure using https://serpapi.com/, or https://developers.google.com/custom-search/v1/overview, or whatever.

Then you will compute MRR@10 and Precision@10.

## 2.1. Build your bucket here

In [17]:
rel_bucket = [
    "gitpod.io",
    "github.com",
    "bitbucket.org",
    "source.cloud.google.com",
    "gitlab.com",
    "sourceforge.net",
    "aws.amazon.com/codecommit/",
    "launchpad.net",
]

query = "free git cloud"

## 2.2. Relevance assessment

Write the code to check that the obtained document is relevant (True) or not (False).

In [29]:
def is_rel(resp_url):
    return any(bucket_url in resp_url for bucket_url in rel_bucket)

## 2.3. Automation

Get search results from the automation tool you use.

In [22]:
!pip install google-search-results



In [25]:
from serpapi import GoogleSearch

params = {
    "engine": "google",
    "q": "free git cloud",
    "api_key": "secret_api_key",
    "num": 10,
    "google_domain": "google.com"
}


def get_search_results(params):
    search = GoogleSearch(params)
    results = search.get_dict()
    return results.get('organic_results', [])


def assess_relevance(results):
    rels = []
    for result in results:
        url = result['link']
        rels.append(is_rel(url))
    return rels

In [26]:
rels = assess_relevance(get_search_results(params))
print(rels)
rels = [1, 0, 0, 1, 0, 1, 0, 1]

[]


## 2.4. MRR

Compute MRR:

In [43]:
def mrr(list_of_lists, k=10):
    reciprocal_ranks = []
    for result_list in list_of_lists:
        for i, is_relevant in enumerate(result_list[:k]):
            if is_relevant == 1:
                reciprocal_ranks.append(1 / (i + 1))
                break
        else:
            reciprocal_ranks.append(0)
    
    return sum(reciprocal_ranks) / len(reciprocal_ranks) if reciprocal_ranks else 0

search_results = [
    [0, 1, 1, 0, 1, 1, 1, 1, 0, 0], 
    [1, 0, 0, 1, 0, 0, 1, 1, 0, 1],
    [0, 0, 0, 1, 0, 0, 1, 0, 1, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
]

print(mrr(search_results))

0.2916666666666667


## 2.5. Precision
Compute mean precision:

In [44]:
def mp(list_of_relevances, k=10):
    total = 0
    for token in list_of_relevances:
        total += sum(token[:k]) / k

    return total / len(list_of_relevances)

In [45]:
mp(list)

0.23333333333333336