# 1. Assessor and analyst work

## 1.0. Rating and criteria

Please [open this document](https://static.googleusercontent.com/media/guidelines.raterhub.com/en//searchqualityevaluatorguidelines.pdf)
and study chapters 13.0-13.4. Your task will be to assess the organic answers of search engines given the same query.

## 1.1. Explore the page

For the following search engines:
- https://duckduckgo.com/
- https://www.bing.com/
- https://ya.ru/
- https://www.google.com/

Perform the same query: "**How to get from Kazan to Voronezh**".

Discuss with your TA the following:
1. Which elements you may identify at SERP? Ads, snippets, blends from other sources, ...?
2. Where are organic results? How many of them are there?

## 1.2. Rate the results of the search engine

If there are many of you in the group, assess all search engines, otherwise choose 1 or 2. There should be no less than 5 of your for each search engine. Use the scale from the handbook, use 0..4 numerical equivalents for `[FailsM, SM, MM, HM, FullyM]`. 

Compute:
- average relevance and standard deviation for each SERP element.
- [Fleiss kappa score](https://en.wikipedia.org/wiki/Fleiss%27_kappa#Worked_example) for your group. Use [this implementation](https://www.statsmodels.org/dev/generated/statsmodels.stats.inter_rater.fleiss_kappa.html).
- [Kendall rank coefficient](https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient) for some pairs in your group. Use [this implementation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kendalltau.html).

Discuss numerical results. Did you agree on the relevance? Did you agree on the rank? What is the difference?

In [2]:
import numpy as np

# example input by users
ranking_data = np.array([
    [4, 4, 4, 3, 4, 2, 2, 1, 1, 0],  # assessor 1 relevance
    [4, 3, 4, 3, 3, 2, 1, 1, 1, 1],  # assessor 2 relevance
    [3, 4, 4, 4, 4, 3, 2, 1, 1, 1],  # ...
    [4, 4, 4, 4, 3, 2, 2, 1, 1, 0],
    [4, 4, 4, 4, 3, 2, 2, 1, 1, 3]
])

Averages ang standard deviations per item.

1) Calculate the Average (Hint use mean(axis=0) cause we care about the columns)

2) Calculate Sigma2: the Variance (mean((item - mean)**2 ))

3) Calcualte Sigma: the square root of the Variance



In [7]:
# TODO your code here
average_relevance = ranking_data.mean(axis=0)
sigma2 = ((ranking_data - average_relevance) ** 2).mean(axis=0)
sigma = np.sqrt(sigma2)

for i in range(ranking_data.shape[1]):
    print(f" {i} relevance {average_relevance[i]:.2f} ± {sigma[i]:.3f}")



 0 relevance 3.80 ± 0.400
 1 relevance 3.80 ± 0.400
 2 relevance 4.00 ± 0.000
 3 relevance 3.60 ± 0.490
 4 relevance 3.40 ± 0.490
 5 relevance 2.20 ± 0.400
 6 relevance 1.80 ± 0.400
 7 relevance 1.00 ± 0.000
 8 relevance 1.00 ± 0.000
 9 relevance 1.00 ± 1.095


Fleiss kappa score

In [8]:
!pip install statsmodels

Defaulting to user installation because normal site-packages is not writeable
Collecting statsmodels
  Downloading statsmodels-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m151.7 kB/s[0m eta [36m0:00:00[0m00:01[0m00:02[0m
[?25hCollecting pandas>=1.0
  Downloading pandas-2.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.7/12.7 MB[0m [31m168.8 kB/s[0m eta [36m0:00:00[0m00:01[0m00:03[0m
Collecting scipy!=1.9.2,>=1.4
  Using cached scipy-1.11.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36.3 MB)
Collecting patsy>=0.5.2
  Downloading patsy-0.5.3-py2.py3-none-any.whl (233 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.8/233.8 KB[0m [31m217.0 kB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting tzdata>=2022.1
  Using c

In [12]:
from statsmodels.stats.inter_rater import aggregate_raters, fleiss_kappa

# TODO your code here
# transpose ranking data
transposed_ranking_data = ranking_data.transpose()
agreement_matrix, categories = aggregate_raters(transposed_ranking_data)
kappa = fleiss_kappa(agreement_matrix)
print("Agreement matrix:")
print(agreement_matrix)
print(f"Categories: {categories}")
print(f"Kappa: {kappa}")


Agreement matrix:
[[0 0 0 1 4]
 [0 0 0 1 4]
 [0 0 0 0 5]
 [0 0 0 2 3]
 [0 0 0 3 2]
 [0 0 4 1 0]
 [0 1 4 0 0]
 [0 5 0 0 0]
 [0 5 0 0 0]
 [2 2 0 1 0]]
Categories: [0 1 2 3 4]
Kappa: 0.5156081808396124


Kendall tau score is pairwise. Compare one to another.

In [30]:
from scipy.stats import kendalltau

# TODO your code here
kendalltau = kendalltau(ranking_data[0], ranking_data[1])
kendalltau

SignificanceResult(statistic=0.8336550215650926, pvalue=0.0031006074932690315)

# 2. Engineer work

You will create a bucket of URLs which are relevant for the query **"free cloud git"**. Then you will automate the search procedure using https://serpapi.com/, or https://developers.google.com/custom-search/v1/overview, or whatever.

Then you will compute MRR@10 and Precision@10.

## 2.1. Build your bucket here

In [31]:
rel_bucket = [
    "gitpod.io",
    "github.com",
    "bitbucket.org",
    "source.cloud.google.com",
    "gitlab.com",
    "sourceforge.net",
    "aws.amazon.com/codecommit/",
    "launchpad.net",
]

query = "free git cloud"

## 2.2. Relevance Assessment

The purpose of this section is to introduce a function, `is_rel`, which evaluates the relevance of a document based on its URL.

### Function Overview

**Name:** `is_rel(resp_url)`

**Arguments:**
- `resp_url`: A string representing the URL of the document we wish to assess for relevance.

**Returns:**
- A boolean value (`True` or `False`) indicating whether the document is considered relevant.

### Code Explanation

In [39]:
def is_rel(resp_url):
    # Loop through each URL our known relevant bucket
    # If the current URL is found within the provided resp_url
    # The document is relevant
    # If we've gone through the entire rel_bucket and found no matches
    # The document is not relevant
    global rel_bucket
    return int(any(url in resp_url for url in rel_bucket))

## 2.3. Automation

This section introduces a procedure to fetch search results from a search engine using an automation tool (in this case, `serpapi`). This tool allows us to programmatically obtain search results based on a query.

### Procedure Overview

1. Define the `api_key` to authenticate with the service.
2. Construct a URL endpoint that specifies the search query and other parameters.
3. Fetch the search results.
4. Parse and display the results, while also assessing their relevance.


In [42]:
# Import the requests module to send HTTP requests.
import requests

# The unique API key to access the serpapi service.
api_key = "5aff1ae53da3a991a97d770bf1991833ba30a97d68925ede4cb0003285c727ba"

# Construct the URL for fetching search results, specifying the query, language (English),
# geographic location (US), and the domain (google.com).
# url = f'https://serpapi.com/search.json?q=free+cloud+git&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key={api_key}'
url = f"https://serpapi.com/search.json?q={query}&hl=en&gl=us&google_domain=google.com&api_key={api_key}"
# Send a GET request to the constructed URL and parse the JSON response.
js = requests.get(url).json()

# Initialize a list to store relevance indicators (1 for relevant, 0 for non-relevant).
rels = []

# Iterate over the 'organic_results' section of the response.
for result in js["organic_results"]:
    # Display the position and title of the search result.
    print(f'Position: {result["position"]}')
    print(f'Title: {result["title"]}')
    # Display the link (URL) of the search result.
    print(f'URL: {result["link"]}')
    # Check the relevance of the link using the is_rel function and display the result.
    relevance = is_rel(result["link"])
    print(f'Relevance: {relevance}')
    # Append the relevance indicator (1 or 0) to the rels list.
    rels.append(relevance)
    print()  # Print a blank line for better visual separation.

Position: 1
Title: 6 places to host your git repository
URL: https://opensource.com/article/18/8/github-alternatives
Relevance: 0

Position: 2
Title: GitLab: The DevSecOps Platform
URL: https://about.gitlab.com/
Relevance: 1

Position: 3
Title: GitHub: Let's build from here · GitHub
URL: https://github.com/
Relevance: 1

Position: 4
Title: Gitpod: Always ready-to-code.
URL: https://www.gitpod.io/
Relevance: 1

Position: 5
Title: Bitbucket | Git solution for teams using Jira
URL: https://bitbucket.org/product
Relevance: 1

Position: 6
Title: Best 13 Free Version Control Hosting Software Picks in 2023
URL: https://www.g2.com/categories/version-control-hosting/free
Relevance: 0

Position: 7
Title: 14 Git Hosting Services Compared | Tower Blog
URL: https://www.git-tower.com/blog/git-hosting-services-compared/
Relevance: 0

Position: 8
Title: Best free git hosting? : r/git
URL: https://www.reddit.com/r/git/comments/46t07s/best_free_git_hosting/
Relevance: 0

Position: 9
Title: Sourcetree | 

In [43]:
# Display the final list of relevance indicators for all the search results.
print(rels)

[0, 1, 1, 1, 1, 0, 0, 0, 0, 0]


## 2.4. MRR (Mean Reciprocal Rank)

MRR stands for "Mean Reciprocal Rank". It is a statistical measure used to evaluate the quality of a list of ranked items, specifically in information retrieval systems like search engines. MRR calculates the average of the reciprocal ranks of the first relevant item in the list.

### Concept:

If we have a set of ranked items, the reciprocal rank is the multiplicative inverse of the rank of the first relevant item. For example, if the first relevant item was in the 2nd position, its reciprocal rank is $\frac{1}{2}$. If no relevant items are found, then a default rank (often the length of the list plus one) is used.

The MRR is the average of the reciprocal ranks for a set of queries or lists.


In [45]:
def mrr(list_of_lists, k=10):
    # Initialize a variable to accumulate the sum of reciprocal ranks.
    r = []
    for lst in list_of_lists:
        try:
            r.append(1 / (lst.index(1) + 1))
        except ValueError:
            r.append(1/ (k + 1))
    return np.mean(r)
        
    # Iterate over each list in the list of lists.
    # If there's no relevant item in the list, use the default rank (k+1). 
    # Otherwise, compute the reciprocal rank of the first relevant item.
    # Return the mean of the accumulated reciprocal ranks.


In [47]:
mrr([rels])  # BTW, why do I wrap the list into additional brackets? :)

0.5

## 2.5. Precision

Precision is one of the fundamental metrics in information retrieval. It quantifies how many of the retrieved items (or documents) are relevant. Specifically, precision is defined as the ratio of relevant retrieved items to the total number of retrieved items. 

Mathematically:
$$ \text{Precision} = \frac{\text{Number of Relevant Retrieved Items}}{\text{Total Number of Retrieved Items}} $$


In the context of multiple sets of retrieved items (like multiple search queries), the mean precision is often calculated to provide an average measure across all the sets.

### Code Explanation:
## 2.5. Precision
Compute mean precision:

In [51]:
def mp(list_of_lists, k=10):
    # Initialize a variable to accumulate the sum of precisions.
    p = 0
    # Iterate over each list in the list of lists.
    for lst in list_of_lists:
    # Calculate precision for the current list.
        p += sum(lst) / k
    return p / len(list_of_lists)
    # sum(l) gives the number of relevant items in the list, and k is the total number of items in the list.
    # Return the mean of the accumulated precisions.

In [52]:
print(mp([rels]))

0.4
