## Información no Estructurada
# Práctica 2 &mdash; Evaluación de buscadores
### Autores: Íñigo Gómez Carvajal y Jon Zorrilla Gamboa 

# 1.  Evaluación con juicios de relevancia

### Conexión a Google Drive

Para acceder directamente a la hoja de cálculo con los datos introducidos por los estudiantes.

In [None]:
import numpy as np
from google.colab import auth
import pandas as pd
import time

auth.authenticate_user()
import gspread
from google.auth import default
creds, _ = default()
gc = gspread.authorize(creds)

### Lectura de datos de la hoja de cálculo

In [None]:
# Función de comodidad para leer datos de pestañas de una hoja de cálculo
def read_sheet(wb, name, cols=False, colnames=False, coltypes=False):
  df = pd.DataFrame(wb.worksheet(name).get_all_values())[1:]
  if cols: df = df[df.columns[cols]] 
  if colnames: df.columns = colnames
  if coltypes: df = df.astype(coltypes)
  return df

# Nos conectamos a la hoja de cálculo
wb = gc.open_by_url('https://docs.google.com/spreadsheets/d/1nWr6r1ZkLH29WTyhqr4oz05HgvQv0tmE-IPHdZUex0c')
print('Reading data...')

# Leemos los rankings
print('  Reading tab', 'Rankings', end='')
rankings = read_sheet(wb, 'Rankings', [0,1,2,3,4], ['qid', 'docid', 'pos', 'score', 'system'], {'qid':'int', 'pos':'int', 'score':'int'})
rankings = rankings[rankings.docid != '']
print('..... ok')

# Leemos los juicios de relevancia
qrels = pd.DataFrame()
for ws in wb.worksheets():
  if ws.title.startswith('Acierto q'): 
    print('  Reading tab', ws.title, end='')
    qrels = qrels.append(read_sheet(wb, ws.title, [0, 1, 2], ['qid', 'docid', 'rel'], {'qid':'int'}))
    print('... ok')
qrels = qrels.reset_index(drop=True)
print('Done.')

# Comprobamos duplicados
pd.set_option('display.max_colwidth', 50)
duplicates = qrels[qrels.duplicated(['qid', 'docid'])]
if not duplicates.empty: print('\nDuplicate relevance judgments\n-----------------------------\n', 
                               duplicates.to_string(index=False, max_colwidth=70))

# Comprobamos que coincidan las URLs en los rankings y juicios de relevancia 
# (y de paso hacemos un join en ranking_rels para facilitar la implementación de métricas)
qm = qrels.merge(rankings, how='left')
ranking_rels = rankings.merge(qrels, how='left')
missing = ranking_rels[pd.isna(ranking_rels.rel)]
if missing.size: print('\nMissing relevance judgments\n---------------------------\n', 
                       missing[['qid', 'docid']].to_string(index=False, max_colwidth=70))
missing = qm[pd.isna(qm.pos)]
if missing.size: print('\nMissing results\n---------------\n', 
                       missing[['qid', 'docid']].to_string(index=False, max_colwidth=70))

Reading data...
  Reading tab Rankings..... ok
  Reading tab Acierto q0... ok
  Reading tab Acierto q1... ok
  Reading tab Acierto q2... ok
  Reading tab Acierto q3... ok
  Reading tab Acierto q4... ok
  Reading tab Acierto q5... ok
  Reading tab Acierto q7... ok
  Reading tab Acierto q8... ok
Done.


### Implementación de las métricas

In [None]:
ranking_rels = ranking_rels.astype({'rel': 'int64'})

def precision(ranking_rels, qid, system):

  returned = len(ranking_rels[(ranking_rels['qid'] == qid) & (ranking_rels['system'] == system)])
  intersect = len(ranking_rels[(ranking_rels['qid'] == qid) & (ranking_rels['rel'] > 0) & (ranking_rels['system'] == system)])

  return intersect/returned


def recall(ranking_rels, qid, system):

  relevant = len(ranking_rels[(ranking_rels['qid'] == qid) & (ranking_rels['rel'] > 0)].drop_duplicates(subset=["qid", "docid"]))
  intersect = len(ranking_rels[(ranking_rels['qid'] == qid) & 
                               (ranking_rels['rel'] > 0) & 
                               (ranking_rels['system'] == system)])
  return intersect/relevant

def F1_score(ranking_rels, qid, system, alpha=.5):

  prec = precision(ranking_rels, qid, system)
  rec = recall(ranking_rels, qid, system)

  if (prec == 0 or rec == 0):
    return 0
  
  else:
    return 1/(alpha/prec + (1-alpha)/rec)

def _MRR(ranking_rels, qid, system):
    mrr = 0.0
    group = ranking_rels[(ranking_rels["qid"] == qid) & (ranking_rels["system"] == system)]
    relevant = group.loc[(group["rel"] > 0)]
    if len(relevant) > 0:
        mrr = 1./(np.argmax(group['rel'] > 0) + 1)
    return mrr
  
def _nDCG(ranking_rels, qid, system, k):
    group = ranking_rels[(ranking_rels["system"] == system) & (ranking_rels["qid"] == qid)]
    ndcg_values = []
    
    topk_scores = group['rel'].values[:k]
    topk_positions = np.arange(1, k+1)
    topk_scores = [int(i) for i in topk_scores]
    topk_positions = [int(i) for i in topk_positions]
    # Compute DCG
    dcg = sum([(2**topk_scores[i] - 1)/np.log2(topk_positions[i] + 1) for i in range(len(topk_scores))])

    # Compute IDCG
    ideal_scores = np.sort(group['rel'].values)[::-1][:k]
    ideal_positions = np.arange(1, k+1)
    ideal_scores = [int(i) for i in ideal_scores]
    ideal_positions = [int(i) for i in ideal_positions]
    idcg = sum([(2**ideal_scores[i] - 1) / np.log2(ideal_positions[i] + 1) for i in range(len(ideal_scores))])

    # Compute nDCG
    if idcg > 0:
        return dcg / idcg
    else:
        return 0

def _ERR(ranking_rels, qid, system):


  ERR = 0.0
  rel_docs = ranking_rels[(ranking_rels["qid"] == qid) & (ranking_rels["system"] == system)]["rel"].values
  rel_docs = [int(i) for i in rel_docs]
  total_rel = sum(rel_docs)
  if total_rel == 0:
    return 0.0
  p_stop = []
  p_aux = []

  p = [(2**rel_docs[i] - 1) / (2**2) for i in range(len(rel_docs))]
  for i in range(len(rel_docs)):
    prod = p[i]
    for j in range(i):
      prod *= (1 - p[j])

    ERR += prod/(i + 1)
  
  return ERR

Para mostrar los resultado con un formato de tabla apropiado y sobre el que poder hacer agregaciones, vamos a conformar un DataFrame con los resultados de las métricas implementadas evaluando cada query con cada sistema. Esto lo que nos permitirá es poder hacer las medias solo calculando estos datos.

In [None]:
result = {"qid": [], 
          "system": [],
          "SetP": [],
          "SetR": [],
          "SetF": [],
          "nDCG@10": [],
          "MRR": [],
          "ERR": []}
for system in ["bing", "google", "duckduckgo", "yandex"]:
  for id in range(9):
    if id != 6:
      result["qid"].append(id)
      result["system"].append(system)
      result['SetP'].append(precision(ranking_rels=ranking_rels, qid=id, system=system))
      result['SetR'].append(recall(ranking_rels=ranking_rels, qid=id, system=system))
      result['SetF'].append(F1_score(ranking_rels=ranking_rels, qid=id, system=system))
      result['nDCG@10'].append(_nDCG(ranking_rels=ranking_rels, qid=id, system=system, k=10))
      result['MRR'].append(_MRR(ranking_rels=ranking_rels, qid=id, system=system))
      result['ERR'].append(_ERR(ranking_rels=ranking_rels, qid=id, system=system))
      

df_metrics = pd.DataFrame.from_dict(result)

In [None]:
df_metrics.head()

Unnamed: 0,qid,system,SetP,SetR,SetF,nDCG@10,MRR,ERR
0,0,bing,0.2,0.222222,0.210526,0.693426,0.5,0.1875
1,1,bing,1.0,0.454545,0.625,0.986325,1.0,0.862874
2,2,bing,0.4,0.266667,0.32,0.75307,1.0,0.779036
3,3,bing,0.6,0.6,0.6,0.764515,1.0,0.467278
4,4,bing,1.0,0.37037,0.540541,0.799035,1.0,0.515868


### Valor de métricas promediado por query

In [None]:
df_metrics.groupby(["qid"]).mean()

Unnamed: 0_level_0,SetP,SetR,SetF,nDCG@10,MRR,ERR
qid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.25,0.277778,0.263158,0.606132,0.5,0.315416
1,0.95,0.431818,0.59375,0.978252,1.0,0.86183
2,0.525,0.35,0.42,0.606029,0.431548,0.358068
3,0.375,0.375,0.375,0.617268,0.75,0.444437
4,0.975,0.361111,0.527027,0.84405,1.0,0.670088
5,0.65,0.382353,0.481481,0.898261,0.875,0.739136
7,0.6,0.461538,0.521739,0.895589,0.875,0.751441
8,0.825,0.25,0.383721,0.806598,0.583333,0.512564


### Valor de métricas promediado por motor de búsqueda

In [None]:
df_metrics[["system", "SetP", "SetR", "SetF", "nDCG@10","MRR", "ERR"]].groupby(["system"]).mean()

Unnamed: 0_level_0,SetP,SetR,SetF,nDCG@10,MRR,ERR
system,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
bing,0.6625,0.378692,0.463552,0.815789,0.8125,0.572029
duckduckgo,0.6625,0.372757,0.458182,0.781978,0.809524,0.587242
google,0.6875,0.391775,0.48197,0.827591,0.739583,0.624492
yandex,0.5625,0.301575,0.379234,0.700732,0.645833,0.542727


### Estudiamos la significatividad estadística de la comparación entre los dos mejores sistemas.

Como en 4 de las 6 métricas son mejores o iguales al resto, consideramos que los dos mejores motores de búsqueda son Bing y Google, por tanto, la significatividad estadística se realizará entre ambos modelos. Para evaluarlo, haremos la prueba de la binomial.

Calculamos la diferencia entre los valores de las métricas entre Google y Bing

In [None]:
df_metrics[df_metrics["system"] == "google"][["SetP", "SetR", "SetF", "nDCG@10","MRR", "ERR"]].reset_index(drop="True") - \
 df_metrics[df_metrics["system"] == "bing"][["SetP", "SetR", "SetF", "nDCG@10","MRR", "ERR"]].reset_index(drop="True")

Unnamed: 0,SetP,SetR,SetF,nDCG@10,MRR,ERR
0,-0.1,-0.111111,-0.105263,-0.337219,-0.333333,-0.145833
1,-0.1,-0.045455,-0.0625,0.013675,0.0,0.000172
2,0.1,0.066667,0.08,-0.221852,-0.75,-0.565016
3,0.0,0.0,0.0,0.232462,0.0,0.393973
4,0.0,0.0,0.0,0.129691,0.0,0.321831
5,0.2,0.117647,0.148148,0.204868,0.5,0.410938
6,0.1,0.076923,0.086957,0.053144,0.0,0.003314
7,0.0,0.0,0.0,0.019648,0.0,0.000325


Podemos ver en en nDCG y ERR que Google es mejor que Bing en 6 de las 8 queries. La probabilidad de que esto ocurra se calcula mediante la siguiente fórmula

$$\mathcal{P}(X \geq 6) = \sum_{i=6}^8 \binom{8}{6} 0.5^i 0.5^{8-i} ≈ 0.14454$$

Siendo el 0.5 la probabilidad de que que un sistema sea mejor que otro en una query.

Como podemos observar, la probabilidad de que se haya dado este resultado por mera casualidad es de un $14.54\%$, lo cual es bastante bajo; más aún teniendo en cuenta que la muestra sobre la que estamos haciendo el test es muy pequeña.

Por lo tanto, podemos concluir con un cierto grado de seguridad que Google es mejor que Bing.

$$$$

### Comprobamos que los cálculos realizados son correctos.

Para ello, vamos a replicar el DataFrame que hemos creado en los aparatados anteiores, pero esta vez va a contener los resultados generados por ir_measures.

In [None]:
!pip install ir-measures

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ir-measures
  Downloading ir_measures-0.3.1.tar.gz (46 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.5/46.5 KB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pytrec-eval-terrier>=0.5.2
  Downloading pytrec_eval_terrier-0.5.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (287 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m287.2/287.2 KB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting cwl-eval>=1.0.10
  Downloading cwl-eval-1.0.12.tar.gz (31 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: ir-measures, cwl-eval
  Building wheel for ir-measures (setup.py) ... [?25l[?25hdone
  Created wheel for ir-measures: filename=ir_measures-0.3.1-py3-none-any.whl size=60192 sha256=f92d38c9ab23bf9f9dc82e80eafa86db0

In [None]:
import ir_measures
from ir_measures import *

Preparamos los DataFrames de rankings y qrels para que se adapten al formato que exige ir_measures.

Cabe destacar que como las métricas de Precisión, Recall y media armónica requieren conocer otros documentos relevantes que no se hayan mostrado en los resultados de los buscadores, hacemos operaciones distintas para sacar estas 3 métricas y para el resto operamos con ranking_rels a nivel de query y sistema.

In [None]:
qrels2 = qrels.rename(columns = {"qid":"query_id", "docid":"doc_id", "rel":"relevance"}).astype({"query_id": "str", "relevance": "int64"})

In [None]:
rankings2 = rankings.rename(columns = {"qid":"query_id", "docid":"doc_id"}).astype({"query_id": "str", "score": "int64"})

In [None]:
ranking_rels2 = ranking_rels.rename(columns = {"qid":"query_id", "docid":"doc_id", "rel":"relevance"}).astype({"query_id": "str", "relevance": "int64", "score": "int64"})

In [None]:
result_ir = {"qid": [], 
          "system": [],
          "SetP_ir": [],
          "SetR_ir": [],
          "SetF_ir": [],
          "nDCG@10_ir": [],
          "MRR_ir": [],
          "ERR_ir": []}
for system in ["bing", "google", "duckduckgo", "yandex"]:
  for id in range(9):
    if id != 6:
      rankings_aux = rankings2[(rankings2["query_id"] == str(id)) & (rankings2["system"] == system)]
      qrels_aux = qrels2[(qrels2["query_id"] == str(id))]
      ranking_rels_aux = ranking_rels2[(ranking_rels2["query_id"] == str(id)) & (ranking_rels2["system"] == system)]


      aux = ir_measures.calc_aggregate([SetP, SetR, SetF], qrels_aux, rankings_aux)
      aux2 = ir_measures.calc_aggregate([MRR, nDCG@10, ERR@10], ranking_rels_aux, ranking_rels_aux)
      
      result_ir["qid"].append(id)
      result_ir["system"].append(system)
      result_ir['SetP_ir'].append(aux[SetP])
      result_ir['SetR_ir'].append(aux[SetR])
      result_ir['SetF_ir'].append(aux[SetF])
      result_ir['nDCG@10_ir'].append(aux2[nDCG@10])
      result_ir['MRR_ir'].append(aux2[MRR])
      result_ir['ERR_ir'].append(aux2[ERR@10])
      
df_metrics_ir = pd.DataFrame.from_dict(result_ir)

### Comparativa con promedio por query

In [None]:
df_metrics_ir.groupby(["qid"]).mean()

Unnamed: 0_level_0,SetP_ir,SetR_ir,SetF_ir,nDCG@10_ir,MRR_ir,ERR_ir
qid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.261111,0.277778,0.269006,0.631483,0.541667,0.096125
1,0.95,0.431818,0.59375,0.980884,1.0,0.36125
2,0.538889,0.35,0.424167,0.632657,0.452381,0.128445
3,0.375,0.375,0.375,0.661015,0.75,0.15359
4,0.975,0.361111,0.527027,0.89034,1.0,0.26472
5,0.65,0.382353,0.481481,0.896533,0.875,0.28756
7,0.6,0.461538,0.521739,0.907744,0.875,0.293285
8,0.825,0.25,0.383721,0.815234,0.583333,0.23424


In [None]:
df_metrics.groupby(["qid"]).mean()

Unnamed: 0_level_0,SetP,SetR,SetF,nDCG@10,MRR,ERR
qid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0.25,0.277778,0.263158,0.606132,0.5,0.315416
1,0.95,0.431818,0.59375,0.978252,1.0,0.86183
2,0.525,0.35,0.42,0.606029,0.431548,0.358068
3,0.375,0.375,0.375,0.617268,0.75,0.444437
4,0.975,0.361111,0.527027,0.84405,1.0,0.670088
5,0.65,0.382353,0.481481,0.898261,0.875,0.739136
7,0.6,0.461538,0.521739,0.895589,0.875,0.751441
8,0.825,0.25,0.383721,0.806598,0.583333,0.512564


Por lo que podemos apreciar, los resultados entre ambas implementaciones son bastante similares, a excepción de una.

En el caso de ERR, podemos ver que hay diferencias entre magnitud absoluta de valores, puesto que nuestra implementación nos da unos valores más altos que los que se obtienen de ir_measures. Sin embargo, pese a que desconocemos por qué se puede dar este caso, sí apreciamos que se respeta el valor que debería tener de forma relativa; esto es, los valores más altos en ir_measures se corresponden con los valores más altos en nuestra implementación y viceversa.

### Comparativa con promedio por motor de búsqueda

In [None]:
df_metrics_ir[["system", "SetP_ir", "SetR_ir", "SetF_ir", "nDCG@10_ir","MRR_ir", "ERR_ir"]].groupby(["system"]).mean()

Unnamed: 0_level_0,SetP_ir,SetR_ir,SetF_ir,nDCG@10_ir,MRR_ir,ERR_ir
system,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
bing,0.6625,0.378692,0.463552,0.838239,0.8125,0.221281
duckduckgo,0.668056,0.372757,0.461106,0.822215,0.830357,0.229701
google,0.694444,0.391775,0.484053,0.841382,0.75,0.258578
yandex,0.5625,0.301575,0.379234,0.706109,0.645833,0.200048


In [None]:
df_metrics[["system", "SetP", "SetR", "SetF", "nDCG@10","MRR", "ERR"]].groupby(["system"]).mean()

Unnamed: 0_level_0,SetP,SetR,SetF,nDCG@10,MRR,ERR
system,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
bing,0.6625,0.378692,0.463552,0.815789,0.8125,0.572029
duckduckgo,0.6625,0.372757,0.458182,0.781978,0.809524,0.587242
google,0.6875,0.391775,0.48197,0.827591,0.739583,0.624492
yandex,0.5625,0.301575,0.379234,0.700732,0.645833,0.542727


En este caso, podemos decir lo mismo que comentamos con los promedios por query. Los resultados son muy similares, y si no lo son mantienen la magnitud relativa.

# 2.  Evaluación con métricas de negocio

### Lectura de los logs simulados

In [None]:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Leemos el log de interacción simulada preparado por el profesor a los dataframes impressions y engagement.
drive.CreateFile({'id':'1KwuB8yLDNDYdnW1h-51vImle1iHzv0Bt'}).GetContentFile('impressions.csv')
impressions = pd.read_csv('impressions.csv')

drive.CreateFile({'id':'1ox0FGfOKNCMPSpGcLcPhUE_oBuuTC2uE'}).GetContentFile('engagement-log.csv')
engagement = pd.read_csv('engagement-log.csv')

### Cálculo de las métricas de negocio

Evaluación A/B test de los cuatro sistemas basada en clicks. Calculamos las siguientes métricas: clicks por consulta, tasa de abandono, Max RR, Mean RR y "unidades vendidas". 

En primer lugar, podemos desprendernos del sistema "google+bing".

In [None]:
engagement = engagement[engagement.system != "google+bing"]
impressions = impressions[impressions.system != "google+bing"]

Calculamos los clicks por consulta. Para ello, miramos cuantos clicks hay por buscador para cada consulta, y hacemos la media por consulta.

In [None]:
data = engagement.groupby(['system'], as_index=False)["click"].sum()
data["click"] = data["click"]/len(qids)
data

Unnamed: 0,system,click
0,bing,2.375
1,duckduckgo,3.125
2,google,2.375
3,yandex,1.625


Hemos obtenido los clicks por cada buscador y hemos hecho la media por el número de consultas. Como se puede observar, obtenemos el mejor resultado para duckduckgo, lo que implica que se han realizado más clicks para las consultas realizadas en dicho buscador. Lo contrario ocurre para Yandex, mientras que se obtiene un empate para bing y para google. 

Calculamos la tasa de abandono. Esto es el ratio de sesiones que acaban sin ningún click para cada buscador. 

In [None]:
n_cons = len(np.unique(engagement['qid']))

tasa_abandono = dict({ (sistema,
  (n_cons - len(np.unique(impressions[impressions['system'] == sistema]['qid'])))/n_cons
  )
   for sistema in systems
})
pd.DataFrame.from_dict(tasa_abandono, orient="Index")

Unnamed: 0,0
yandex,0.125
bing,0.0
google,0.0
duckduckgo,0.0


En este caso, estudiamos la tasa de abandono para cada buscador. Como se puede observar, se obtiene una tasa de abandono nula para todos los buscadores excepto para yandex, para el cual se obtiene un 12.5%.

Por último, calcularemos Max RR y Mean RR. El Max RR es es el valor medio de $1/r$, donde $r$ es el rango del del resultado clickado con mayor rango. El Mean RR es la media de la suma de $1/r$, sumando sobre todos los rangos clickados para cada query. 

Max RR:

In [None]:
sistemas = ["bing", "google", "duckduckgo", "yandex"]

In [None]:
ranks = []
for sistema in sistemas:
  data = engagement[engagement["system"] == sistema]
  max = engagement.loc[data["pos"].idxmin()][3]
  ranks.append(max)

ranks = ranks / engagement.groupby(["system"], as_index = False)["click"].sum().click
df = pd.DataFrame(list(ranks), columns = ["rank"])
df["system"] = sistemas
df.sort_values(by="rank", ascending=False).reset_index(drop=True)

Unnamed: 0,rank,system
0,0.538462,yandex
1,0.421053,bing
2,0.210526,duckduckgo
3,0.04,google


En este caso se puede observar como se obtiene el mejor Max RR para yandex, luego bing, luego duckduckgo y por último google. Con estos resultados podemos interpretar que yandex consigue obtener más información relevante a partir de los resultados, mientras que de google se obtiene lo contrario. 

Mean RR:

In [None]:
ranks = []
c = 0
for sistema in sistemas:
  data = engagement[engagement["system"] == sistema]
  for item in data["pos"]:
    c += 1 / int(item)
  ranks.append(c)

ranks = ranks / engagement.groupby(["system"], as_index = False)["click"].sum().click
df = pd.DataFrame(list(ranks), columns = ["rank"])
df["system"] = sistemas
df.sort_values(by="rank", ascending=False).reset_index(drop=True)

Unnamed: 0,rank,system
0,0.819017,yandex
1,0.483626,duckduckgo
2,0.206746,google
3,0.126441,bing


En este caso, se puede obtener como conclusión que yandex es mejor a la hora de colocar los elementos más importantes como los primeros a la hora de realizar la búsqueda, mientras que ocurre lo contrario para bing.