# Testing your app

Once your app is running you can query it. You can simply do that by navigating to the URL that ngrok gave you above or through code in a different python session. For example, once the frontend app is running, you can navigate to:
http://YOUR_SERVER_DOMAIN/search?query=hello+world where YOUR_SERVER_DOMAIN is something like XXXX-XX-XX-XX-XX.ngrok.io, which is printed above in Colab or that is your external IP on GCP.

The code below shows how to issue a query from python. This is also how our testing code will issue queries to your search engine, so make sure to test your search engine this way after you deploy it to GCP and before submission. Command line instructions for deploying your search engine to GCP are available at `run_frontend_in_gcp.sh`. Note that we will not only issue training queries to your search engine, but also test queries, i.e. queries that you've never seen before.

In [None]:
import json

with open('queries_train.json', 'rt') as f:
  queries = json.load(f)

In [None]:
def average_precision_at_10(true_list, predicted_list, k=10):
    true_set = frozenset(true_list)
    predicted_list = predicted_list[:k]
    precisions = []
    for i, doc_id in enumerate(predicted_list):
        if doc_id in true_set:
            prec = (len(precisions) + 1) / (i+1)
            precisions.append(prec)
    if len(precisions) == 0:
        return 0.0
    return round(sum(precisions) / len(precisions), 3)

def average_precision(true_list, predicted_list):
    true_set = frozenset(true_list)
    precisions = []
    for i, doc_id in enumerate(predicted_list):
        if doc_id in true_set:
            prec = (len(precisions) + 1) / (i+1)
            precisions.append(prec)
    if len(precisions) == 0:
        return 0.0
    return round(sum(precisions) / len(precisions), 3)

def recall(true_list, predicted_list):
    true_set = frozenset(true_list)
    rec = 0
    for i in predicted_list:
      if i in true_set:
        rec += 1
    return rec/len(true_set)

In [None]:
def precision_at_k(true_list, predicted_list, k):
    true_set = frozenset(true_list)
    predicted_list = predicted_list[:k]
    if len(predicted_list) == 0:
        return 0.0
    return round(len([1 for doc_id in predicted_list if doc_id in true_set]) / len(predicted_list), 3)

def recall_at_k(true_list, predicted_list, k):
    true_set = frozenset(true_list)
    predicted_list = predicted_list[:k]
    if len(true_set) < 1:
        return 1.0
    return round(len([1 for doc_id in predicted_list if doc_id in true_set]) / len(true_set), 3)

def f1_at_k(true_list, predicted_list, k):
    p = precision_at_k(true_list, predicted_list, k)
    r = recall_at_k(true_list, predicted_list, k)
    if p == 0.0 or r == 0.0:
        return 0.0
    return round(2.0 / (1.0/p + 1.0/r), 3)

def results_quality(true_list, predicted_list):
    p5 = precision_at_k(true_list, predicted_list, 5)
    f1_30 = f1_at_k(true_list, predicted_list, 30)
    if p5 == 0.0 or f1_30 == 0.0:
        return 0.0
    return round(2.0 / (1.0/p5 + 1.0/f1_30), 3)

assert precision_at_k(range(10), [1,2,3] , 2) == 1.0
assert recall_at_k(   range(10), [10,5,3], 2) == 0.1
assert precision_at_k(range(10), []      , 2) == 0.0
assert precision_at_k([],        [1,2,3],  5) == 0.0
assert recall_at_k(   [],        [10,5,3], 2) == 1.0
assert recall_at_k(   range(10), [],       2) == 0.0
assert f1_at_k(       [],        [1,2,3],  5) == 0.0
assert f1_at_k(       range(10), [],       2) == 0.0
assert f1_at_k(       range(10), [0,1,2],  2) == 0.333
assert f1_at_k(       range(50), range(5), 30) == 0.182
assert f1_at_k(       range(50), range(10), 30) == 0.333
assert f1_at_k(       range(50), range(30), 30) == 0.75
assert results_quality(range(50), range(5))  == 0.308
assert results_quality(range(50), range(10)) == 0.5
assert results_quality(range(50), range(30)) == 0.857
assert results_quality(range(50), [-1]*5 + list(range(5,30))) == 0.0

In [None]:
import requests
from time import time
# place the domain you got from ngrok or GCP IP below.
url = 'http://35.192.185.129:8080'

qs_res = []
for q, true_wids in queries.items():
  duration, p5, f1_30, p10, ap, r, rq = None, None, None, None, None, None, None
  t_start = time()
  try:
    res = requests.get(url + '/search', {'query': q}, timeout=35)
    duration = time() - t_start
    if res.status_code == 200:
      pred_wids, _ = zip(*res.json())
      p5 = precision_at_k(true_wids, pred_wids, 5)
      f1_30 = f1_at_k(true_wids, pred_wids, 30)
      rq = results_quality(true_wids, pred_wids)
      p10 = average_precision_at_10(true_wids, pred_wids)
      ap = average_precision(true_wids, pred_wids)
      r = recall(true_wids, pred_wids)
  except:
    pass
  print(f"{q} -> {len(true_wids), len(pred_wids)}")
  qs_res.append((q, duration, p5, f1_30, p10, ap, r, rq))

print(qs_res)

res_len = len(qs_res)
sum_duration = 0
sum_p5 = 0
sum_f1_30 = 0
sum_p10 = 0
sum_ap = 0
sum_r = 0
sum_rq = 0

for tup in qs_res:
  sum_duration += tup[1]
  sum_p5 += tup[2]
  sum_f1_30 += tup[3]
  sum_p10 += tup[4]
  sum_ap += tup[5]
  sum_r += tup[6]
  sum_rq += tup[7]

avg_duration = sum_duration / res_len
avg_p5 = sum_p5 / res_len
avg_f1_30 = sum_f1_30 / res_len
avg_p10 = sum_p10 / res_len
avg_ap = sum_ap / res_len
avg_r = sum_r / res_len
avg_rq = sum_rq / res_len

print(f"average recall is {avg_r}")
print(f"average precision is {avg_ap}")
print(f"average precision at 5 is {avg_p5}")
print(f"average precision at 10 is {avg_p10}")
print(f"average f1 at 30 is {avg_f1_30}")
print(f"average rq is {avg_rq}")
print(f"average time is {avg_duration}")

genetics -> (46, 100)
Who is considered the "Father of the United States"? -> (50, 100)
economic -> (50, 100)
When was the United Nations founded? -> (46, 100)
video gaming -> (46, 100)
3D printing technology -> (44, 100)
Who is the author of "1984"? -> (44, 100)
bioinformatics -> (19, 100)
Who is known for proposing the heliocentric model of the solar system? -> (45, 100)
Describe the process of water erosion. -> (48, 100)
When was the Berlin Wall constructed? -> (49, 100)
What is the meaning of the term "Habeas Corpus"? -> (49, 100)
telecommunications -> (26, 100)
internet -> (42, 100)
What are the characteristics of a chemical element? -> (50, 100)
Describe the structure of a plant cell. -> (49, 100)
Who painted "Starry Night"? -> (33, 100)
computer -> (46, 100)
What is the structure of the Earth's layers? -> (44, 100)
When did World War II end? -> (50, 100)
When was the Gutenberg printing press invented? -> (46, 100)
medicine -> (45, 100)
Describe the water cycle. -> (47, 100)
arti