# Inconsistencies

This notebook contains inconsistencies between **manually labled** type datasets, and **automatically generated** type datasets.

This shows that some of the natural language questions, do not fully correspong to their gold QPL.

In [1]:
import json
from src.utils.qpl.paths import AUTOMATICALLY_LABLED_TYPES_DATASETS, MANUALLY_LABLED_TYPES_DATASETS

DB_ID = 'concert_singer'
filename = f'auto_{DB_ID}.json'

with open(AUTOMATICALLY_LABLED_TYPES_DATASETS / filename, 'r') as f:
    automatic = json.load(f)

with open(MANUALLY_LABLED_TYPES_DATASETS / filename, 'r') as f:
    manual = json.load(f)

manual_labels = {row['question']: row['type'] for row in manual}

acc = 0
for row in automatic:
    if 'error' in row:
        print(f"\033[93mSkipping row with error: {row['error']}\033[0m")
        continue
    type_set = set(manual_labels[row['question']].split(', '))
    pred_type_set = set(row['type'].split(', '))
    if type_set == pred_type_set:
        acc += 1
    else:
        # print(json.dumps(row, indent=2))
        print(f"Question: \033[92m{row['question']}\033[0m Manual: \033[91m{manual_labels[row['question']]!r}\033[0m")
        print(f"QPL: \033[94m{row['qpl'][-1]}\033[0m Auto: \033[91m{row['type']!r}\033[0m")
        print("-"*80)

print(f"Accuracy: {acc}/{len([row for row in automatic if 'error' not in row])} = {acc/len(automatic):.2%}")

Question: [92mList 1 for each concert in year 2014 or 2015.[0m Manual: [91m'Number'[0m
QPL: [94m#1 = Scan Table [ concert ] Predicate [ Year = 2014 OR Year = 2015 ] Output [ Year ][0m Auto: [91m'concert'[0m
--------------------------------------------------------------------------------
Question: [92mList 1 for each concert that occurred in 2014 or 2015.[0m Manual: [91m'Number'[0m
QPL: [94m#1 = Scan Table [ concert ] Predicate [ Year = 2014 OR Year = 2015 ] Output [ Year ][0m Auto: [91m'concert'[0m
--------------------------------------------------------------------------------
[93mSkipping row with error: Column 'Name' not in GroupBy and thus must be aggregated.[0m
[93mSkipping row with error: Column 'Name' not in GroupBy and thus must be aggregated.[0m
Question: [92mShow the stadium id of concerts in year 2014 or after.[0m Manual: [91m'stadium'[0m
QPL: [94m#1 = Scan Table [ concert ] Predicate [ Year >= 2014 ] Output [ Year , Stadium_ID ][0m Auto: [91m'conc