# magi Notebook

## ML Dependencies
- [sentence-transformer](https://www.sbert.net/examples/applications/semantic-search/README.html)
- [OpenAI GPT-3.5 Turbo](https://platform.openai.com/docs/models/gpt-3-5)

## Dataset
The official dataset and cached data is at this [Google Drive folder](https://drive.google.com/drive/u/1/folders/1ZfmEKn_nU3fkwxG2Xfg7e_-00o67OR8t)

## Terms
1. answers: train data
2. wikis: wiki data
3. queries: test data

## Process
1. Embed answers and wikis
2. Embed queries
3. Search top answers for queries
4. Search top wikis for queries
5. Rank all search results of `3.` and `4.` and generate labels for queries with OpenAI's GPT
6. Summarize generated labels

## Notes
1. All heavy tasks are preprocessed and their outputs are cached
2. For user to replay, the minimum hardware requirements is a 8GB RAM and CPU. You can run on Google Colab for FREE.
3. For developer to train, it is recommended to subscript to Google Colab Pro, use GPU A100, and optimize hyper parameters before running.

In [None]:
!gdown --folder 1ZfmEKn_nU3fkwxG2Xfg7e_-00o67OR8t && \
mv ./NLP2023/data . && \
rm -r NLP2023

Retrieving folder list
Retrieving folder 1Fxac18gg5ig5oB7Dny27HORZNILzO828 data
Retrieving folder 1GyE2CDAVTwB28wN1FNGyyHjN6iYiJBdI caches
Processing file 11pbOIDVzdLH9g7Kh9bUEWnx_x-0bO9es answer_candicates.json
Processing file 1fcBCorOh5xonCI1o8STHYxIHSwjPrPQr answer_embeddings.pt
Processing file 1gaaoVwjfpk7KitmhxjtvT2P5TYgfnQ_4 answers.json
Processing file 1_imy7bKAMORAToyIZcA4mArPVd-ehE6_ wiki_candicates.json
Processing file 1xbSjr9VK_HWwV44aJDuCBaTuSI1MMwyT wiki_embeddings.pt
Processing file 1AuMvHa8Cznwa0H3iuV87ecK4DDcg8bDi wikis.json
Retrieving folder 15XXuLVNCTdzgJXTqsJoFxCf9WSohkCwq test
Processing file 1vndQLFgWIy0so4eHUNkdNPM3HyxRoJxD private_test_data.jsonl
Processing file 1-NdCRpB6DfKGXKOj5KISdHHbxTh_Xa2n public_test_data.jsonl
Retrieving folder 19nTqjOYhqW0nay-GLpwDkJC3oogg4psY train
Processing file 1rUxHXSblBHHXtxrFUCxia40JeomouQCp public_train_data_0.jsonl
Processing file 1XbRBh8dZinI6BPEQ9cP7IgAH-KW6K-HD public_train_data_1.jsonl
Retrieving folder 1opwtjerhopHBgDx4rMnc

In [None]:
# install dependencies

!pip install sentence-transformers \
msgspec \
tqdm \
numpy \
torch \
openai \
python-dotenv

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentence-transformers
  Using cached sentence-transformers-2.2.2.tar.gz (85 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting msgspec
  Using cached msgspec-0.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (183 kB)
Collecting openai
  Using cached openai-0.27.7-py3-none-any.whl (71 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Collecting transformers<5.0.0,>=4.6.0 (from sentence-transformers)
  Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.1/7.1 MB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
Collecting sentencepiece (from sentence-transformers)
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [

In [None]:
# import dependencies

from sentence_transformers import SentenceTransformer, util
from msgspec.json import decode, encode
from tqdm import tqdm
import numpy
from numpy import random
import torch
from torch import Tensor
from dotenv import load_dotenv
import openai

import re
import json
from os import getenv
from pathlib import Path
from itertools import chain
from time import sleep
from datetime import date, timedelta
from pprint import pprint # [DEV]
from typing_extensions import TypedDict
from typing import Any, Union, Literal

In [None]:
# Public hyper parameters

# TRAIN_SPEED
# type:
#   name: int
#   range: [32, 1024]
# Side effects if increased include higher memory usage
TRAIN_SPEED: int = 64

# RNG_SEED
# type:
#   name: int
# Universal seed for all random number generators used
RNG_SEED: int = 94248763

# Private hyper parameters

_BATCH_SIZE = 1 << (
  int(min(max(TRAIN_SPEED, 32), 2048)) >> 1
).bit_length()

_RNG_1 = random.Generator(
  random.SFC64(RNG_SEED)
)

In [None]:
# Pre-trained model reference: https://www.sbert.net/docs/pretrained_models.html

ml_embedder = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')

Downloading (…)0fe39/.gitattributes:   0%|          | 0.00/968 [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)83e900fe39/README.md:   0%|          | 0.00/3.79k [00:00<?, ?B/s]

Downloading (…)e900fe39/config.json:   0%|          | 0.00/645 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/471M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)tencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/480 [00:00<?, ?B/s]

Downloading unigram.json:   0%|          | 0.00/14.8M [00:00<?, ?B/s]

Downloading (…)900fe39/modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

In [None]:
# Typings

Label = Literal['supports', 'refutes', 'NOT ENOUGH INFO']

Evidence = tuple[str, int]

Embeddings = list[Tensor]

Answers = TypedDict('Answers', {
  'claim': list[str],
  'label': list[Label],
  'evidences': list[list[Evidence]],
})

Answer = TypedDict('Answer', {
  'claim': str,
  'label': Label,
  'evidences': list[Evidence],
})

Wikis = dict[str, list[str]]

Queries = dict[int, str]

Candicates = list[tuple[float, Answer]]

Submission = TypedDict('Submission', {
  'id': int,
  'predicted_label': Label,
  'predicted_evidence': list[Evidence] | None,
})

In [None]:
# make caches directory

cache_dir = Path('./data/caches/')
if not cache_dir.exists():
  cache_dir.mkdir()

In [None]:
# load answers from ./data/train/public_train_data_(0|1).json
# or cached answers
# at ./data/caches/answers.json
# and ./data/caches/answer_embeddings.pt

answers: Answers = {
  'claim': [],
  'label': [],
  'evidences': [],
}
answer_embeddings: Embeddings = []

# tool functions

def to_evidences(evidence: list) -> list[Evidence]:
  _f = lambda *n: (
    e for a in n
      for e in (_f(*a) if isinstance(a, (list, tuple)) else (a,))
  )
  flatten = list(_f(evidence))
  ids = flatten[2::4]
  indices = flatten[3::4]
  return list(zip(ids, indices)) if ids[0] else []

cache_file = cache_dir.joinpath('answers.json')
cache_file_2 = cache_dir.joinpath('answer_embeddings.pt')

if not cache_file.exists() or not cache_file_2.exists():
  answer_embeddings_source: list[str] = []

  for line in tqdm(list(chain(*[
    open(f'./data/train/public_train_data_{i}.jsonl')
    for i in range(2)
  ]))):
    answer = decode(line)
    answers['claim'].append(answer['claim'])
    answers['label'].append(answer['label'])
    answers['evidences'].append(to_evidences(answer['evidence']))
    answer_embeddings_source.append(answer['claim'])

  answer_embeddings = ml_embedder.encode(
    answer_embeddings_source,
    convert_to_tensor=True,
    show_progress_bar=True,
    batch_size=_BATCH_SIZE,
  )

  cache_file.write_bytes(encode(answers))
  torch.save(answer_embeddings, open(cache_file_2, 'wb'))
else:
  answers = decode(cache_file.read_bytes())
  answer_embeddings = torch.load(
    open(cache_file_2, 'rb'),
    lambda s, l: s.cuda(0) if torch.cuda.is_available() else s,
  )
  print(len(answer_embeddings))

cache_file = None
cache_file_2 = None

11620


In [None]:
# load wikis from ./data/train/wiki-pages/wiki-{:03d}.jsonl
# or cached wikis
# at ./data/caches/wikis.json
# and ./data/caches/wiki_embeddings.pt

wikis: Wikis = {}
wiki_embeddings: Embeddings = []

cache_file = cache_dir.joinpath('wikis.json')
cache_file_2 = cache_dir.joinpath('wiki_embeddings.pt')

if not cache_file.exists() or not cache_file_2.exists():
  wiki_embeddings_source: list[str] = []
  for line in tqdm(list(chain(*[
    open(f'./data/train/wiki-pages/wiki-{i:03d}.jsonl')
    for i in range(1, 25)
  ]))):
    wiki = decode(line)

    lines_no_prefix = re.sub(
      r'[0-9]+\t', '',
      wiki['lines'],
    )
    lines = lines_no_prefix.replace('\t', ' ').split('\n')
    wikis[wiki['id']] = lines

    wiki_embeddings_source.append(
      f"{wiki['id']}的簡述：{wiki['text'][:100]}"
    )

  wiki_embeddings = ml_embedder.encode(
    wiki_embeddings_source,
    convert_to_tensor=True,
    show_progress_bar=True,
    batch_size=_BATCH_SIZE,
  )

  cache_file.write_bytes(encode(wikis))
  torch.save(wiki_embeddings, open(cache_file_2, 'wb'))
else:
  wikis = decode(cache_file.read_bytes())
  wiki_embeddings = torch.load(
    open(cache_file_2, 'rb'),
    lambda s, l: s.cuda(0) if torch.cuda.is_available() else s,
  )
  print(len(wiki_embeddings))

cache_file = None
cache_file_2 = None

1187751


In [None]:
# load queries
# from ./data/public_test.jsonl
# and ./data/private_test.jsonl

queries: Queries = {}

for line in tqdm(list(chain(
  open('./data/test/public_test_data.jsonl'),
  open('./data/test/private_test_data.jsonl'),
))):
  query = decode(line)
  queries[query['id']] = query['claim']

100%|██████████| 9038/9038 [00:00<00:00, 136888.99it/s]


In [None]:
# [LSS], [FS], [L], [S], [R]
# 1. Large Semantic Search [LSS]
# 2. Filter by Score [FS]
# 3. Label and Summarize [LS]
# 4. Respond [R]

In [None]:
# 1. Large Semantic Search [LSS]

# choose candicates for each query
# or load cached candicates
# from ./data/caches/answer_candicates.json
# and ./data/caches/wiki_candicates.json
#
# type of candicates: dict[query_id, Candicates]

query_embeddings = None

def _object_hook(x):
  try:
    return {int(k): v for k, v in x.items()}
  except:
    return x

# choose from answers

answer_candicates: dict[int, Candicates] = {}

cache_file = cache_dir.joinpath('answer_candicates.json')

if not cache_file.exists():
  if not query_embeddings:
    query_embeddings = ml_embedder.encode(
      list(queries.values()),
      convert_to_tensor=True,
      show_progress_bar=True,
      batch_size=_BATCH_SIZE,
    )

  answer_top_results = util.semantic_search(
    query_embeddings,
    answer_embeddings,
    top_k=4,
  )

  answer_candicates = {
    query_id: [
      (
        _['score'],
        {
          'claim': answers['claim'][_['corpus_id']],
          'label': answers['label'][_['corpus_id']],
          'evidences': answers['evidences'][_['corpus_id']],
        }
      )
      for _ in top_results
    ]
    for query_id, top_results in zip(
      queries.keys(),
      answer_top_results
    )
  }

  cache_file.write_bytes(encode(answer_candicates))
else:
  answer_candicates = json.loads(
    cache_file.read_bytes(),
    object_hook=_object_hook,
  )

answer_embeddings = None

# choose from wiki
# wave-1: choose 12 pages from 1.2M wiki pages
# wave-2: choose 4 lines from 12 pages' lines

wiki_candicates: dict[int, Candicates] = {}

cache_file = cache_dir.joinpath('wiki_candicates.json')

if not cache_file.exists():
  if not query_embeddings:
    query_embeddings = ml_embedder.encode(
      list(queries.values()),
      convert_to_tensor=True,
      show_progress_bar=True,
      batch_size=_BATCH_SIZE,
    )

  wiki_top_results = util.semantic_search(
    query_embeddings,
    wiki_embeddings,
    top_k=12,
  )

  wiki_ids: list[str] = list(wikis.keys())

  for query_id, query_embedding, top_results in tqdm(list(zip(
    queries.keys(),
    query_embeddings,
    wiki_top_results,
  ))):
    wiki_claims: list[str] = []
    wiki_evidences: list[Evidence] = []

    for _ in top_results:
      for index, line in enumerate(wikis[wiki_ids[_['corpus_id']]]):
        wiki_claims.append(''.join(
          wikis[wiki_ids[_['corpus_id']]][index:index+2]
        ))
        wiki_evidences.append((wiki_ids[_['corpus_id']], index))

    wiki_claim_embeddings = ml_embedder.encode(
      wiki_claims,
      convert_to_tensor=True,
    )
    top_results = util.semantic_search(
      query_embedding,
      wiki_claim_embeddings,
      top_k=4,
    )[0]
    wiki_candicates[query_id] = [
      (
        _['score'],
        {
          'claim': wiki_claims[_['corpus_id']],
          'label': 'supports',
          'evidences': [wiki_evidences[_['corpus_id']]],
        }
      )
      for _ in top_results
    ]

  cache_file.write_bytes(encode(wiki_candicates))
else:
  # The type of query's id is always int
  wiki_candicates = json.loads(
    cache_file.read_bytes(),
    object_hook=_object_hook,
  )

cache_file = None
wiki_embeddings = None
query_embeddings = None

In [None]:
# 2. Filter by Score [FS]

candicates: dict[int, Candicates] = {}

for query_id in queries.keys():
  sorted_candicates: Candicates = sorted(
    [
      (score, {'type': 'answer', **answer})
      for score, answer in answer_candicates[query_id]
    ] +
    [
      (score, {'type': 'wiki', **answer})
      for score, answer in wiki_candicates[query_id]
    ],
    reverse=True,
    key=lambda t: t[0],
  )
  pivot = 3
  for score, cand in sorted_candicates[pivot:]:
    if (
      (cand['type'] == 'answer' and score >= 0.85) or
      (cand['type'] == 'wiki' and score >= 0.55)
    ):
      pivot += 1
    else:
      break
  candicates[query_id] = sorted_candicates[:pivot]

In [None]:
for qid, cand in list(candicates.items())[:20:]:
  pprint([(int(score * 100) / 100, ans['claim']) for score, ans in cand])

[(0.97, '光學顯微鏡是以物理原理來將不可見或難見的微小物放大至肉眼可見的儀器。'),
 (0.94, '光學顯微鏡是以凸透鏡成像來將不可見或難見的微小物放大至肉眼可見的儀器。'),
 (0.93, '光學顯微鏡是以光學原理來將不可見或難見的微小物放大至肉眼可見的武器。'),
 (0.85,
  '顯微鏡泛指將微小不可見或難見物品之影像放大 ， 而能被肉眼或其他成像儀器觀察之工具 。 日常用語中之顯微鏡多指光學顯微鏡 ， 放大倍率和清析度 （ '
  '聚焦 ） 爲顯微鏡重要因素 。 '),
 (0.83,
  '透射顯微鏡的物體是透明的或非常薄 ， 光從可透過它進入顯微鏡 。 顯微鏡 顯微鏡 物體 物體 光 光這種顯微鏡常被用來觀察生物組織 。 顯微鏡 '
  '顯微鏡'),
 (0.81,
  '光學顯微鏡依樣品的不同可分爲反射式和透射式 。 顯微鏡 顯微鏡 光學 光學反射顯微鏡的物體一般是不透明的 ， 光從上面照在物體上 ， '
  '被物體反射的光進入顯微鏡 。 顯微鏡 顯微鏡 物體 物體 光 光'),
 (0.78,
  '光學顯微鏡 （ Optical microscope 、 Light microscope ） 是一種利用光學透鏡產生影像放大效應的顯微鏡 。 顯微鏡 '
  '顯微鏡 光學 光學 透鏡 透鏡 影像 影像')]
[(0.96, '產絲的蠶或產蜜的蜜蜂爲提供利益的昆蟲。'),
 (0.95, '有些昆蟲可以直接提供經濟上的利益，例如蠶產絲或是蜜蜂產蜂蜜。'),
 (0.95, '產絲綢原料的家蠶或採集植物花蜜的蜜蜂爲提供直接經濟利益的昆蟲。')]
[(0.73, '波蘭西部的綠山城縣平均每平方公里的土地有數十人。'),
 (0.73, '綠山城縣位於波蘭西部，平均每1萬平方公尺的土地有75人。'),
 (0.73, '波蘭西部的綠山城縣平均每平方公里的土地有75人。')]
[(0.97, 'Vivien Leigh主演魂斷藍橋的女主角。'),
 (0.89, '魂斷藍橋的女主角由Vivien Leigh主演。'),
 (0.86, '魂斷藍橋的角色由Vivien Leigh主演。')]
[(0.97, '侯孝賢改編自唐代傳聞的電影獲得金馬獎最佳劇情片獎。'),
 (0.96, '侯孝賢改編自唐代傳奇的電影獲得金馬獎最佳劇情片獎。')

In [None]:
# 3. Label and Summarize [LS]

In [None]:
!touch openai.env

In [None]:
# configure OpenAI API

load_dotenv('config.env')
load_dotenv('openai.env')
openai.api_key = getenv('OPENAI_API_KEY')
openai.organization = getenv('OPENAI_ORG_KEY')

print(f"OpenAI API is{' NOT' if not (openai.api_key and openai.organization) else ''} configured")

def openai_usage():
  end_date = date.today() + timedelta(days=1)
  start_date = date.today() + timedelta(days=-14)
  openai_total_usage = openai.api_requestor.APIRequestor().request('GET',
    f'/dashboard/billing/usage?start_date={start_date}&end_date={end_date}'
  )[0].data['total_usage'] / 100

  print(f'The usage of OpenAI API is US${openai_total_usage:.2f}, NT${openai_total_usage * 30.3:.1f} in the last 14 days')

OpenAI API is configured


In [None]:
# Use OpenAI to generate label for query's claim

def get_label_for_claim(
  claim: str,
  fact_claim: str,
  fact_label: Label,
  sleep_for_seconds: float = 0.15,
) -> Label:
  """
  ### prerequisites
  1. openai.api_key is set: OpenAI API key is a string in the form of `sk-***`
  2. openai.org_key is set: OpenAI Organization key is a string in the form of `org-***`

  ### return
  claim's label

  ### note
  Always return `'NOT ENOUGH INFO'` if `fact_label` == `'NOT ENOUGH INFO'`
  """

  if not openai.api_key:
    raise KeyError('openai.api_key should be set: OpenAI API key is in the form of `sk-***`')
  if not openai.organization:
    raise KeyError('openai.org_key should be set: OpenAI Organization key is in the form of `org-***`')

  if fact_label == 'NOT ENOUGH INFO':
    return 'NOT ENOUGH INFO'
  elif not (fact_label == 'supports' or fact_label == 'refutes'):
    raise TypeError("\
get_label_of_claim's paramter fact_label should be one of \
'supports', 'refutes', 'NOT ENOUGH INFO'"
    )

  # Get contextual similarity between claim and fact_claim from OpenAI's GPT API
  response = {}
  for chance in range(3):
    try:
      response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo',
        messages=[
          {
            'role': 'system',
            'content': "(lang: zh-TW) assistant 將根據 system 提供的事實，判斷 user 的敘述是否正確",
          },
          {'role': 'system', 'content': fact_claim},
          {'role': 'user', 'content': claim},
          {
            'role': 'assistant',
            'content': "以下只能回復 1 個中文字，若 user 的敘述正確請回覆「是」，敘述錯誤則「否」，敘述與 system 提供之事實無關係則「沒」",
          },
        ],
        temperature=0.1,
        max_tokens=2,
        timeout=3,
      )
      break
    except:
      continue
  sleep(sleep_for_seconds)

  reply = response['choices'][0]['message']['content'].upper()
  usage = response['usage']['total_tokens']
  print(f"GPT-3.5 (Used {usage} tokens): {reply}")

  # process label
  if '是' in reply:
    return fact_label
  elif '否' in reply:
    if fact_label == 'supports':
      label_reversing_p = 0.9
      return _RNG_1.choice(
        ['refutes', 'supports'],
        p=[label_reversing_p, 1-label_reversing_p],
      )
    elif fact_label == 'refutes':
      label_reversing_p = 0.3
      return _RNG_1.choice(
        ['supports', 'NOT ENOUGH INFO'],
        p=[label_reversing_p, 1-label_reversing_p],
      )
  elif '沒' in reply:
    return 'NOT ENOUGH INFO'
  else:
    raise Exception(f'\
get_label_of_claim cannot recognize \
the reply from GPT-3.5: {reply}'
    )

def summarize(scores_labels: list[tuple[float, Label]]) -> Label:
  result: dict[Label, float] = {
    'supports': 0,
    'refutes': 0,
    'NOT ENOUGH INFO': 0,
  }
  for score, label in scores_labels:
    result[label] += (
      1 if label == 'NOT ENOUGH INFO' else score
    )

  if result['NOT ENOUGH INFO'] >= len(scores_labels) / 2:
    return 'NOT ENOUGH INFO'
  return max([(v, k) for k, v in result.items()])[1]

In [None]:
submissions: list[Submission] = []

for query_id, query_claim in list(queries.items())[:90]: # [DEV]
  scores_labels: list[tuple[float, Label]] = []
  answers: list[Answer] = []

  for score, answer in candicates[query_id]:
    scores_labels.append((
      score,
      get_label_for_claim(
        query_claim,
        answer['claim'],
        answer['label'],
      )
    ))
    answers.append(answer)

  label = summarize(scores_labels)
  evidence: list[Evidence] = list(set(
    tuple(evidence)
    for answer in filter(
      lambda a: (
        (a['label'] != 'NOT ENOUGH INFO') if label == 'refutes' else
        (a['label'] == label)
      ),
      answers
    )
      for evidence in answer['evidences']
  ))[:5]

  submissions.append({
    'id': query_id,
    'predicted_label': label,
    'predicted_evidence': evidence if evidence else None,
    'claim': query_claim,
  })

openai_usage()

GPT-3.5 (Used 227 tokens): 否
GPT-3.5 (Used 292 tokens): 否
GPT-3.5 (Used 299 tokens): 沒
GPT-3.5 (Used 327 tokens): 是。
GPT-3.5 (Used 273 tokens): 否
GPT-3.5 (Used 207 tokens): 是
GPT-3.5 (Used 222 tokens): 是。
GPT-3.5 (Used 228 tokens): 是
GPT-3.5 (Used 193 tokens): 否
GPT-3.5 (Used 198 tokens): 否
GPT-3.5 (Used 191 tokens): 否
GPT-3.5 (Used 176 tokens): 否
GPT-3.5 (Used 177 tokens): 否
GPT-3.5 (Used 176 tokens): 否
GPT-3.5 (Used 215 tokens): 是
GPT-3.5 (Used 215 tokens): 是
GPT-3.5 (Used 219 tokens): 是
GPT-3.5 (Used 219 tokens): 否
GPT-3.5 (Used 574 tokens): 否
GPT-3.5 (Used 215 tokens): 否
GPT-3.5 (Used 211 tokens): 否
GPT-3.5 (Used 300 tokens): 否
GPT-3.5 (Used 207 tokens): 否
GPT-3.5 (Used 208 tokens): 否
GPT-3.5 (Used 202 tokens): 否
GPT-3.5 (Used 178 tokens): 是。
GPT-3.5 (Used 192 tokens): 是
GPT-3.5 (Used 247 tokens): 是
GPT-3.5 (Used 229 tokens): 是
GPT-3.5 (Used 217 tokens): 是
GPT-3.5 (Used 217 tokens): 是
GPT-3.5 (Used 217 tokens): 是
GPT-3.5 (Used 217 tokens): 是
GPT-3.5 (Used 417 tokens): 是
GPT-3.5 (Us

In [None]:
for query in list(queries.items())[:90]: # [DEV]
  print(query)

(5208, '光學顯微鏡是以電磁學原理來將不可見或難見的微小物放大至肉眼可見的儀器。')
(1019, '產絲的蠶或產蜜的蜜蜂爲提供間接經濟利益的昆蟲。')
(8514, '波蘭西部的綠山城縣平均每平方公里的土地有0人。')
(1874, 'Vivien Leigh主演魂斷藍橋中的女配角。')
(8352, '侯孝賢改編自唐代文言文學的電影獲得金馬獎最佳劇情片獎。')
(4603, '國務院前副總理的姪子薄熙來在2012年9月被開除黨籍 。')
(3147, '水星凌日曾發生過。')
(5829, '馬克思在自己的作品中論述了馬克思主義政治經濟學的基本概念。')
(2656, '一貫道相信最高神祇無生老母派遣轉世成銀公祖師路中一的彌勒佛拯救凡間。')
(3887, '回族世居內蒙古至山西 、 陝西 、 甘肅 ， 以至於新疆和中亞一帶，受中亞與西亞中伊斯蘭教傳播的影響 ， 許多回族成爲穆斯林。')
(2268, '天衛四的表面呈現暗紅色，小行星和彗星相撞後所形成其主要地形，並存在許多撞擊坑。')
(7468, '玲子·艾爾斯沃斯身上有美洲血統與亞洲血統。')
(9045, '於西元2000年以前傳入臺灣的法輪功主要以口耳相傳介紹來介紹功法，是由中國傳入。')
(3693, '由同名小說改編的Kramer vs. Kramer 拿到了五項奧斯卡獎項。')
(4173, '研究地震波時，反射十分重要，通過對海浪形狀的檢測以研究地震與海嘯 。')
(3289, '思想與宇宙命運被學者認爲屬於彌涅耳瓦以外的羅馬神祇所化身。')
(133, '樂山大佛建造於唐朝而且花了90年。')
(2722, '韓劇挖掘人性之真善美、倡導仁、誠、恕，是個適合家庭一起觀看的電視劇。')
(1838, '基於可擴展標記語言的標記語言是XBRL。')
(3395, '在1980年代初，史蒂夫·喬布斯使蘋果引入全錄帕洛奧圖中心 （ Xerox PARC ） 的滑鼠驅動圖形用戶介面技術加強了電腦的易用性和普及，離開蘋果公司後也成立了計算機動畫製片廠 。')
(5437, '「211工程」中唯一的政法類高校的中國政法大學沒有隸屬於直屬機關下，而是獨立運作的私人大學。')
(4278, '珠江三角洲範圍包括珠江干流。')
(663, '亞伯拉罕諸教爲基督宗教、伊斯蘭教與猶太教的通則。')
(5583, 

In [None]:
pprint(submissions)

[{'claim': '光學顯微鏡是以電磁學原理來將不可見或難見的微小物放大至肉眼可見的儀器。',
  'id': 5208,
  'predicted_evidence': None,
  'predicted_label': 'NOT ENOUGH INFO'},
 {'claim': '產絲的蠶或產蜜的蜜蜂爲提供間接經濟利益的昆蟲。',
  'id': 1019,
  'predicted_evidence': [('昆蟲', 23), ('昆蟲', 22)],
  'predicted_label': 'supports'},
 {'claim': '波蘭西部的綠山城縣平均每平方公里的土地有0人。',
  'id': 8514,
  'predicted_evidence': None,
  'predicted_label': 'NOT ENOUGH INFO'},
 {'claim': 'Vivien Leigh主演魂斷藍橋中的女配角。',
  'id': 1874,
  'predicted_evidence': [('魂斷藍橋', 0)],
  'predicted_label': 'refutes'},
 {'claim': '侯孝賢改編自唐代文言文學的電影獲得金馬獎最佳劇情片獎。',
  'id': 8352,
  'predicted_evidence': [('鼕鼕的假期', 1),
                         ('李行', 0),
                         ('侯孝賢', 0),
                         ('刺客聶隱娘', 0),
                         ('鄧育昆', 0)],
  'predicted_label': 'supports'},
 {'claim': '國務院前副總理的姪子薄熙來在2012年9月被開除黨籍 。',
  'id': 4603,
  'predicted_evidence': [('薄熙來', 7), ('薄熙來', 1), ('薄熙來', 2)],
  'predicted_label': 'refutes'},
 {'claim': '水星凌日曾發生過。',
  'id': 3147,
  'predicted

In [None]:
# 4. Respond [R]

for form in submissions:
  form.pop('claim', None)

submissions.extend([
  {
    'id': query_id,
    'predicted_label': 'NOT ENOUGH INFO',
    'predicted_evidence': None,
  }
  for query_id in list(queries.keys())[90:] # [DEV]
])
submissions.sort(key=lambda s: s['id'])

open('submission.jsonl', 'w').close()
with open('submission.jsonl', 'ab') as submission_file:
  for form in submissions:
    submission_file.write(encode(form))
    submission_file.write(b'\n')

In [None]:
score = 0.0384
public_test_count = 989
submitted_count = 90
model_accuracy = score * public_test_count / submitted_count
print(f'Accuracy: {model_accuracy:.3f}')

Accuracy: 0.422
