<a href="https://colab.research.google.com/github/chyanju/Poe/blob/main/public_PLDI22AE_Poe_make_benchmark.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Poe: Customize your own benchmark and run the tool.
This is a colab notebook that guides you to run through the full pipeline with customized benchmark.

This colab is modified from the TaPas public notebooks. See [https://github.com/google-research/tapas/tree/master/notebooks](https://github.com/google-research/tapas/tree/master/notebooks).

Table of Contents:
1. Set up the environment
2. Customize your own benchmark here
3. Prepare the machine learning model
4. Use the machine learning model to make initial predictions of your customized benchmark
5. Construct new dataset for Poe by using the results generated so far
6. Run Poe here on your customized benchmark

# 1. Set up the environment.
(Note) All commands in this section should only be run once during the same runtime.


First we install dependencies. It's OK to see an error that says: "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.". Just move on.

In [None]:
! pip install -U pip setuptools wheel
! pip install simplejson==3.17.5
! pip install xxhash==2.0.2
! pip install nltk==3.6.1
! pip install sexpdata==0.0.3
! pip install tabulate==0.8.9
! pip install pandas==1.4.0
! pip install spacy==3.1.2
! python -m spacy download en_core_web_sm
! pip install tapas-table-parsing==0.0.1.dev0

Collecting pip
  Downloading pip-22.0.4-py3-none-any.whl (2.1 MB)
[K     |████████████████████████████████| 2.1 MB 5.0 MB/s 
Collecting setuptools
  Downloading setuptools-62.0.0-py3-none-any.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 44.3 MB/s 
Installing collected packages: setuptools, pip
  Attempting uninstall: setuptools
    Found existing installation: setuptools 57.4.0
    Uninstalling setuptools-57.4.0:
      Successfully uninstalled setuptools-57.4.0
  Attempting uninstall: pip
    Found existing installation: pip 21.1.3
    Uninstalling pip-21.1.3:
      Successfully uninstalled pip-21.1.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.8.0 requires tf-estimator-nightly==2.8.0.dev2021122109, which is not installed.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.[0m
Suc

Collecting simplejson==3.17.5
  Downloading simplejson-3.17.5-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (129 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/129.9 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.9/129.9 KB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: simplejson
Successfully installed simplejson-3.17.5
[0mCollecting xxhash==2.0.2
  Downloading xxhash-2.0.2-cp37-cp37m-manylinux2010_x86_64.whl (243 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m243.9/243.9 KB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: xxhash
Successfully installed xxhash-2.0.2
[0mCollecting nltk==3.6.1
  Downloading nltk-3.6.1-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m19.7 MB/s[0m eta [36m0:00:00[0m
Installi

## **You need to restart the runtime and resume from here. Click "Runtime" -> "Restart runtime".**

The following commands clones the Poe public repo and perform necessary set-ups for incorporation of the machine learning model.

In [None]:
! git clone https://github.com/chyanju/Poe.git
! cp /content/Poe/benchmarks/VisQA/shared/run_task_main.py /usr/local/lib/python3.7/dist-packages/tapas/

Cloning into 'Poe'...
remote: Enumerating objects: 2804, done.[K
remote: Counting objects: 100% (2804/2804), done.[K
remote: Compressing objects: 100% (850/850), done.[K
remote: Total 2804 (delta 1951), reused 2801 (delta 1951), pack-reused 0[K
Receiving objects: 100% (2804/2804), 9.20 MiB | 14.79 MiB/s, done.
Resolving deltas: 100% (1951/1951), done.
Checking out files: 100% (2751/2751), done.


The following commands set up nltk library.

In [None]:
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Unzipping corpora/omw-1.4.zip.


True

# 2. Customize your own benchmark here
You only need to modify the "#----customizable zone----#"

In [None]:
import pickle
import pandas as pd
import spacy
nlp = spacy.load("en_core_web_sm")
data_path = "/content/"

def format_table(arg_df):
  return arg_df.to_markdown(showindex=False,tablefmt="jira",numalign="left").replace("||","|").replace("|\n|","\n").strip("|")

The following is the customizable zone.

- Construct your own table in Pandas DataFrame and store it in the `customized_table` variable
- Provide your own id in `customized_id`
- Provide a natural language query in `customized_query`

The currently block has some examples filled in for your reference.

In [None]:
#----customizable zone----#
table_data = [
  {'name': 'Jay', 'math': 98, 'physics': 99},
  {'name': 'Brian', 'math': 93, 'physics': 100},
  {'name': 'Zoe', 'math': 97, 'physics': 95},
]
customized_table = pd.DataFrame.from_records(table_data)
customized_id = "customized-benchmark-00"
customized_query = "Who's got the highest score?"

In [None]:
# preview table
customized_table

Unnamed: 0,name,math,physics
0,Jay,98,99
1,Brian,93,100
2,Zoe,97,95


In [None]:
# preview query
processed_query = " ".join([str(p).lower() for p in nlp(customized_query)])
processed_query

"who 's got the highest score ?"

In [None]:
# construct inputs for every benchmark
tapas_inputs = []
str_table = format_table(customized_table)
tapas_inputs.append((customized_id, processed_query, str_table))
visqa_dt = [{
    "id": customized_id,
    "short_id": customized_id,
    "query": processed_query,
    "repr_answer": None,
    "rendered_table": customized_table,
}]

In [None]:
with open("{}/tapas_on_visqa_inputs.pkl".format(data_path), "wb") as f:
    pickle.dump(tapas_inputs, f)

In [None]:
with open("{}/visqa_dataset.pkl".format(data_path), "wb") as f:
  pickle.dump(visqa_dt, f)

# 3. Prepare the machine learning model

In [None]:
import tapas
print(tapas.__file__)

/usr/local/lib/python3.7/dist-packages/tapas/__init__.py


(Note) The following block only needs to be run once during the same runtime.

In [None]:
! gsutil cp "gs://tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_medium_reset.zip" "tapas_model.zip" && unzip tapas_model.zip
! mv tapas_wtq_wikisql_sqa_masklm_medium_reset tapas_model

Copying gs://tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_medium_reset.zip...
| [1 files][388.7 MiB/388.7 MiB]                                                
Operation completed over 1 objects/388.7 MiB.                                    
Archive:  tapas_model.zip
   creating: tapas_wtq_wikisql_sqa_masklm_medium_reset/
  inflating: tapas_wtq_wikisql_sqa_masklm_medium_reset/bert_config.json  
  inflating: tapas_wtq_wikisql_sqa_masklm_medium_reset/README.txt  
  inflating: tapas_wtq_wikisql_sqa_masklm_medium_reset/model.ckpt.index  
  inflating: tapas_wtq_wikisql_sqa_masklm_medium_reset/model.ckpt.data-00000-of-00001  
  inflating: tapas_wtq_wikisql_sqa_masklm_medium_reset/vocab.txt  
  inflating: tapas_wtq_wikisql_sqa_masklm_medium_reset/model.ckpt.meta  


The following two blocks import the model and necessary libraries.

In [None]:
import tensorflow.compat.v1 as tf
import os 
import shutil
import csv
import pandas as pd
import IPython

tf.get_logger().setLevel('ERROR')

In [None]:
from tapas.utils import tf_example_utils
from tapas.protos import interaction_pb2
from tapas.utils import number_annotation_utils
from tapas.scripts import prediction_utils

The following two blocks load the checkpoint and prediction functionalities.

In [None]:
os.makedirs('results/wtq/tf_examples', exist_ok=True)
os.makedirs('results/wtq/model', exist_ok=True)
with open('results/wtq/model/checkpoint', 'w') as f:
  f.write('model_checkpoint_path: "model.ckpt-0"')
for suffix in ['.data-00000-of-00001', '.index', '.meta']:
  shutil.copyfile(f'tapas_model/model.ckpt{suffix}', f'results/wtq/model/model.ckpt-0{suffix}')

In [None]:
max_seq_length = 512
vocab_file = "tapas_model/vocab.txt"
config = tf_example_utils.ClassifierConversionConfig(
    vocab_file=vocab_file,
    max_seq_length=max_seq_length,
    max_column_id=max_seq_length,
    max_row_id=max_seq_length,
    strip_column_names=False,
    add_aggregation_candidates=False,
)
converter = tf_example_utils.ToClassifierTensorflowExample(config)

def convert_interactions_to_examples(tables_and_queries):
  """Calls Tapas converter to convert interaction to example."""
  for idx, (table, queries) in enumerate(tables_and_queries):
    interaction = interaction_pb2.Interaction()
    for position, query in enumerate(queries):
      question = interaction.questions.add()
      question.original_text = query
      question.id = f"{idx}-0_{position}"
    for header in table[0]:
      interaction.table.columns.add().text = header
    for line in table[1:]:
      row = interaction.table.rows.add()
      for cell in line:
        row.cells.add().text = cell
    number_annotation_utils.add_numeric_values(interaction)
    for i in range(len(interaction.questions)):
      try:
        yield converter.convert(interaction, i)
      except ValueError as e:
        print(f"Can't convert interaction: {interaction.id} error: {e}")
        
def write_tf_example(filename, examples):
  with tf.io.TFRecordWriter(filename) as writer:
    for example in examples:
      writer.write(example.SerializeToString())

def aggregation_to_string(index):
  if index == 0:
    return "NONE"
  if index == 1:
    return "SUM"
  if index == 2:
    return "AVERAGE"
  if index == 3:
    return "COUNT"
  raise ValueError(f"Unknown index: {index}")

def predict(table_data, queries):
  table = [list(map(lambda s: s.strip(), row.split("|"))) 
           for row in table_data.split("\n") if row.strip()]
  examples = convert_interactions_to_examples([(table, queries)])
  write_tf_example("results/wtq/tf_examples/test.tfrecord", examples)
  write_tf_example("results/wtq/tf_examples/random-split-1-dev.tfrecord", [])
  
  ! python -m tapas.run_task_main \
    --task="WTQ" \
    --output_dir="results" \
    --noloop_predict \
    --test_batch_size={len(queries)} \
    --tapas_verbosity="ERROR" \
    --compression_type= \
    --reset_position_index_per_cell \
    --init_checkpoint="tapas_model/model.ckpt" \
    --bert_config_file="tapas_model/bert_config.json" \
    --mode="predict" 2> error


  results_path = "results/wtq/model/test.tsv"
  all_coordinates = []
  df = pd.DataFrame(table[1:], columns=table[0])
  # display(IPython.display.HTML(df.to_html(index=False)))
  # print()
  with open(results_path) as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t')
    for row in reader:
      coordinates = sorted(prediction_utils.parse_coordinates(row["answer_coordinates"]))
      all_coordinates.append(coordinates)
      answers = ', '.join([table[row + 1][col] for row, col in coordinates])
      position = int(row['position'])
      aggregation = aggregation_to_string(int(row["pred_aggr"]))
      print(">", queries[position])
      answer_text = str(answers)
      if aggregation != "NONE":
        answer_text = f"{aggregation} of {answer_text}"
      print(answer_text)
  return all_coordinates

# 4. Use the machine learning model to make initial predictions of your customized benchmark

Start collection. This may run for several hours if you have too many benchmarks.

(Note) If you see an error like "cap is not defined", please run the following block again until the error is gone.

In [None]:
%%capture cap --no-stderr

import pickle
with open("/content/tapas_on_visqa_outputs.pkl", "wb") as f:
  pickle.dump([],f)

import pickle
with open("/content/tapas_on_visqa_inputs.pkl", "rb") as f:
  dt = pickle.load(f)

for i in range(len(dt)):
  print("# processing benchmark id={}/{}, query={}".format(i, dt[i][0], dt[i][1]))
  predict(dt[i][2], [dt[i][1]])

with open("/content/tapas_on_visqa_outputs.log", "w") as f:
  f.write(cap.stdout)

Store the output as logs for future use.

In [None]:
print(cap.stdout)
print("# done")

# processing benchmark id=0/customized-benchmark-00, query=who 's got the highest score ?
is_built_with_cuda: True
is_gpu_available: False
GPUs: []
Training or predicting ...
Evaluation finished after training step 0.
> who 's got the highest score ?
Jay

# done


# 5. Construct new dataset for Poe by using the results generated so far

In [None]:
import pickle
import numpy as np
import pandas as pd
from Poe.trinity.utils.visqa import normalize_table, parse_value
from Poe.trinity.utils.visqa_strategy import strategy_TaPas_A, strategy_TaPas_B, strategy_TaPas_C

data_path = "/content/"

In [None]:
def interpret_answer(arg_line):
    if arg_line.startswith("COUNT of "):
        tmp_operands = [parse_value(p) for p in arg_line[len("COUNT of "):].split(", ")]
        return [len(tmp_operands)]
    elif arg_line.startswith("SUM of "):
        tmp_operands = [parse_value(p) for p in arg_line[len("SUM of "):].split(", ")]
        return [sum(tmp_operands)]
    elif arg_line.startswith("AVERAGE of "):
        tmp_operands = [parse_value(p) for p in arg_line[len("AVERAGE of "):].split(", ")]
        return [sum(tmp_operands)/len(tmp_operands)]
    else:
        # no ops
        tmp_operands = [parse_value(p) for p in arg_line.split(", ")]
        if len(tmp_operands)==0:
            return ["<no answer>"]
        elif len(tmp_operands)==1:
            if isinstance(tmp_operands[0], str) and tmp_operands[0].strip()=="":
                return ["<no answer>"]
            else:
                return [tmp_operands[0]]
        else:
            # len>1
            return sorted([p for p in tmp_operands], key=lambda x:str(x))
        
def extract_answers_from_logs(arg_logs):
    tmp_answers = []
    for i in range(len(arg_logs)):
        if arg_logs[i].startswith("Evaluation finished"):
            if arg_logs[i+1].startswith(">"):
                try:
                    tmp_answers.append(interpret_answer(arg_logs[i+2]))
                except TypeError:
                    tmp_answers.append(["<type error>"])
            else:
                # TaPas exception/error
                tmp_answers.append(["<tapas exception>"])
    return tmp_answers

In [None]:
with open("{}/visqa_dataset.pkl".format(data_path), "rb") as f:
    dt = pickle.load(f)
with open("{}/tapas_on_visqa_outputs.log".format(data_path), "r") as f:
    tapas_logs = f.readlines()
tapas_logs = extract_answers_from_logs(tapas_logs)
with open("{}/tapas_on_visqa_outputs.pkl".format(data_path), "rb") as f:
    tapas_outputs = pickle.load(f)
tapas_outputs = [tapas_outputs[i] for i in range(len(tapas_outputs)) if i%2!=0]

In [None]:
assert len(tapas_outputs)==len(dt)
assert len(tapas_logs)==len(dt)
len(tapas_outputs)

1

In [None]:
# first extract all the cell pointers with probs
tapas_parsed_outputs = []
for i in range(len(tapas_outputs)):
    # print("# i={}".format(i))
    if len(tapas_outputs[i])>0:
        p = tapas_outputs[i][0] # always at 0 since we pass 1 benchmark to TaPas at a time
        dop = p["pred_aggr"] # predicted operator
        qlist = p["probabilities"]>0 # find all cells with prob>0
        cpps = []
        for j in range(len(qlist)):
            if qlist[j]:
                drow = p["row_ids"][j]-1
                dcol = p["column_ids"][j]-1
                dprob= p["probabilities"][j]
                cpps.append((drow,dcol,dprob))
        cpps = sorted(cpps, key=lambda x:x[2], reverse=True)
        tapas_parsed_outputs.append((dop,cpps)) # (aggr, cpps)
    else:
        # no outputs, could be something wrong?
        print("# warning: no output for i={}".format(i))
        tapas_parsed_outputs.append((0,[])) # (aggr, cpps)

In [None]:
# then build "expected_output" table and "candidate_outputs" table
for i in range(len(dt)):
    print("\r# processing {}/{}".format(i, len(dt)), end="")
    p = dt[i]

    # placeholder
    dt[i]["expected_output"] = pd.DataFrame()

    tmp_outputs_original = tapas_logs[i]
    tmp_outputs_TaPas_A = strategy_TaPas_A(tapas_parsed_outputs[i], p["rendered_table"])
    tmp_outputs_TaPas_B = strategy_TaPas_B(tapas_parsed_outputs[i], p["rendered_table"])
    tmp_probs_TaPas_C, tmp_outputs_TaPas_C = strategy_TaPas_C(tapas_parsed_outputs[i], p["rendered_table"])
    dt[i]["candidate_outputs"] = {
        "TaPas_original": tmp_outputs_original,
        "TaPas_A": tmp_outputs_TaPas_A,
        "TaPas_B": tmp_outputs_TaPas_B,
        "TaPas_C": tmp_outputs_TaPas_C,
        "TaPas_probs_C": tmp_probs_TaPas_C,
    }


# processing 0/1

In [None]:
with open("{}/tapas_on_visqa_dataset.pkl".format(data_path), "wb") as f:
    pickle.dump(dt, f)

# 6. Run Poe here on your customized benchmark
- Since we are running customized benchmarks, we assume that there's no known groundtruth answer. So the command should not include `--expected-only` argument since that's for testing the groundtruth answer.

Note that due to Python compatibality issue (TaPas model works on Python 3.7, but Poe may experience errors in Python 3.7), if you encounter errors executing the following command, please follow the stpes below to run the customized dataset on the other machine that is ready for running Poe only:
1. Copy the file `/content/tapas_on_visqa_dataset.pkl` from colab to your local machine (open the "Files" view in colab so that you can browse the files and download them).
2. Specify `-i <your tapas_on_visqa_dataset.pkl>` path when executing the command in the next block.

In [None]:
! cd Poe/ && python ./test_TaPas_on_VisQA_benchmark.py --benchmark customized-benchmark-00 --dsl meta_visqa --skeletons visqa_simple --strategy TaPas_C  -i ../tapas_on_visqa_dataset.pkl

# parsed arguments: Namespace(benchmark='customized-benchmark-00', dataset='../tapas_on_visqa_dataset.pkl', dsl='meta_visqa', expected_only=False, fallback='none', mode='full', skeletons='visqa_simple', strategy='TaPas_C', timeout=0)
# loading benchmark...
# table keywords: {'100', 'zoe', '95', 'name', '98', 'jay', '93', '99', '97', 'physics', 'brian', 'math'}
# input type: [dtype('O'), dtype('int64'), dtype('int64')]
# input is:
    name  ...  physics
0    Jay  ...       99
1  Brian  ...      100
2    Zoe  ...       95

[3 rows x 3 columns]
# query is: who 's got the highest score ?
# expected output type:[]
# expected output is:
Empty DataFrame
Columns: []
Index: []
# inferred DSL terminals:
  # ConstVal: ['<NULL>']
     # cmap: []
  # AggrFunc: ['max', '<NULL>']
     # amap: [('highest', 'max')]
  # NumFunc: ['<NULL>']
     # nmap: []
  # BoolFunc: ['==', '<NULL>']
     # bmap: [(None, '==')]
  # IndFunc: ['eqmax', '<NULL>']
     # imap: [('highest', 'eqmax')]
# loading skeleton lis