<a href="https://colab.research.google.com/github/chyanju/Poe/blob/main/public_PLDI22AE_Poe_TaPas_on_VisQA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Poe: Collecting TaPas results on VisQA dataset.
This colab notebook is modified by ***Poe*** from the original TaPas tool. Follow the instructions to get the prediction results.

##### Copyright 2020 The Google AI Language Team Authors

Licensed under the Apache License, Version 2.0 (the "License");

In [None]:
# Copyright 2019 The Google AI Language Team Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# 1. Clone and install the repository


First, let's install the code.

In [None]:
! pip install tapas-table-parsing==0.0.1.dev0

**Poe**: You need to restart the runtime and resume from here.

**Poe**: Replace the `run_task_main.py` with a customized one for Poe. This block only needs to be run **once** during the runtime.

In [None]:
! git clone https://github.com/chyanju/Poe.git
! cp /content/Poe/benchmarks/VisQA/shared/run_task_main.py /usr/local/lib/python3.7/dist-packages/tapas/
! cp /content/Poe/benchmarks/VisQA/shared/tapas_on_visqa_inputs.pkl /content/

In [None]:
import tapas
print(tapas.__file__)

# 2. Fetch models fom Google Storage

**Poe**: This block only needs to be run once during the runtime.

In [None]:
! gsutil cp "gs://tapas_models/2020_08_05/tapas_wtq_wikisql_sqa_masklm_medium_reset.zip" "tapas_model.zip" && unzip tapas_model.zip
! mv tapas_wtq_wikisql_sqa_masklm_medium_reset tapas_model

# 3. Imports

In [None]:
import tensorflow.compat.v1 as tf
import os 
import shutil
import csv
import pandas as pd
import IPython

tf.get_logger().setLevel('ERROR')

In [None]:
from tapas.utils import tf_example_utils
from tapas.protos import interaction_pb2
from tapas.utils import number_annotation_utils
from tapas.scripts import prediction_utils

# 4. Load checkpoint for prediction

Here's the prediction code, which will create and `interaction_pb2.Interaction` protobuf object, which is the datastructure we use to store examples, and then call the prediction script.

In [None]:
os.makedirs('results/wtq/tf_examples', exist_ok=True)
os.makedirs('results/wtq/model', exist_ok=True)
with open('results/wtq/model/checkpoint', 'w') as f:
  f.write('model_checkpoint_path: "model.ckpt-0"')
for suffix in ['.data-00000-of-00001', '.index', '.meta']:
  shutil.copyfile(f'tapas_model/model.ckpt{suffix}', f'results/wtq/model/model.ckpt-0{suffix}')

In [None]:
max_seq_length = 512
vocab_file = "tapas_model/vocab.txt"
config = tf_example_utils.ClassifierConversionConfig(
    vocab_file=vocab_file,
    max_seq_length=max_seq_length,
    max_column_id=max_seq_length,
    max_row_id=max_seq_length,
    strip_column_names=False,
    add_aggregation_candidates=False,
)
converter = tf_example_utils.ToClassifierTensorflowExample(config)

def convert_interactions_to_examples(tables_and_queries):
  """Calls Tapas converter to convert interaction to example."""
  for idx, (table, queries) in enumerate(tables_and_queries):
    interaction = interaction_pb2.Interaction()
    for position, query in enumerate(queries):
      question = interaction.questions.add()
      question.original_text = query
      question.id = f"{idx}-0_{position}"
    for header in table[0]:
      interaction.table.columns.add().text = header
    for line in table[1:]:
      row = interaction.table.rows.add()
      for cell in line:
        row.cells.add().text = cell
    number_annotation_utils.add_numeric_values(interaction)
    for i in range(len(interaction.questions)):
      try:
        yield converter.convert(interaction, i)
      except ValueError as e:
        print(f"Can't convert interaction: {interaction.id} error: {e}")
        
def write_tf_example(filename, examples):
  with tf.io.TFRecordWriter(filename) as writer:
    for example in examples:
      writer.write(example.SerializeToString())

def aggregation_to_string(index):
  if index == 0:
    return "NONE"
  if index == 1:
    return "SUM"
  if index == 2:
    return "AVERAGE"
  if index == 3:
    return "COUNT"
  raise ValueError(f"Unknown index: {index}")

def predict(table_data, queries):
  table = [list(map(lambda s: s.strip(), row.split("|"))) 
           for row in table_data.split("\n") if row.strip()]
  examples = convert_interactions_to_examples([(table, queries)])
  write_tf_example("results/wtq/tf_examples/test.tfrecord", examples)
  write_tf_example("results/wtq/tf_examples/random-split-1-dev.tfrecord", [])
  
  ! python -m tapas.run_task_main \
    --task="WTQ" \
    --output_dir="results" \
    --noloop_predict \
    --test_batch_size={len(queries)} \
    --tapas_verbosity="ERROR" \
    --compression_type= \
    --reset_position_index_per_cell \
    --init_checkpoint="tapas_model/model.ckpt" \
    --bert_config_file="tapas_model/bert_config.json" \
    --mode="predict" 2> error


  results_path = "results/wtq/model/test.tsv"
  all_coordinates = []
  df = pd.DataFrame(table[1:], columns=table[0])
  # display(IPython.display.HTML(df.to_html(index=False)))
  # print()
  with open(results_path) as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t')
    for row in reader:
      coordinates = sorted(prediction_utils.parse_coordinates(row["answer_coordinates"]))
      all_coordinates.append(coordinates)
      answers = ', '.join([table[row + 1][col] for row, col in coordinates])
      position = int(row['position'])
      aggregation = aggregation_to_string(int(row["pred_aggr"]))
      print(">", queries[position])
      answer_text = str(answers)
      if aggregation != "NONE":
        answer_text = f"{aggregation} of {answer_text}"
      print(answer_text)
  return all_coordinates

# 5. Predict

**Poe**: Make an initial (empty) output file first.

In [None]:
import pickle
with open("/content/tapas_on_visqa_outputs.pkl", "wb") as f:
  pickle.dump([],f)

**Poe**: Then we read the inputs, predict and collect all the raw results.

In [None]:
import pickle
with open("/content/tapas_on_visqa_inputs.pkl", "rb") as f:
  dt = pickle.load(f)

**Poe**: Start collection. This may run for several hours.

In [None]:
for i in range(len(dt)):
  print("# processing benchmark id={}/{}, query={}".format(i, dt[i][0], dt[i][1]))
  predict(dt[i][2], [dt[i][1]])

In [None]:
print("# done")