<a href="https://colab.research.google.com/github/darshanja/tapas/blob/master/sqa_predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Clone and install the repository


In [None]:
! git clone https://github.com/google-research/tapas.git

Cloning into 'tapas'...
remote: Enumerating objects: 172, done.[K
remote: Counting objects: 100% (172/172), done.[K
remote: Compressing objects: 100% (122/122), done.[K
remote: Total 172 (delta 72), reused 141 (delta 43), pack-reused 0[K
Receiving objects: 100% (172/172), 180.64 KiB | 518.00 KiB/s, done.
Resolving deltas: 100% (72/72), done.


In [None]:
! pip install ./tapas

Processing ./tapas
Collecting apache-beam[gcp]==2.20.0
[?25l  Downloading https://files.pythonhosted.org/packages/4b/0d/0979ad626578a52887f7df60492ac6759089a9da261ac4c88b112b3f6a5a/apache_beam-2.20.0-cp36-cp36m-manylinux1_x86_64.whl (3.5MB)
[K     |████████████████████████████████| 3.5MB 2.7MB/s 
[?25hCollecting frozendict==1.2
  Downloading https://files.pythonhosted.org/packages/4e/55/a12ded2c426a4d2bee73f88304c9c08ebbdbadb82569ebdd6a0c007cfd08/frozendict-1.2.tar.gz
Collecting tf-models-official~=2.2.0
[?25l  Downloading https://files.pythonhosted.org/packages/77/74/14ec628a5be6ef83b95aaddbbc2d19277fb8d3d497188b615ac1377d1bfc/tf_models_official-2.2.1-py2.py3-none-any.whl (711kB)
[K     |████████████████████████████████| 716kB 20.8MB/s 
Collecting tf_slim~=1.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/02/97/b0f4a64df018ca018cc035d44f2ef08f91e2e8aa67271f6f19633a015ff7/tf_slim-1.1.0-py2.py3-none-any.whl (352kB)
[K     |████████████████████████████████| 358kB 30

# Fetch models fom Google Storage

Next we can get pretrained checkpoint from Google Storage. For the sake of speed, this is base sized model trained on [SQA](https://www.microsoft.com/en-us/download/details.aspx?id=54253). Note that best results in the paper were obtained with with a large model, with 24 layers instead of 12.

In [None]:
! gsutil cp gs://tapas_models/2020_04_21/tapas_sqa_base.zip . && unzip tapas_sqa_base.zip

Copying gs://tapas_models/2020_04_21/tapas_sqa_base.zip...
/ [1 files][  1.0 GiB/  1.0 GiB]   19.9 MiB/s                                   
Operation completed over 1 objects/1.0 GiB.                                      
Archive:  tapas_sqa_base.zip
   creating: tapas_sqa_base/
  inflating: tapas_sqa_base/model.ckpt.data-00000-of-00001  
  inflating: tapas_sqa_base/model.ckpt.index  
  inflating: tapas_sqa_base/README.txt  
  inflating: tapas_sqa_base/vocab.txt  
  inflating: tapas_sqa_base/bert_config.json  
  inflating: tapas_sqa_base/model.ckpt.meta  


# Imports

In [None]:
import tensorflow.compat.v1 as tf
import os 
import shutil
import csv
import pandas as pd
import IPython

tf.get_logger().setLevel('ERROR')

In [None]:
from tapas.utils import tf_example_utils
from tapas.protos import interaction_pb2
from tapas.utils import number_annotation_utils
from tapas.scripts import prediction_utils

# Load checkpoint for prediction

Here's the prediction code, which will create and `interaction_pb2.Interaction` protobuf object, which is the datastructure we use to store examples, and then call the prediction script.

In [None]:
os.makedirs('results/sqa/tf_examples', exist_ok=True)
os.makedirs('results/sqa/model', exist_ok=True)
with open('results/sqa/model/checkpoint', 'w') as f:
  f.write('model_checkpoint_path: "model.ckpt-0"')
for suffix in ['.data-00000-of-00001', '.index', '.meta']:
  shutil.copyfile(f'tapas_sqa_base/model.ckpt{suffix}', f'results/sqa/model/model.ckpt-0{suffix}')

In [None]:
max_seq_length = 512
vocab_file = "tapas_sqa_base/vocab.txt"
config = tf_example_utils.ClassifierConversionConfig(
    vocab_file=vocab_file,
    max_seq_length=max_seq_length,
    max_column_id=max_seq_length,
    max_row_id=max_seq_length,
    strip_column_names=False,
    add_aggregation_candidates=False,
)
converter = tf_example_utils.ToClassifierTensorflowExample(config)

def convert_interactions_to_examples(tables_and_queries):
  """Calls Tapas converter to convert interaction to example."""
  for idx, (table, queries) in enumerate(tables_and_queries):
    interaction = interaction_pb2.Interaction()
    for position, query in enumerate(queries):
      question = interaction.questions.add()
      question.original_text = query
      question.id = f"{idx}-0_{position}"
    for header in table[0]:
      interaction.table.columns.add().text = header
    for line in table[1:]:
      row = interaction.table.rows.add()
      for cell in line:
        row.cells.add().text = cell
    number_annotation_utils.add_numeric_values(interaction)
    for i in range(len(interaction.questions)):
      try:
        yield converter.convert(interaction, i)
      except ValueError as e:
        print(f"Can't convert interaction: {interaction.id} error: {e}")
        
def write_tf_example(filename, examples):
  with tf.io.TFRecordWriter(filename) as writer:
    for example in examples:
      writer.write(example.SerializeToString())

def predict(table_data, queries):
  table = [list(map(lambda s: s.strip(), row.split("|"))) 
           for row in table_data.split("\n") if row.strip()]
  examples = convert_interactions_to_examples([(table, queries)])
  write_tf_example("results/sqa/tf_examples/test.tfrecord", examples)
  write_tf_example("results/sqa/tf_examples/random-split-1-dev.tfrecord", [])
  
  ! python tapas/tapas/run_task_main.py \
    --task="SQA" \
    --output_dir="results" \
    --noloop_predict \
    --test_batch_size={len(queries)} \
    --tapas_verbosity="ERROR" \
    --compression_type= \
    --init_checkpoint="tapas_sqa_base/model.ckpt" \
    --bert_config_file="tapas_sqa_base/bert_config.json" \
    --mode="predict" 2> error


  results_path = "results/sqa/model/test_sequence.tsv"
  all_coordinates = []
  df = pd.DataFrame(table[1:], columns=table[0])
  display(IPython.display.HTML(df.to_html(index=False)))
  print()
  with open(results_path) as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t')
    for row in reader:
      coordinates = prediction_utils.parse_coordinates(row["answer_coordinates"])
      all_coordinates.append(coordinates)
      answers = ', '.join([table[row + 1][col] for row, col in coordinates])
      position = int(row['position'])
      print(">", queries[position])
      print(answers)
  return all_coordinates

# Predict

In [None]:
result = predict("""
code          | customer      | city          | working area | country  | grade
C00001        | Micheal       | New York      | New York     | USA      | 2     
C00013        | Holmes        | London        | London       | UK       | 2     
C00020        | Albert        | New York      | New York     | USA      | 3     
C00025        | Ravindran     | Bangalore     | Bangalore    | India    | 2     
C00024        | Cook          | London        | London       | UK       | 2     
C00015        | Stuart        | London        | London       | UK       | 1     
C00002        | Bolt          | New York      | New York     | USA      | 3     
C00018        | Fleming       | Brisban       | Brisban      | Australia| 2     
C00021        | Jacks         | Brisban       | Brisban      | Australia| 1     
C00019        | Yearannaidu   | Chennai       | Chennai      | India    | 1     
C00005        | Sasikant      | Mumbai        | Mumbai       | India    | 1     
C00007        | Ramanathan    | Chennai       | Chennai      | India    | 1     
C00022        | Avinash       | Mumbai        | Mumbai       | India    | 2     
C00004        | Winston       | Brisban       | Brisban      | Australia| 1     
C00023        | Karl          | London        | London       | UK       | 0     
C00006        | Shilton       | Torento       | Torento      | Canada   | 1     
C00010        | Charles       | Hampshair     | Hampshair    | UK       | 3     
C00017        | Srinivas      | Bangalore     | Bangalore    | India    | 2     
C00012        | Steven        | San Jose      | San Jose     | USA      | 1     
C00008        | Karolina      | Torento       | Torento      | Canada   | 1     
C00003        | Martin        | Torento       | Torento      | Canada   | 2     
C00009        | Ramesh        | Mumbai        | Mumbai       | India    | 3     
C00014        | Rangarappa    | Bangalore     | Bangalore    | India    | 2     
C00016        | Venkatpati    | Bangalore     | Bangalore    | India    | 2
C00011        | Sundariya     | Chennai       | Chennai      | India    | 3
""", ["what were the customer names?",
      "of these, who scored less that 2 grade?",
      "whose opening amount is highest?"])

is_built_with_cuda: True
is_gpu_available: True
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Training or predicting ...
Evaluation finished after training step 0.


code,customer,city,working area,country,grade
C00001,Micheal,New York,New York,USA,2
C00013,Holmes,London,London,UK,2
C00020,Albert,New York,New York,USA,3
C00025,Ravindran,Bangalore,Bangalore,India,2
C00024,Cook,London,London,UK,2
C00015,Stuart,London,London,UK,1
C00002,Bolt,New York,New York,USA,3
C00018,Fleming,Brisban,Brisban,Australia,2
C00021,Jacks,Brisban,Brisban,Australia,1
C00019,Yearannaidu,Chennai,Chennai,India,1



> what were the customer names?
Avinash, Yearannaidu, Albert, Shilton, Stuart, Steven, Jacks, Ramesh, Karl, Ramanathan, Sundariya, Cook, Holmes, Fleming, Martin, Srinivas, Sasikant, Venkatpati, Micheal, Winston, Bolt, Ravindran, Charles, Rangarappa, Karolina
> of these, who scored less that 2 grade?
Winston, Yearannaidu, Jacks, Shilton, Karl, Ramanathan, Sasikant, Stuart, Karolina, Steven
> whose opening amount is highest?
Steven


In [None]:
result = predict("""
code          | customer      | city          | working area | country  | grade
C00001        | Micheal       | New York      | New York     | USA      | 2     
C00013        | Holmes        | London        | London       | UK       | 2     
C00020        | Albert        | New York      | New York     | USA      | 3     
C00025        | Ravindran     | Bangalore     | Bangalore    | India    | 2     
C00024        | Cook          | London        | London       | UK       | 2     
C00015        | Stuart        | London        | London       | UK       | 1     
C00002        | Bolt          | New York      | New York     | USA      | 3     
C00018        | Fleming       | Brisban       | Brisban      | Australia| 2     
C00021        | Jacks         | Brisban       | Brisban      | Australia| 1     
C00019        | Yearannaidu   | Chennai       | Chennai      | India    | 1     
C00005        | Sasikant      | Mumbai        | Mumbai       | India    | 1     
C00007        | Ramanathan    | Chennai       | Chennai      | India    | 1     
C00022        | Avinash       | Mumbai        | Mumbai       | India    | 2     
C00004        | Winston       | Brisban       | Brisban      | Australia| 1     
C00023        | Karl          | London        | London       | UK       | 0     
C00006        | Shilton       | Torento       | Torento      | Canada   | 1     
C00010        | Charles       | Hampshair     | Hampshair    | UK       | 3     
C00017        | Srinivas      | Bangalore     | Bangalore    | India    | 2     
C00012        | Steven        | San Jose      | San Jose     | USA      | 1     
C00008        | Karolina      | Torento       | Torento      | Canada   | 1     
C00003        | Martin        | Torento       | Torento      | Canada   | 2     
C00009        | Ramesh        | Mumbai        | Mumbai       | India    | 3     
C00014        | Rangarappa    | Bangalore     | Bangalore    | India    | 2     
C00016        | Venkatpati    | Bangalore     | Bangalore    | India    | 2
C00011        | Sundariya     | Chennai       | Chennai      | India    | 3
""", ["venkatapati belongs to which city?"])

is_built_with_cuda: True
is_gpu_available: True
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Training or predicting ...
Evaluation finished after training step 0.


code,customer,city,working area,country,grade
C00001,Micheal,New York,New York,USA,2
C00013,Holmes,London,London,UK,2
C00020,Albert,New York,New York,USA,3
C00025,Ravindran,Bangalore,Bangalore,India,2
C00024,Cook,London,London,UK,2
C00015,Stuart,London,London,UK,1
C00002,Bolt,New York,New York,USA,3
C00018,Fleming,Brisban,Brisban,Australia,2
C00021,Jacks,Brisban,Brisban,Australia,1
C00019,Yearannaidu,Chennai,Chennai,India,1



> venkatapati belongs to which city?
New York


In [None]:
result = predict("""
code          | customer      | city          | working area | country  | grade
C00001        | Micheal       | New York      | New York     | USA      | 2     
C00013        | Holmes        | London        | London       | UK       | 2     
C00020        | Albert        | New York      | New York     | USA      | 3     
C00025        | Ravindran     | Bangalore     | Bangalore    | India    | 2     
C00024        | Cook          | London        | London       | UK       | 2     
C00015        | Stuart        | London        | London       | UK       | 1     
C00002        | Bolt          | New York      | New York     | USA      | 3     
C00018        | Fleming       | Brisban       | Brisban      | Australia| 2     
C00021        | Jacks         | Brisban       | Brisban      | Australia| 1     
C00019        | Yearannaidu   | Chennai       | Chennai      | India    | 1     
C00005        | Sasikant      | Mumbai        | Mumbai       | India    | 1     
C00007        | Ramanathan    | Chennai       | Chennai      | India    | 1     
C00022        | Avinash       | Mumbai        | Mumbai       | India    | 2     
C00004        | Winston       | Brisban       | Brisban      | Australia| 1     
C00023        | Karl          | London        | London       | UK       | 0     
C00006        | Shilton       | Torento       | Torento      | Canada   | 1     
C00010        | Charles       | Hampshair     | Hampshair    | UK       | 3     
C00017        | Srinivas      | Bangalore     | Bangalore    | India    | 2     
C00012        | Steven        | San Jose      | San Jose     | USA      | 1     
C00008        | Karolina      | Torento       | Torento      | Canada   | 1     
C00003        | Martin        | Torento       | Torento      | Canada   | 2     
C00009        | Ramesh        | Mumbai        | Mumbai       | India    | 3     
C00014        | Rangarappa    | Bangalore     | Bangalore    | India    | 2     
C00016        | Venkatpati    | Bangalore     | Bangalore    | India    | 2
C00011        | Sundariya     | Chennai       | Chennai      | India    | 3
""", ["who are the customers belong to India"])

is_built_with_cuda: True
is_gpu_available: True
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Training or predicting ...
Evaluation finished after training step 0.


code,customer,city,working area,country,grade
C00001,Micheal,New York,New York,USA,2
C00013,Holmes,London,London,UK,2
C00020,Albert,New York,New York,USA,3
C00025,Ravindran,Bangalore,Bangalore,India,2
C00024,Cook,London,London,UK,2
C00015,Stuart,London,London,UK,1
C00002,Bolt,New York,New York,USA,3
C00018,Fleming,Brisban,Brisban,Australia,2
C00021,Jacks,Brisban,Brisban,Australia,1
C00019,Yearannaidu,Chennai,Chennai,India,1



> who are the customers belong to India
Avinash, Yearannaidu, Ramesh, Ravindran, Srinivas, Ramanathan, Sasikant, Venkatpati, Sundariya, Rangarappa


In [None]:
result = predict("""
code          | customer      | city          | working area | country  | grade
C00001        | Micheal       | New York      | New York     | USA      | 2     
C00013        | Holmes        | London        | London       | UK       | 2     
C00020        | Albert        | New York      | New York     | USA      | 3     
C00025        | Ravindran     | Bangalore     | Bangalore    | India    | 2     
C00024        | Cook          | London        | London       | UK       | 2     
C00015        | Stuart        | London        | London       | UK       | 1     
C00002        | Bolt          | New York      | New York     | USA      | 3     
C00018        | Fleming       | Brisban       | Brisban      | Australia| 2     
C00021        | Jacks         | Brisban       | Brisban      | Australia| 1     
C00019        | Yearannaidu   | Chennai       | Chennai      | India    | 1     
C00005        | Sasikant      | Mumbai        | Mumbai       | India    | 1     
C00007        | Ramanathan    | Chennai       | Chennai      | India    | 1     
C00022        | Avinash       | Mumbai        | Mumbai       | India    | 2     
C00004        | Winston       | Brisban       | Brisban      | Australia| 1     
C00023        | Karl          | London        | London       | UK       | 0     
C00006        | Shilton       | Torento       | Torento      | Canada   | 1     
C00010        | Charles       | Hampshair     | Hampshair    | UK       | 3     
C00017        | Srinivas      | Bangalore     | Bangalore    | India    | 2     
C00012        | Steven        | San Jose      | San Jose     | USA      | 1     
C00008        | Karolina      | Torento       | Torento      | Canada   | 1     
C00003        | Martin        | Torento       | Torento      | Canada   | 2     
C00009        | Ramesh        | Mumbai        | Mumbai       | India    | 3     
C00014        | Rangarappa    | Bangalore     | Bangalore    | India    | 2     
C00016        | Venkatpati    | Bangalore     | Bangalore    | India    | 2
C00011        | Sundariya     | Chennai       | Chennai      | India    | 3
""", ["Average of grade"])

is_built_with_cuda: True
is_gpu_available: True
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Training or predicting ...
Evaluation finished after training step 0.


code,customer,city,working area,country,grade
C00001,Micheal,New York,New York,USA,2
C00013,Holmes,London,London,UK,2
C00020,Albert,New York,New York,USA,3
C00025,Ravindran,Bangalore,Bangalore,India,2
C00024,Cook,London,London,UK,2
C00015,Stuart,London,London,UK,1
C00002,Bolt,New York,New York,USA,3
C00018,Fleming,Brisban,Brisban,Australia,2
C00021,Jacks,Brisban,Brisban,Australia,1
C00019,Yearannaidu,Chennai,Chennai,India,1



> Average of grade
2, 3, 1, 1, 1, 0, 1, 1, 3, 2, 3, 2, 2, 2, 2, 2, 3, 2, 1, 2, 3, 1, 2, 1, 1


In [None]:
result = predict("""
code          | customer      | city          | working area | country  | grade
C00001        | Micheal       | New York      | New York     | USA      | 2     
C00013        | Holmes        | London        | London       | UK       | 2     
C00020        | Albert        | New York      | New York     | USA      | 3     
C00025        | Ravindran     | Bangalore     | Bangalore    | India    | 2     
C00024        | Cook          | London        | London       | UK       | 2     
C00015        | Stuart        | London        | London       | UK       | 1     
C00002        | Bolt          | New York      | New York     | USA      | 3     
C00018        | Fleming       | Brisban       | Brisban      | Australia| 2     
C00021        | Jacks         | Brisban       | Brisban      | Australia| 1     
C00019        | Yearannaidu   | Chennai       | Chennai      | India    | 1     
C00005        | Sasikant      | Mumbai        | Mumbai       | India    | 1     
C00007        | Ramanathan    | Chennai       | Chennai      | India    | 1     
C00022        | Avinash       | Mumbai        | Mumbai       | India    | 2     
C00004        | Winston       | Brisban       | Brisban      | Australia| 1     
C00023        | Karl          | London        | London       | UK       | 0     
C00006        | Shilton       | Torento       | Torento      | Canada   | 1     
C00010        | Charles       | Hampshair     | Hampshair    | UK       | 3     
C00017        | Srinivas      | Bangalore     | Bangalore    | India    | 2     
C00012        | Steven        | San Jose      | San Jose     | USA      | 1     
C00008        | Karolina      | Torento       | Torento      | Canada   | 1     
C00003        | Martin        | Torento       | Torento      | Canada   | 2     
C00009        | Ramesh        | Mumbai        | Mumbai       | India    | 3     
C00014        | Rangarappa    | Bangalore     | Bangalore    | India    | 2     
C00016        | Venkatpati    | Bangalore     | Bangalore    | India    | 2
C00011        | Sundariya     | Chennai       | Chennai      | India    | 3
""", ["country having more number of customers"])

is_built_with_cuda: True
is_gpu_available: True
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Training or predicting ...
Evaluation finished after training step 0.


code,customer,city,working area,country,grade
C00001,Micheal,New York,New York,USA,2
C00013,Holmes,London,London,UK,2
C00020,Albert,New York,New York,USA,3
C00025,Ravindran,Bangalore,Bangalore,India,2
C00024,Cook,London,London,UK,2
C00015,Stuart,London,London,UK,1
C00002,Bolt,New York,New York,USA,3
C00018,Fleming,Brisban,Brisban,Australia,2
C00021,Jacks,Brisban,Brisban,Australia,1
C00019,Yearannaidu,Chennai,Chennai,India,1



> country having more number of customers
USA, USA
