<a href="https://colab.research.google.com/github/google-research/tapas/blob/master/notebooks/sqa_predictions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##### Copyright 2020 Priyal Narang

*   List item
*   List item



Licensed under the Apache License, Version 2.0 (the "License");

In [0]:
# Copyright 2020 Priyal Narang.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Running a Tapas fine-tuned checkpoint
---
This notebook shows how to load and make predictions with TAPAS model, which was introduced in the paper: [TAPAS: Weakly Supervised Table Parsing via Pre-training](https://arxiv.org/abs/2004.02349)

# Clone and install the repository


First, let's fetch the code from the github repository and install it

In [2]:
! git clone https://github.com/google-research/tapas.git

Cloning into 'tapas'...
remote: Enumerating objects: 109, done.[K
remote: Counting objects:   0% (1/109)[Kremote: Counting objects:   1% (2/109)[Kremote: Counting objects:   2% (3/109)[Kremote: Counting objects:   3% (4/109)[Kremote: Counting objects:   4% (5/109)[Kremote: Counting objects:   5% (6/109)[Kremote: Counting objects:   6% (7/109)[Kremote: Counting objects:   7% (8/109)[Kremote: Counting objects:   8% (9/109)[Kremote: Counting objects:   9% (10/109)[Kremote: Counting objects:  10% (11/109)[Kremote: Counting objects:  11% (12/109)[Kremote: Counting objects:  12% (14/109)[Kremote: Counting objects:  13% (15/109)[Kremote: Counting objects:  14% (16/109)[Kremote: Counting objects:  15% (17/109)[Kremote: Counting objects:  16% (18/109)[Kremote: Counting objects:  17% (19/109)[Kremote: Counting objects:  18% (20/109)[Kremote: Counting objects:  19% (21/109)[Kremote: Counting objects:  20% (22/109)[Kremote: Counting objects:  21% (23/109)

In [3]:
! pip uninstall -y tensorflow
! pip install ./tapas

Uninstalling tensorflow-2.2.0:
  Successfully uninstalled tensorflow-2.2.0
Processing ./tapas
Collecting bert-tensorflow==1.0.1
[?25l  Downloading https://files.pythonhosted.org/packages/a6/66/7eb4e8b6ea35b7cc54c322c816f976167a43019750279a8473d355800a93/bert_tensorflow-1.0.1-py2.py3-none-any.whl (67kB)
[K     |████████████████████████████████| 71kB 1.9MB/s 
[?25hCollecting frozendict==1.2
  Downloading https://files.pythonhosted.org/packages/4e/55/a12ded2c426a4d2bee73f88304c9c08ebbdbadb82569ebdd6a0c007cfd08/frozendict-1.2.tar.gz
Collecting tensorflow-gpu~=1.14.0
[?25l  Downloading https://files.pythonhosted.org/packages/76/04/43153bfdfcf6c9a4c38ecdb971ca9a75b9a791bb69a764d652c359aca504/tensorflow_gpu-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (377.0MB)
[K     |████████████████████████████████| 377.0MB 45kB/s 
[?25hCollecting tensorflow-probability==0.7.0
[?25l  Downloading https://files.pythonhosted.org/packages/3e/3a/c10b6c22320531c774402ac7186d1b673374e2a9d12502cbc8d811e4601c/ten

# Fetch models fom Google Storage

Next we can get pretrained checkpoint from Google Storage. For the sake of speed, this is base sized model trained on [SQA](https://www.microsoft.com/en-us/download/details.aspx?id=54253). Note that best results in the paper were obtained with with a large model, with 24 layers instead of 12.

In [4]:
! gsutil cp gs://tapas_models/2020_04_21/tapas_sqa_base.zip . && unzip tapas_sqa_base.zip

Copying gs://tapas_models/2020_04_21/tapas_sqa_base.zip...
| [1 files][  1.0 GiB/  1.0 GiB]   27.1 MiB/s                                   
Operation completed over 1 objects/1.0 GiB.                                      
Archive:  tapas_sqa_base.zip
   creating: tapas_sqa_base/
  inflating: tapas_sqa_base/model.ckpt.data-00000-of-00001  
  inflating: tapas_sqa_base/model.ckpt.index  
  inflating: tapas_sqa_base/README.txt  
  inflating: tapas_sqa_base/vocab.txt  
  inflating: tapas_sqa_base/bert_config.json  
  inflating: tapas_sqa_base/model.ckpt.meta  


# Imports

In [5]:
import tensorflow.compat.v1 as tf
import os 
import shutil
import csv
import pandas as pd
import IPython

tf.get_logger().setLevel('ERROR')

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [0]:
from tapas.utils import tf_example_utils
from tapas.protos import interaction_pb2
from tapas.utils import number_annotation_utils
from tapas.scripts import prediction_utils

# Load checkpoint for prediction

Here's the prediction code, which will create and `interaction_pb2.Interaction` protobuf object, which is the datastructure we use to store examples, and then call the prediction script.

In [0]:
os.makedirs('results/sqa/tf_examples', exist_ok=True)
os.makedirs('results/sqa/model', exist_ok=True)
with open('results/sqa/model/checkpoint', 'w') as f:
  f.write('model_checkpoint_path: "model.ckpt-0"')
for suffix in ['.data-00000-of-00001', '.index', '.meta']:
  shutil.copyfile(f'tapas_sqa_base/model.ckpt{suffix}', f'results/sqa/model/model.ckpt-0{suffix}')

In [0]:
max_seq_length = 512
vocab_file = "tapas_sqa_base/vocab.txt"
config = tf_example_utils.ClassifierConversionConfig(
    vocab_file=vocab_file,
    max_seq_length=max_seq_length,
    max_column_id=max_seq_length,
    max_row_id=max_seq_length,
    strip_column_names=False,
    add_aggregation_candidates=False,
)
converter = tf_example_utils.ToClassifierTensorflowExample(config)

def convert_interactions_to_examples(tables_and_queries):
  """Calls Tapas converter to convert interaction to example."""
  for idx, (table, queries) in enumerate(tables_and_queries):
    interaction = interaction_pb2.Interaction()
    for position, query in enumerate(queries):
      question = interaction.questions.add()
      question.original_text = query
      question.id = f"{idx}-0_{position}"
    for header in table[0]:
      interaction.table.columns.add().text = header
    for line in table[1:]:
      row = interaction.table.rows.add()
      for cell in line:
        row.cells.add().text = cell
    number_annotation_utils.add_numeric_values(interaction)
    for i in range(len(interaction.questions)):
      try:
        yield converter.convert(interaction, i)
      except ValueError as e:
        print(f"Can't convert interaction: {interaction.id} error: {e}")
        
def write_tf_example(filename, examples):
  with tf.io.TFRecordWriter(filename) as writer:
    for example in examples:
      writer.write(example.SerializeToString())

def predict(table_data, queries):
  table = [list(map(lambda s: s.strip(), row.split("|"))) 
           for row in table_data.split("\n") if row.strip()]
  examples = convert_interactions_to_examples([(table, queries)])
  write_tf_example("results/sqa/tf_examples/test.tfrecord", examples)
  write_tf_example("results/sqa/tf_examples/random-split-1-dev.tfrecord", [])
  
  ! python tapas/tapas/run_task_main.py \
    --task="SQA" \
    --output_dir="results" \
    --noloop_predict \
    --test_batch_size={len(queries)} \
    --tapas_verbosity="ERROR" \
    --compression_type= \
    --init_checkpoint="tapas_sqa_base/model.ckpt" \
    --bert_config_file="tapas_sqa_base/bert_config.json" \
    --mode="predict" 2> error


  results_path = "results/sqa/model/test_sequence.tsv"
  all_coordinates = []
  df = pd.DataFrame(table[1:], columns=table[0])
  display(IPython.display.HTML(df.to_html(index=False)))
  print()
  with open(results_path) as csvfile:
    reader = csv.DictReader(csvfile, delimiter='\t')
    for row in reader:
      coordinates = prediction_utils.parse_coordinates(row["answer_coordinates"])
      all_coordinates.append(coordinates)
      answers = ', '.join([table[row + 1][col] for row, col in coordinates])
      position = int(row['position'])
      print(">", queries[position])
      print(answers)
  return all_coordinates

# Predictions


In [11]:
#Performs well on simple SELECT Queries not involving any aggregation operation
res1 = predict("""
ID  | CustomerName         | Address               | PostalCode       |Age   | Country
1   | Patrick              | Obere Str. 57         | 12209            | 23   | Germany  
2   | Alfred               | Via dei Fiorentini 66 | 05021            | 50   | Italy      
3   | Priyal               | Dwarka                | 110078           | 22   | India
4   | Siddharth            | Dwarka                | 110078           | 22   | India  
5   | Alex                 | California            | 90201            | 34   | Spain  

""", ["what were the customer names?",
      "how many countries are present?",
      "which customers have the same country?",
      "what is the average age of customers?",
      "which customer has the maximum age?"])

is_built_with_cuda: True
is_gpu_available: True
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Training or predicting ...
Evaluation finished after training step 0.


ID,CustomerName,Address,PostalCode,Age,Country
1,Patrick,Obere Str. 57,12209,23,Germany
2,Alfred,Via dei Fiorentini 66,5021,50,Italy
3,Priyal,Dwarka,110078,22,India
4,Siddharth,Dwarka,110078,22,India
5,Alex,California,90201,34,Spain



> what were the customer names?
Patrick, Siddharth, Priyal, Alex, Alfred
> how many countries are present?
Spain, Italy, Germany, India, India
> which customers have the same country?
Siddharth, Alex, Priyal
> what is the average age of customers?
34
> which customer has the maximum age?
Alex


In [10]:
#Testing the model with more data
res2 = predict("""
ProductID  |	ProductName                                 |	SupplierID	  |  CategoryID  |	Unit                |	Price
1	         |  Chais	                                      |  1	          |  1	         |  10 boxes x 20 bags	|  18
2	         |  Chang	                                      |  1	          |  1           | 	24 - 12 oz bottles	|  19
3	         |  Aniseed Syrup 	                            |  1	          |  2	         |  12 - 550 ml bottles	|  10
4	         |  Chef Anton's Cajun Seasoning                |  2	          |  2	         |  48 - 6 oz jars	    |  22
5	         |  Chef Anton's Gumbo Mix	                    |  2	          |  2	         |  36 boxes	          |  21.35
6	         |  Grandma's Boysenberry Spread                |	 3	          |  2	         |  12 - 8 oz jars	    |  25
7	         |  Uncle Bob's Organic Dried Pears	            |  3	          |  7	         |  12 - 1 lb pkgs.	    |  30
8	         |  Northwoods Cranberry Sauce	                |  3	          |  2	         |  12 - 12 oz jars	    |  40
9	         |  Mishi Kobe Niku	                            |  4	          |  6	         |  18 - 500 g pkgs.	  |  97
10	       |  Ikura	                                      |  4	          |  8	         |  12 - 200 ml jars	  |  31
11	       |  Queso Cabrales	                            |  5	          |  4	         |  1 kg pkg.	          |  21
12	       |  Queso Manchego La Pastora                   |	 5	          |  4	         |  10 - 500 g pkgs.	  |  38
13	       |  Konbu	                                      |  6	          |  8	         |  2 kg box	          |  6
14	       |  Tofu	                                      |  6	          |  7	         |  40 - 100 g pkgs.	  |  23.25
15	       |  Genen Shouyu	                              |  6	          |  2	         |  24 - 250 ml bottles	|  15.5
16	       |  Pavlova	                                    |  7	          |  3	         |  32 - 500 g boxes	  |  17.45
17	       |  Alice Mutton	                              |  7	          |  6	         |  20 - 1 kg tins	    |  39
18	       |  Carnarvon Tigers                            |  7	          |  8	         |  16 kg pkg.	        |  62.5
19	       |  Teatime Chocolate Biscuits	                |  8	          |  3	         |  10 boxes x 12 pieces|  9.2
20	       |  Sir Rodney's Marmalade	                    |  8	          |  3	         |  30 gift boxes	      |  81

""", ["what are the product names?",
      "how many ProductIDs are present?",
      "how many units are present for the product Chais?",
      "what is the highest price?",
      "what is the lowest price of all products?"])

is_built_with_cuda: True
is_gpu_available: True
GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Training or predicting ...
Evaluation finished after training step 0.


ProductID,ProductName,SupplierID,CategoryID,Unit,Price
1,Chais,1,1,10 boxes x 20 bags,18.0
2,Chang,1,1,24 - 12 oz bottles,19.0
3,Aniseed Syrup,1,2,12 - 550 ml bottles,10.0
4,Chef Anton's Cajun Seasoning,2,2,48 - 6 oz jars,22.0
5,Chef Anton's Gumbo Mix,2,2,36 boxes,21.35
6,Grandma's Boysenberry Spread,3,2,12 - 8 oz jars,25.0
7,Uncle Bob's Organic Dried Pears,3,7,12 - 1 lb pkgs.,30.0
8,Northwoods Cranberry Sauce,3,2,12 - 12 oz jars,40.0
9,Mishi Kobe Niku,4,6,18 - 500 g pkgs.,97.0
10,Ikura,4,8,12 - 200 ml jars,31.0



> what are the product names?
Chais, Tofu, Konbu, Ikura, Northwoods Cranberry Sauce, Mishi Kobe Niku, Uncle Bob's Organic Dried Pears, Chef Anton's Cajun Seasoning, Carnarvon Tigers, Aniseed Syrup, Pavlova, Alice Mutton, Genen Shouyu, Queso Manchego La Pastora, Queso Cabrales, Grandma's Boysenberry Spread, Sir Rodney's Marmalade, Chef Anton's Gumbo Mix, Chang, Teatime Chocolate Biscuits
> how many ProductIDs are present?
19, 10, 1, 8, 13, 4, 18, 9, 16, 7, 12, 3, 17, 15, 6, 20, 11, 2, 14, 5
> how many units are present for the product Chais?
10 boxes x 20 bags
> what is the highest price?
18
> what is the lowest price of all products?
6
