## Exporting a BERT/SQuAD model to CoreML
To use a Fireball model in an iOS application, we can use [exportToCoreMl](https://interdigitalinc.github.io/Fireball/html/source/model.html#fireball.model.Model.exportToCoreMl) method. This notebook shows how to use this function to create a CoreML model ready to be deployed in an iOS app. It assumes that a trained BERT/SQuAD model already exists in the ```Models``` directory. Please refer to the notebook [Question Answering (BERT/SQuAD)](BertSquad.ipynb) for more info about training and using a BERT/SQuAD model.

Fireball can also export models with reduced number of parameters, pruned models, and quatized models. Please refer to the following notebooks for more information:

- [Reducing number of parameters of BERT/SQuAD Model](BertSquad-Reduce.ipynb)
- [Pruning BERT/SQuAD Model](BertSquad-Prune.ipynb)
- [Quantizing BERT/SQuAD Model](BertSquad-Quantize.ipynb)

Note: Fireball uses the [coremltools](https://github.com/apple/coremltools) python package to export CoreML models. 

## Load a pretrained model

In [1]:
from fireball import Model

# orgFileName = "Models/BertSquad.fbm"        # Original model
# orgFileName = "Models/BertSquadQR.fbm"      # Quantized - Retrained
# orgFileName = "Models/BertSquadPR.fbm"      # Pruned - Retrained
# orgFileName = "Models/BertSquadPRQR.fbm"    # Pruned - Retrained - Quantized - Retrained
# orgFileName = "Models/BertSquadRR.fbm"      # Reduced - Retrained
# orgFileName = "Models/BertSquadRRQR.fbm"    # Reduced - Retrained - Quantized - Retrained
# orgFileName = "Models/BertSquadRRPR.fbm"    # Reduced - Retrained - Pruned - Retrained
orgFileName = "Models/BertSquadRRPRQR.fbm"  # Reduced - Retrained - Pruned - Retrained - Quantized - Retrained


model = Model.makeFromFile(orgFileName, gpus='0')
model.printLayersInfo()
model.initSession()


Reading from "Models/BertSquadRRPRQR.fbm" ... Done.
Creating the fireball model "Bert-SQuAD" ... Done.

Scope            InShape       Comments                 OutShape      Activ.   Post Act.        # of Params
---------------  ------------  -----------------------  ------------  -------  ---------------  -----------
IN_EMB           ≤512 2        LR512                    ≤512 768      None                      4,743,735  
S1_L1_LN         ≤512 768                               ≤512 768      None     DO:0.1           1,536      
S2_L1_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      2,839,069  
S2_L2_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      2,898,847  
S2_L3_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      2,909,099  
S2_L4_BERT       ≤512 768      768/3072, 12 heads       ≤512 768      GELU                      3,036,692  
S2_L5_BERT       ≤512 768      

## Exporting to CoreML
Our Fireball BERT/SQuAD model can handle combined sequence lengths of up to 512. Here we choose 384 as the max sequence length for the exported model.

In [2]:
cmlFileName = orgFileName.replace('.fbm', '.mlmodel')

seqLen = 384
doc = "This is the question answering model based on BERTbase and fine-tuned on SQuAD dataset. "\
      "The inputs are two lists of token IDs and token types based on word-piece vocabulary embedding "\
      "scheme. The token IDs list must start with a [CLS] and end with an [SEP] code. The question tokens and "\
      "context tokens must also be separated by another [SEP] code. The token types input must have 0's for "\
      "question tokens and 1's for context tokens. Both lists must be 0-padded to the length %d."%(seqLen)
model.exportToCoreMl(cmlFileName, modelDesc=doc, maxSeqLen=seqLen)




Exporting to CoreML model "Models/BertSquadRRPRQR.mlmodel" ... 
    Exported all 16 layers.                               
    Saving to "Models/BertSquadRRPRQR.mlmodel" ... Done.
Done (38.54 Sec.)


## Using netron to visualize the exported model
We can now visualize the model's network structure using the [netron](https://github.com/lutzroeder/netron) package.

In [3]:
import netron
import platform

if platform.system() == 'Darwin':      # Running on MAC
    netron.start(cmlFileName)   
else:
    import socket
    hostIp = socket.gethostbyname(socket.gethostname())
    netron.start(cmlFileName, address=(hostIp,8084))

Serving 'Models/BertSquadRRPRQR.mlmodel' at http://10.21.16.50:8084


## Running inference on the exported model
To verify the exported model, we can now run inference on it. Currently the CoreML runtime is only available on Mac. You can run all previous cells on a GPU-based machine and then copy the exported CoreML file to a Mac and test the model using the code in the following cell.

Here we have a "context" which is a paragraph about InterDigital copied from Wikipedia and 3 different questions related to the context. We use our exported CoreML model to answer the questions.

**Note:** We could use the "Tokenizer" included in Fireball. But to show the independence of the following code from Fireball, we are using Google's original tokenizer from here.


In [2]:
# assert platform.system() == 'Darwin', "This is only supported when running on Mac!"
context = r"""
InterDigital is a technology research and development company that provides wireless and video technologies for 
mobile devices, networks, and services worldwide. Founded in 1972, InterDigital is listed on NASDAQ and is 
included in the S&P SmallCap 600. InterDigital had 2020 revenue of $359 million and a portfolio of about 
32,000 U.S. and foreign issued patents and patent applications.
"""

print(context)
questions = [
    "When was InterDigital established?",
    "How much was InterDigital's revenue in 2020?",
    "What does InterDigital provide?",
]

import numpy as np

import tokenization
tokenizer = tokenization.FullTokenizer("/Users/shahab/data/SQuAD/vocab.txt")

import coremltools
cmlFileName = "Models/BertSquadRRPRQR.mlmodel"
seqLen = 384
coreMlModel = coremltools.models.MLModel(cmlFileName)

contextTokens = tokenizer.tokenize(context)
for i, question in enumerate(questions):
    questionTokens = tokenizer.tokenize(question)
    allTokens = ["[CLS]"] + questionTokens + ["[SEP]"] + contextTokens + ["[SEP]"]
    numPad = (seqLen - len(allTokens)) 
    allTokens += numPad*["[PAD]"]
    tokIds = tokenizer.convert_tokens_to_ids(allTokens)
    tokTypes = [0]*(len(questionTokens)+2) + [1]*(len(contextTokens)+1) + numPad*[0]

    outputDic = coreMlModel.predict({ 'TokIds': np.int32(tokIds), 'TokTypes': np.int32(tokTypes) })
    startTok = np.argmax(outputDic['StartLogits']) - len(questionTokens) - 2
    endTok = np.argmax(outputDic['EndLogits']) - len(questionTokens) - 2
    answer = ' '.join(contextTokens[int(startTok):int(endTok+1)])
    print("\nQ%d: %s\n    %s"%(i+1, question, answer))



InterDigital is a technology research and development company that provides wireless and video technologies for 
mobile devices, networks, and services worldwide. Founded in 1972, InterDigital is listed on NASDAQ and is 
included in the S&P SmallCap 600. InterDigital had 2020 revenue of $359 million and a portfolio of about 
32,000 U.S. and foreign issued patents and patent applications.


Q1: When was InterDigital established?
    1972

Q2: How much was InterDigital's revenue in 2020?
    $ 35 ##9 million

Q3: What does InterDigital provide?
    wireless and video technologies


## Where do I go from here?

[Exporting BERT/SQuAD Model to ONNX](BertSquad-ONNX.ipynb)

[Exporting BERT/SQuAD Model to TensorFlow](BertSquad-TF.ipynb)

---

[Fireball Playgrounds](../Contents.ipynb)

[Question Answering (BERT/SQuAD)](BertSquad.ipynb)

[Reducing number of parameters of BERT/SQuAD Model](BertSquad-Reduce.ipynb)

[Pruning BERT/SQuAD Model](BertSquad-Prune.ipynb)

[Quantizing BERT/SQuAD Model](BertSquad-Quantize.ipynb)
