## Bert-large-cased-finetuned-conll03-english Deployment

### Model Description
The dbmdz/bert-large-cased-finetuned-conll03-english model is a variant of BERT (Bidirectional Encoder Representations from Transformers) specifically fine-tuned on the CONLL-03 dataset for Named Entity Recognition (NER) tasks. 

It is built using the large cased version of BERT, which means the model is case-sensitive and can distinguish between words with different capitalizations. However BERT was originally for text masking, hence we cannot follow their tutorial for set up https://huggingface.co/google-bert/bert-large-cased, which uses 'fill-mask'. Instead we need to use 'token-classification'

---

### Model Interaction
This model is meant to be a entitiy recognizer, meaning it is able to identify important tokens in a text. It will then return a list of these words, their scores as well as what category of entity they are e.g person, place and etc

**Example Input**:

"My name is Sarah and I live in London"

**Example Output**:

[{'entity': 'I-PER',
  'score': 0.9986294507980347,
  'index': 4,
  'word': 'Sarah',
  'start': 11,
  'end': 16},
 {'entity': 'I-LOC',
  'score': 0.9990901947021484,
  'index': 9,
  'word': 'London',
  'start': 31,
  'end': 37},]

**Per is person, Loc is location**

---

### Current Deployment

For this current deployment, it will accept 1 string per request, but since triton does not natively support the return of objects, I have made it convert the object into a string instead before returning. The client can then run json.loads to turn it back into an object

Used the vllm method of deployment to deploy

In [None]:
import requests
import json

In [48]:
triton_url = "http://triton-route-triton-inference-services.apps.nebula.sl/v2/models/bert-large-cased/infer"

payload = {
    "inputs": [
        {
            "name": "INPUT",
            "shape": [1], 
            "datatype": "BYTES",  # Make sure the datatype matches the input configuration
            "data": [   
                "My name is Sarah and I live in London, while Steve lives in Greece."
            ]
        }
    ],
    "outputs": [
        {
            "name": "OUTPUT"
        }
    ]
}

# Step 4: Send the POST request to Triton
headers = {"Content-Type": "application/json"}
response = requests.post(triton_url, json=payload, headers=headers)

# Step 5: Handle the response
if response.status_code == 200:
    response_data = response.json()

else:
    print(f"Error with Triton request. Status code: {response.status_code}")
    print(f"Error message: {response.text}")


In [49]:
json_object = json.loads(response_data['outputs'][0]['data'][0])
json_object

[{'entity': 'I-PER',
  'score': 0.9986294507980347,
  'index': 4,
  'word': 'Sarah',
  'start': 11,
  'end': 16},
 {'entity': 'I-LOC',
  'score': 0.9990901947021484,
  'index': 9,
  'word': 'London',
  'start': 31,
  'end': 37},
 {'entity': 'I-PER',
  'score': 0.9976377487182617,
  'index': 12,
  'word': 'Steve',
  'start': 45,
  'end': 50},
 {'entity': 'I-LOC',
  'score': 0.9997974038124084,
  'index': 15,
  'word': 'Greece',
  'start': 60,
  'end': 66}]