## Distilbart-cnn-12-6 Deployment

### Model Description
DistilBART-CNN-12-6 is a variant of the BART (Bidirectional and Auto-Regressive Transformers) model, specifically distilled for summarization tasks.

It was designed to be more efficient while retaining much of the performance of its larger counterpart, BART. Distillation refers to a process where a smaller model is trained to mimic the behavior of a larger, more complex model. 

This particular model is fine-tuned for summarization tasks, particularly on the CNN/Daily Mail dataset, which is commonly used for news article summarization. 

It generates summaries by processing long articles and reducing them to more concise, readable versions while preserving key information.

I followed https://huggingface.co/docs/transformers/model_doc/bart#transformers.BartForConditionalGeneration as a reference

---

### Model Interaction
The model expects a string-based input, which typically consists of a news article, a long piece of text, or any other content that needs summarizing. 

The input is max 1024 tokens and will be truncated internally if exceed.

**Example Input**:

"The stock market saw a major downturn today, with the Dow Jones Industrial Average dropping 500 points. Analysts attribute this decline to rising inflation concerns and geopolitical tensions, which have caused investors to become more cautious. Many businesses are now adjusting their forecasts for the upcoming quarters as a result of 
these factors."

It will then return the text in a summarized form

**Example Output**:

"The stock market dropped 500 points due to rising inflation and geopolitic

---

### Current Deployment

For this current deployment, it has been made to only accept 1 input and return 1 output per request

Used the vllm method of deployment to deploy

In [1]:
import requests

In [69]:
text = "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."

In [70]:
triton_url = "http://triton-route-triton-inference-services.apps.nebula.sl/v2/models/distilbart-cnn-12-6/infer"

payload = {
    "inputs": [
        {
            "name": "INPUT",
            "shape": [1], 
            "datatype": "BYTES",  # Make sure the datatype matches the input configuration
            "data": [   
                text
            ]
        }
    ],
    "outputs": [
        {
            "name": "OUTPUT"
        }
    ]
}

# Step 4: Send the POST request to Triton
headers = {"Content-Type": "application/json"}
response = requests.post(triton_url, json=payload, headers=headers)

# Step 5: Handle the response
if response.status_code == 200:
    response_data = response.json()

else:
    print(f"Error with Triton request. Status code: {response.status_code}")
    print(f"Error message: {response.text}")


In [71]:
response_data['outputs'][0]

{'name': 'OUTPUT',
 'datatype': 'BYTES',
 'shape': [1],
 'data': [' The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building . It was the first structure to reach a height of 300 metres . It is now taller than the Chrysler Building in New York City by 5.2 metres (17 ft) Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France .']}

In [72]:
len(response_data['outputs'][0]['data'][0])

333