## All-distilroberta-v1 Deployment

### Model Description
The RoBERTa-based sentence-transformers model is an advanced NLP model that focuses on mapping sentences and paragraphs into a dense vector space, specifically a 768-dimensional space. 

This allows the model to capture rich semantic information about the text, making it well-suited for tasks such as semantic search, clustering, and other applications that require an understanding of the meaning or context of text.

I followed this on the model card https://huggingface.co/sentence-transformers/all-distilroberta-v1 as a  reference

---

### Model Interaction
This model is meant to be a sentence level embedder, meaning 1 vector embedding per chunk of text. It expects a single chunk of text and will return a vector with the shape of [1, 768] 

The input is max 512 tokens and will be truncated internally if exceed.

**Example Input**:

"The stock market saw a major downturn today, with the Dow Jones Industrial Average dropping 500 points. Analysts attribute this decline to rising inflation concerns and geopolitical tensions, which have caused investors to become more cautious. Many businesses are now adjusting their forecasts for the upcoming quarters as a result of 
these factors."

**Example Output**:

[0.111, 0.3121, ... , 0.221231]

---

### Current Deployment

For this current deployment, it has been made to only accept 1 input text chunk and return 1 vector of size 768

Used the vllm method of deployment to deploy

In [1]:
import requests

  from .autonotebook import tqdm as notebook_tqdm


In [24]:
triton_url = "http://triton-route-triton-inference-services.apps.nebula.sl/v2/models/all-distilroberta-v1/infer"

payload = {
    "inputs": [
        {
            "name": "INPUT",
            "shape": [1], 
            "datatype": "BYTES",  # Make sure the datatype matches the input configuration
            "data": [   
                "This is an example sentence"
            ]
        }
    ],
    "outputs": [
        {
            "name": "OUTPUT"
        }
    ]
}

# Step 4: Send the POST request to Triton
headers = {"Content-Type": "application/json"}
response = requests.post(triton_url, json=payload, headers=headers)

# Step 5: Handle the response
if response.status_code == 200:
    response_data = response.json()

else:
    print(f"Error with Triton request. Status code: {response.status_code}")
    print(f"Error message: {response.text}")


In [25]:
len(response_data['outputs'][0]['data'])

768

In [26]:
response_data

{'model_name': 'all-distilroberta-v1',
 'model_version': '1',
 'outputs': [{'name': 'OUTPUT',
   'datatype': 'FP32',
   'shape': [1, 768],
   'data': [-0.033754393458366394,
    -0.06318267434835434,
    -0.03165888041257858,
    -0.04057187959551811,
    0.025460511445999146,
    -0.01036836113780737,
    -0.029329312965273857,
    0.04067103564739227,
    -0.012247906997799873,
    0.009519931860268116,
    0.017907869070768356,
    -0.008985818363726139,
    0.0022824429906904697,
    -0.03642483800649643,
    -0.011522174812853336,
    -0.04585650563240051,
    -0.00043001980520784855,
    -0.01669340766966343,
    -0.012273237109184265,
    0.035609565675258636,
    -0.0018178644822910428,
    0.03461875393986702,
    0.01918163150548935,
    -0.08003159612417221,
    0.01996784284710884,
    0.030607985332608223,
    0.06710664927959442,
    -0.0007088454440236092,
    -0.026898818090558052,
    0.04074525833129883,
    0.02836795523762703,
    0.07364766299724579,
    -0.0357776