# Analyze Service Latency and Some initial suggestions of improvement

In [1]:
import requests
import time

def analyze_latency():
    url = "http://0.0.0.0:8000/sentiment"
    data = {
        "tweet": "Recession hit Veronique Branquinho, she has to quit her company, such a shame!",
        "sentiment": "negative",
    }

    num_requests = 100  # Define the number of requests to average over
    total_time = 0

    for _ in range(num_requests):
        start_time = time.time()
        response = requests.post(url, json=data)
        end_time = time.time()
        total_time += (end_time - start_time)

        # Assert response status for correctness
        assert response.status_code == 200

    avg_latency = total_time / num_requests
    print(f"Average latency: {avg_latency:.4f} seconds per request")

analyze_latency()


Average latency: 1.3389 seconds per request


So the average latency is 1.3389 s/request

Potential Improvements:

- If possible (depending on conditions of the system), gather batch requests into a single request to reduce individual request times. In case of batch inference, GPU could even be leveraged to increase computation speed.
- Quantization of model, pruning or knowledge distillation if possible.
- Simplify preprocessing, if feasible
- Optimize container to garantuee it has enough memory to comport model.
- In a real scale project, horizontally scale service: spin up several containers so load can be distributed between them, reducing average latency.
- More alternatives that can be metioned during meeting
