# Use models with faster time between tokens

Whilst common knowledge for most GenAI practitioners, different models may have significantly time-between-token speeds. That is, the time taken to generate each token, which is essentially the speed at which the model can generate the output. For example, GPT-4 is considerably slower than GPT-3.5, but has significantly more powerful reasoning capabilities.

The appropriate model should be chosen for the use case and the level of reasoning required (or other requirements), in order to best optimise the latency of the application.

#### Load Helper Functions and Import Libraries

In [2]:
import datetime
import json
import time
import os
import datetime
import json
import time
from openai import AzureOpenAI
from dotenv import load_dotenv
import json
import copy
import textwrap

# Load environment variables
load_dotenv()

def aoai_call(system_message,prompt,model):
    client = AzureOpenAI(
        api_version=os.getenv("API_VERSION"),
        azure_endpoint=os.getenv("AZURE_ENDPOINT"),
        api_key=os.getenv("API_KEY")
    )

    start_time = time.time()

    completion = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": prompt},
        ],
    )

    end_time = time.time()
    e2e_time = end_time - start_time

    result=json.loads(completion.model_dump_json(indent=2))
    prompt_tokens=result["usage"]["prompt_tokens"]
    completion_tokens=result["usage"]["completion_tokens"]
    completion_text=result["choices"][0]["message"]["content"]

    return result,prompt_tokens,completion_tokens,completion_text,e2e_time


## Use case: Checking for spelling errors and grammar

The use case is largely irrelevant to demonstrating the difference in speed between LLM models, however it is important that the smaller parameter model has the ability to actually achieve the use case.

In [5]:
documents_to_check="""

1. Pigeons’ Backflips Linked to Genetics Scientists have unraveled the genetic basis behind a fascinating avian behavior: some pigeons perform backward somersaults mid-flight. Dr. Avani Patel, a researcher at the Avian Genetics Institute, identified specific genes associated with this acrobatic feat in parlor roller pigeons. These findings shed light on the evolution of pigeon behavior and could have implications for understanding complex traits in other animals1.

2. Desert Ants’ Brain Adaptations to Magnetic Fields Researchers from the Desert Ecology Lab have revealed how magnetic fields shape the brains of desert ants. By studying the neural patterns in these tiny navigators, they discovered that the ants’ brains undergo structural changes in response to Earth’s magnetic field. This adaptation enhances their ability to find their way back to the nest, even in the harsh desert environment. The study provides fresh insights into the remarkable adaptations of these resilient insects1.

"""

### A: GPT-4 model response time

**Time taken: 17 seconds**

GPT-4 has a slower generation speed than GPT-3.5. However it has significantly more powerful reasoning capabilities.

In [7]:
model=os.getenv("MODELGPT432k")

system_message="""
You help spell check documents. Rewrite the entire document word for word, correcting any spelling or grammatical errors.
"""
prompt=f"""
Documents to check and rewrite:
{documents_to_check}
"""

result,prompt_tokens,completion_tokens,completion_text,e2e_time=aoai_call(system_message,prompt,model)
print(f"Prompt Tokens: {prompt_tokens}")
print(f"Completion Tokens: {completion_tokens}")
print(f"Time taken: {e2e_time:.2f} seconds")
print(completion_text)


Prompt Tokens: 229
Completion Tokens: 189
Total cost: $0.0364
Time taken: 17.29 seconds


### B: GPT-3.5 model response time

**Time taken: 5 seconds**

For simple tasks like checking for spelling errors, the faster GPT-3.5 model has sufficient reasoning capability, and is able to perform the task much more quickly.

In [9]:
model=os.getenv("MODEL35")

system_message="""
You help spell check documents. Rewrite the entire document word for word, correcting any spelling or grammatical errors.
"""
prompt=f"""
Documents to check and rewrite:
{documents_to_check}
"""

result,prompt_tokens,completion_tokens,completion_text,e2e_time=aoai_call(system_message,prompt,model)
print(f"Prompt Tokens: {prompt_tokens}")
print(f"Completion Tokens: {completion_tokens}")
print(f"Time taken: {e2e_time:.2f} seconds")
print(completion_text)


Prompt Tokens: 229
Completion Tokens: 189
Total cost: $0.0364
Time taken: 4.68 seconds
1. Pigeons' Backflips Linked to Genetics: Scientists have unraveled the genetic basis behind a fascinating avian behavior: some pigeons perform backward somersaults mid-flight. Dr. Avani Patel, a researcher at the Avian Genetics Institute, identified specific genes associated with this acrobatic feat in parlor roller pigeons. These findings shed light on the evolution of pigeon behavior and could have implications for understanding complex traits in other animals.

2. Desert Ants' Brain Adaptations to Magnetic Fields: Researchers from the Desert Ecology Lab have revealed how magnetic fields shape the brains of desert ants. By studying the neural patterns in these tiny navigators, they discovered that the ants' brains undergo structural changes in response to the Earth's magnetic field. This adaptation enhances their ability to find their way back to the nest, even in the harsh desert environment. The