# Local-LLM Tests and Examples

Simply choose your favorite model of choice from the models list and paste it into the `model` variable on the API calls. You can get a list of models below.

Install OpenAI and requests:

```bash
pip install openai requests
```

**Note, you do not need an OpenAI API Key, the API Key is your `LOCAL_LLM_API_KEY` for the server if you defined one in your `.env` file.**

## Global definitions and helpers


In [21]:
import openai
import requests
import time

# Set your LOCAL_LLM_SERVER and LOCAL_LLM_API_KEY here for using the notebook.
LOCAL_LLM_SERVER = "http://localhost:8091"
LOCAL_LLM_API_KEY = "Your LOCAL_LLM_API_KEY from your .env file"
DEFAULT_LLM = "zephyr-7b-beta"
SYSTEM_MESSAGE = "Act as a creative writer. All of your responses are transcribed to audio and sent to the user. Be concise with all responses. After the request is fulfilled, end with </s>."
DEFAULT_MAX_TOKENS = 64
DEFAULT_TEMPERATURE = 1.33
DEFAULT_TOP_P = 0.95


# ------------------- DO NOT EDIT BELOW THIS LINE IN THIS CELL ------------------- #
openai.base_url = f"{LOCAL_LLM_SERVER}/v1/"
openai.api_key = LOCAL_LLM_API_KEY if LOCAL_LLM_API_KEY else LOCAL_LLM_SERVER
HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"{LOCAL_LLM_API_KEY}",
}


def display_content(content):
    global LOCAL_LLM_SERVER
    outputs_url = f"{LOCAL_LLM_SERVER}/outputs/"
    try:
        from IPython.display import Audio, display, Image, Video
    except:
        print(content)
        return
    if "<audio controls>" in content or " " not in content:
        import base64
        from datetime import datetime

        try:
            audio_response = content.split("data:audio/wav;base64,")[1].split('" type')[
                0
            ]
        except:
            audio_response = content
        file_name = f"{datetime.now().strftime('%Y-%m-%d-%H-%M-%S')}.wav"
        with open(file_name, "wb") as fh:
            fh.write(base64.b64decode(audio_response))
        display(Audio(filename=file_name, autoplay=True))
    if outputs_url in content:
        file_name = content.split(outputs_url)[1].split('"')[0]
        url = f"{outputs_url}{file_name}"
        if url.endswith(".jpg") or url.endswith(".png"):
            content = content.replace(url, "")
            display(Image(url=url))
        elif url.endswith(".mp4"):
            content = content.replace(url, "")
            display(Video(url=url, autoplay=True))
        elif url.endswith(".wav"):
            content = content.replace(url, "")
            print(f"URL: {url}")
            display(Audio(url=url, autoplay=True))
    print(content)

## Language Models

Get a list of models to choose from if you don't already know what model you want to use.


In [11]:
# Wait for server to come up instead of timing out.
while True:
    try:
        models = requests.get(f"{LOCAL_LLM_SERVER}/v1/models", headers=HEADERS)
        if models.status_code == 200:
            break
    except:
        pass
    time.sleep(1)

print(models.json())

['bakllava-1-7b', 'llava-v1.5-7b', 'llava-v1.5-13b', 'yi-vl-6b', 'Tess-34B-v1.5b', 'Tess-34B-v1.5b', 'Tess-10.7B-v1.5b', 'Goliath-longLORA-120b-rope8-32k-fp16', 'Etheria-55b-v0.1', 'EstopianMaid-13B', 'Everyone-Coder-33B-Base', 'FusionNet_34Bx2_MoE', 'WestLake-7B-v2', 'WestSeverus-7B-DPO', 'DiscoLM_German_7b_v1', 'Garrulus', 'DareVox-7B', 'NexoNimbus-7B', 'Lelantos-Maid-DPO-7B', 'stable-code-3b', 'Dr_Samantha-7B', 'NeuralBeagle14-7B', 'tigerbot-13B-chat-v5', 'Nous-Hermes-2-Mixtral-8x7B-SFT', 'Thespis-13B-DPO-v0.7', 'Code-290k-13B', 'Nous-Hermes-2-Mixtral-8x7B-DPO', 'Venus-120b-v1.2', 'LLaMA2-13B-Estopia', 'medicine-LLM', 'finance-LLM-13B', 'Yi-34B-200K-DARE-megamerge-v8', 'phi-2-orange', 'laser-dolphin-mixtral-2x7b-dpo', 'bagel-dpo-8x7b-v0.2', 'Everyone-Coder-4x7b-Base', 'phi-2-electrical-engineering', 'Cosmosis-3x34B', 'HamSter-0.1', 'Helion-4x34B', 'Bagel-Hermes-2x34b', 'deepmoney-34b-200k-chat-evaluator', 'deepmoney-34b-200k-base', 'TowerInstruct-7B-v0.1', 'PiVoT-SUS-RP', 'Noromaid-

## Voices

Any `wav` file in the `voices` directory will be available to use as a voice.


In [12]:
voices = requests.get(f"{LOCAL_LLM_SERVER}/v1/audio/voices", headers=HEADERS)
print(voices.json())

{'voices': ['default', 'DukeNukem', 'Hal9000_Mono', 'Hal_voice_9000_Synthetic', 'SyntheticStarTrekComputerVoice', 'Synthetic_DukeNukem', 'Synthetic_Female_Hybrid_4_Phonetics_0001', 'Synthetic_Female_Phonetics_0001']}


## Embeddings

[OpenAI API Reference](https://platform.openai.com/docs/api-reference/embeddings)


In [13]:
# Modify this prompt to generate different outputs
prompt = "Tacos are great."

response = openai.embeddings.create(
    input=prompt,
    model=DEFAULT_LLM,
)
print(response.data[0].embedding)

[0.5429642796516418, -8.369122505187988, -5.294323921203613, 5.748764514923096, -0.01756887137889862, -3.200688600540161, 4.215512275695801, -3.6155927181243896, -4.893770694732666, -1.3151473999023438, 0.04899054393172264, 0.8963842988014221, 0.46536651253700256, 7.564478397369385, -11.398655891418457, 1.4146454334259033, 2.083888530731201, -4.350066661834717, 1.4608023166656494, -1.6663378477096558, -1.3684545755386353, 0.7196666598320007, -1.9787977933883667, 1.4073946475982666, -2.8631153106689453, -3.1257877349853516, -0.04228734225034714, -0.47600990533828735, -7.521921157836914, 0.6401062607765198, 7.916580677032471, -0.9487177133560181, 1.754802942276001, 1.6382769346237183, -0.5631095767021179, -4.480162143707275, -2.0241525173187256, 0.4007587432861328, -0.05319162458181381, -0.7118483185768127, 3.2603232860565186, -9.447614669799805, 8.983275413513184, 0.05082497373223305, 6.135091781616211, -1.5933201313018799, -4.028542518615723, -3.379406452178955, 2.9893321990966797, -1.

## Chat Completion

[OpenAI API Reference](https://platform.openai.com/docs/api-reference/chat)


In [25]:
# Modify this prompt to generate different outputs
prompt = "Write a haiku about Taco Bell's Doritos Locos Tacos."


response = openai.chat.completions.create(
    model=DEFAULT_LLM,
    messages=[{"role": "user", "content": prompt}],
    temperature=DEFAULT_TEMPERATURE,
    max_tokens=DEFAULT_MAX_TOKENS,
    top_p=DEFAULT_TOP_P,
    stream=False,
    extra_body={"system_message": SYSTEM_MESSAGE},
)
display_content(response.messages[1]["content"])

Crunchy orb in hand,

  Spicy cheese explosion,

  Tasty Taco joy!


## Completion

[OpenAI API Reference](https://platform.openai.com/docs/api-reference/completions/create)


In [24]:
# Modify this prompt to generate different outputs
prompt = "Write a haiku about Taco Bell's Doritos Locos Tacos."

completion = openai.completions.create(
    model=DEFAULT_LLM,
    prompt=prompt,
    temperature=DEFAULT_TEMPERATURE,
    max_tokens=DEFAULT_MAX_TOKENS,
    top_p=DEFAULT_TOP_P,
    n=1,
    stream=False,
    extra_body={"system_message": SYSTEM_MESSAGE},
)
display_content(completion.choices[0].text)

Crunchy fusion blend,
   Tortilla turned chip, now relish,
   DLT delicacy.


## Cloning Text to Speech

Any `wav` file in the `voices` directory can be used as a voice.


In [26]:
prompt = "Write a haiku about Taco Bell's Doritos Locos Tacos."
response = requests.post(
    f"{LOCAL_LLM_SERVER}/v1/audio/generation",
    headers=HEADERS,
    json={
        "text": prompt,
        "voice": "DukeNukem",
        "language": "en",
    },
)
audio_response = response.json()
display_content(audio_response["data"])

UklGRkaiAwBXQVZFZm10IBAAAAABAAEAwF0AAIC7AAACABAATElTVBoAAABJTkZPSVNGVA4AAABMYXZmNTguNzYuMTAwAGRhdGEAogMAGAANABkAJwAoACwAPABEAD8ARABGAEYAQgBFAEkARABIAE8ATQBMAEQARwBCAD8AQwBAAD0APQBBAD4AOgA7ADcAMwAwADUANAA6ADQAMAApACIAIAAfABwAGwAhABoAFwAdACcAIgAlAB8AJAAeAB4AHgAdACMAIAAtACcAJwAhACcALgArADAANQAxAC8AMwAzADEAPgA+AC0ALQA+ADgANAArAC4AJgA0ADMANwA6ADUAOQA9AD8ANgA0ADcAPQBBAEYARgA+AEcATQBCADwAQABLAEMATgBUAFEAYQBiAFMAVABLAEkAQQA5ADoANgBMAE0ARwA3ACwALwAjACkAKgAnACwAJQAlAB8AMAAwAC8AMAAlACcAHAAnACQAHgAgABUAHAAhACwAKwArACMAIAApACAAJgAuACMALQAuACMAHwAyAC4AIwAlACUAKwAzADQALwApADUAMwAuACYAHwAtABcAGgAiABoAHgAiACEAEwAUAB0AFQASABAAHwAWABUAFQAZABgAEgAZABQADwAZABAAFgAfAA0AHQAYABgAFwAlABIAHAAQAAoAEwAJAAMA/f8CAPb/BwD+/w8ABAD+//z/+//y/+n/3v/U/8//wv/H/77/p/+f/5T/hv+K/3n/c/9m/2L/aP91/3n/jP+Q/5T/kv+k/73/x//d/+//AwAGACQAIABQAFAAagCEAIYAnwDNALMApQCeAFAAUgCJANcALAGVAeABJQJlAnsCewItAtgBSgGrAO//Kf9b/pT9vPzv+yf7dvrl+XD5E/nz+AD5NvmK+Qn6tPpo+xL8xfx0/R3+u/5X/9z/aADhAH8B6wF6Av4CfQP6A1oE7gRoBfwFXAYSB34H9AdECKUItgi3CH4ISgjWBzkH

## Text to Speech


In [27]:
# We will use the audio response from the previous cell to transcribe it.
transcription = requests.post(
    f"{LOCAL_LLM_SERVER}/v1/audio/transcriptions",
    json={
        "file": audio_response["data"],
        "audio_format": "wav",
        "model": "base.en",
    },
    headers=HEADERS,
)


print(transcription.json())

{'data': " Write a haiku about Taco Bell's Doritos. Locos."}


## Voice Completion Example


In [28]:
# We will use the audio response from a couple of cells back.
completion = openai.completions.create(
    model=DEFAULT_LLM,
    prompt=audio_response["data"],
    temperature=DEFAULT_TEMPERATURE,
    max_tokens=DEFAULT_MAX_TOKENS,
    top_p=DEFAULT_TOP_P,
    n=1,
    stream=False,
    extra_body={
        "system_message": SYSTEM_MESSAGE,
        "audio_format": "wav",
        "voice": "DukeNukem",
    },
)

response_text = completion.choices[0].text
display_content(response_text)

URL: http://localhost:8091/outputs/28ce3345c4fb4eb3979680d11204fbc2.wav


Crunchy shell,
   Spicy filling gleams bright, 
   Taco joy, Doritos.

