# LLM

## Compressing Prompts With No Loss with `llmlingua`

Here is how to reduce the costs of working with LLMs.

When working with LLMs, we often encountered problems like exceeding token limits, forgetting context, or paying much more for usage than expected.

Researchers from Microsoft try to solve these problems with `llmlingua`.

`llmlingua` compresses your prompt by taking a trained small LLM to detect unimportant tokens.

They claim to achieve up to 20x compression with no or minimal performance loss.

I tried it out by myself and I noticed no performance loss at all, but I would be cautious for critical applications.

In [None]:
!pip install llmlingua

In [None]:
# !pip install llmlingua

from llmlingua import PromptCompressor

prompt = "<YOUR_PROMPT>"
llm_lingua = PromptCompressor("lgaalves/gpt2-dolly",)

compressed_prompt = llm_lingua.compress_prompt(prompt, instruction="", question="", target_token=200)

# {'compressed_prompt': 'are- that turns into formatting & with like "[]" best it.......'
# 'origin_tokens': 2430,
# 'compressed_tokens': 261,
# 'ratio': '9.3x',
# 'saving': 'Saving $0.1 in GPT-4.}'

## One-Function Call to Any LLM with `litellm`

Do you want a One-Function call to any LLM in Python?

Try `litellm`.

`litellm` is a Python package to call any LLM in a consistent format and to return a consistent output.

You only need to set the API key of the provider and the model name.

It also supports async calls and streaming the models response.

In [None]:
!pip install litellm

In [None]:
import os
from litellm import completion

os.environ["OPENAI_API_KEY"] = "your-api-key" 
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
os.environ['MISTRAL_API_KEY'] = "your-api-key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# OpenAI
response = completion(model="gpt-3.5-turbo", messages=messages)

# Anthropic
response = completion(model="claude-instant-1", messages=messages)

# Anthropic
response = completion(model="mistral/mistral-tiny", messages=messages)