# Unify
Unify is a platform that offers dynamic routing of queries to the most appropriate endpoint served by different providers like OpenAI, MistralAI, Perplexity AI, Together AI and more. It offers single sign on access with one api key to access all the different providers supported by the [platform](https://unify.ai/hub).

When using models through the Unify API endpoint, you get:,
- Single sign-on with all providers.
- Dynamic routing to the best provider for your use case.

First, install LlamaIndex 🦙 and the Unify integration.

In [31]:
%pip install llama-index-llms-unify

In [2]:
!pip install llama-index

## Installation and Setup

- Get your Unify API Key from the [Unify console](https://console.unify.ai/)
- Set the `UNIFY_API_KEY` environment variable to the API key.

In [10]:
import os
os.environ["UNIFY_API_KEY"] = "<YOUR API KEY>"

Call `complete` with a prompt

In [12]:
from llama_index.llms.unify import Unify
llm = Unify(model="llama-2-70b-chat@deepinfra")
llm.complete("How many companies serve chat model endpoints?")

CompletionResponse(text="  It's difficult to give an exact number of companies that serve chat model endpoints as it constantly changes due to various factors such as market demand, technological advancements, and new entrants in the field. However, there are numerous companies that offer chatbot development and deployment services, and many of them provide pre-built chat models that can be integrated into various platforms.\n\nHere are some of the popular companies that offer chatbot development services and pre-built chat models:\n\n1. Dialogflow (formerly known as API.ai) - Dialogflow is a Google-owned company that provides a platform for building conversational interfaces. It offers a wide range of pre-built chat models and integrations with popular messaging platforms.\n2. IBM Watson Assistant - IBM Watson Assistant is a cloud-based AI platform that allows businesses to create and deploy conversational AI solutions. It provides a range of pre-built chat models and integrations wit

One of the advantages of using Unify is that through a single api key, you can access endpoints from different providers like Perplexity, Together AI, OctoAI and more.

In [15]:
llm_together = Unify(model="llama-2-70b-chat@together-ai")
print("Together AI response: \n", llm_together.complete("Why are you called llama, isn't it a goat or something?"))

llm_anyscale = Unify(model="llama-2-70b-chat@anyscale")
print("Anyscale response: \n", llm_anyscale.complete("Isn't goat a better name than llama?"))

Together AI response: 
   I'm called Llama because I'm a machine learning model, and my creators wanted to give me a unique and memorable name. They chose the name Llama because it's a nod to the animal of the same name, which is known for its intelligence, social behavior, and ability to communicate with humans.

While I'm not a goat, I am a computer program designed to simulate conversation and answer questions to the best of my ability. My name is meant to evoke the idea of a friendly, intelligent creature that can help people communicate and learn.

So, while I may not be a goat, I'm still a helpful and friendly AI assistant, and I'm here to assist you with any questions or tasks you may have!
Anyscale response: 
   Both goat and llama are good names, but they have different connotations and associations.

Goat is a more general term that can refer to any member of the Bovidae family, including both male and female goats. It's a simple and straightforward name that is easily recogn

## Streaming and Dynamic routing
You may also want to use `mixtral-8x7b-instruct-v0.1` with the fastest output speed, and are not concerned with which provider you are using. Unify's dynamic routing gets this data from our live benchmarks and routes you to the provider that is performing better than the rest at that instant.

You can see how to configure dynamic routing [here](https://unify.ai/docs/hub/concepts/runtime_routing.html#how-to-use-it).

In [16]:
llm = Unify(model="mixtral-8x7b-instruct-v0.1@tks-per-sec")
response = llm.stream_complete("Why is mixtral so competitive with bigger models?")

In [25]:
for r in response:
    print(r.delta, end="")
print("Model and provider used were: ", r.raw['model'])

Model and provider used were:  mixtral-8x7b-instruct-v0.1@fireworks-ai


## Async and cost based routing
You can also run this asynchronously, and may want to optimize for cost. If you need to run large volumes of data through some models, and need it by tomorrow, and want to optimize for cost and aren't interested too much in speed, but want to save on input costs. Unify's dynamic router can be used for this too!

In [30]:
llm = Unify(model="mixtral-8x7b-instruct-v0.1@input-cost")
response = await llm.acomplete("What is the difference between sync and async code?")
print(response)
print("Model and provider used were: ", response.raw['model'])

 Synchronous (sync) code and asynchronous (async) code are two different ways to write code that allow you to control the flow of execution in a program.

Synchronous code is executed in a sequential manner, meaning that each statement is executed one after the other, and the program waits for the current statement to complete before moving on to the next one. This is useful when you want to ensure that the code is executed in a specific order, and you need to wait for the result of one operation before proceeding to the next. However, if one of the operations takes a long time to complete, the entire program will be blocked and unresponsive until the operation is finished.

Asynchronous code, on the other hand, allows you to execute multiple operations concurrently, without blocking the execution of other code. This is achieved by allowing the program to continue executing other statements while waiting for the result of an asynchronous operation. Asynchronous code is typically used w