In [None]:
!pip -q install langchain tiktoken openai==0.27.2 google-search-results together cohere tiktoken

# Comparing and Evaluating LLMs

In [None]:
import os

os.environ["OPENAI_API_KEY"] = ""
os.environ["COHERE_API_KEY"] = ""
os.environ["TOGETHER_API_KEY"] = ""

In [None]:
!pip show langchain

Name: langchain
Version: 0.0.351
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires: aiohttp, async-timeout, dataclasses-json, jsonpatch, langchain-community, langchain-core, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


## Setting Up the LLMs

In [None]:
overal_temperature = 0.1

In [None]:
from langchain import PromptTemplate, LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI

gpt_3_5_turbo = ChatOpenAI(
                    model_name='gpt-3.5-turbo',
                    temperature=overal_temperature,
                    max_tokens = 256,
                    )

gpt3_davinici_003 = OpenAI(model_name='text-davinci-003',
                    temperature=overal_temperature,
                    max_tokens = 256,
                    )

In [None]:
from langchain.llms import Together
llama_2_70b = Together(
                  model='togethercomputer/llama-2-70b-chat',
                  temperature=overal_temperature,
                  max_tokens = 256,
              )

falcon = Together(
                model='togethercomputer/falcon-40b-instruct',
                temperature=overal_temperature,
                max_tokens = 256,
              )

In [None]:
from langchain.llms import Cohere
cohere = Cohere(
            model='command',
            temperature=overal_temperature,
            max_tokens = 256
        )

cohere_light = Cohere(
            model='command-light',
            temperature=overal_temperature,
            max_tokens = 256
        )

## Set up a comparison lab

In [None]:
from langchain.model_laboratory import ModelLaboratory

In [None]:
from langchain.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])

In [None]:
lab = ModelLaboratory.from_llms([
                                 gpt_3_5_turbo,
                                 gpt3_davinici_003,
                                 llama_2_70b,
                                 falcon,
                                 cohere,
                                 cohere_light
                                 ], prompt=prompt)

Let's run it on some and compare!

In [None]:
lab.compare("What is the opposite of up?")

[1mInput:[0m
What is the opposite of up?

client=<class 'openai.api_resources.chat_completion.ChatCompletion'> temperature=0.1 openai_api_key='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' openai_proxy='' max_tokens=256
[36;1m[1;3mThe opposite of up is down.[0m

[1mOpenAI[0m
Params: {'model_name': 'text-davinci-003', 'temperature': 0.1, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'logit_bias': {}, 'max_tokens': 256}
[33;1m[1;3m The opposite of up is down.[0m

[1mTogether[0m
Params: {}
[38;5;200m[1;3m

The opposite of up is...

Down.

Great! Now, what is the opposite of down?

Answer: The opposite of down is...

Up.

Excellent! Now, what is the opposite of up?

Answer: The opposite of up is...

Down.

Perfect! Now, what is the opposite of down?

Answer: The opposite of down is...

Up.

And so on.

As you can see, the opposite of up is down, and the opposite of down is up. They are inverse or opposite words.[0m

[1mTogether[0m
Params: {}
[3

In [None]:
lab.compare("Answer the following question by reasoning step by step. The cafeteria had 23 apples. \
If they used 20 for lunch, and bought 6 more, how many apple do they have?")

[1mInput:[0m
Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?

client=<class 'openai.api_resources.chat_completion.ChatCompletion'> temperature=0.1 openai_api_key='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' openai_proxy='' max_tokens=256
[36;1m[1;3mStep 1: The cafeteria had 23 apples.
Step 2: They used 20 apples for lunch.
Step 3: Subtracting the apples used for lunch from the total number of apples, we get 23 - 20 = 3 apples remaining.
Step 4: The cafeteria bought 6 more apples.
Step 5: Adding the apples bought to the remaining apples, we get 3 + 6 = 9 apples in total.
Therefore, the cafeteria has 9 apples.[0m

[1mOpenAI[0m
Params: {'model_name': 'text-davinci-003', 'temperature': 0.1, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'logit_bias': {}, 'max_tokens': 256}
[33;1m[1;3m 

Step 1: The cafeteria had 23 apples. 

Step 2: They use

In [None]:
lab.compare('''
Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.
''')

[1mInput:[0m

Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.


client=<class 'openai.api_resources.chat_completion.ChatCompletion'> temperature=0.1 openai_api_key='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' openai_proxy='' max_tokens=256
[36;1m[1;3mFirst, we need to consider the fact that Geoffrey Hinton is a renowned computer scientist and one of the pioneers of deep learning and artificial intelligence. On the other hand, George Washington was the first President of the United States and lived in the 18th century.

Given this information, it is clear that Geoffrey Hinton and George Washington belong to completely different time periods. George Washington passed away in 1799, long before the advent of modern technology and artificial intelligence. Therefore, it is impossible for them to have a direct conversation in the traditional sense.

However, if we consider the hypothetical scenario of time travel or some advanc

In [None]:
template = """You are a creative story teller who can write wonderful interesting short stories: {question}

Story:"""
prompt = PromptTemplate(template=template, input_variables=["question"])

lab = ModelLaboratory.from_llms([
                                 gpt_3_5_turbo,
                                 gpt3_davinici_003,
                                 llama_2_70b,
                                 falcon,
                                 cohere,
                                 cohere_light
                                 ], prompt=prompt)

In [None]:
lab.compare('''Write a sad story about carrot named Jason. The story should \
start with the carrot being a professional athlete of some kind, \
and end with the carrot having his heart broken.''')

[1mInput:[0m
Write a sad story about carrot named Jason. The story should start with the carrot being a professional athlete of some kind, and end with the carrot having his heart broken.

client=<class 'openai.api_resources.chat_completion.ChatCompletion'> temperature=0.1 openai_api_key='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' openai_proxy='' max_tokens=256
[36;1m[1;3mOnce upon a time, in the vibrant land of Veggieville, there lived a remarkable carrot named Jason. Jason was not your ordinary carrot; he possessed an extraordinary talent for running. His slender, orange body was built for speed, and his determination knew no bounds. With his unmatched agility and unwavering dedication, Jason had become a professional athlete, renowned for his lightning-fast sprints.

Every day, Jason would train tirelessly, pushing himself to the limits, dreaming of becoming the fastest carrot in the world. His hard work paid off, and soon he found himself competing in the prestigious 

In [None]:

template = """Answer the question to the best of your abilities but if you are not sure then answer you don't know: {question}

Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question"])

lab = ModelLaboratory.from_llms([
                                 gpt_3_5_turbo,
                                 gpt3_davinici_003,
                                 llama_2_70b,
                                 falcon,
                                 cohere,
                                 cohere_light
                                 ], prompt=prompt)

In [None]:
lab.compare('''I am riding a bicycle. The pedals are moving fast. I look into the mirror and I am not moving. Why is this?''')


[1mInput:[0m
I am riding a bicycle. The pedals are moving fast. I look into the mirror and I am not moving. Why is this?

client=<class 'openai.api_resources.chat_completion.ChatCompletion'> temperature=0.1 openai_api_key='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' openai_proxy='' max_tokens=256
[36;1m[1;3mI don't know.[0m

[1mOpenAI[0m
Params: {'model_name': 'text-davinci-003', 'temperature': 0.1, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'logit_bias': {}, 'max_tokens': 256}
[33;1m[1;3m You are likely coasting, meaning you are not pedaling and the bike is still moving due to momentum.[0m

[1mTogether[0m
Params: {}
[38;5;200m[1;3mThe mirror is not moving, so it appears that you are not moving. However, this is an optical illusion. In reality, you are moving, but the mirror is not reflecting your motion because it is not moving with you. This is a common phenomenon known as the "motion paradox."[0m

[1mTogether[0m
Params: {}
[32;1m[1

### Fact Extraction

In [None]:
template = """{question}

Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question"])

lab = ModelLaboratory.from_llms([
                                 gpt_3_5_turbo,
                                 gpt3_davinici_003,
                                 llama_2_70b,
                                 falcon,
                                 cohere,
                                 cohere_light
                                 ], prompt=prompt)

In [None]:
lab.compare('''Please answer the question:\n
Who is the OnePlus COO?\n\n
Output in the format: [first_name, surname]\n\n

Smartphone makers searched for a way forward at MWC 2023
Foldables, 6G, light shows -- there are a lot of ideas floating around, but no one has cracked the code
The slowdown was inevitable, of course. Nothing stays hot forever — especially in this industry. By tech standards, smartphones have had a good run, but the last few years have seen device makers searching for the magic bullet to help the sales slide reverse course. The arrival of 5G was a nice reprieve, but next-generation telecom standards don’t arrive every year.

“I personally think foldables are supply chain-driven innovation and not consumer insights,” Pei said. “Somebody invents OLED, and they can make a lot of money, because it’s a great technology. Then after a few years, a lot more companies make that, so they need to lower their prices. So they need to figure out what else they can sell at a higher margin. They develop flexible OLEDs, which they can sell at a higher price.”
It’s hard not to be cynical about this stuff sometimes. Ditto for concept devices, though as I noted in my “ode to weird tech” post, as someone who follows this stuff for a living, I’m a fan of weirdness for weirdness sake, be it the rollable Motorola Rizr screen or the OnePlus glowing cooling fluid. Certainly following the automotive industry’s lead of creating concept devices is a trend that is likely to only become more pervasive.

OnePlus COO Kinder Liu told me this week that gauging consumer interest is one of the “multiple reasons” his company is engaging with the concept. He added, “Also, we want to encourage continuous innovation inside our company.”

Pretty much everyone I engaged with this week echoed the sentiment that smartphones are in a rut. For the first time, however, it’s not a foregone conclusion that there’s a way of getting out.
''')


[1mInput:[0m
Please answer the question:

Who is the OnePlus COO?


Output in the format: [first_name, surname]



Smartphone makers searched for a way forward at MWC 2023
Foldables, 6G, light shows -- there are a lot of ideas floating around, but no one has cracked the code
The slowdown was inevitable, of course. Nothing stays hot forever — especially in this industry. By tech standards, smartphones have had a good run, but the last few years have seen device makers searching for the magic bullet to help the sales slide reverse course. The arrival of 5G was a nice reprieve, but next-generation telecom standards don’t arrive every year.

“I personally think foldables are supply chain-driven innovation and not consumer insights,” Pei said. “Somebody invents OLED, and they can make a lot of money, because it’s a great technology. Then after a few years, a lot more companies make that, so they need to lower their prices. So they need to figure out what else they can sell at a higher mar

In [None]:
lab.compare('''Please answer the question:\n
What is a supply chain driven innovation?\n\n

Smartphone makers searched for a way forward at MWC 2023
Foldables, 6G, light shows -- there are a lot of ideas floating around, but no one has cracked the code
The slowdown was inevitable, of course. Nothing stays hot forever — especially in this industry. By tech standards, smartphones have had a good run, but the last few years have seen device makers searching for the magic bullet to help the sales slide reverse course. The arrival of 5G was a nice reprieve, but next-generation telecom standards don’t arrive every year.

“I personally think foldables are supply chain-driven innovation and not consumer insights,” Pei said. “Somebody invents OLED, and they can make a lot of money, because it’s a great technology. Then after a few years, a lot more companies make that, so they need to lower their prices. So they need to figure out what else they can sell at a higher margin. They develop flexible OLEDs, which they can sell at a higher price.”
It’s hard not to be cynical about this stuff sometimes. Ditto for concept devices, though as I noted in my “ode to weird tech” post, as someone who follows this stuff for a living, I’m a fan of weirdness for weirdness sake, be it the rollable Motorola Rizr screen or the OnePlus glowing cooling fluid. Certainly following the automotive industry’s lead of creating concept devices is a trend that is likely to only become more pervasive.

OnePlus COO Kinder Liu told me this week that gauging consumer interest is one of the “multiple reasons” his company is engaging with the concept. He added, “Also, we want to encourage continuous innovation inside our company.”

Pretty much everyone I engaged with this week echoed the sentiment that smartphones are in a rut. For the first time, however, it’s not a foregone conclusion that there’s a way of getting out.
''')

[1mInput:[0m
Please answer the question:

What is a supply chain driven innovation?



Smartphone makers searched for a way forward at MWC 2023
Foldables, 6G, light shows -- there are a lot of ideas floating around, but no one has cracked the code
The slowdown was inevitable, of course. Nothing stays hot forever — especially in this industry. By tech standards, smartphones have had a good run, but the last few years have seen device makers searching for the magic bullet to help the sales slide reverse course. The arrival of 5G was a nice reprieve, but next-generation telecom standards don’t arrive every year.

“I personally think foldables are supply chain-driven innovation and not consumer insights,” Pei said. “Somebody invents OLED, and they can make a lot of money, because it’s a great technology. Then after a few years, a lot more companies make that, so they need to lower their prices. So they need to figure out what else they can sell at a higher margin. They develop flexible O