<a href="https://colab.research.google.com/github/bobhuff0/langchain-tutorials/blob/main/YT_Langchain_Evaluating_and_Comparing_LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [21]:
!pip -q install langchain huggingface_hub openai==0.27.2 google-search-results tiktoken cohere

# Comparing and Evaluating LLMs

In [38]:
import os

os.environ["OPENAI_API_KEY"] = "OPENAI_API_KEY"
os.environ["COHERE_API_KEY"] = "COHERE_API_KEY"
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "HUGGINGFACEHUB_API_TOKEN"

In [39]:
!pip show langchain

Name: langchain
Version: 0.0.181
Summary: Building applications with LLMs through composability
Home-page: https://www.github.com/hwchase17/langchain
Author: 
Author-email: 
License: MIT
Location: /Users/robertjhuff/anaconda3/envs/py311AI/lib/python3.11/site-packages
Requires: aiohttp, dataclasses-json, numexpr, numpy, openapi-schema-pydantic, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: 


## Setting Up the LLMs

In [40]:
overal_temperature = 0.1

#### Setting up Flan models


In [41]:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain


flan_20B = HuggingFaceHub(repo_id="google/flan-ul2", 
                         model_kwargs={"temperature":overal_temperature, 
                                       "max_new_tokens":200}
                         ) 

In [42]:
flan_t5xxl = HuggingFaceHub(repo_id="google/flan-t5-xxl", 
                         model_kwargs={"temperature":overal_temperature, 
                                       "max_new_tokens":200}
                         ) 

In [43]:
# unfortunately not working
# GPTNeoXT_20B = HuggingFaceHub(repo_id="togethercomputer/GPT-NeoXT-Chat-Base-20B", 
#                          model_kwargs={"temperature":overal_temperature, 
#                                        "max_new_tokens":200}
#                          ) bigscience/bloom-7b1

In [44]:
# unfortunately not working
# bloom7B = HuggingFaceHub(repo_id="bigscience/bloom-7b1", 
#                          model_kwargs={"temperature":overal_temperature, 
#                                        "max_new_tokens":200}
#                          ) 

gpt_j6B = HuggingFaceHub(repo_id="EleutherAI/gpt-j-6B", 
                         model_kwargs={"temperature":overal_temperature, 
                                       "max_new_tokens":100}
                         )

#### Setting up OpenAI models

In [45]:
from langchain.llms import OpenAI, OpenAIChat

chatGPT_turbo = OpenAIChat(model_name='gpt-3.5-turbo', 
             temperature=overal_temperature, 
             max_tokens = 256,
             )

gpt3_davinici_003 = OpenAI(model_name='text-davinci-003', 
             temperature=overal_temperature, 
             max_tokens = 256,
             )

#### Setting up Cohere models

In [46]:
from langchain.llms import Cohere

In [47]:
cohere_command_xl = Cohere(model='command-xlarge', 
             temperature=0.1, 
             max_tokens = 256)

In [48]:
cohere_command_xl_nightly = Cohere(model='command-xlarge-nightly',
             temperature=0.1, 
             max_tokens = 256)

## Set up a comparison lab

In [49]:
from langchain.model_laboratory import ModelLaboratory

In [50]:
from langchain.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])

In [55]:
chatGPT_turbo(prompt=prompt)

ValueError: Argument `prompt` is expected to be a string. Instead found <class 'langchain.prompts.prompt.PromptTemplate'>. If you want to run the LLM on multiple prompts, use `generate` instead.

In [51]:
lab = ModelLaboratory.from_llms([
                                 chatGPT_turbo, 
                                 gpt3_davinici_003,
                                 flan_20B, 
                                 cohere_command_xl
                                 ], prompt=prompt)

Let's run it on some and compare!

In [54]:
lab.compare("What is the opposite of up?")

[1mInput:[0m
What is the opposite of up?

[1mOpenAIChat[0m
Params: {'model_name': 'gpt-3.5-turbo', 'temperature': 0.1, 'max_tokens': 256}


AuthenticationError: <empty message>

In [None]:
lab.compare("What is the opposite of up?")

In [None]:
lab.compare("What is the opposite of up?")

In [37]:
lab.compare("Answer the following question by reasoning step by step. The cafeteria had 23 apples. \
If they used 20 for lunch, and bought 6 more, how many apple do they have?")

[1mInput:[0m
Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?

[1mOpenAIChat[0m
Params: {'model_name': 'gpt-3.5-turbo', 'temperature': 0.1, 'max_tokens': 256}


AuthenticationError: <empty message>

In [None]:
lab.compare('''
Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.
''')

[1mInput:[0m

Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering.


[1mOpenAIChat[0m
Params: {'model_name': 'gpt-3.5-turbo', 'temperature': 0.1, 'max_tokens': 256}
[36;1m[1;3m

First, we need to establish that Geoffrey Hinton is a real person who is alive today, while George Washington was a historical figure who died in 1799. Therefore, it is impossible for them to have a conversation in the traditional sense.

However, if we were to imagine a hypothetical scenario where time travel was possible, and Geoffrey Hinton could travel back in time to meet George Washington, there would still be significant barriers to having a meaningful conversation.

Firstly, George Washington lived in a very different time period with different cultural norms, language, and technology. It is likely that he would struggle to understand many of the concepts and ideas that Geoffrey Hinton would want to discuss.

Secondly, Geoffrey Hinton is a computer s

In [None]:
template = """You are a creative story teller who can write wonderful interesting short stories: {question}

Story:"""
prompt = PromptTemplate(template=template, input_variables=["question"])

lab = ModelLaboratory.from_llms([
                                 chatGPT_turbo, 
                                 gpt3_davinici_003,
                                 gpt_j6B, 
                                 flan_20B,
                                 flan_t5xxl, 
                                 cohere_command_xl, 
                                 cohere_command_xl_nightly
                                 ], prompt=prompt)

In [None]:
lab.compare('''Write a sad story about carrot named Jason. The story should \
start with the carrot being a professional athlete of some kind, \
and end with the carrot having his heart broken.''')

[1mInput:[0m
Write a sad story about carrot named Jason. The story should start with the carrot being a professional athlete of some kind, and end with the carrot having his heart broken.

[1mOpenAIChat[0m
Params: {'model_name': 'gpt-3.5-turbo', 'temperature': 0.1, 'max_tokens': 256}
[36;1m[1;3mJason was a carrot like no other. He was a professional athlete, a runner to be exact. He had won numerous races and had a bright future ahead of him. He was the pride of his family and the envy of his peers.

Jason had always been passionate about running. He loved the feeling of the wind in his leaves and the adrenaline rush that came with every race. He trained hard every day, pushing himself to the limit, always striving to be better.

One day, Jason met a beautiful tomato named Sarah. She was a fellow athlete, a swimmer. They hit it off immediately and soon became inseparable. They trained together, ate together, and even slept together.

Jason was head over heels in love with Sarah. 

In [None]:

template = """Answer the question to the best of your abilities but if you are not sure then answer you don't know: {question}

Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question"])

lab = ModelLaboratory.from_llms([
                                 chatGPT_turbo, 
                                 gpt3_davinici_003,
                                 gpt_j6B, 
                                 flan_20B,
                                 flan_t5xxl, 
                                 cohere_command_xl, 
                                 cohere_command_xl_nightly
                                 ], prompt=prompt)

In [None]:
lab.compare('''I am riding a bicycle. The pedals are moving fast. I look into the mirror and I am not moving. Why is this?''')


[1mInput:[0m
I am riding a bicycle. The pedals are moving fast. I look into the mirror and I am not moving. Why is this?

[1mOpenAIChat[0m
Params: {'model_name': 'gpt-3.5-turbo', 'temperature': 0.1, 'max_tokens': 256}
[36;1m[1;3mYou are likely on a stationary bicycle or trainer.[0m

[1mOpenAI[0m
Params: {'model_name': 'text-davinci-003', 'temperature': 0.1, 'max_tokens': 256, 'top_p': 1, 'frequency_penalty': 0, 'presence_penalty': 0, 'n': 1, 'best_of': 1, 'request_timeout': None, 'logit_bias': {}}
[33;1m[1;3m You are not moving because you are coasting, meaning you are not pedaling and the bike is still in motion due to the momentum from your previous pedaling.[0m

[1mHuggingFaceHub[0m
Params: {'repo_id': 'EleutherAI/gpt-j-6B', 'task': None, 'model_kwargs': {'temperature': 0.1, 'max_new_tokens': 100}}
[38;5;200m[1;3m

The bicycle is moving because the bicycle is moving. The bicycle is moving because the bicycle is moving. The bicycle is moving because the bicycle is mov

### Fact Extraction

In [None]:
template = """{question}

Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question"])

lab = ModelLaboratory.from_llms([
                                 chatGPT_turbo, 
                                 gpt3_davinici_003,
                                 gpt_j6B, 
                                 flan_20B,
                                 flan_t5xxl, 
                                 cohere_command_xl, 
                                 cohere_command_xl_nightly
                                 ], prompt=prompt)

In [None]:
lab.compare('''Please answer the question:\n
Who is the OnePlus COO?\n\n
Output in the format: [first_name, surname]\n\n

Smartphone makers searched for a way forward at MWC 2023
Foldables, 6G, light shows -- there are a lot of ideas floating around, but no one has cracked the code
The slowdown was inevitable, of course. Nothing stays hot forever — especially in this industry. By tech standards, smartphones have had a good run, but the last few years have seen device makers searching for the magic bullet to help the sales slide reverse course. The arrival of 5G was a nice reprieve, but next-generation telecom standards don’t arrive every year.

“I personally think foldables are supply chain-driven innovation and not consumer insights,” Pei said. “Somebody invents OLED, and they can make a lot of money, because it’s a great technology. Then after a few years, a lot more companies make that, so they need to lower their prices. So they need to figure out what else they can sell at a higher margin. They develop flexible OLEDs, which they can sell at a higher price.”
It’s hard not to be cynical about this stuff sometimes. Ditto for concept devices, though as I noted in my “ode to weird tech” post, as someone who follows this stuff for a living, I’m a fan of weirdness for weirdness sake, be it the rollable Motorola Rizr screen or the OnePlus glowing cooling fluid. Certainly following the automotive industry’s lead of creating concept devices is a trend that is likely to only become more pervasive.

OnePlus COO Kinder Liu told me this week that gauging consumer interest is one of the “multiple reasons” his company is engaging with the concept. He added, “Also, we want to encourage continuous innovation inside our company.”

Pretty much everyone I engaged with this week echoed the sentiment that smartphones are in a rut. For the first time, however, it’s not a foregone conclusion that there’s a way of getting out.
''')


[1mInput:[0m
Please answer the question:

Who is the OnePlus COO?


Output in the format: [first_name, surname]



Smartphone makers searched for a way forward at MWC 2023
Foldables, 6G, light shows -- there are a lot of ideas floating around, but no one has cracked the code
The slowdown was inevitable, of course. Nothing stays hot forever — especially in this industry. By tech standards, smartphones have had a good run, but the last few years have seen device makers searching for the magic bullet to help the sales slide reverse course. The arrival of 5G was a nice reprieve, but next-generation telecom standards don’t arrive every year.

“I personally think foldables are supply chain-driven innovation and not consumer insights,” Pei said. “Somebody invents OLED, and they can make a lot of money, because it’s a great technology. Then after a few years, a lot more companies make that, so they need to lower their prices. So they need to figure out what else they can sell at a higher mar

In [None]:
lab.compare('''Please answer the question:\n
What is a supply chain driven innovation?\n\n

Smartphone makers searched for a way forward at MWC 2023
Foldables, 6G, light shows -- there are a lot of ideas floating around, but no one has cracked the code
The slowdown was inevitable, of course. Nothing stays hot forever — especially in this industry. By tech standards, smartphones have had a good run, but the last few years have seen device makers searching for the magic bullet to help the sales slide reverse course. The arrival of 5G was a nice reprieve, but next-generation telecom standards don’t arrive every year.

“I personally think foldables are supply chain-driven innovation and not consumer insights,” Pei said. “Somebody invents OLED, and they can make a lot of money, because it’s a great technology. Then after a few years, a lot more companies make that, so they need to lower their prices. So they need to figure out what else they can sell at a higher margin. They develop flexible OLEDs, which they can sell at a higher price.”
It’s hard not to be cynical about this stuff sometimes. Ditto for concept devices, though as I noted in my “ode to weird tech” post, as someone who follows this stuff for a living, I’m a fan of weirdness for weirdness sake, be it the rollable Motorola Rizr screen or the OnePlus glowing cooling fluid. Certainly following the automotive industry’s lead of creating concept devices is a trend that is likely to only become more pervasive.

OnePlus COO Kinder Liu told me this week that gauging consumer interest is one of the “multiple reasons” his company is engaging with the concept. He added, “Also, we want to encourage continuous innovation inside our company.”

Pretty much everyone I engaged with this week echoed the sentiment that smartphones are in a rut. For the first time, however, it’s not a foregone conclusion that there’s a way of getting out.
''')

[1mInput:[0m
Please answer the question:

What is a supply chain driven innovation?



Smartphone makers searched for a way forward at MWC 2023
Foldables, 6G, light shows -- there are a lot of ideas floating around, but no one has cracked the code
The slowdown was inevitable, of course. Nothing stays hot forever — especially in this industry. By tech standards, smartphones have had a good run, but the last few years have seen device makers searching for the magic bullet to help the sales slide reverse course. The arrival of 5G was a nice reprieve, but next-generation telecom standards don’t arrive every year.

“I personally think foldables are supply chain-driven innovation and not consumer insights,” Pei said. “Somebody invents OLED, and they can make a lot of money, because it’s a great technology. Then after a few years, a lot more companies make that, so they need to lower their prices. So they need to figure out what else they can sell at a higher margin. They develop flexible O