In [13]:
articles = [
    """Almost two years ago, Mark Zuckerberg rebranded his company Facebook to Meta — and since then, he has been focused on building the “metaverse,” a three-dimensional virtual reality. But the metaverse has lost some of its luster since 2021. Companies like Disney have closed down their metaverse divisions and deemphasized using the word, while crypto-based startup metaverses have quietly languished or imploded. In 2022, Meta’s Reality Labs division reported an operational loss of $13.7 billion.""",
    """Despite doubts that the Raspberry Pi 5 would launch this year, the latest version of the microcomputer has arrived with some notable upgrades at a $60 starting price. Not only is it supposed to perform better than its predecessor but it’s also the first Raspberry Pi to come with in-house silicon. Powering the brain of the Raspberry Pi 5 is a 64-bit quad-core Arm Cortex-A76 processor that runs at 2.4GHz, allowing for two to three times the performance boost when compared to the four-year-old Raspberry Pi 4. The device also comes with an 800MHz VideoCore VII graphics chip that the Raspberry Pi Foundation says offers a “substantial uplift” in graphics performance.""",
    """Starting on November 1st, Disney Plus will begin restricting password sharing. In Canada. The company announced the change in an email sent to Canadian subscribers. Disney has not provided many details on how it plans to enforce this policy — its email merely states that “we’re implementing restrictions on your ability to share your account or login credentials outside of your household.” The announcement reads more like a strong finger wag than anything else. “You may not share your subscription outside of your household,” reads the company’s updated Help Center.""",
    """Between Apple no longer making leather products and the utter disappointment of the new FineWoven case, there’s more opportunity this year for third-party iPhone case makers than perhaps ever before. But for a lot of manufacturers, the new Action Button in the iPhone 15 Pro and Pro Max is throwing a big old wrench into their plans. Over the past week, I’ve received case samples from a number of companies, and it’s clear which ones bet correctly on the Action Button’s existence and which ones hedged their bets a little too hard on Apple sticking with the traditional ringer switch. Some brands, such as Nomad and MOFT, bet that Apple would replace the ringer switch with a button this year and designed their cases accordingly: there’s a metal button in the case that allows you to easily activate the Action Button whenever you want.""",
    """Fitbit is back with the Charge 6 — and on paper, this one feels like the most Fitbit-y Fitbit since Google actively began folding the company into its ecosystem. Not only has the price been lowered from $179.95 to $159.95 but the device also adds an improved heart rate tracking algorithm, compatibility with certain gym machines, and better integration with Google services. Oh, and the side button is back, baby.""",
]


In [14]:
from arthur_bench.run.testsuite import TestSuite
from random_scorer import RandomScorer
random_test = TestSuite(
    'random_scorer',
    scoring_method=RandomScorer(),
    input_text_list=articles,
    reference_output_list=["Mark Zuckerberg's Meta lost $13.7 billion in 2021.", "The Raspberry Pi 5 is the first Raspberry Pi to come with in-house silicon.", "Disney Plus will begin restricting password sharing in Canada.", "The Action Button in the iPhone 15 Pro and Pro Max is throwing a big old wrench into case makers' plans.", "Fitbit's Charge 6 is the most Fitbit-y Fitbit since Google acquired the company."],
)



In [15]:
from trulens_eval import TruChain, Feedback, Huggingface, Tru
tru = Tru()
from langchain.chat_models import ChatOpenAI
from langchain.llms import LlamaCpp
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import AIMessage, HumanMessage, SystemMessage
from langchain.llms import Cohere
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler


from langchain.llms import VertexAI

In [16]:
gpt_35 = ChatOpenAI(
    temperature=0,
    model='gpt-3.5-turbo')
gpt_4 = ChatOpenAI(
    temperature=0,
    model='gpt-4')
n_gpu_layers = 1  # Metal set to 1 is enough.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.
# Make sure the model path is correct for your system!
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
llama_13b = LlamaCpp(
    model_path="/Users/nathan/Code/llama-2-13b.Q5_K_M.gguf",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
)
cohere = Cohere()
vertex = VertexAI()

llama_model_loader: loaded meta data with 19 key-value pairs and 363 tensors from /Users/nathan/Code/llama-2-13b.Q5_K_M.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q5_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight q6_K     [ 13824,  5120,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight q5_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight q5_K     [  5120, 13824,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.attn_k.weight q5_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    7:         blk.0.attn_output.weight q5_K     [  5120,  5120,

In [17]:
# Prompts for chat
system_message = SystemMessage(
    content = "You are an expert in article summarization who can write a summary of an article in exactly one sentence, while still maintaining the main points of the article.",
)
human_message = HumanMessagePromptTemplate.from_template("Summarize the following article in one sentence: {text}")

chat_prompt = ChatPromptTemplate.from_messages([
    system_message,
    human_message
])

# Prompts for completion
completion_prompt = PromptTemplate(template="""We will now summarize the following article in exactly one sentence, while still maintaining the main points of the article.
                                   Article: {text}
                                   Single-Sentence Summary:""", input_variables=['text'])

In [18]:
from trulens_eval.feedback import Feedback, Huggingface, OpenAI
# Initialize HuggingFace-based feedback function collection class:
hugs = Huggingface()
openai = OpenAI()

# Define a language match feedback function using HuggingFace.
lang_match = Feedback(hugs.language_match).on_input_output()
# By default this will check language match on the main app input and main app
# output.

# Question/answer relevance between overall question and answer.
qa_relevance = Feedback(openai.relevance).on_input_output()
# By default this will evaluate feedback on main app input and main app output.

✅ In language_match, input text1 will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In language_match, input text2 will be set to *.__record__.main_output or `Select.RecordOutput` .
✅ In relevance, input prompt will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In relevance, input response will be set to *.__record__.main_output or `Select.RecordOutput` .


In [19]:
cohere_chain = LLMChain(
    prompt=completion_prompt,
    llm=cohere,
)

tru_cohere_chain = TruChain(
    cohere_chain,
    app_id='Cohere',
    feedbacks=[lang_match, qa_relevance]
)

In [20]:
llama_chain = LLMChain(
    prompt=completion_prompt,
    llm=llama_13b,
)
tru_llama_chain = TruChain(
    llama_chain,
    app_id='Llama',
    feedbacks=[lang_match, qa_relevance]
)

ggml_metal_free: deallocating


In [21]:
vertex_chain = LLMChain(
    prompt=completion_prompt,
    llm=vertex,
)

tru_vertex_chain = TruChain(
    vertex_chain,
    app_id='Vertex',
    feedbacks=[lang_match, qa_relevance]
)

In [22]:
gpt_35_summaries = [
    gpt_35(chat_prompt.format_prompt(text=article).to_messages()).content for article in articles
]


In [23]:
gpt_4_summaries = [
    gpt_4(chat_prompt.format_prompt(text=article).to_messages()).content for article in articles
]

In [24]:
cohere_summaries = [
    tru_cohere_chain(article) for article in articles
]

A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x28650b5c0 is calling an instrumented method <function Chain.__call__ at 0x16a38a0c0>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x28650b5c0 is calling an instrumented method <function LLMChain._call at 0x16a38b600>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x28650b5c0 is calling an instrumented method <function LLMChain._call at 0x16a38b600>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x28650b5c0 is calling an instrumented method <function LLMChain._call at 0x16a38b600>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.
Retrying langchain.llms.cohere.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised CohereAPIError: You are using a Trial key, which is limited to 2 API calls / minute. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.ai/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions.
Retrying langchain.llms.cohere.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised CohereAPIError: You are using a Trial key, which is limited to 2 API calls / minute. You can continue to use the Trial key for free or upgrade to a Production


`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x28650b5c0 is calling an instrumented method <function LLMChain._call at 0x16a38b600>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



Retrying langchain.llms.cohere.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised CohereAPIError: You are using a Trial key, which is limited to 2 API calls / minute. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.ai/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions.
Retrying langchain.llms.cohere.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised CohereAPIError: You are using a Trial key, which is limited to 2 API calls / minute. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.ai/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions.
Retrying langchain.llms.cohere.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised C

In [25]:
vertex_summaries = [
    tru_vertex_chain(article) for article in articles
]

A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x51ae136b0 is calling an instrumented method <function Chain.__call__ at 0x16a38a0c0>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x51ae136b0 is calling an instrumented method <function LLMChain._call at 0x16a38b600>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x51ae136b0 is calling an instrumented method <function LLMChain._call at 0x16a38b600>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x51ae136b0 is calling an instrumented method <function LLMChain._call at 0x16a38b600>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x51ae136b0 is calling an instrumented method <function LLMChain._call at 0x16a38b600>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



A new object of type <class 'langchain.chains.llm.LLMChain'> at 0x51ae136b0 is calling an instrumented method <function LLMChain._call at 0x16a38b600>. The path of this call may be incorrect.
Guessing path of new object is *.app based on other object (0x51aca21c0) using this function.


In [27]:
llama_summaries = [
    tru_llama_chain(article) for article in articles
]


`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```




llama_print_timings:        load time =  9503.09 ms
llama_print_timings:      sample time =   239.33 ms /   256 runs   (    0.93 ms per token,  1069.65 tokens per second)
llama_print_timings: prompt eval time =  9503.00 ms /   169 tokens (   56.23 ms per token,    17.78 tokens per second)
llama_print_timings:        eval time = 35487.26 ms /   255 runs   (  139.17 ms per token,     7.19 tokens per second)
llama_print_timings:       total time = 45727.88 ms



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



Llama.generate: prefix-match hit

llama_print_timings:        load time =  9503.09 ms
llama_print_timings:      sample time =   179.50 ms /   190 runs   (    0.94 ms per token,  1058.50 tokens per second)
llama_print_timings: prompt eval time =  2663.01 ms /   189 tokens (   14.09 ms per token,    70.97 tokens per second)
llama_print_timings:        eval time = 26113.09 ms /   189 runs   (  138.16 ms per token,     7.24 tokens per second)
llama_print_timings:       total time = 29332.24 ms



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



Llama.generate: prefix-match hit
Waiting for {'error': 'Model papluca/xlm-roberta-base-language-detection is currently loading', 'estimated_time': 44.49275207519531} (44.49275207519531) second(s).

llama_print_timings:        load time =  9503.09 ms
llama_print_timings:      sample time =   235.34 ms /   256 runs   (    0.92 ms per token,  1087.80 tokens per second)
llama_print_timings: prompt eval time =  1643.89 ms /   123 tokens (   13.36 ms per token,    74.82 tokens per second)
llama_print_timings:        eval time = 33232.04 ms /   255 runs   (  130.32 ms per token,     7.67 tokens per second)
llama_print_timings:       total time = 35600.87 ms



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```

Task queue full. Finishing existing tasks.


Llama.generate: prefix-match hit
API error: {'error': 'Rate limit reached. Please log in or use your apiToken'}.
API error: {'error': 'Rate limit reached. Please log in or use your apiToken'}.
Feedback function language_match with aggregation <function mean at 0x11227f4c0> had no inputs.
Feedback function language_match with aggregation <function mean at 0x11227f4c0> had no inputs.


Evaluation of language_match failed on inputs: 
{'text1': 'Despite doubts that the Raspberry Pi 5 would launch this year, the '
          'latest version of the microcomputer h
Rate limit reached. Please log in or use your apiToken.
Evaluation of language_match failed on inputs: 
{'text1': 'Starting on November 1st, Disney Plus will begin restricting '
          'password sharing. In Canada. The company an
Rate limit reached. Please log in or use your apiToken.



llama_print_timings:        load time =  9503.09 ms
llama_print_timings:      sample time =   242.51 ms /   256 runs   (    0.95 ms per token,  1055.62 tokens per second)
llama_print_timings: prompt eval time =  3251.89 ms /   199 tokens (   16.34 ms per token,    61.20 tokens per second)
llama_print_timings:        eval time = 36740.13 ms /   255 runs   (  144.08 ms per token,     6.94 tokens per second)
llama_print_timings:       total time = 40743.14 ms



`call` will be deprecated soon; To record results of your app's execution, use one of these options to invoke your app:
    (1) Use the `with_` method:
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        result = tru_app_recorder.with_(app, ...args/kwargs-to-app...)
        ```
    (2) Use TruChain as a context manager: 
        ```python
        app # your app
        tru_app_recorder: TruChain = TruChain(app, ...)
        with tru_app_recorder:
            result = app(...args/kwargs-to-app...)
        
        ```



API error: {'error': 'Rate limit reached. Please log in or use your apiToken'}.
Llama.generate: prefix-match hit
API error: {'error': 'Rate limit reached. Please log in or use your apiToken'}.
Feedback function language_match with aggregation <function mean at 0x11227f4c0> had no inputs.


Evaluation of language_match failed on inputs: 
{'text1': 'Between Apple no longer making leather products and the utter '
          'disappointment of the new FineWoven case, 
Rate limit reached. Please log in or use your apiToken.



llama_print_timings:        load time =  9503.09 ms
llama_print_timings:      sample time =   276.09 ms /   256 runs   (    1.08 ms per token,   927.24 tokens per second)
llama_print_timings: prompt eval time =  2071.93 ms /   118 tokens (   17.56 ms per token,    56.95 tokens per second)
llama_print_timings:        eval time = 36270.86 ms /   255 runs   (  142.24 ms per token,     7.03 tokens per second)
llama_print_timings:       total time = 39155.79 ms


In [None]:
random_test.run("gpt_35_summaries", candidate_output_list=gpt_35_summaries)
random_test.run("gpt_4_summaries", candidate_output_list=gpt_4_summaries)
random_test.run("cohere_summaries", candidate_output_list=cohere_summaries)
random_test.run("vertex_summaries", candidate_output_list=vertex_summaries)

UserValueError: A test run with the name gpt_35_summaries already exists. Give this test run a unique name and re-run.

API error: {'error': 'Rate limit reached. Please log in or use your apiToken'}.
Feedback function language_match with aggregation <function mean at 0x11227f4c0> had no inputs.
API error: {'error': 'Rate limit reached. Please log in or use your apiToken'}.


Evaluation of language_match failed on inputs: 
{'text1': 'Fitbit is back with the Charge 6 — and on paper, this one feels '
          'like the most Fitbit-y Fitbit since Goog
Rate limit reached. Please log in or use your apiToken.
