<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/low_level/oss_ingestion_retrieval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Building RAG from Scratch (Open-source only!)

In this tutorial, we show you how to build a data ingestion pipeline into a vector database, and then build a retrieval pipeline from that vector database, from scratch.

Notably, we use a fully open-source stack:

- Sentence Transformers as the embedding model
- Elasticsearch as the vector store (we support many other [vector stores](https://gpt-index.readthedocs.io/en/stable/module_guides/storing/vector_stores.html) too!)
- Llama 2 as the LLM (through [llama.cpp](https://github.com/ggerganov/llama.cpp))

## Setup

We setup our open-source components.
1. Sentence Transformers
2. Llama 2
3. We initialize postgres and wrap it with our wrappers/abstractions.

#### Sentence Transformers

In [None]:
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-llama-cpp
%pip install llama-index-vector-stores-elasticsearch
%pip install tqdm

In [None]:
!pip install llama-index

In [2]:
# sentence transformers
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="mixedbread-ai/mxbai-embed-large-v1")

  from .autonotebook import tqdm as notebook_tqdm


#### Llama CPP

In this notebook, we use the [`llama-2-chat-7b-gguf`](https://huggingface.co/TheBloke/Llama-2-7B-chat-GGUF) model, along with the proper prompt formatting.

Check out our [Llama CPP guide](https://gpt-index.readthedocs.io/en/stable/examples/llm/llama_2_llama_cpp.html) for full setup instructions/details.

In [None]:
!pip install llama-cpp-python


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [37]:
from llama_index.llms.llama_cpp import LlamaCPP

# model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
model_path = "llama-2-7b-chat.Q3_K_S.gguf"

llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    # model_url=model_url,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=model_path,
    temperature=0.1,
    max_new_tokens=512,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=4000,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 0},
    verbose=True,
)

llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from llama-2-7b-chat.Q3_K_S.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 

#### Initialize Elasticsearch

Using an existing elasticsearch running at localhost, create the database we'll be using.

**NOTE**: Of course there are plenty of other open-source/self-hosted databases you can use! e.g. Chroma, Qdrant, Weaviate, and many more. Take a look at our [vector store guide](https://gpt-index.readthedocs.io/en/stable/module_guides/storing/vector_stores.html).

Use this command in the terminal to launch a docker hosted Elasticsearch container

```
docker run -p 9200:9200 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -e "xpack.security.http.ssl.enabled=false" \
  -e "xpack.license.self_generated.type=trial" \
  docker.elastic.co/elasticsearch/elasticsearch:8.9.0
```

In [21]:
%pip install elasticsearch

Note: you may need to restart the kernel to use updated packages.


In [4]:
from llama_index.vector_stores.elasticsearch import ElasticsearchStore

vector_store = ElasticsearchStore(
    index_name="qna_mental_health",
    es_url="http://localhost:9200",
)

## Build an Ingestion Pipeline from Scratch

We show how to build an ingestion pipeline as mentioned in the introduction.

We fast-track the steps here (can skip metadata extraction). More details can be found [in our dedicated ingestion guide](https://gpt-index.readthedocs.io/en/latest/examples/low_level/ingestion.html).

### 1. Load Data

In [5]:
file_path_test="counsel-chat-best-answer-test.csv"
file_path_train="counsel-chat-best-answer-train.csv"

import pandas as pd

df_test = pd.read_csv(file_path_test)
df_train = pd.read_csv(file_path_train)

### 2. Use a Text Splitter to Split Documents

In [6]:
def combined_data(data):
    return {
        "question": data['questionText'],
        "answer": data['answerText'],
        "topic": data['topic']
    }

df_train['qa_mental_health'] = df_train.apply(combined_data, axis=1)

In [7]:
df_train['qa_mental_health'][0]

{'question': "I have been diagnosed with general anxiety and depression by my family doctor. They wrote a prescription for me to have an emotional support dog, I have the paper work, and I gave it to my apartment manager. They said I can't keep the ESD because I'm not disabled. What do you suggest I do?",
 'answer': 'This can be a difficult situation. \xa0Typically, only animals that are specifically trains to accomplish a specific task are legally protected as Service Animsls. Even though that can be very helpful, emotional support animals are not generally protected in the same way.You might not be able to make your landlord accommodate you. If possible, you may want to consider a different apparent that is more animal friendly.',
 'topic': 'depression'}

In [8]:
df_final = pd.DataFrame(df_train['qa_mental_health'])
df_final['question_title'] = df_train['questionTitle']

In [9]:
df_final.head()

Unnamed: 0,qa_mental_health,question_title
0,{'question': 'I have been diagnosed with gener...,My apartment manager won't let me keep an emot...
1,{'question': 'There are many people willing to...,Why do I feel like I don't belong anywhere?
2,{'question': 'My girlfriend just quit drinking...,How can I help my girlfriend?
3,{'question': 'I don't know how else to explain...,How do I stop feeling empty?
4,{'question': 'I tried telling my husband I was...,How can I get my husband to listen to my needs...


### 3. Manually Construct Nodes from Text Chunks

In [10]:
from llama_index.core.schema import TextNode

nodes = []
for ans, title in zip(df_final['qa_mental_health'], df_final['question_title']):
    node = TextNode(
        text=str(ans),
    )
    node.metadata = {"title": title}
    nodes.append(node)

### 4. Generate Embeddings for each Node

Here we generate embeddings for each Node using a sentence_transformers model.

In [11]:
for node in nodes:
    node_embedding = embed_model.get_text_embedding(
        node.get_content(metadata_mode="all")
    )
    node.embedding = node_embedding

### 5. Load Nodes into a Vector Store

We now insert these nodes into our `ElasticsearchVectorStore`.

In [12]:
vector_store.add(nodes)

['8bb12a5e-ff00-4548-9ad3-6452376f7490',
 '19c84223-b0af-499f-91ad-d02e6b668572',
 '9392cbb0-52e1-4cec-8f24-c1c22133dbfb',
 '9f114635-e505-479f-bdd8-9ce5bdc961d9',
 '3538156f-3fe0-4b7a-9128-229b341d0e8d',
 '055e1e8a-8da3-4bd7-8f14-059bfb456545',
 '42328b14-7b85-4f72-8ae3-6b5c2cb6f2de',
 'd8950f34-c10a-45e8-89e2-9af6bff772ef',
 'ce653213-2c85-4883-a6b9-8cef3e24243e',
 'a9c9d511-8f0f-4eb0-ac15-5e6c1a95e257',
 'ff5e121a-d325-45fe-9439-d4e93236b14f',
 '108c81de-b6c0-48db-a449-0e9ba63c2103',
 '88c004cf-1ea0-4f39-940b-766a31130045',
 '22efc545-66ec-4311-b1a2-f262b41132df',
 '767e8553-1cf7-4286-9c73-a8060b783e51',
 '0607ebfa-3c6b-4d29-bf33-c8a57d8bbb4a',
 'd830b38d-4a67-464b-a83d-f66461ae61e0',
 'e788d21c-e694-41ff-a1fd-08b298f69811',
 '316fad1f-930d-434e-9af2-3d8528c67f32',
 '9fc6dbd9-4e38-44c0-a804-e38b832aa8cd',
 '0dcc145b-f50e-471b-8f76-42a968369df2',
 '039a652f-a2b7-4f18-a2f0-511f6013916e',
 'da315144-5ae7-4eb5-8388-9c394fcd7cac',
 'fd4f2465-b0df-461c-89f7-94ac0cd3eeaf',
 'fe8363b6-12dd-

## Build Retrieval Pipeline from Scratch

We show how to build a retrieval pipeline. Similar to ingestion, we fast-track the steps. Take a look at our [retrieval guide](https://gpt-index.readthedocs.io/en/latest/examples/low_level/retrieval.html) for more details!

In [17]:
query_str = """I have so many issues to address. I have a history of sexual abuse, I’m a breast cancer survivor and I am a lifetime insomniac. 
    I have a long history of depression and I’m beginning to have anxiety. I have low self esteem but I’ve been happily married for almost 35 years. 
    I’ve never had counseling about any of this. Do I have too many issues to address in counseling?
    """

### 1. Generate a Query Embedding

In [18]:
query_embedding = embed_model.get_query_embedding(query_str)

### 2. Query the Vector Database

In [19]:
# construct vector store query
from llama_index.core.vector_stores import VectorStoreQuery

query_mode = "default"
# query_mode = "sparse"
# query_mode = "hybrid"

vector_store_query = VectorStoreQuery(
    query_embedding=query_embedding, similarity_top_k=2, mode=query_mode
)

In [20]:
# returns a VectorStoreQueryResult
query_result = vector_store.query(vector_store_query)
print(query_result.nodes[0].get_content())

{'question': 'I stress over everything. If I don\'t have enough "quality time" with my boyfriend, I start to feel resentment towards him. He has three children, and they are great kids, but I find we don\'t have much time together. I break down easily and find myself depressed.', 'answer': 'Everyone has some level of anxiety - it\'s what helps us respond to stressors in our lives and clues us into the fact that we need to respond to something going on. However, if you\'re feeling overwhelmed by racing thoughts, feeling like you spend a lot of energy worrying about something specific or even pretty much anything at all, and you\'re starting to find that it\'s getting in your way when it comes to living your life the way you want, then I\'d suggest seeing a counselor or therapist for an assessment for anxiety.\xa0Your other concerns, though, seem pretty "normal" for someone who is in a relationship with a partner who has children. As a married stepmother, I\'ve been there, and as a thera

### 3. Parse Result into a Set of Nodes

In [21]:
from llama_index.core.schema import NodeWithScore
from typing import Optional

nodes_with_scores = []
for index, node in enumerate(query_result.nodes):
    score: Optional[float] = None
    if query_result.similarities is not None:
        score = query_result.similarities[index]
    nodes_with_scores.append(NodeWithScore(node=node, score=score))

### 4. Put into a Retriever

In [22]:
from llama_index.core import QueryBundle
from llama_index.core.retrievers import BaseRetriever
from typing import Any, List


class VectorDBRetriever(BaseRetriever):
    """Retriever over a postgres vector store."""

    def __init__(
        self,
        vector_store: ElasticsearchStore,
        embed_model: Any,
        query_mode: str = "default",
        similarity_top_k: int = 2,
    ) -> None:
        """Init params."""
        self._vector_store = vector_store
        self._embed_model = embed_model
        self._query_mode = query_mode
        self._similarity_top_k = similarity_top_k
        super().__init__()

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve."""
        query_embedding = embed_model.get_query_embedding(
            query_bundle.query_str
        )
        vector_store_query = VectorStoreQuery(
            query_embedding=query_embedding,
            similarity_top_k=self._similarity_top_k,
            mode=self._query_mode,
        )
        query_result = vector_store.query(vector_store_query)

        nodes_with_scores = []
        for index, node in enumerate(query_result.nodes):
            score: Optional[float] = None
            if query_result.similarities is not None:
                score = query_result.similarities[index]
            nodes_with_scores.append(NodeWithScore(node=node, score=score))

        return nodes_with_scores

In [23]:
retriever = VectorDBRetriever(
    vector_store, embed_model, query_mode="default", similarity_top_k=2
)

## Plug this into our RetrieverQueryEngine to synthesize a response

In [38]:
from llama_index.core.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine.from_args(retriever, llm=llm)

In [39]:
query_str = """I have so many issues to address. I have a history of sexual abuse, I’m a breast cancer survivor and I am a lifetime insomniac. 
    I have a long history of depression and I’m beginning to have anxiety. I have low self esteem but I’ve been happily married for almost 35 years. 
    I’ve never had counseling about any of this. Do I have too many issues to address in counseling?
    """

response = query_engine.query(query_str)


llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      22.89 ms /   237 runs   (    0.10 ms per token, 10352.96 tokens per second)
llama_print_timings: prompt eval time =   31449.02 ms /   913 tokens (   34.45 ms per token,    29.03 tokens per second)
llama_print_timings:        eval time =   34101.34 ms /   236 runs   (  144.50 ms per token,     6.92 tokens per second)
llama_print_timings:       total time =   65948.91 ms /  1149 tokens


In [40]:
print(str(response))

 No, you don't have too many issues to address in counseling. It's completely normal and healthy to want to work through your past experiences, mental health struggles, and relationship concerns with a trained therapist or counselor. In fact, seeking professional help is often the first step towards healing and growth.
It's important to remember that you don't have to go through any of this alone. A therapist can provide a safe and supportive space for you to process your experiences and emotions, and offer tools and strategies to help you manage your mental health and relationships.
Additionally, it's important to find a therapist who is experienced in working with survivors of sexual abuse and individuals with low self-esteem. They can provide specialized support and guidance to help you work through these issues.
Overall, don't hesitate to seek professional help if you feel like you're struggling to cope with your experiences or mental health. It's okay to ask for help, and it's imp

In [41]:
print(response.source_nodes[0].get_content())

{'question': 'I stress over everything. If I don\'t have enough "quality time" with my boyfriend, I start to feel resentment towards him. He has three children, and they are great kids, but I find we don\'t have much time together. I break down easily and find myself depressed.', 'answer': 'Everyone has some level of anxiety - it\'s what helps us respond to stressors in our lives and clues us into the fact that we need to respond to something going on. However, if you\'re feeling overwhelmed by racing thoughts, feeling like you spend a lot of energy worrying about something specific or even pretty much anything at all, and you\'re starting to find that it\'s getting in your way when it comes to living your life the way you want, then I\'d suggest seeing a counselor or therapist for an assessment for anxiety.\xa0Your other concerns, though, seem pretty "normal" for someone who is in a relationship with a partner who has children. As a married stepmother, I\'ve been there, and as a thera

## Get output of all test questions 

In [51]:
df_test = pd.read_csv(file_path_test)

responses = []

for index, row in df_test.iterrows():
    query_str = row['questionText']
    response = query_engine.query(query_str)
    responses.append(str(response))
    print (str(index+1) + "/" + str(len(df_test)) + " done")

Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      34.19 ms /   350 runs   (    0.10 ms per token, 10236.31 tokens per second)
llama_print_timings: prompt eval time =  112741.02 ms /     2 tokens (56370.51 ms per token,     0.02 tokens per second)
llama_print_timings:        eval time =  108006.37 ms /   349 runs   (  309.47 ms per token,     3.23 tokens per second)
llama_print_timings:       total time =  114315.10 ms /   351 tokens


1/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      53.47 ms /   416 runs   (    0.13 ms per token,  7780.21 tokens per second)
llama_print_timings: prompt eval time =   54146.47 ms /  1157 tokens (   46.80 ms per token,    21.37 tokens per second)
llama_print_timings:        eval time =  369499.77 ms /   415 runs   (  890.36 ms per token,     1.12 tokens per second)
llama_print_timings:       total time =  424834.27 ms /  1572 tokens


2/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      59.87 ms /   512 runs   (    0.12 ms per token,  8551.72 tokens per second)
llama_print_timings: prompt eval time =   36736.64 ms /   733 tokens (   50.12 ms per token,    19.95 tokens per second)
llama_print_timings:        eval time =  141932.67 ms /   511 runs   (  277.75 ms per token,     3.60 tokens per second)
llama_print_timings:       total time =  179868.94 ms /  1244 tokens


3/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      47.52 ms /   382 runs   (    0.12 ms per token,  8039.06 tokens per second)
llama_print_timings: prompt eval time =   35662.08 ms /   953 tokens (   37.42 ms per token,    26.72 tokens per second)
llama_print_timings:        eval time =  216309.77 ms /   381 runs   (  567.74 ms per token,     1.76 tokens per second)
llama_print_timings:       total time =  253062.12 ms /  1334 tokens


4/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      53.66 ms /   427 runs   (    0.13 ms per token,  7957.36 tokens per second)
llama_print_timings: prompt eval time =   32355.45 ms /   668 tokens (   48.44 ms per token,    20.65 tokens per second)
llama_print_timings:        eval time =  215743.58 ms /   426 runs   (  506.44 ms per token,     1.97 tokens per second)
llama_print_timings:       total time =  249258.42 ms /  1094 tokens


5/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      35.13 ms /   316 runs   (    0.11 ms per token,  8995.42 tokens per second)
llama_print_timings: prompt eval time =   30504.81 ms /   699 tokens (   43.64 ms per token,    22.91 tokens per second)
llama_print_timings:        eval time =  142151.50 ms /   315 runs   (  451.27 ms per token,     2.22 tokens per second)
llama_print_timings:       total time =  173460.79 ms /  1014 tokens


6/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      14.72 ms /   163 runs   (    0.09 ms per token, 11075.63 tokens per second)
llama_print_timings: prompt eval time =   21414.79 ms /   717 tokens (   29.87 ms per token,    33.48 tokens per second)
llama_print_timings:        eval time =   19216.59 ms /   162 runs   (  118.62 ms per token,     8.43 tokens per second)
llama_print_timings:       total time =   41035.25 ms /   879 tokens


7/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      32.31 ms /   329 runs   (    0.10 ms per token, 10181.98 tokens per second)
llama_print_timings: prompt eval time =   19807.54 ms /   791 tokens (   25.04 ms per token,    39.93 tokens per second)
llama_print_timings:        eval time =   41479.62 ms /   328 runs   (  126.46 ms per token,     7.91 tokens per second)
llama_print_timings:       total time =   61991.36 ms /  1119 tokens


8/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      38.23 ms /   389 runs   (    0.10 ms per token, 10176.59 tokens per second)
llama_print_timings: prompt eval time =   18823.67 ms /   778 tokens (   24.19 ms per token,    41.33 tokens per second)
llama_print_timings:        eval time =   49037.52 ms /   388 runs   (  126.39 ms per token,     7.91 tokens per second)
llama_print_timings:       total time =   68634.56 ms /  1166 tokens


9/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      27.91 ms /   273 runs   (    0.10 ms per token,  9782.84 tokens per second)
llama_print_timings: prompt eval time =   16065.40 ms /   631 tokens (   25.46 ms per token,    39.28 tokens per second)
llama_print_timings:        eval time =   32729.05 ms /   272 runs   (  120.33 ms per token,     8.31 tokens per second)
llama_print_timings:       total time =   49347.66 ms /   903 tokens


10/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      34.56 ms /   358 runs   (    0.10 ms per token, 10357.90 tokens per second)
llama_print_timings: prompt eval time =   23278.11 ms /   973 tokens (   23.92 ms per token,    41.80 tokens per second)
llama_print_timings:        eval time =   44497.70 ms /   357 runs   (  124.64 ms per token,     8.02 tokens per second)
llama_print_timings:       total time =   68578.22 ms /  1330 tokens


11/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      38.84 ms /   416 runs   (    0.09 ms per token, 10710.61 tokens per second)
llama_print_timings: prompt eval time =   27774.50 ms /  1077 tokens (   25.79 ms per token,    38.78 tokens per second)
llama_print_timings:        eval time =   51215.10 ms /   415 runs   (  123.41 ms per token,     8.10 tokens per second)
llama_print_timings:       total time =   79871.27 ms /  1492 tokens


12/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      33.34 ms /   362 runs   (    0.09 ms per token, 10856.20 tokens per second)
llama_print_timings: prompt eval time =   38957.00 ms /  1545 tokens (   25.21 ms per token,    39.66 tokens per second)
llama_print_timings:        eval time =   49448.12 ms /   361 runs   (  136.98 ms per token,     7.30 tokens per second)
llama_print_timings:       total time =   89067.64 ms /  1906 tokens


13/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      35.87 ms /   392 runs   (    0.09 ms per token, 10928.96 tokens per second)
llama_print_timings: prompt eval time =   32831.97 ms /  1244 tokens (   26.39 ms per token,    37.89 tokens per second)
llama_print_timings:        eval time =   50531.80 ms /   391 runs   (  129.24 ms per token,     7.74 tokens per second)
llama_print_timings:       total time =   84112.61 ms /  1635 tokens


14/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      48.94 ms /   512 runs   (    0.10 ms per token, 10462.43 tokens per second)
llama_print_timings: prompt eval time =   25252.76 ms /   941 tokens (   26.84 ms per token,    37.26 tokens per second)
llama_print_timings:        eval time =   64965.15 ms /   511 runs   (  127.13 ms per token,     7.87 tokens per second)
llama_print_timings:       total time =   91187.43 ms /  1452 tokens


15/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      45.13 ms /   471 runs   (    0.10 ms per token, 10436.52 tokens per second)
llama_print_timings: prompt eval time =   21185.85 ms /   846 tokens (   25.04 ms per token,    39.93 tokens per second)
llama_print_timings:        eval time =   57398.14 ms /   470 runs   (  122.12 ms per token,     8.19 tokens per second)
llama_print_timings:       total time =   79470.08 ms /  1316 tokens


16/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      13.23 ms /   143 runs   (    0.09 ms per token, 10807.95 tokens per second)
llama_print_timings: prompt eval time =   24895.10 ms /   931 tokens (   26.74 ms per token,    37.40 tokens per second)
llama_print_timings:        eval time =   16558.51 ms /   142 runs   (  116.61 ms per token,     8.58 tokens per second)
llama_print_timings:       total time =   41700.27 ms /  1073 tokens


17/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      20.13 ms /   221 runs   (    0.09 ms per token, 10980.28 tokens per second)
llama_print_timings: prompt eval time =   17783.53 ms /   709 tokens (   25.08 ms per token,    39.87 tokens per second)
llama_print_timings:        eval time =   24592.97 ms /   220 runs   (  111.79 ms per token,     8.95 tokens per second)
llama_print_timings:       total time =   42755.42 ms /   929 tokens
Llama.generate: prefix-match hit


18/112 done



llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      36.38 ms /   410 runs   (    0.09 ms per token, 11271.48 tokens per second)
llama_print_timings: prompt eval time =   20429.17 ms /   883 tokens (   23.14 ms per token,    43.22 tokens per second)
llama_print_timings:        eval time =   46883.04 ms /   409 runs   (  114.63 ms per token,     8.72 tokens per second)
llama_print_timings:       total time =   68045.23 ms /  1292 tokens


19/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      38.35 ms /   428 runs   (    0.09 ms per token, 11160.07 tokens per second)
llama_print_timings: prompt eval time =   21266.60 ms /   883 tokens (   24.08 ms per token,    41.52 tokens per second)
llama_print_timings:        eval time =   49688.65 ms /   427 runs   (  116.37 ms per token,     8.59 tokens per second)
llama_print_timings:       total time =   71706.57 ms /  1310 tokens


20/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      29.92 ms /   325 runs   (    0.09 ms per token, 10862.30 tokens per second)
llama_print_timings: prompt eval time =    7518.33 ms /   292 tokens (   25.75 ms per token,    38.84 tokens per second)
llama_print_timings:        eval time =   36975.41 ms /   324 runs   (  114.12 ms per token,     8.76 tokens per second)
llama_print_timings:       total time =   45075.66 ms /   616 tokens


21/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      21.51 ms /   233 runs   (    0.09 ms per token, 10831.67 tokens per second)
llama_print_timings: prompt eval time =   19214.21 ms /   767 tokens (   25.05 ms per token,    39.92 tokens per second)
llama_print_timings:        eval time =   27340.05 ms /   232 runs   (  117.85 ms per token,     8.49 tokens per second)
llama_print_timings:       total time =   46966.19 ms /   999 tokens


22/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      17.50 ms /   186 runs   (    0.09 ms per token, 10630.39 tokens per second)
llama_print_timings: prompt eval time =   12154.15 ms /   477 tokens (   25.48 ms per token,    39.25 tokens per second)
llama_print_timings:        eval time =   20618.60 ms /   185 runs   (  111.45 ms per token,     8.97 tokens per second)
llama_print_timings:       total time =   33088.19 ms /   662 tokens


23/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      41.82 ms /   479 runs   (    0.09 ms per token, 11453.85 tokens per second)
llama_print_timings: prompt eval time =   24113.60 ms /   974 tokens (   24.76 ms per token,    40.39 tokens per second)
llama_print_timings:        eval time =   58068.68 ms /   478 runs   (  121.48 ms per token,     8.23 tokens per second)
llama_print_timings:       total time =   83097.96 ms /  1452 tokens


24/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      46.74 ms /   512 runs   (    0.09 ms per token, 10953.28 tokens per second)
llama_print_timings: prompt eval time =   21269.10 ms /   836 tokens (   25.44 ms per token,    39.31 tokens per second)
llama_print_timings:        eval time =   61154.44 ms /   511 runs   (  119.68 ms per token,     8.36 tokens per second)
llama_print_timings:       total time =   83394.46 ms /  1347 tokens


25/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      19.24 ms /   212 runs   (    0.09 ms per token, 11016.99 tokens per second)
llama_print_timings: prompt eval time =   19573.28 ms /   773 tokens (   25.32 ms per token,    39.49 tokens per second)
llama_print_timings:        eval time =   25523.92 ms /   211 runs   (  120.97 ms per token,     8.27 tokens per second)
llama_print_timings:       total time =   45492.19 ms /   984 tokens


26/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      33.80 ms /   361 runs   (    0.09 ms per token, 10679.53 tokens per second)
llama_print_timings: prompt eval time =   24251.47 ms /   727 tokens (   33.36 ms per token,    29.98 tokens per second)
llama_print_timings:        eval time =   43020.33 ms /   360 runs   (  119.50 ms per token,     8.37 tokens per second)
llama_print_timings:       total time =   67953.44 ms /  1087 tokens


27/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      41.76 ms /   452 runs   (    0.09 ms per token, 10822.98 tokens per second)
llama_print_timings: prompt eval time =   19693.87 ms /   705 tokens (   27.93 ms per token,    35.80 tokens per second)
llama_print_timings:        eval time =   54221.38 ms /   451 runs   (  120.22 ms per token,     8.32 tokens per second)
llama_print_timings:       total time =   74843.67 ms /  1156 tokens


28/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =       8.85 ms /    93 runs   (    0.10 ms per token, 10512.04 tokens per second)
llama_print_timings: prompt eval time =   17007.79 ms /   568 tokens (   29.94 ms per token,    33.40 tokens per second)
llama_print_timings:        eval time =   10343.17 ms /    92 runs   (  112.43 ms per token,     8.89 tokens per second)
llama_print_timings:       total time =   27524.88 ms /   660 tokens


29/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      27.41 ms /   303 runs   (    0.09 ms per token, 11052.34 tokens per second)
llama_print_timings: prompt eval time =   23984.11 ms /   911 tokens (   26.33 ms per token,    37.98 tokens per second)
llama_print_timings:        eval time =   37129.83 ms /   302 runs   (  122.95 ms per token,     8.13 tokens per second)
llama_print_timings:       total time =   61699.08 ms /  1213 tokens


30/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      30.02 ms /   327 runs   (    0.09 ms per token, 10893.83 tokens per second)
llama_print_timings: prompt eval time =   22984.08 ms /   891 tokens (   25.80 ms per token,    38.77 tokens per second)
llama_print_timings:        eval time =   39692.41 ms /   326 runs   (  121.76 ms per token,     8.21 tokens per second)
llama_print_timings:       total time =   63322.43 ms /  1217 tokens


31/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      29.57 ms /   325 runs   (    0.09 ms per token, 10989.75 tokens per second)
llama_print_timings: prompt eval time =    9647.88 ms /   292 tokens (   33.04 ms per token,    30.27 tokens per second)
llama_print_timings:        eval time =   36860.93 ms /   324 runs   (  113.77 ms per token,     8.79 tokens per second)
llama_print_timings:       total time =   47111.57 ms /   616 tokens


32/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      38.88 ms /   429 runs   (    0.09 ms per token, 11032.82 tokens per second)
llama_print_timings: prompt eval time =   21788.68 ms /   891 tokens (   24.45 ms per token,    40.89 tokens per second)
llama_print_timings:        eval time =   49291.27 ms /   428 runs   (  115.17 ms per token,     8.68 tokens per second)
llama_print_timings:       total time =   71901.86 ms /  1319 tokens


33/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      22.10 ms /   253 runs   (    0.09 ms per token, 11449.52 tokens per second)
llama_print_timings: prompt eval time =   51581.00 ms /  1963 tokens (   26.28 ms per token,    38.06 tokens per second)
llama_print_timings:        eval time =   31898.55 ms /   252 runs   (  126.58 ms per token,     7.90 tokens per second)
llama_print_timings:       total time =   83951.06 ms /  2215 tokens


34/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      39.47 ms /   436 runs   (    0.09 ms per token, 11045.24 tokens per second)
llama_print_timings: prompt eval time =   10626.18 ms /   410 tokens (   25.92 ms per token,    38.58 tokens per second)
llama_print_timings:        eval time =   50841.03 ms /   435 runs   (  116.88 ms per token,     8.56 tokens per second)
llama_print_timings:       total time =   62288.77 ms /   845 tokens


35/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      34.34 ms /   366 runs   (    0.09 ms per token, 10659.68 tokens per second)
llama_print_timings: prompt eval time =   17031.50 ms /   628 tokens (   27.12 ms per token,    36.87 tokens per second)
llama_print_timings:        eval time =   42514.50 ms /   365 runs   (  116.48 ms per token,     8.59 tokens per second)
llama_print_timings:       total time =   60227.78 ms /   993 tokens


36/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      11.45 ms /   124 runs   (    0.09 ms per token, 10826.86 tokens per second)
llama_print_timings: prompt eval time =   12416.26 ms /   438 tokens (   28.35 ms per token,    35.28 tokens per second)
llama_print_timings:        eval time =   13409.96 ms /   123 runs   (  109.02 ms per token,     9.17 tokens per second)
llama_print_timings:       total time =   26051.53 ms /   561 tokens


37/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      20.54 ms /   230 runs   (    0.09 ms per token, 11196.03 tokens per second)
llama_print_timings: prompt eval time =   21489.78 ms /   882 tokens (   24.36 ms per token,    41.04 tokens per second)
llama_print_timings:        eval time =   27239.86 ms /   229 runs   (  118.95 ms per token,     8.41 tokens per second)
llama_print_timings:       total time =   49141.84 ms /  1111 tokens


38/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      37.95 ms /   422 runs   (    0.09 ms per token, 11120.19 tokens per second)
llama_print_timings: prompt eval time =   18249.15 ms /   719 tokens (   25.38 ms per token,    39.40 tokens per second)
llama_print_timings:        eval time =   49776.21 ms /   421 runs   (  118.23 ms per token,     8.46 tokens per second)
llama_print_timings:       total time =   68840.86 ms /  1140 tokens


39/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      28.24 ms /   307 runs   (    0.09 ms per token, 10869.57 tokens per second)
llama_print_timings: prompt eval time =   23739.09 ms /   903 tokens (   26.29 ms per token,    38.04 tokens per second)
llama_print_timings:        eval time =   36520.49 ms /   306 runs   (  119.35 ms per token,     8.38 tokens per second)
llama_print_timings:       total time =   60845.47 ms /  1209 tokens


40/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      32.15 ms /   360 runs   (    0.09 ms per token, 11196.12 tokens per second)
llama_print_timings: prompt eval time =   57538.62 ms /  1945 tokens (   29.58 ms per token,    33.80 tokens per second)
llama_print_timings:        eval time =   48716.88 ms /   359 runs   (  135.70 ms per token,     7.37 tokens per second)
llama_print_timings:       total time =  106938.10 ms /  2304 tokens


41/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      26.98 ms /   299 runs   (    0.09 ms per token, 11080.64 tokens per second)
llama_print_timings: prompt eval time =   36753.66 ms /  1387 tokens (   26.50 ms per token,    37.74 tokens per second)
llama_print_timings:        eval time =   38571.16 ms /   298 runs   (  129.43 ms per token,     7.73 tokens per second)
llama_print_timings:       total time =   75899.53 ms /  1685 tokens


42/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      28.81 ms /   317 runs   (    0.09 ms per token, 11004.65 tokens per second)
llama_print_timings: prompt eval time =   22017.13 ms /   843 tokens (   26.12 ms per token,    38.29 tokens per second)
llama_print_timings:        eval time =   38137.66 ms /   316 runs   (  120.69 ms per token,     8.29 tokens per second)
llama_print_timings:       total time =   60764.55 ms /  1159 tokens


43/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      33.44 ms /   357 runs   (    0.09 ms per token, 10675.52 tokens per second)
llama_print_timings: prompt eval time =   17377.88 ms /   595 tokens (   29.21 ms per token,    34.24 tokens per second)
llama_print_timings:        eval time =   41442.46 ms /   356 runs   (  116.41 ms per token,     8.59 tokens per second)
llama_print_timings:       total time =   59531.88 ms /   951 tokens


44/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      27.94 ms /   311 runs   (    0.09 ms per token, 11130.60 tokens per second)
llama_print_timings: prompt eval time =   32470.98 ms /  1184 tokens (   27.42 ms per token,    36.46 tokens per second)
llama_print_timings:        eval time =   37020.84 ms /   310 runs   (  119.42 ms per token,     8.37 tokens per second)
llama_print_timings:       total time =   70058.10 ms /  1494 tokens


45/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      34.36 ms /   377 runs   (    0.09 ms per token, 10972.06 tokens per second)
llama_print_timings: prompt eval time =   28199.74 ms /  1074 tokens (   26.26 ms per token,    38.09 tokens per second)
llama_print_timings:        eval time =   43974.63 ms /   376 runs   (  116.95 ms per token,     8.55 tokens per second)
llama_print_timings:       total time =   72889.34 ms /  1450 tokens


46/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      31.13 ms /   358 runs   (    0.09 ms per token, 11500.90 tokens per second)
llama_print_timings: prompt eval time =   37419.44 ms /  1455 tokens (   25.72 ms per token,    38.88 tokens per second)
llama_print_timings:        eval time =   44331.07 ms /   357 runs   (  124.18 ms per token,     8.05 tokens per second)
llama_print_timings:       total time =   82414.31 ms /  1812 tokens


47/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      28.33 ms /   344 runs   (    0.08 ms per token, 12143.46 tokens per second)
llama_print_timings: prompt eval time =   38457.43 ms /  1529 tokens (   25.15 ms per token,    39.76 tokens per second)
llama_print_timings:        eval time =   42667.86 ms /   343 runs   (  124.40 ms per token,     8.04 tokens per second)
llama_print_timings:       total time =   81738.82 ms /  1872 tokens


48/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      28.87 ms /   322 runs   (    0.09 ms per token, 11151.90 tokens per second)
llama_print_timings: prompt eval time =   18591.59 ms /   694 tokens (   26.79 ms per token,    37.33 tokens per second)
llama_print_timings:        eval time =   36078.67 ms /   321 runs   (  112.39 ms per token,     8.90 tokens per second)
llama_print_timings:       total time =   55259.61 ms /  1015 tokens


49/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      23.61 ms /   282 runs   (    0.08 ms per token, 11945.10 tokens per second)
llama_print_timings: prompt eval time =   47284.60 ms /  1839 tokens (   25.71 ms per token,    38.89 tokens per second)
llama_print_timings:        eval time =   34915.13 ms /   281 runs   (  124.25 ms per token,     8.05 tokens per second)
llama_print_timings:       total time =   82705.56 ms /  2120 tokens


50/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      19.59 ms /   234 runs   (    0.08 ms per token, 11946.70 tokens per second)
llama_print_timings: prompt eval time =   16367.06 ms /   608 tokens (   26.92 ms per token,    37.15 tokens per second)
llama_print_timings:        eval time =   24861.93 ms /   233 runs   (  106.70 ms per token,     9.37 tokens per second)
llama_print_timings:       total time =   41635.08 ms /   841 tokens


51/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      35.32 ms /   419 runs   (    0.08 ms per token, 11864.31 tokens per second)
llama_print_timings: prompt eval time =   18400.54 ms /   718 tokens (   25.63 ms per token,    39.02 tokens per second)
llama_print_timings:        eval time =   46197.25 ms /   418 runs   (  110.52 ms per token,     9.05 tokens per second)
llama_print_timings:       total time =   65358.05 ms /  1136 tokens


52/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      28.79 ms /   342 runs   (    0.08 ms per token, 11880.78 tokens per second)
llama_print_timings: prompt eval time =   19944.66 ms /   818 tokens (   24.38 ms per token,    41.01 tokens per second)
llama_print_timings:        eval time =   38292.98 ms /   341 runs   (  112.30 ms per token,     8.91 tokens per second)
llama_print_timings:       total time =   58828.47 ms /  1159 tokens


53/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      19.72 ms /   227 runs   (    0.09 ms per token, 11512.32 tokens per second)
llama_print_timings: prompt eval time =   16295.34 ms /   606 tokens (   26.89 ms per token,    37.19 tokens per second)
llama_print_timings:        eval time =   23886.96 ms /   226 runs   (  105.69 ms per token,     9.46 tokens per second)
llama_print_timings:       total time =   40586.77 ms /   832 tokens


54/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      26.21 ms /   311 runs   (    0.08 ms per token, 11863.44 tokens per second)
llama_print_timings: prompt eval time =   30797.02 ms /  1189 tokens (   25.90 ms per token,    38.61 tokens per second)
llama_print_timings:        eval time =   36159.93 ms /   310 runs   (  116.64 ms per token,     8.57 tokens per second)
llama_print_timings:       total time =   67519.49 ms /  1499 tokens


55/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      36.89 ms /   426 runs   (    0.09 ms per token, 11548.16 tokens per second)
llama_print_timings: prompt eval time =   14278.76 ms /   532 tokens (   26.84 ms per token,    37.26 tokens per second)
llama_print_timings:        eval time =   45710.38 ms /   425 runs   (  107.55 ms per token,     9.30 tokens per second)
llama_print_timings:       total time =   60767.05 ms /   957 tokens


56/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      29.04 ms /   341 runs   (    0.09 ms per token, 11742.02 tokens per second)
llama_print_timings: prompt eval time =   30288.78 ms /  1176 tokens (   25.76 ms per token,    38.83 tokens per second)
llama_print_timings:        eval time =   39521.69 ms /   340 runs   (  116.24 ms per token,     8.60 tokens per second)
llama_print_timings:       total time =   70412.43 ms /  1516 tokens


57/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      37.70 ms /   434 runs   (    0.09 ms per token, 11510.41 tokens per second)
llama_print_timings: prompt eval time =   21482.03 ms /   881 tokens (   24.38 ms per token,    41.01 tokens per second)
llama_print_timings:        eval time =   49332.77 ms /   433 runs   (  113.93 ms per token,     8.78 tokens per second)
llama_print_timings:       total time =   71588.09 ms /  1314 tokens


58/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      37.94 ms /   432 runs   (    0.09 ms per token, 11387.00 tokens per second)
llama_print_timings: prompt eval time =   24700.83 ms /   977 tokens (   25.28 ms per token,    39.55 tokens per second)
llama_print_timings:        eval time =   49945.78 ms /   431 runs   (  115.88 ms per token,     8.63 tokens per second)
llama_print_timings:       total time =   75449.60 ms /  1408 tokens


59/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      27.82 ms /   319 runs   (    0.09 ms per token, 11465.33 tokens per second)
llama_print_timings: prompt eval time =   32694.41 ms /  1293 tokens (   25.29 ms per token,    39.55 tokens per second)
llama_print_timings:        eval time =   38197.41 ms /   318 runs   (  120.12 ms per token,     8.33 tokens per second)
llama_print_timings:       total time =   71473.43 ms /  1611 tokens


60/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      30.36 ms /   359 runs   (    0.08 ms per token, 11824.38 tokens per second)
llama_print_timings: prompt eval time =   29478.57 ms /  1136 tokens (   25.95 ms per token,    38.54 tokens per second)
llama_print_timings:        eval time =   41943.62 ms /   358 runs   (  117.16 ms per token,     8.54 tokens per second)
llama_print_timings:       total time =   72065.38 ms /  1494 tokens


61/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      31.89 ms /   341 runs   (    0.09 ms per token, 10692.34 tokens per second)
llama_print_timings: prompt eval time =   23277.68 ms /   863 tokens (   26.97 ms per token,    37.07 tokens per second)
llama_print_timings:        eval time =   64349.94 ms /   340 runs   (  189.26 ms per token,     5.28 tokens per second)
llama_print_timings:       total time =   88284.78 ms /  1203 tokens


62/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      15.38 ms /   178 runs   (    0.09 ms per token, 11576.48 tokens per second)
llama_print_timings: prompt eval time =   20445.21 ms /   764 tokens (   26.76 ms per token,    37.37 tokens per second)
llama_print_timings:        eval time =   20172.55 ms /   177 runs   (  113.97 ms per token,     8.77 tokens per second)
llama_print_timings:       total time =   40942.22 ms /   941 tokens


63/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      31.68 ms /   356 runs   (    0.09 ms per token, 11237.37 tokens per second)
llama_print_timings: prompt eval time =   20946.25 ms /   819 tokens (   25.58 ms per token,    39.10 tokens per second)
llama_print_timings:        eval time =   42222.86 ms /   355 runs   (  118.94 ms per token,     8.41 tokens per second)
llama_print_timings:       total time =   63793.21 ms /  1174 tokens


64/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      33.57 ms /   390 runs   (    0.09 ms per token, 11618.21 tokens per second)
llama_print_timings: prompt eval time =   21172.83 ms /   801 tokens (   26.43 ms per token,    37.83 tokens per second)
llama_print_timings:        eval time =   46093.09 ms /   389 runs   (  118.49 ms per token,     8.44 tokens per second)
llama_print_timings:       total time =   67995.99 ms /  1190 tokens


65/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      20.41 ms /   227 runs   (    0.09 ms per token, 11123.09 tokens per second)
llama_print_timings: prompt eval time =   17931.84 ms /   639 tokens (   28.06 ms per token,    35.63 tokens per second)
llama_print_timings:        eval time =   25589.90 ms /   226 runs   (  113.23 ms per token,     8.83 tokens per second)
llama_print_timings:       total time =   43939.45 ms /   865 tokens


66/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      33.95 ms /   391 runs   (    0.09 ms per token, 11515.24 tokens per second)
llama_print_timings: prompt eval time =   30509.53 ms /  1082 tokens (   28.20 ms per token,    35.46 tokens per second)
llama_print_timings:        eval time =   53610.90 ms /   390 runs   (  137.46 ms per token,     7.27 tokens per second)
llama_print_timings:       total time =   84853.90 ms /  1472 tokens


67/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      26.64 ms /   287 runs   (    0.09 ms per token, 10771.66 tokens per second)
llama_print_timings: prompt eval time =   20462.22 ms /   557 tokens (   36.74 ms per token,    27.22 tokens per second)
llama_print_timings:        eval time =   35544.36 ms /   286 runs   (  124.28 ms per token,     8.05 tokens per second)
llama_print_timings:       total time =   56548.10 ms /   843 tokens


68/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      31.00 ms /   364 runs   (    0.09 ms per token, 11741.18 tokens per second)
llama_print_timings: prompt eval time =   33579.18 ms /  1183 tokens (   28.38 ms per token,    35.23 tokens per second)
llama_print_timings:        eval time =   46871.18 ms /   363 runs   (  129.12 ms per token,     7.74 tokens per second)
llama_print_timings:       total time =   81112.03 ms /  1546 tokens


69/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      20.03 ms /   226 runs   (    0.09 ms per token, 11280.82 tokens per second)
llama_print_timings: prompt eval time =   17512.46 ms /   574 tokens (   30.51 ms per token,    32.78 tokens per second)
llama_print_timings:        eval time =   25966.27 ms /   225 runs   (  115.41 ms per token,     8.67 tokens per second)
llama_print_timings:       total time =   43905.62 ms /   799 tokens


70/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      22.64 ms /   256 runs   (    0.09 ms per token, 11306.92 tokens per second)
llama_print_timings: prompt eval time =   26691.61 ms /  1018 tokens (   26.22 ms per token,    38.14 tokens per second)
llama_print_timings:        eval time =   31204.39 ms /   255 runs   (  122.37 ms per token,     8.17 tokens per second)
llama_print_timings:       total time =   58388.11 ms /  1273 tokens


71/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      29.20 ms /   325 runs   (    0.09 ms per token, 11130.14 tokens per second)
llama_print_timings: prompt eval time =   20310.50 ms /   747 tokens (   27.19 ms per token,    36.78 tokens per second)
llama_print_timings:        eval time =   40703.04 ms /   324 runs   (  125.63 ms per token,     7.96 tokens per second)
llama_print_timings:       total time =   61624.83 ms /  1071 tokens


72/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      24.61 ms /   278 runs   (    0.09 ms per token, 11298.52 tokens per second)
llama_print_timings: prompt eval time =   25336.95 ms /   958 tokens (   26.45 ms per token,    37.81 tokens per second)
llama_print_timings:        eval time =   32264.65 ms /   277 runs   (  116.48 ms per token,     8.59 tokens per second)
llama_print_timings:       total time =   58102.82 ms /  1235 tokens


73/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      23.69 ms /   271 runs   (    0.09 ms per token, 11439.91 tokens per second)
llama_print_timings: prompt eval time =   23619.88 ms /   934 tokens (   25.29 ms per token,    39.54 tokens per second)
llama_print_timings:        eval time =   30777.69 ms /   270 runs   (  113.99 ms per token,     8.77 tokens per second)
llama_print_timings:       total time =   54864.02 ms /  1204 tokens


74/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      32.90 ms /   363 runs   (    0.09 ms per token, 11035.11 tokens per second)
llama_print_timings: prompt eval time =   26365.65 ms /  1034 tokens (   25.50 ms per token,    39.22 tokens per second)
llama_print_timings:        eval time =   41922.12 ms /   362 runs   (  115.81 ms per token,     8.64 tokens per second)
llama_print_timings:       total time =   68921.25 ms /  1396 tokens


75/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      34.67 ms /   401 runs   (    0.09 ms per token, 11565.86 tokens per second)
llama_print_timings: prompt eval time =   30291.12 ms /  1174 tokens (   25.80 ms per token,    38.76 tokens per second)
llama_print_timings:        eval time =   47312.38 ms /   400 runs   (  118.28 ms per token,     8.45 tokens per second)
llama_print_timings:       total time =   78303.59 ms /  1574 tokens


76/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      44.42 ms /   502 runs   (    0.09 ms per token, 11301.98 tokens per second)
llama_print_timings: prompt eval time =   29822.46 ms /  1106 tokens (   26.96 ms per token,    37.09 tokens per second)
llama_print_timings:        eval time =   58988.15 ms /   501 runs   (  117.74 ms per token,     8.49 tokens per second)
llama_print_timings:       total time =   89715.24 ms /  1607 tokens


77/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      30.77 ms /   337 runs   (    0.09 ms per token, 10952.94 tokens per second)
llama_print_timings: prompt eval time =   17854.94 ms /   652 tokens (   27.38 ms per token,    36.52 tokens per second)
llama_print_timings:        eval time =   37173.54 ms /   336 runs   (  110.64 ms per token,     9.04 tokens per second)
llama_print_timings:       total time =   55622.77 ms /   988 tokens


78/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      11.76 ms /   129 runs   (    0.09 ms per token, 10967.52 tokens per second)
llama_print_timings: prompt eval time =   17995.66 ms /   641 tokens (   28.07 ms per token,    35.62 tokens per second)
llama_print_timings:        eval time =   20051.98 ms /   128 runs   (  156.66 ms per token,     6.38 tokens per second)
llama_print_timings:       total time =   38269.57 ms /   769 tokens


79/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      30.38 ms /   321 runs   (    0.09 ms per token, 10567.21 tokens per second)
llama_print_timings: prompt eval time =   41283.01 ms /   904 tokens (   45.67 ms per token,    21.90 tokens per second)
llama_print_timings:        eval time =   68898.35 ms /   320 runs   (  215.31 ms per token,     4.64 tokens per second)
llama_print_timings:       total time =  110801.92 ms /  1224 tokens


80/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      17.54 ms /   186 runs   (    0.09 ms per token, 10604.33 tokens per second)
llama_print_timings: prompt eval time =   34823.98 ms /  1038 tokens (   33.55 ms per token,    29.81 tokens per second)
llama_print_timings:        eval time =   25597.27 ms /   185 runs   (  138.36 ms per token,     7.23 tokens per second)
llama_print_timings:       total time =   60763.31 ms /  1223 tokens


81/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      63.09 ms /   475 runs   (    0.13 ms per token,  7529.28 tokens per second)
llama_print_timings: prompt eval time =   29851.25 ms /   964 tokens (   30.97 ms per token,    32.29 tokens per second)
llama_print_timings:        eval time =  150495.64 ms /   474 runs   (  317.50 ms per token,     3.15 tokens per second)
llama_print_timings:       total time =  181487.49 ms /  1438 tokens


82/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      19.38 ms /   190 runs   (    0.10 ms per token,  9806.45 tokens per second)
llama_print_timings: prompt eval time =   18551.73 ms /   524 tokens (   35.40 ms per token,    28.25 tokens per second)
llama_print_timings:        eval time =   39115.75 ms /   189 runs   (  206.96 ms per token,     4.83 tokens per second)
llama_print_timings:       total time =   58075.96 ms /   713 tokens


83/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      22.55 ms /   234 runs   (    0.10 ms per token, 10379.24 tokens per second)
llama_print_timings: prompt eval time =   35760.23 ms /  1160 tokens (   30.83 ms per token,    32.44 tokens per second)
llama_print_timings:        eval time =   39308.37 ms /   233 runs   (  168.71 ms per token,     5.93 tokens per second)
llama_print_timings:       total time =   75527.43 ms /  1393 tokens


84/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      13.53 ms /   144 runs   (    0.09 ms per token, 10641.44 tokens per second)
llama_print_timings: prompt eval time =   21466.50 ms /   576 tokens (   37.27 ms per token,    26.83 tokens per second)
llama_print_timings:        eval time =   24373.40 ms /   143 runs   (  170.44 ms per token,     5.87 tokens per second)
llama_print_timings:       total time =   46106.96 ms /   719 tokens


85/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      26.99 ms /   263 runs   (    0.10 ms per token,  9743.99 tokens per second)
llama_print_timings: prompt eval time =   30475.40 ms /   939 tokens (   32.46 ms per token,    30.81 tokens per second)
llama_print_timings:        eval time =   58640.89 ms /   262 runs   (  223.82 ms per token,     4.47 tokens per second)
llama_print_timings:       total time =   89642.30 ms /  1201 tokens


86/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      16.05 ms /   162 runs   (    0.10 ms per token, 10094.72 tokens per second)
llama_print_timings: prompt eval time =   25214.51 ms /   538 tokens (   46.87 ms per token,    21.34 tokens per second)
llama_print_timings:        eval time =   24276.00 ms /   161 runs   (  150.78 ms per token,     6.63 tokens per second)
llama_print_timings:       total time =   49811.06 ms /   699 tokens


87/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      27.30 ms /   289 runs   (    0.09 ms per token, 10585.31 tokens per second)
llama_print_timings: prompt eval time =   45980.83 ms /  1052 tokens (   43.71 ms per token,    22.88 tokens per second)
llama_print_timings:        eval time =   72511.73 ms /   288 runs   (  251.78 ms per token,     3.97 tokens per second)
llama_print_timings:       total time =  119065.98 ms /  1340 tokens


88/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      31.98 ms /   335 runs   (    0.10 ms per token, 10475.62 tokens per second)
llama_print_timings: prompt eval time =   31947.51 ms /   843 tokens (   37.90 ms per token,    26.39 tokens per second)
llama_print_timings:        eval time =   59500.61 ms /   334 runs   (  178.15 ms per token,     5.61 tokens per second)
llama_print_timings:       total time =   92089.75 ms /  1177 tokens


89/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      23.70 ms /   217 runs   (    0.11 ms per token,  9156.50 tokens per second)
llama_print_timings: prompt eval time =   31250.14 ms /   861 tokens (   36.30 ms per token,    27.55 tokens per second)
llama_print_timings:        eval time =   51330.14 ms /   216 runs   (  237.64 ms per token,     4.21 tokens per second)
llama_print_timings:       total time =   83063.32 ms /  1077 tokens


90/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      27.91 ms /   312 runs   (    0.09 ms per token, 11179.99 tokens per second)
llama_print_timings: prompt eval time =   70443.13 ms /  1592 tokens (   44.25 ms per token,    22.60 tokens per second)
llama_print_timings:        eval time =   63373.04 ms /   311 runs   (  203.77 ms per token,     4.91 tokens per second)
llama_print_timings:       total time =  134600.50 ms /  1903 tokens


91/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      57.67 ms /   512 runs   (    0.11 ms per token,  8877.79 tokens per second)
llama_print_timings: prompt eval time =   29164.54 ms /   751 tokens (   38.83 ms per token,    25.75 tokens per second)
llama_print_timings:        eval time =  117928.31 ms /   511 runs   (  230.78 ms per token,     4.33 tokens per second)
llama_print_timings:       total time =  148234.13 ms /  1262 tokens


92/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      40.49 ms /   403 runs   (    0.10 ms per token,  9951.85 tokens per second)
llama_print_timings: prompt eval time =   25873.52 ms /   594 tokens (   43.56 ms per token,    22.96 tokens per second)
llama_print_timings:        eval time =   57381.07 ms /   402 runs   (  142.74 ms per token,     7.01 tokens per second)
llama_print_timings:       total time =   84045.11 ms /   996 tokens


93/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      23.05 ms /   253 runs   (    0.09 ms per token, 10978.04 tokens per second)
llama_print_timings: prompt eval time =   32663.24 ms /  1148 tokens (   28.45 ms per token,    35.15 tokens per second)
llama_print_timings:        eval time =   43695.97 ms /   252 runs   (  173.40 ms per token,     5.77 tokens per second)
llama_print_timings:       total time =   76900.51 ms /  1400 tokens


94/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      43.10 ms /   378 runs   (    0.11 ms per token,  8770.30 tokens per second)
llama_print_timings: prompt eval time =   53037.58 ms /  1406 tokens (   37.72 ms per token,    26.51 tokens per second)
llama_print_timings:        eval time =  154601.31 ms /   377 runs   (  410.08 ms per token,     2.44 tokens per second)
llama_print_timings:       total time =  208691.81 ms /  1783 tokens


95/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      24.42 ms /   286 runs   (    0.09 ms per token, 11712.67 tokens per second)
llama_print_timings: prompt eval time =   38706.03 ms /  1237 tokens (   31.29 ms per token,    31.96 tokens per second)
llama_print_timings:        eval time =   40253.19 ms /   285 runs   (  141.24 ms per token,     7.08 tokens per second)
llama_print_timings:       total time =   79473.01 ms /  1522 tokens


96/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      24.63 ms /   190 runs   (    0.13 ms per token,  7714.17 tokens per second)
llama_print_timings: prompt eval time =   26493.14 ms /  1015 tokens (   26.10 ms per token,    38.31 tokens per second)
llama_print_timings:        eval time =   30596.83 ms /   189 runs   (  161.89 ms per token,     6.18 tokens per second)
llama_print_timings:       total time =   57446.88 ms /  1204 tokens


97/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      30.28 ms /   342 runs   (    0.09 ms per token, 11293.09 tokens per second)
llama_print_timings: prompt eval time =   18810.84 ms /   613 tokens (   30.69 ms per token,    32.59 tokens per second)
llama_print_timings:        eval time =   44180.47 ms /   341 runs   (  129.56 ms per token,     7.72 tokens per second)
llama_print_timings:       total time =   63607.71 ms /   954 tokens


98/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      15.76 ms /   179 runs   (    0.09 ms per token, 11359.31 tokens per second)
llama_print_timings: prompt eval time =   24615.92 ms /   884 tokens (   27.85 ms per token,    35.91 tokens per second)
llama_print_timings:        eval time =   23035.27 ms /   178 runs   (  129.41 ms per token,     7.73 tokens per second)
llama_print_timings:       total time =   47952.33 ms /  1062 tokens


99/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      14.50 ms /   162 runs   (    0.09 ms per token, 11172.41 tokens per second)
llama_print_timings: prompt eval time =   21431.56 ms /   639 tokens (   33.54 ms per token,    29.82 tokens per second)
llama_print_timings:        eval time =   24866.06 ms /   161 runs   (  154.45 ms per token,     6.47 tokens per second)
llama_print_timings:       total time =   46660.48 ms /   800 tokens


100/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      38.59 ms /   449 runs   (    0.09 ms per token, 11636.65 tokens per second)
llama_print_timings: prompt eval time =   22950.72 ms /   782 tokens (   29.35 ms per token,    34.07 tokens per second)
llama_print_timings:        eval time =   60484.09 ms /   448 runs   (  135.01 ms per token,     7.41 tokens per second)
llama_print_timings:       total time =   84244.86 ms /  1230 tokens


101/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      27.55 ms /   305 runs   (    0.09 ms per token, 11070.38 tokens per second)
llama_print_timings: prompt eval time =   33193.71 ms /  1069 tokens (   31.05 ms per token,    32.20 tokens per second)
llama_print_timings:        eval time =   40499.97 ms /   304 runs   (  133.22 ms per token,     7.51 tokens per second)
llama_print_timings:       total time =   74226.05 ms /  1373 tokens


102/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      40.97 ms /   454 runs   (    0.09 ms per token, 11081.01 tokens per second)
llama_print_timings: prompt eval time =   29907.96 ms /   659 tokens (   45.38 ms per token,    22.03 tokens per second)
llama_print_timings:        eval time =   62309.57 ms /   453 runs   (  137.55 ms per token,     7.27 tokens per second)
llama_print_timings:       total time =   93132.25 ms /  1112 tokens


103/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      43.75 ms /   512 runs   (    0.09 ms per token, 11702.32 tokens per second)
llama_print_timings: prompt eval time =   50008.42 ms /  1563 tokens (   32.00 ms per token,    31.25 tokens per second)
llama_print_timings:        eval time =   72798.06 ms /   511 runs   (  142.46 ms per token,     7.02 tokens per second)
llama_print_timings:       total time =  123896.04 ms /  2074 tokens


104/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      14.66 ms /   172 runs   (    0.09 ms per token, 11729.41 tokens per second)
llama_print_timings: prompt eval time =   19719.17 ms /   791 tokens (   24.93 ms per token,    40.11 tokens per second)
llama_print_timings:        eval time =   19792.50 ms /   171 runs   (  115.75 ms per token,     8.64 tokens per second)
llama_print_timings:       total time =   39800.64 ms /   962 tokens


105/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      28.24 ms /   313 runs   (    0.09 ms per token, 11083.96 tokens per second)
llama_print_timings: prompt eval time =   30942.92 ms /  1248 tokens (   24.79 ms per token,    40.33 tokens per second)
llama_print_timings:        eval time =   39037.89 ms /   312 runs   (  125.12 ms per token,     7.99 tokens per second)
llama_print_timings:       total time =   70517.18 ms /  1560 tokens


106/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      27.36 ms /   307 runs   (    0.09 ms per token, 11221.58 tokens per second)
llama_print_timings: prompt eval time =   31070.87 ms /  1194 tokens (   26.02 ms per token,    38.43 tokens per second)
llama_print_timings:        eval time =   36949.15 ms /   306 runs   (  120.75 ms per token,     8.28 tokens per second)
llama_print_timings:       total time =   68557.39 ms /  1500 tokens


107/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      31.65 ms /   330 runs   (    0.10 ms per token, 10425.88 tokens per second)
llama_print_timings: prompt eval time =   17027.96 ms /   612 tokens (   27.82 ms per token,    35.94 tokens per second)
llama_print_timings:        eval time =   37794.68 ms /   329 runs   (  114.88 ms per token,     8.70 tokens per second)
llama_print_timings:       total time =   55406.57 ms /   941 tokens


108/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      23.73 ms /   263 runs   (    0.09 ms per token, 11080.68 tokens per second)
llama_print_timings: prompt eval time =   10512.28 ms /   410 tokens (   25.64 ms per token,    39.00 tokens per second)
llama_print_timings:        eval time =   31554.70 ms /   262 runs   (  120.44 ms per token,     8.30 tokens per second)
llama_print_timings:       total time =   42618.09 ms /   672 tokens


109/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      28.12 ms /   321 runs   (    0.09 ms per token, 11416.58 tokens per second)
llama_print_timings: prompt eval time =   11996.48 ms /   456 tokens (   26.31 ms per token,    38.01 tokens per second)
llama_print_timings:        eval time =   35105.85 ms /   320 runs   (  109.71 ms per token,     9.12 tokens per second)
llama_print_timings:       total time =   47668.30 ms /   776 tokens


110/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      44.81 ms /   512 runs   (    0.09 ms per token, 11426.28 tokens per second)
llama_print_timings: prompt eval time =   29086.83 ms /  1049 tokens (   27.73 ms per token,    36.06 tokens per second)
llama_print_timings:        eval time =   63096.14 ms /   511 runs   (  123.48 ms per token,     8.10 tokens per second)
llama_print_timings:       total time =   93116.11 ms /  1560 tokens


111/112 done


Llama.generate: prefix-match hit

llama_print_timings:        load time =   16833.79 ms
llama_print_timings:      sample time =      10.17 ms /   105 runs   (    0.10 ms per token, 10324.48 tokens per second)
llama_print_timings: prompt eval time =   12429.16 ms /   455 tokens (   27.32 ms per token,    36.61 tokens per second)
llama_print_timings:        eval time =   11836.94 ms /   104 runs   (  113.82 ms per token,     8.79 tokens per second)
llama_print_timings:       total time =   24450.93 ms /   559 tokens


112/112 done


In [50]:
len(df_test)

112

In [52]:
df_test['response'] = responses
df_test.to_csv("counsel-chat-best-answer-test-response.csv", index=False)