In [1]:
import pandas as pd

from langchain.embeddings.huggingface import HuggingFaceEmbeddings

from llama_cpp import Llama

from llama_index import (
    SimpleDirectoryReader,
    LangchainEmbedding,
    VectorStoreIndex,
    PromptHelper,
    LLMPredictor,
    ServiceContext,
    Document
)
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt

In [2]:
import llama_index
llama_index.__version__

'0.8.21'

In [3]:
llama_index.__file__

'/Users/rchan/Library/CloudStorage/OneDrive-TheAlanTuringInstitute/llama_index_proper/llama_index/__init__.py'

Note: notebook assumes that in the reginald directory, there is a `gguf_models/` folder. Here we've downloaded the quantized 4-bit version of Llama2-13b-chat from [`TheBloke/Llama-2-13B-chat-GGML`](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML). 

Note that we're currently running a version of `llama-cpp-python` which no longer supports `ggmmlv3` model formats and has changed to `gguf`. We need to convert the above to `gguf` format using the `convert-llama-ggmlv3-to-gguf.py` script in [`llama.cpp`](https://github.com/ggerganov/llama.cpp).

## Quick example with llama-cpp-python

In [4]:
llama_2_13b_chat_path = "../../gguf_models/llama-2-13b-chat.gguf.q4_K_S.bin"

## Using metal acceleration

In [5]:
llm = Llama(model_path=llama_2_13b_chat_path, n_gpu_layers=1)

llama_model_loader: loaded meta data with 18 key-value pairs and 363 tensors from ../../gguf_models/llama-2-13b-chat.gguf.q4_K_S.bin (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:               output_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:                    output.weight q6_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_q.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_k.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    5:              blk.0.attn_v.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    7:           blk.0.attn_norm.weight f32      [  5120

llama_new_context_with_model: compute buffer total size =   91.47 MB
llama_new_context_with_model: max tensor size =   128.17 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  7024.61 MB, ( 7025.05 / 21845.34)
ggml_metal_add_buffer: allocated 'eval            ' buffer, size =     1.48 MB, ( 7026.53 / 21845.34)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   402.00 MB, ( 7428.53 / 21845.34)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =    90.02 MB, ( 7518.55 / 21845.34)
AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | 


In [6]:
prompt_example = "Name all the planets in the solar system and state their distances to the sun"

In [7]:
output = llm(prompt_example,
             max_tokens=512,
             echo=True)


llama_print_timings:        load time =   610.65 ms
llama_print_timings:      sample time =   196.05 ms /   268 runs   (    0.73 ms per token,  1367.01 tokens per second)
llama_print_timings: prompt eval time =   610.63 ms /    17 tokens (   35.92 ms per token,    27.84 tokens per second)
llama_print_timings:        eval time = 14795.14 ms /   267 runs   (   55.41 ms per token,    18.05 tokens per second)
llama_print_timings:       total time = 15977.86 ms


In [8]:
print(output)

{'id': 'cmpl-0026dd42-1853-4a6c-8e46-5a6e71740986', 'object': 'text_completion', 'created': 1694121339, 'model': '../../gguf_models/llama-2-13b-chat.gguf.q4_K_S.bin', 'choices': [{'text': "Name all the planets in the solar system and state their distances to the sun.\n\nThere are eight planets in the solar system: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Here is a list of each planet along with its distance from the Sun (in astronomical units or AU):\n\n1. Mercury - 0.4 AU (very close to the Sun)\n2. Venus - 1.0 AU (just inside Earth's orbit)\n3. Earth - 1.0 AU (the distance from the Earth to the Sun is called an astronomical unit, or AU)\n4. Mars - 1.6 AU (about 1.5 times the distance from the Earth to the Sun)\n5. Jupiter - 5.2 AU (about 5 times the distance from the Earth to the Sun)\n6. Saturn - 9.5 AU (almost twice the distance from the Earth to the Sun)\n7. Uranus - 19.0 AU (about 4 times the distance from the Earth to the Sun)\n8. Neptune - 30.1 AU (mor

In [9]:
print(output["choices"][0]["text"])

Name all the planets in the solar system and state their distances to the sun.

There are eight planets in the solar system: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. Here is a list of each planet along with its distance from the Sun (in astronomical units or AU):

1. Mercury - 0.4 AU (very close to the Sun)
2. Venus - 1.0 AU (just inside Earth's orbit)
3. Earth - 1.0 AU (the distance from the Earth to the Sun is called an astronomical unit, or AU)
4. Mars - 1.6 AU (about 1.5 times the distance from the Earth to the Sun)
5. Jupiter - 5.2 AU (about 5 times the distance from the Earth to the Sun)
6. Saturn - 9.5 AU (almost twice the distance from the Earth to the Sun)
7. Uranus - 19.0 AU (about 4 times the distance from the Earth to the Sun)
8. Neptune - 30.1 AU (more than 3 times the distance from the Earth to the Sun)


## Using CPU

In [10]:
llm = Llama(model_path=llama_2_13b_chat_path)

llama_model_loader: loaded meta data with 18 key-value pairs and 363 tensors from ../../gguf_models/llama-2-13b-chat.gguf.q4_K_S.bin (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:               output_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:                    output.weight q6_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_q.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_k.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    5:              blk.0.attn_v.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    7:           blk.0.attn_norm.weight f32      [  5120

In [11]:
output = llm(prompt_example,
             max_tokens=512,
             echo=True)


llama_print_timings:        load time =  1480.25 ms
llama_print_timings:      sample time =   162.22 ms /   212 runs   (    0.77 ms per token,  1306.87 tokens per second)
llama_print_timings: prompt eval time =  1480.21 ms /    17 tokens (   87.07 ms per token,    11.48 tokens per second)
llama_print_timings:        eval time = 20115.90 ms /   211 runs   (   95.34 ms per token,    10.49 tokens per second)
llama_print_timings:       total time = 22063.41 ms


By inspection, we can see that the metal acceleration is faster as expected.

In [12]:
print(output["choices"][0]["text"])

Name all the planets in the solar system and state their distances to the sun.
1. Mercury - 58 million kilometers (36 million miles)
2. Venus - 108 million kilometers (67 million miles)
3. Earth - 149.6 million kilometers (92.96 million miles)
4. Mars - 225 million kilometers (140 million miles)
5. Jupiter - 778.3 million kilometers (483.8 million miles)
6. Saturn - 1.4 billion kilometers (870 million miles)
7. Uranus - 2.9 billion kilometers (1.8 billion miles)
8. Neptune - 4.5 billion kilometers (2.8 billion miles)

Note that the distance of each planet from the Sun is measured in terms of their average distance, as the orbits of the planets are not perfectly circular and the distances vary slightly over the course of a year.


## Using Llama2 in `llama-index`

In [13]:
llm = LlamaCPP(
    model_path="../../gguf_models/llama-2-13b-chat.gguf.q4_K_S.bin",
    temperature=0.1,
    max_new_tokens=1024,
    # llama2 has a context window of 4096 tokens,
    # but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 1},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

llama_model_loader: loaded meta data with 18 key-value pairs and 363 tensors from ../../gguf_models/llama-2-13b-chat.gguf.q4_K_S.bin (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q4_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    1:               output_norm.weight f32      [  5120,     1,     1,     1 ]
llama_model_loader: - tensor    2:                    output.weight q6_K     [  5120, 32000,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_q.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.attn_k.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    5:              blk.0.attn_v.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    6:         blk.0.attn_output.weight q4_K     [  5120,  5120,     1,     1 ]
llama_model_loader: - tensor    7:           blk.0.attn_norm.weight f32      [  5120

llama_new_context_with_model: compute buffer total size =  356.16 MB
llama_new_context_with_model: max tensor size =   128.17 MB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  7024.61 MB, ( 7025.11 / 21845.34)
ggml_metal_add_buffer: allocated 'eval            ' buffer, size =     1.48 MB, ( 7026.59 / 21845.34)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =  3048.88 MB, (10075.47 / 21845.34)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =   354.70 MBAVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | 
, (10430.17 / 21845.34)


In [14]:
llm.metadata

LLMMetadata(context_window=3900, num_output=1024, is_chat_model=False, is_function_calling_model=False, model_name='../../gguf_models/llama-2-13b-chat.gguf.q4_K_S.bin')

In [15]:
handbook = pd.read_csv("../../data/public/handbook-scraped.csv")
wiki = pd.read_csv("../../data/turing_internal/wiki-scraped.csv")
# turing = pd.read_csv("../../data/public/turingacuk-no-boilerplate.csv")

text_list = list(handbook["body"].astype("str")) + list(wiki["body"].astype("str"))
documents = [Document(text=t) for t in text_list]

In [16]:
hfemb = HuggingFaceEmbeddings()
embed_model = LangchainEmbedding(hfemb)

In [17]:
hfemb

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
), model_name='sentence-transformers/all-mpnet-base-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False)

In [18]:
# set number of output tokens
num_output = 1024
# set maximum input size
context_window = 4096
# set maximum chunk overlap
chunk_size_limit = 512
chunk_overlap_ratio = 0

prompt_helper = PromptHelper(
    context_window=context_window,
    num_output=num_output,
    chunk_size_limit=chunk_size_limit,
    chunk_overlap_ratio=chunk_overlap_ratio,
)

In [19]:
 service_context = ServiceContext.from_defaults(
    llm_predictor=LLMPredictor(llm=llm),
    embed_model=embed_model,
    prompt_helper=prompt_helper,
    chunk_size=chunk_size_limit,
)

index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

In [20]:
response_mode = "simple_summarize"

## "React" chat engine

In [21]:
chat_engine = index.as_chat_engine(chat_mode="react",
                                   verbose=True)

In [22]:
response = chat_engine.chat(
    "hello"
)
print(response)

[38;5;200m[1;3mThought: I need to use a tool to help me answer the question.
Action: query_engine_tool
Action Input: {'input': 'hello world'}
[0m


llama_print_timings:        load time =  7509.25 ms
llama_print_timings:      sample time =    42.94 ms /    59 runs   (    0.73 ms per token,  1374.17 tokens per second)
llama_print_timings: prompt eval time =  7509.19 ms /   447 tokens (   16.80 ms per token,    59.53 tokens per second)
llama_print_timings:        eval time =  3475.79 ms /    58 runs   (   59.93 ms per token,    16.69 tokens per second)
llama_print_timings:       total time = 11105.13 ms
Llama.generate: prefix-match hit


[36;1m[1;3mObservation:   Hello! As a helpful, respectful, and honest assistant, I'm here to assist you with any questions or requests you may have. Based on the context information provided, it seems like you are looking for information about the Turing Institute and its various communications channels and events.

To start, I can provide you with some general information about the Turing Institute and its activities. The Turing Institute is a research centre based in the UK that focuses on the development of algorithms and computational methods for solving complex problems in various fields, such as computer science, mathematics, and biology. The institute has a strong emphasis on interdisciplinary research and collaboration, and it hosts a variety of events and workshops to facilitate these interactions.

In terms of communications channels, the Turing Institute uses a variety of platforms to keep its members and collaborators informed about its activities and research progress. T


llama_print_timings:        load time =  7509.25 ms
llama_print_timings:      sample time =   244.08 ms /   339 runs   (    0.72 ms per token,  1388.92 tokens per second)
llama_print_timings: prompt eval time =  8954.56 ms /   537 tokens (   16.68 ms per token,    59.97 tokens per second)
llama_print_timings:        eval time = 21652.52 ms /   338 runs   (   64.06 ms per token,    15.61 tokens per second)
llama_print_timings:       total time = 31331.83 ms
Llama.generate: prefix-match hit


[38;5;200m[1;3mThought: I need to use a tool to help me answer the question.
Action: query_engine_tool
Action Input: {'input': 'what are the research areas of the Turing Institute?'}
[0m


llama_print_timings:        load time =  7509.25 ms
llama_print_timings:      sample time =    32.78 ms /    46 runs   (    0.71 ms per token,  1403.21 tokens per second)
llama_print_timings: prompt eval time = 15197.21 ms /   832 tokens (   18.27 ms per token,    54.75 tokens per second)
llama_print_timings:        eval time =  2972.41 ms /    45 runs   (   66.05 ms per token,    15.14 tokens per second)
llama_print_timings:       total time = 18262.37 ms
Llama.generate: prefix-match hit

llama_print_timings:        load time =  7509.25 ms
llama_print_timings:      sample time =   216.46 ms /   309 runs   (    0.70 ms per token,  1427.55 tokens per second)
llama_print_timings: prompt eval time = 11918.74 ms /   689 tokens (   17.30 ms per token,    57.81 tokens per second)
llama_print_timings:        eval time = 20660.28 ms /   308 runs   (   67.08 ms per token,    14.91 tokens per second)
llama_print_timings:       total time = 33190.48 ms
Llama.generate: prefix-match hit


[36;1m[1;3mObservation:   Based on the updated context, the research areas of the Turing Institute are focused on impact-driven research in data science and AI, with a particular emphasis on diversity and openness. The institute prioritizes projects that have realistic plans for meeting the priority of reproducible research and open source software development. Additionally, the institute values collaboration across academic, social, ethnic, and racial backgrounds, as well as cross-disciplinary and cross-cultural collaborations.

Some specific research areas that may be of interest to the Turing Institute include:

1. Data-driven approaches to addressing societal challenges, such as healthcare, education, and environmental sustainability.
2. Development of new AI technologies and techniques, such as machine learning, natural language processing, and computer vision.
3. Applications of data science and AI in various domains, such as finance, transportation, and culture.
4. Studies on 


llama_print_timings:        load time =  7509.25 ms
llama_print_timings:      sample time =   277.50 ms /   396 runs   (    0.70 ms per token,  1427.03 tokens per second)
llama_print_timings: prompt eval time = 11135.08 ms /   627 tokens (   17.76 ms per token,    56.31 tokens per second)
llama_print_timings:        eval time = 26738.70 ms /   395 runs   (   67.69 ms per token,    14.77 tokens per second)
llama_print_timings:       total time = 38670.27 ms
Llama.generate: prefix-match hit


[38;5;200m[1;3mResponse: The research areas of the Turing Institute include data-driven approaches to addressing societal challenges, development of new AI technologies and techniques, applications of data science and AI in various domains, studies on the ethical, social, and economic implications of data science and AI, and collaborative research projects that bring together diverse perspectives and expertise from academia, industry, and civil society.
[0mThe research areas of the Turing Institute include data-driven approaches to addressing societal challenges, development of new AI technologies and techniques, applications of data science and AI in various domains, studies on the ethical, social, and economic implications of data science and AI, and collaborative research projects that bring together diverse perspectives and expertise from academia, industry, and civil society.



llama_print_timings:        load time =  7509.25 ms
llama_print_timings:      sample time =    72.06 ms /   103 runs   (    0.70 ms per token,  1429.42 tokens per second)
llama_print_timings: prompt eval time = 27985.35 ms /  1286 tokens (   21.76 ms per token,    45.95 tokens per second)
llama_print_timings:        eval time =  7628.91 ms /   102 runs   (   74.79 ms per token,    13.37 tokens per second)
llama_print_timings:       total time = 35813.39 ms


In [23]:
response = chat_engine.chat("what should a new starter in the research engineering group (REG) at the Alan Turing Institute do?")
print(response)

Llama.generate: prefix-match hit


[38;5;200m[1;3mResponse:   As a new starter in the Research Engineering Group (REG) at the Alan Turing Institute, there are several things you can do to get started and make the most of your time here:

1. Familiarize yourself with the institute's research areas and ongoing projects. This will help you understand the scope of the work being done and how you can contribute.
2. Meet with your supervisor and other members of the REG to discuss your background, interests, and goals. They can provide valuable guidance and help you get settled into your new role.
3. Review the institute's policies and procedures to ensure you understand the expectations and requirements of your position.
4. Attend orientation sessions and training programs offered by the institute to learn more about the Turing Institute's culture, resources, and research practices.
5. Start exploring the tools and technologies used in the REG, such as query_engine_tool, to get a sense of the technical capabilities availab


llama_print_timings:        load time =  7509.25 ms
llama_print_timings:      sample time =   266.23 ms /   380 runs   (    0.70 ms per token,  1427.35 tokens per second)
llama_print_timings: prompt eval time =  2152.57 ms /   121 tokens (   17.79 ms per token,    56.21 tokens per second)
llama_print_timings:        eval time = 24885.95 ms /   379 runs   (   65.66 ms per token,    15.23 tokens per second)
llama_print_timings:       total time = 27796.96 ms


In [24]:
response = chat_engine.chat("Have you used the query engine yet?")
print(response)

Llama.generate: prefix-match hit


[38;5;200m[1;3mResponse:   No, I have not used the query engine yet. As a new starter in the Research Engineering Group at the Alan Turing Institute, I am still in the process of familiarizing myself with the tools and technologies available to me. However, I am eager to learn more about the query engine and how it can be used to support my research activities. Can you tell me more about the query engine and its capabilities?
[0m  No, I have not used the query engine yet. As a new starter in the Research Engineering Group at the Alan Turing Institute, I am still in the process of familiarizing myself with the tools and technologies available to me. However, I am eager to learn more about the query engine and how it can be used to support my research activities. Can you tell me more about the query engine and its capabilities?



llama_print_timings:        load time =  7509.25 ms
llama_print_timings:      sample time =    61.06 ms /    87 runs   (    0.70 ms per token,  1424.85 tokens per second)
llama_print_timings: prompt eval time =   570.47 ms /    20 tokens (   28.52 ms per token,    35.06 tokens per second)
llama_print_timings:        eval time =  5989.95 ms /    86 runs   (   69.65 ms per token,    14.36 tokens per second)
llama_print_timings:       total time =  6726.84 ms


In [25]:
response = chat_engine.chat("What have I asked you so far?")
print(response)

Llama.generate: prefix-match hit


[38;5;200m[1;3mResponse:   You have asked me the following questions so far:

1. What should a new starter in the Research Engineering Group (REG) at the Alan Turing Institute do?
2. Have you used the query engine yet?
[0m  You have asked me the following questions so far:

1. What should a new starter in the Research Engineering Group (REG) at the Alan Turing Institute do?
2. Have you used the query engine yet?



llama_print_timings:        load time =  7509.25 ms
llama_print_timings:      sample time =    34.37 ms /    49 runs   (    0.70 ms per token,  1425.79 tokens per second)
llama_print_timings: prompt eval time =   600.34 ms /    20 tokens (   30.02 ms per token,    33.31 tokens per second)
llama_print_timings:        eval time =  3407.13 ms /    48 runs   (   70.98 ms per token,    14.09 tokens per second)
llama_print_timings:       total time =  4101.47 ms


Reset chat engine...

In [26]:
chat_engine.reset()

In [27]:
response = chat_engine.chat("What did I ask you before?")
print(response)

Llama.generate: prefix-match hit


********** prompt: **********
 [INST] What did I ask you before? [/INST]
********** completion_response: **********
   I apologize, but I don't have the ability to remember previous conversations or keep track of what was asked. Each time you interact with me, it is a new and separate conversation. If you would like to ask something again, I will do my best to assist you.
********** chat: chat_response: **********
 assistant:   I apologize, but I don't have the ability to remember previous conversations or keep track of what was asked. Each time you interact with me, it is a new and separate conversation. If you would like to ask something again, I will do my best to assist you.
[38;5;200m[1;3mResponse:   I apologize, but I don't have the ability to remember previous conversations or keep track of what was asked. Each time you interact with me, it is a new and separate conversation. If you would like to ask something again, I will do my best to assist you.
[0m********** _process_act


llama_print_timings:        load time =   931.62 ms
llama_print_timings:      sample time =    42.12 ms /    60 runs   (    0.70 ms per token,  1424.43 tokens per second)
llama_print_timings: prompt eval time =   288.19 ms /    14 tokens (   20.59 ms per token,    48.58 tokens per second)
llama_print_timings:        eval time =  3228.61 ms /    59 runs   (   54.72 ms per token,    18.27 tokens per second)
llama_print_timings:       total time =  3630.86 ms


## React engine and asking it to use query

We saw that it didn't use the query engine in the above, but maybe we could force it to use it...

In [28]:
chat_engine = index.as_chat_engine(chat_mode="react",
                                   response_mode=response_mode,
                                   verbose=True)

In [29]:
response = chat_engine.chat(
    "Please use the query engine. What should a new starter in the research engineering group do?"
)
print(response)

********** prompt: **********
 [INST] Please use the query engine. What should a new starter in the research engineering group do? [/INST]


Llama.generate: prefix-match hit


********** completion_response: **********
   As a new starter in the research engineering group, there are several tasks and activities that you can focus on to get started with your role:

1. Familiarize yourself with the research environment:
	* Learn about the research projects that the group is currently working on, and the technologies and tools being used.
	* Understand the goals and objectives of the research projects, and how they align with the overall research strategy of the organization.
2. Set up your workspace:
	* Install any necessary software or tools required for your work.
	* Set up your computer and other equipment to ensure that you have a comfortable and productive work environment.
3. Meet with your supervisor and colleagues:
	* Schedule a meeting with your supervisor to discuss your role, responsibilities, and expectations.
	* Introduce yourself to your colleagues and learn about their roles and areas of expertise.
4. Familiarize yourself with the organization's


llama_print_timings:        load time =   981.41 ms
llama_print_timings:      sample time =   377.91 ms /   539 runs   (    0.70 ms per token,  1426.28 tokens per second)
llama_print_timings: prompt eval time =   307.84 ms /    23 tokens (   13.38 ms per token,    74.72 tokens per second)
llama_print_timings:        eval time = 31503.15 ms /   538 runs   (   58.56 ms per token,    17.08 tokens per second)
llama_print_timings:       total time = 32916.09 ms


In [30]:
response = chat_engine.chat(
    "I want to specifically know about a new starter in the REG team at the Turing institute"
)
print(response)

********** prompt: **********
 <s>[INST] <<SYS>>

You are designed to help with a variety of tasks, from answering questions     to providing summaries to other types of analyses.

## Tools
You have access to a wide variety of tools. You are responsible for using
the tools in any sequence you deem appropriate to complete the task at hand.
This may require breaking the task into subtasks and using different tools
to complete each subtask.

You have access to the following tools:
> Tool Name: query_engine_tool
Tool Description: Useful for running a natural language query
against a knowledge base and get back a natural language response.

Tool Args: {'title': 'DefaultToolFnSchema', 'description': 'Default tool function Schema.', 'type': 'object', 'properties': {'input': {'title': 'Input', 'type': 'string'}}, 'required': ['input']}


## Output Format
To answer the question, please use the following format.

```
Thought: I need to use a tool to help me answer the question.
Action: tool name

Llama.generate: prefix-match hit


********** completion_response: **********
   As a new starter in the Research Engineering Group (REG) at the Turing Institute, there are several tasks and activities that you can focus on to get started with your role:

1. Familiarize yourself with the Turing Institute's research environment:
	* Learn about the research projects that the REG is currently working on, and the technologies and tools being used.
	* Understand the goals and objectives of the research projects, and how they align with the overall research strategy of the Turing Institute.
2. Set up your workspace:
	* Install any necessary software or tools required for your work.
	* Set up your computer and other equipment to ensure that you have a comfortable and productive work environment.
3. Meet with your supervisor and REG colleagues:
	* Schedule a meeting with your supervisor to discuss your role, responsibilities, and expectations.
	* Introduce yourself to your REG colleagues and learn about their roles and areas of


llama_print_timings:        load time =   981.41 ms
llama_print_timings:      sample time =   403.86 ms /   577 runs   (    0.70 ms per token,  1428.72 tokens per second)
llama_print_timings: prompt eval time = 21240.71 ms /  1045 tokens (   20.33 ms per token,    49.20 tokens per second)
llama_print_timings:        eval time = 43054.91 ms /   576 runs   (   74.75 ms per token,    13.38 tokens per second)
llama_print_timings:       total time = 65498.11 ms
