# Algorithms Prompting with OLMo

At University of Washington, SSEC, we are very fortunate to have an in-house group of software engineers,


Who have came up with a very interesting set of questions for us to try out with OLMo:

* What is best method for multiplying large numbers?

Let's analyze how OLMo does without any additional context with these questions or your own specific domain questions.

## Set up the OLMo Model and Prompt

We'll begin with a recap of the previous module, setting up the OLMo model and prompt.

In [4]:
from langchain_community.llms import LlamaCpp
from langchain_core.callbacks import StreamingStdOutCallbackHandler
from ssec_tutorials import download_olmo_model

This time during the model setup, we'll try to increase the `n_ctx`, input context length, to `2048` tokens and the `max_tokens`, maximum tokens generated by the model, to `512` tokens.
This is so later we can really expand on the questions that we ask the model and get a more expansive answer.

In [5]:
olmo = LlamaCpp(
    model_path=str(OLMO_MODEL),
    callbacks=[StreamingStdOutCallbackHandler()],
    temperature=0.8,
    verbose=False,
    n_ctx=2048,
    max_tokens=512,
)

Now that we have the model ready, let's setup the prompt template like before,
using the internal chat template.

In [6]:
from langchain_core.prompts import PromptTemplate

In [7]:
# Create a prompt template using OLMo's tokenizer chat template we saw in module 1.
prompt_template = PromptTemplate.from_template(
    template=olmo.client.metadata["tokenizer.chat_template"],
    template_format="jinja2",
    partial_variables={"add_generation_prompt": True, "eos_token": "<|endoftext|>"},
)

We again use the partial variables
here to fill out the `add_generation_prompt` and `eos_token` fields.
So that we're left with just the `messages` input variables.

In [8]:
prompt_template.input_variables

['messages']

We have the prompt template ready, let's move on to the next step, and create a prompt for the model.
For simplicity of this tutorial, we'll only use one message, `user` input to the model.
This means we'll only ask the model a single question at a time,
rather than a series of questions that can feed of each other.

In [9]:
import textwrap  # a module to wrap text to make it more readable

Just like before, we'll start by checking out what our full prompt text is going to look like.
In this example, we've also used a handy built-in python module called textwrap to wrap the text to a certain width. We are using this to dedent the extra spaces to make it look cleaner.

In [10]:
# Test the prompt you want to send to OLMo.
question = "What is the best method for multiplying large numbers?"
input_content = textwrap.dedent(
    f"""\
    You are an algorithms expert. Please answer the following question on algorithms.
    Question: {question}
"""
)
input_messages = [
    {
        "role": "user",
        "content": input_content,
    }
]

full_prompt_text = prompt_template.format(messages=input_messages)

In [11]:
print(full_prompt_text)

<|endoftext|>

<|user|>
You are an algorithms expert. Please answer the following question on algorithms.
Question: What is the best method for multiplying large numbers?



<|assistant|>




Our prompt looks good. Let's now make a chain and invoke it.

In [12]:
# Chain the prompt template and olmo
llm_chain = prompt_template | olmo

In [13]:
# Invoke the chain with a question and other parameters.
captured_answer = llm_chain.invoke({"messages": input_messages})

 The best method for multiplying large numbers is using Multi-precision Computation techniques, such as Miller-Rabin primality test or its variations. These techniques are more efficient than the basic method of repeating subtraction (also known as binary or long hand multiplication) while ensuring greater accuracy and scalability.

The Miller-Rabin primality test is an iterative approach that uses probabilistic arguments to determine if a number is prime, in contrast to linear or exponential time algorithms such as Sieve of Eratosthenes or trial division which can definitely tell whether a given prime number is prime or not, but are less efficient for large numbers.

The Miller-Rabin primality test works by repeatedly applying an ad hoc base function (the "rabbits" method) to a randomly generated base number and the original number to be factored. If it returns a negative value after a specific number of iterations, the probability that the original number is not a prime number increa

KeyboardInterrupt: 

Great! At this point we have reviewed essentially module 1.
But to ask different questions, we'll need a way to pass in different questions to the chain.
We know that we can just create new values for `question`, `input_content`, and `input_messages` variables,
but that's a lot of work and formatting to do every time we want to ask a new question.
So what can we do?

## Partial prompt

We will now introduce a new concept called [partial formatting](https://python.langchain.com/v0.2/docs/how_to/prompts_partial/).
By using this feature, we can expand the input variables to be ones that we can easily change and pass in new values to.
Essentially, we are creating a new prompt template from the underlying model template.

We've seen this feature in module 1 and above with the use of `partial_variables` in the model setup.
This time, since we know that we're only using one message,
we can simplify the prompt template to take variables `question` and `instruction`.

First, let's create a simple prompt template string that takes in the variables we want to pass in.

In [14]:
input_prompt_template = textwrap.dedent(
    """\
{instruction}

Question: {question}
"""
)

Notice that the above prompt template string is NOT an f-string, but rather the simple string, like the ones you've created in module 1.

Now that we have the prompt template string ready, let's create a partial formatting from it.
Remember that `prompt_template` is a String PromptTemplate object that contains the original jinja-2 template string with the variables `add_generation_prompt` and `eos_token` filled in. The only variable left is `messages`, which we will create a partial formatting with.

In [15]:
prompt_template

PromptTemplate(input_variables=['messages'], partial_variables={'add_generation_prompt': True, 'eos_token': '<|endoftext|>'}, template="{{ eos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}", template_format='jinja2')

In [16]:
partial_prompt_template = prompt_template.partial(
    messages=[
        {
            "role": "user",
            "content": input_prompt_template,
        }
    ]
)

In [17]:
partial_prompt_template

PromptTemplate(input_variables=[], partial_variables={'add_generation_prompt': True, 'eos_token': '<|endoftext|>', 'messages': [{'role': 'user', 'content': '{instruction}\n\nQuestion: {question}\n'}]}, template="{{ eos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}", template_format='jinja2')

As you can see above, the partial formatting is simply filling in the variables `messages` and now we're left with no `input_variables`. So at this point, how can we create a new prompt template from this?

The answer is pretty straightforward. Let's just call the `.format` and get the "final" prompt template string.

In [18]:
new_prompt_string = partial_prompt_template.format()

In [19]:
new_prompt_string

'<|endoftext|>\n\n<|user|>\n{instruction}\n\nQuestion: {question}\n\n\n\n<|assistant|>\n\n'

Now we have a simple prompt string that we can create a String PromptTemplate from.

In [20]:
new_prompt_template = PromptTemplate.from_template(new_prompt_string)

In [21]:
new_prompt_template

PromptTemplate(input_variables=['instruction', 'question'], template='<|endoftext|>\n\n<|user|>\n{instruction}\n\nQuestion: {question}\n\n\n\n<|assistant|>\n\n')

You can see now that the new prompt template takes in `instruction` and `question`. Let's create a new chain and invoke it with this new prompt template.

## Q&A Session with OLMo

We'll first create a single domain instruction, since we know that we're asking questions about astrophysics.

In [22]:
domain_instruction = "You are an algorithms expert. Please answer the following question on algorithms."

In [23]:
question = "What is the best way to generate the Fibonacci sequence ?"

In [24]:
llm_chain = new_prompt_template.partial(instruction=domain_instruction) | olmo

In [25]:
llm_chain.invoke({"question": question})

The best way to generate the Fibonacci sequence is by using a recursive algorithm, which involves two variables (a and b) and their respective values at each iteration of the loop:

1. Set initial values for a and b as 0 and 1, respectively.
2. Loop until either a fixed point (such as a limit on the number of

KeyboardInterrupt: 

### Your Turn 😎

You have two options:

1. Use the questions provided at the beginning of this notebook and reuse the llm chain to ask questions about astrophysics.
2. With the new prompt template `new_prompt_template` and the `olmo` model. Create a new chain with a different domain instruction, and ask questions about that domain.

Feel free to ask any questions you like, and see how OLMo responds to them! If you're open to sharing, we'd love to hear about the questions you asked and the responses you received in the etherpad at https://etherpad.wikimedia.org/p/ipvVZZVxeP2JhpPxE4j6.

In [24]:
# Write your code here