<a href="https://colab.research.google.com/github/RakeshSharma21/Sessions_Notebook/blob/main/Defying_output_token_limit_of_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Installing the required library

In [1]:
!pip install langchain
!pip install langchain_core
!pip install langchain-anthropic

Collecting langchain
  Downloading langchain-0.1.16-py3-none-any.whl (817 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.32 (from langchain)
  Downloading langchain_community-0.0.34-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m47.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2.0,>=0.1.42 (from langchain)
  Downloading langchain_core-0.1.45-py3-none-any.whl (291 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m291.3/291.3 kB[0m [31m21.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-text-splitters<0.1,>=0.0.1 (from langchain)
  Downl

In [4]:
anthropic_key="put your key here"


### setting up anthropic claude-3-opus

In this demonstration, we will be utilizing the Anthropic Claude V3 Opus model, setting the maximum sample token limit to 100. This will illustrate how we can generate outputs exceeding the allowed token count.

In [5]:
from langchain_anthropic import ChatAnthropic
model = ChatAnthropic(model='claude-3-opus-20240229',anthropic_api_key=anthropic_key,max_tokens=100)

#### First chunk generation

In [7]:
first_output = model.invoke('write the pizza receipe')

In [9]:
first_output.content

"Sure, here's a basic recipe for making pizza:\n\nIngredients:\n\n- 1 lb pizza dough (homemade or store-bought)\n- 1/2 cup tomato sauce\n- 1 cup shredded mozzarella cheese\n- 1/4 cup grated Parmesan cheese\n- Toppings of your choice (e.g., pepperoni, mushrooms, olives, onions,"

Here you can see that the model abruptly stops at "onions" and does not provide the full recipe for baking pizza. Let's explore the reason behind this sudden halt.

In [10]:
first_output.response_metadata

{'id': 'msg_014QnaieyFEYZxNekNhktxb7',
 'model': 'claude-3-opus-20240229',
 'stop_reason': 'max_tokens',
 'stop_sequence': None,
 'usage': {'input_tokens': 12, 'output_tokens': 100}}

Here we can observe why the model stopped abruptly: it reached the specified maximum token limit.

#### Second chunk generation

To generate the next section of the pizza recipe, we can utilize the functionality of the chat model. By providing the history of the chat as context, the model can respond accordingly. In this example, we will input the initial part of the recipe to help the model generate the subsequent portion. Let’s give it a try.

In [11]:
second_output = model.invoke('write the pizza receipe '+first_output.content)

In [12]:
second_output

AIMessage(content='bell peppers)\n- 1 tsp olive oil\n- 1/2 tsp dried oregano\n- Salt and pepper to taste\n\nInstructions:\n\n1. Preheat your oven to 450°F (230°C).\n\n2. If using store-bought dough, let it come to room temperature. If making homemade dough, prepare it according to the recipe.\n\n3. Stretch or roll out the pizza dough', response_metadata={'id': 'msg_01G39owdfNeQmiqWhSR3VUfy', 'model': 'claude-3-opus-20240229', 'stop_reason': 'max_tokens', 'stop_sequence': None, 'usage': {'input_tokens': 112, 'output_tokens': 100}}, id='run-11d47821-f943-447f-af21-4db047ee6f3c-0')

It appears to make sense selecting the next word immediately after "onion." Let's see how it looks when we combine the first and second outputs.

In [14]:
print(first_output.content+ second_output.content)

Sure, here's a basic recipe for making pizza:

Ingredients:

- 1 lb pizza dough (homemade or store-bought)
- 1/2 cup tomato sauce
- 1 cup shredded mozzarella cheese
- 1/4 cup grated Parmesan cheese
- Toppings of your choice (e.g., pepperoni, mushrooms, olives, onions,bell peppers)
- 1 tsp olive oil
- 1/2 tsp dried oregano
- Salt and pepper to taste

Instructions:

1. Preheat your oven to 450°F (230°C).

2. If using store-bought dough, let it come to room temperature. If making homemade dough, prepare it according to the recipe.

3. Stretch or roll out the pizza dough


this logic is making complete sense. Lets convert it to the algorithm

#### Algorithm

```
# Input → Prompt + Input
# First LLM → First generation *Input
# Second LLM → Next generation starting from here: *First LLM output
# Third LLM → Next generation from *Input starting from here: *First LLM output + *Second LLM output
# Fourth LLM → Next generation from *Input starting from here: *First LLM output + *Second LLM output + *Third LLM output
# ...
# Nth LLM → Next generation from *Input starting from here: *First LLM output + *Second LLM output + ... + *(N-1)th LLM output

# Repeat steps 4 to N until the LLM starts repeating similar generations from the start.

# Merge node: Combine all the generated outputs from the LLMs.

```



### Code to handle to max token limit

following is the simple implementation of above algorithm

In [18]:
import difflib

def generate_output(prompt):
    # Mock-up function to simulate LLM response; replace with actual LLM invocation
    output= model.invoke(prompt)
    return output.content

def generate_generations(initial_prompt, max_generations=5):
    all_outputs = []
    current_input = initial_prompt

    for _ in range(max_generations):
      print('iteration ',_)
      print(current_input)
      output = generate_output(current_input)
      all_outputs.append(output)
      current_input += " " + output

    return " ".join(all_outputs)

# Usage example
initial_prompt = "write the pizza receipe"
final_output = generate_generations(initial_prompt)
print(final_output)


iteration  0
write the pizza receipe
iteration  1
write the pizza receipe Here's a basic recipe for making pizza:

Ingredients:

1. Pizza dough (homemade or store-bought)
2. Tomato sauce
3. Mozzarella cheese, shredded
4. Toppings of your choice (e.g., pepperoni, mushrooms, onions, bell peppers, olives)
5. Olive oil
6. Salt and pepper
7. Dried oregano (
iteration  2
write the pizza receipe Here's a basic recipe for making pizza:

Ingredients:

1. Pizza dough (homemade or store-bought)
2. Tomato sauce
3. Mozzarella cheese, shredded
4. Toppings of your choice (e.g., pepperoni, mushrooms, onions, bell peppers, olives)
5. Olive oil
6. Salt and pepper
7. Dried oregano ( optional)

Instructions:

1. Preheat your oven to the highest temperature setting, usually between 450°F to 500°F (230°C to 260°C). Place a pizza stone or a large baking sheet in the oven while it heats up.

2. Roll out the pizza dough on a lightly floured surface to your desired thickness and shape.

3. Transfer the dough on

After reviewing the output, it appears that the model starts repeating itself after the fourth iteration. To obtain a complete solution, we should stop the generation when it begins to repeat.

#### Code with the stopping condition

Let's introduce a new stopping criterion: halt the generation when the new output is more than 80% similar to any previous output produced by the model. Let's go ahead and implement this.

In [24]:
import difflib

def generate_output(prompt,all_outputs):
    # Mock-up function to simulate LLM response; replace with actual LLM invocation
    if len(all_outputs)==0:
      output= model.invoke(prompt)
    else:
      output= model.invoke(f"write the next chunk of pizza receipe based on the {' '.join(all_outputs)}")
    return output.content

def is_similar(text1, text2, threshold=0.8):
    # Simple similarity check based on sequence matching
    similarity = difflib.SequenceMatcher(None, text1, text2).ratio()
    print(similarity)
    return similarity > threshold

def generate_generations(initial_prompt, max_generations=5):
    all_outputs = []
    current_input = initial_prompt

    for _ in range(max_generations):
        output = generate_output(current_input,all_outputs)
        if any(is_similar(output, previous_output) for previous_output in all_outputs):
            print("Stopping generation due to similar output.")
            break
        all_outputs.append(output)
        current_input += " " + output

    return " ".join(all_outputs)

# Usage example
initial_prompt = "write the pizza receipe."
final_output = generate_generations(initial_prompt)
print(final_output)


0.12863070539419086
0.09739130434782609
0.24880382775119617
0.029574861367837338
0.003372681281618887
0.043731778425655975
0.03577817531305903
0.029459901800327332
0.0625
0.041791044776119404
Here's a basic recipe for making pizza:

Ingredients:

- 1 1/2 cups (190g) all-purpose flour
- 1 tsp (5g) instant yeast
- 1 tsp (5g) sugar
- 3/4 tsp (4g) salt
- 2 tbsp (30ml) olive oil
- 1/2 cup (120ml) warm water
- Here's the next chunk of the pizza recipe:

 1/2 cup (120ml) tomato sauce
- 1 cup (100g) shredded mozzarella cheese
- 1/4 cup (25g) grated Parmesan cheese
- Toppings of your choice (e.g., pepperoni, mushrooms, onions, bell peppers)

Instructions:

1. In a large bowl, Here's the next chunk of the pizza recipe:

Instructions:

1. In a large bowl, mix the flour, instant yeast, sugar, and salt.

2. Add the olive oil and warm water to the dry ingredients. Mix until a dough forms.

3. Knead the dough on a lightly floured surface for about 5-7 minutes or until it becomes smooth and elastic.



This looks much better and more complete now.