In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
from tqdm import tqdm

In [24]:
# give instruction to llm on how to solve the problem
template = """Question: {question}
Answer: Let's think step by step."""

prompt = ChatPromptTemplate.from_template(template)

model = 'vanilj/Phi-4'
llm = OllamaLLM(model=model, device='gpu')

chain = prompt | llm

In [41]:
strategy_prompt = """To solve the New York Times connections puzzle effectively, follow these steps:

### Strategy

1. **Read Through All Words**: Start by reading all 16 words carefully to get a general sense of their meanings and any obvious connections.

2. **Group Similar Themes**: Look for clusters of words that might share common themes or categories (e.g., animals, tools, emotions).

3. **Identify Patterns**: Consider different types of connections such as:
    * Synonyms (words with similar meanings)
    * Antonyms (words with opposite meanings)
    * Part-to-whole relationships (e.g., branch, leaf, root, tree)
    * Categorical links (words belonging to the same category)
    * Figurative language (e.g.,  time flies,  a broken heart) 

4. **Test Hypotheses**: Formulate potential groups and see if they fit logically together. If a group doesn’t work, reassess the words and try a different approach.

5. **Iterate**: Continue grouping until all words are categorized into four distinct sets of four words each.

### Constraints

*   Exactly four groups must be formed.
*   Each group must contain exactly four words.
*   Each word must be used once and only once.

### Examples

* **Example 1:** "Happy," "Joyful," "Cheerful," "Merry" 
    * **Connection:** These are all synonyms for feeling good.
* **Example 2:** "Hammer," "Screwdriver," "Wrench," "Banana"
    * **Connection:** Three are tools, but "banana" is a fruit and doesn't belong.
* **Example 3:** "Wheel," "Engine," "Door," "Car"
    * **Connection:** These are all parts of a car.
* **Example 4:** "Time flies," "A broken heart," "Shattered dreams," "A heavy burden"
    * **Connection:** These are all figures of speech.

By applying this strategy, you can systematically approach the puzzle and find meaningful connections among the words.

Format your final answer under the heading '#### Final Groups'."""

In [None]:
def generate_solution():
    return chain.invoke({'question': strategy_prompt + "the 16 words are 'joker', 'dice', 'chance', 'casino', 'octopus', 'slot machine', 'cube', 'gamble', 'slice', 'bet', 'shiva', 'julienne', 'heat', 'venus de milo', 'risk', 'taxi driver'", 'answer': ''})

llm_solution = generate_solution()
print(llm_solution)
final_groups = llm_solution[llm_solution.index('### Final Groups') + len('### Final Groups'):].strip()
print(final_groups)

print('-----------------------')    

def critic(llm_solution):
    return chain.invoke({'question': """Evaluate the following solution to the NYT connections puzzle. Ensure there is exactly 4 words in each group.
                          Which word/words do not fit into their groups neatly and have a better connection to another group? Which
                         grouping are you most confident in?""" + llm_solution, 'answer': ''})

critic_eval = critic(llm_solution)
print(critic_eval)

-----------------------
To solve this New York Times connections puzzle, we need to group the 16 given words into four sets of four words each, based on their relationships or categories.

### Step-by-Step Analysis:

1. **Read Through All Words**: 
   - Joker
   - Dice
   - Chance
   - Casino
   - Octopus
   - Slot machine
   - Cube
   - Gamble
   - Slice
   - Bet
   - Shiva
   - Julienne
   - Heat
   - Venus de Milo
   - Risk
   - Taxi driver

2. **Group Similar Themes**:

   **Gambling Theme:**
   - Joker (common card in gambling)
   - Dice (used in games of chance)
   - Chance (related to probability, often used in gambling contexts)
   - Casino (place where gambling occurs)
   - Slot machine (a type of casino game)
   - Cube (can refer to dice, which are cubes)
   - Gamble (the act of betting or risking something)
   - Bet (placing a wager)
   - Risk (associated with uncertain outcomes in gambling)

   **Food Preparation Theme:**
   - Octopus (seafood often prepared by slicing or j

In [25]:
word_list = ['joker', 'dice', 'chance', 'casino', 'octopus', 'slot machine', 'cube', 'gamble', 'slice', 'bet', 'shiva', 'julienne', 'heat', 'venus de milo', 'risk', 'taxi driver']

In [27]:
templates = ['words that can be synonymous adjectives with each other.', 
             'words that share a pop culture reference.', 
             'words that are each followed by the same word or phrase.']

for template in templates:
    chain = prompt | llm
    output = chain.invoke({"question": '''Given the word list''' + ' ,'.join(word_list) + ''', group the words into 
                           4 categories of 4 words each based on the following criteria: find ''' + template})
    print(output)

To group these words into four categories of synonyms, we need to identify sets of words that are related in meaning or concept as adjectives. Here’s how they might be categorized:

1. **Gambling-related Adjectives**:
   - Joker
   - Dice
   - Slot machine
   - Gamble

2. **Risk and Chance-related Adjectives**:
   - Chance
   - Casino (as a place associated with risk)
   - Risk
   - Bet

3. **Precision or Shape-related Adjectives**:
   - Cube
   - Slice
   - Julienne
   - Venus de Milo (referring to the sculpture's shape/form)

4. **Occupation/Role-related Adjectives**:
   - Taxi driver (as a role)
   - Shiva (a reference, potentially as an adjective for something related to this deity or concept of destruction/rebirth in Hinduism)
   - Heat (possibly referring to high-pressure situations, metaphorically like a driver's environment)
   - Octopus (symbolic of adaptability and multi-tasking, roles)

These groupings are based on thematic connections and potential synonymy as adjectives.
T

In [None]:
# re-evaluate the output
discernment = chain.invoke({f"question": '''Evaluate the output of the previous llms word groupings. Point out words/n
                       that dont seem to match {output}'''})
print(output)

Certainly! To evaluate the output of a language model's (LLM) word groupings, you can follow these steps:

1. **Understand the Task**: Determine what criteria or theme was used for grouping words initially. This could be based on semantics, syntax, phonetics, etc.

2. **Review Groupings**: Examine each group of words to see if they adhere to the identified criteria or theme.

3. **Identify Outliers**:
   - Look for words that don't fit the pattern established by other words in the same group.
   - Consider aspects like meaning, usage, grammatical role, or sound when evaluating each word's fit within its group.

4. **Provide Examples**: Highlight specific words that seem out of place and explain why they don’t match with their peers in the grouping.

5. **Suggest Adjustments**:
   - Propose a reorganization of groups where necessary.
   - Consider creating new groups for misfit words or redistributing them among existing ones.

6. **Check Consistency**: Ensure that all groups are consis

In [22]:
# eval
with open('datasets/history.json', 'r') as f:
    dataset = f.read()

def eval(output, dataset):
    # evaluate the output
    


[


 
 
 
 
{


 
 
 
 
 
 
 
 
"
i
d
"
:
 
1
,


 
 
 
 
 
 
 
 
"
d
a
t
e
"
:
 
"
2
0
2
3
-
0
6
-
1
2
"
,


 
 
 
 
 
 
 
 
"
a
n
s
w
e
r
s
"
:
 
[


 
 
 
 
 
 
 
 
 
 
 
 
{


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
l
e
v
e
l
"
:
 
0
,


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
g
r
o
u
p
"
:
 
"
W
E
T
 
W
E
A
T
H
E
R
"
,


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
m
e
m
b
e
r
s
"
:
 
[


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
H
A
I
L
"
,


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
R
A
I
N
"
,


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
S
L
E
E
T
"
,


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
S
N
O
W
"


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
]


 
 
 
 
 
 
 
 
 
 
 
 
}
,


 
 
 
 
 
 
 
 
 
 
 
 
{


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
l
e
v
e
l
"
:
 
1
,


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
g
r
o
u
p
"
:
 
"
N
B
A
 
T
E
A
M
S
"
,


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
m
e
m
b
e
r
s
"
:
 
[


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
B
U
C
K
S
"
,


 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
"
H
E
A
T
"
,


 
 


KeyboardInterrupt: 