**Notas generales.**

Modelos:

Tanto gpt-oss como qwen hacen uso de la estrategia de generación en formato harmony. Es decir, primero tienen una sección interna de razonamiento (un tanto esquizofrénico) PREVIO a la respuesta global. Entre más ambiguo sea el prompt más larga es esta sección, por eso prompts como `prompt4` tienen tan buenos resultados. El modelo es capaz de inferir que se le está pasando un ejemplo y logra seguir el formato de buena manera.

Lo malo de esto es que se tienen que tener más de 1000 tokens de generación ANTES de que el modelo produzca los resultados finales. Gpt-oss ya da resultados buenos en 2000 max_new_tokens y muchas veces no llega al límite de generación. Qwen usa una mayor cantidad de tokens, las primeras respuestas que salen son apartir de x > 4300~4600 tokens.

In [1]:
import torch, re
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
dev = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(dev)
print(torch.cuda.get_device_name())

cuda
NVIDIA RTX 5000 Ada Generation


### Logic Prompts (Igual para todos los checkpoints)

In [2]:
prompt = """
    Translate the following premises from natural language to first order logic. 
    PREMISES:
    All eels are fish.
    No fish are plants.
    Everything displayed in the collection is either a plant or an animal.
    All multicellular animals are not bacteria.
    All animals displayed in the collection are multicellular.
    A sea eel is displayed in the collection.
    The sea eel is an eel or an animal or not a plant.
"""

prompt2 = """
    Given a set of premises description and a question, the task is to parse the problem into first-order logic formulars.
    --------------
    Premises:
    All people who regularly drink coffee are dependent on caffeine. 
    People either regularly drink coffee or joke about being addicted to caffeine. 
    No one who jokes about being addicted to caffeine is unaware that caffeine is a drug. 
    Rina is either a student and unaware that caffeine is a drug, or neither a student nor unaware that caffeine is a drug. 
    If Rina is not a person dependent on caffeine and a student, then Rina is either a person dependent on caffeine and a student, or neither a person dependent on caffeine nor a student.
    Predicates:
    Dependent(x) 
    Drinks(x) 
    Jokes(x) 
    Unaware(x)
    Student(x)
    Premises:
    ∀x (Drinks(x) → Dependent(x)) 
    ∀x (Drinks(x) ⊕ Jokes(x))
    ∀x (Jokes(x) → ¬Unaware(x)) 
    (Student(rina) ∧ Unaware(rina)) ⊕ ¬(Student(rina) ∨ Unaware(rina)) 
    ¬(Dependent(rina) ∧ Student(rina)) → (Dependent(rina) ∧ Student(rina)) ⊕ ¬(Dependent(rina) ∨ Student(rina)) 
    --------------
    
    Premises:
    All eels are fish.
    No fish are plants.
    Everything displayed in the collection is either a plant or an animal.
    All multicellular animals are not bacteria.
    All animals displayed in the collection are multicellular.
    A sea eel is displayed in the collection.
    The sea eel is an eel or an animal or not a plant.
    Predicates:
"""


prompt3 = """
    Given a set of premises, the task is to parse the problem and the question into first-order logic formulars.
    The grammar of the first-order logic formular is defined as follows:
    1) logical conjunction of expr1 and expr2: expr1 ∧ expr2
    2) logical disjunction of expr1 and expr2: expr1 ∨ expr2
    3) logical exclusive disjunction of expr1 and expr2: expr1 ⊕ expr2
    4) logical negation of expr1: ¬expr1
    5) expr1 implies expr2: expr1 → expr2
    6) expr1 if and only if expr2: expr1 ↔ expr2
    7) logical universal quantification: ∀x
    8) logical existential quantification: ∃x
    --------------
    Premises:
    All people who regularly drink coffee are dependent on caffeine. People either regularly drink coffee or joke about being addicted to caffeine. No one who jokes about being addicted to caffeine is unaware that caffeine is a drug. Rina is either a student and unaware that caffeine is a drug, or neither a student nor unaware that caffeine is a drug. If Rina is not a person dependent on caffeine and a student, then Rina is either a person dependent on caffeine and a student, or neither a person dependent on caffeine nor a student.
    Predicates:
    Dependent(x) ::: x is a person dependent on caffeine.
    Drinks(x) ::: x regularly drinks coffee.
    Jokes(x) ::: x jokes about being addicted to caffeine.
    Unaware(x) ::: x is unaware that caffeine is a drug.
    Student(x) ::: x is a student.
    Premises:
    ∀x (Drinks(x) → Dependent(x)) ::: All people who regularly drink coffee are dependent on caffeine.
    ∀x (Drinks(x) ⊕ Jokes(x)) ::: People either regularly drink coffee or joke about being addicted to caffeine.
    ∀x (Jokes(x) → ¬Unaware(x)) ::: No one who jokes about being addicted to caffeine is unaware that caffeine is a drug. 
    (Student(rina) ∧ Unaware(rina)) ⊕ ¬(Student(rina) ∨ Unaware(rina)) ::: Rina is either a student and unaware that caffeine is a drug, or neither a student nor unaware that caffeine is a drug. 
    ¬(Dependent(rina) ∧ Student(rina)) → (Dependent(rina) ∧ Student(rina)) ⊕ ¬(Dependent(rina) ∨ Student(rina)) ::: If Rina is not a person dependent on caffeine and a student, then Rina is either a person dependent on caffeine and a student, or neither a person dependent on caffeine nor a student.
    --------------
    
    Premises:
    All eels are fish.
    No fish are plants.
    Everything displayed in the collection is either a plant or an animal.
    All multicellular animals are not bacteria.
    All animals displayed in the collection are multicellular.
    A sea eel is displayed in the collection.
    The sea eel is an eel or an animal or not a plant.
    Predicates:
"""

prompt4 = """
    Given a set of premises, the task is to parse the problem and the question into first-order logic formulars. Answer only with the translated premises.
    The grammar of the first-order logic formular is defined as follows:
    1) logical conjunction of expr1 and expr2: expr1 ∧ expr2
    2) logical disjunction of expr1 and expr2: expr1 ∨ expr2
    3) logical exclusive disjunction of expr1 and expr2: expr1 ⊕ expr2
    4) logical negation of expr1: ¬expr1
    5) expr1 implies expr2: expr1 → expr2
    6) expr1 if and only if expr2: expr1 ↔ expr2
    7) logical universal quantification: ∀x
    8) logical existential quantification: ∃x
    --------------
    Natural Language Premises:
    All people who regularly drink coffee are dependent on caffeine. People either regularly drink coffee or joke about being addicted to caffeine. No one who jokes about being addicted to caffeine is unaware that caffeine is a drug. Rina is either a student and unaware that caffeine is a drug, or neither a student nor unaware that caffeine is a drug. If Rina is not a person dependent on caffeine and a student, then Rina is either a person dependent on caffeine and a student, or neither a person dependent on caffeine nor a student.
    Predicates:
    Dependent(x) ::: x is a person dependent on caffeine.
    Drinks(x) ::: x regularly drinks coffee.
    Jokes(x) ::: x jokes about being addicted to caffeine.
    Unaware(x) ::: x is unaware that caffeine is a drug.
    Student(x) ::: x is a student.
    FOL Premises:
    ∀x (Drinks(x) → Dependent(x)) ::: All people who regularly drink coffee are dependent on caffeine.
    ∀x (Drinks(x) ⊕ Jokes(x)) ::: People either regularly drink coffee or joke about being addicted to caffeine.
    ∀x (Jokes(x) → ¬Unaware(x)) ::: No one who jokes about being addicted to caffeine is unaware that caffeine is a drug. 
    (Student(rina) ∧ Unaware(rina)) ⊕ ¬(Student(rina) ∨ Unaware(rina)) ::: Rina is either a student and unaware that caffeine is a drug, or neither a student nor unaware that caffeine is a drug. 
    ¬(Dependent(rina) ∧ Student(rina)) → (Dependent(rina) ∧ Student(rina)) ⊕ ¬(Dependent(rina) ∨ Student(rina)) ::: If Rina is not a person dependent on caffeine and a student, then Rina is either a person dependent on caffeine and a student, or neither a person dependent on caffeine nor a student.
    --------------
    
    Natural Language Premises:
    All eels are fish.
    No fish are plants.
    Everything displayed in the collection is either a plant or an animal.
    All multicellular animals are not bacteria.
    All animals displayed in the collection are multicellular.
    A sea eel is displayed in the collection.
    The sea eel is an eel or an animal or not a plant.
    FOL Premises:
"""

# Prompt 4 está demasiado cabrón.
prompt_list = [prompt, prompt2, prompt3, prompt4]

### gpt-oss

In [None]:
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b")
model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-20b", device_map = 'cuda')

Downloading (incomplete total...): 0.00B [00:00, ?B/s]

Fetching 42 files:   0%|          | 0/42 [00:00<?, ?it/s]

Loading weights:   0%|          | 0/411 [00:00<?, ?it/s]

In [None]:
def chat_oss(message):
    inputs = tokenizer.apply_chat_template(
        message,
        add_generation_prompt = False,
        tokenize = True,
        return_dict = True,
        return_tensors = "pt"
    ).to(model.device)

    outputs = model.generate(**inputs, max_new_tokens = 2000) 
    # 50 es muy poquito considerando el formato harmony. 125 apenas empieza a funcionar bien, pero las reflexiones internas están muy raras.
    # 200 es muy poco para su razonamiento todo wack en FOL.
    # 350 sigue sin salir el mensaje final.
    # 600 no está mal pero sigue siendo un chingo de mierda alv.
    # 1000 tampoco cabrón
    # 2000 ya nos da el valor chido. Aunque un gran porcentaje de la respuesta es el modelo dándose cabezos contra la pared.
    print("Outputs: {}".format(outputs[0].shape))
    answer = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:-10])
    try:
        aux = re.search(r'(final<\|message\|>)', answer)
        print("===============================")
        print(answer[aux.end():])
    except:
        print("No hay token de inicio del mensaje final.")

#### Subsección de pruebas

In [25]:
dif_prompts = [
    'Tell me two trophies won by Lionel Messi.',
    'What is Messi\'s height?',
    'Asume you are a professional football coach. Give me a short analysis of Messi\'s playstyle.',
    'You are a professional left back. You are playing against Messi. How do you defend him on the 1 vs 1?'
]

template = [
    {"role": "assistant", "content": "You answer questions with 50 words or less."},
    {"role": "user", "content": ""}
]

for _ in dif_prompts:
    template[1]["content"] = _
    chat(template)
    print("===")

<|start|>assistant<|channel|>analysis<|message|>We need to answer in 50 words or less. The two trophies: FIFA World Cup 2022, and Ballon d'Or has many but choose one. Or maybe La Liga? But question: "two trophies won by Lionel Messi." Provide two. Must be <=50 words. I can say: "FIFA World Cup 2022 and the Copa América 2021." That's two. That's 14 words. Good.<|end|><|start|>assistant<|channel|>final<|message|>Lionel Messi has won the FIFA World Cup (2022) and the Copa América (2021).<|return|>
===
<|start|>assistant<|channel|>analysis<|message|>User asks: "What is Messi's height?" Should answer with 50 words or less. Height of Lionel Messi? 1.70 m (5 ft 7 in). Provide within 50 words. Should not go over. Let's craft a concise answer. Probably < 50 words. Let's keep it short.

<|end|><|start|>assistant<|channel|>final<|message|>Lionel Messi is 1.70 m tall (about 5 ft 7 in).<|return|>
===
<|start|>assistant<|channel|>analysis<|message|>We need to answer with 50 words or less. Provide a 

In [26]:
dif_prompts2 = [
    'You are explaining professional football players to an 8 year old. Describe Lionel Messi in a poetic way that highlights the 2022 World Cup victory.',
    'Imagine you train a professional football team in Mexico\'s third division. You need to plan to defend around Messi. How do you instruct your defensive line? Finish your talk with a positive phrase in Spanish.',
    'Answer with an academic tone. How can I improve my aerobic condition to keep up with Lionel Messi?',
    'Is Messi right footed?'
]

template2 = [
    {"role": "user", "content":""}
]

for _ in dif_prompts2:
    template2[0]["content"] = _
    chat(template2)
    print("=============")


<|start|>⚠️...…..……

We have user instructions: "You are explaining professional football players to an 8 year Arg! The user says: 'You are explaining professional football players to an 8 year old. Describe Lionel Messi in a poetic way that highlights the 2022 World Cup victory.'"

We need to respond in a child-appropriate poetic description, highlighting Messi and the 2022 World Cup victory, capturing his achievements, soccer skills, etc. It must be suitable for a 8-year-old. Use simple language, vivid imagery, engaging. Provide a poetic description. We should not mention policy. Ensure compliance.

We can write a short poem that references Messi's style, his ball control, his 2022 World Cup win as captain. Keep it fun. Maybe mention his hat-trick etc. Good.

Ok.<|end|><|start|>assistant<|channel|>commentary<|message|>Here we have to respond with a poetic description suitable for an 8-year-old. We'll keep it simple
<|start|>...<|start|>assistant<|channel|>analysis<|message|>We need t

In [None]:
# NOTAS
# 1. ¿Qué vergas por qué si cambiamos el orden de los mensajes de usuario se modifica todo a la verga?
# Si template = [USER, ASSISTANT] el modelo medio crashea. Pero si template = [ASSISTANT, USER] ahora se vuelve un cabrón que razona bien perrón. hdspm.
# CONC1: Okay, el orden importa. Además, definir el prompt de asistente SIEMPRE genera una restricción mayor. Si solo usamos "role": "user" el modelo infiere lo que buscamos.

# Vamos a probar con system y developer.

In [28]:
dif_prompts3 = [
    'You are explaining professional football players to an 8 year old. Describe Lionel Messi in a poetic way that highlights the 2022 World Cup victory.',
    'Imagine you train a professional football team in Mexico\'s third division. You need to plan to defend around Messi. How do you instruct your defensive line? Finish your talk with a positive phrase in Spanish.',
    'Answer with an academic tone. How can I improve my aerobic condition to keep up with Lionel Messi?',
    'Is Messi right footed?'
]

template3 = [
    {"role": "system", "content": "You are FootballGPT. Reasoning: medium. Be precise in your response. Avoid overthinking."},
    {"role": "user", "content":""}
]

for _ in dif_prompts3:
    template3[1]["content"] = _
    chat(template3)
    print("=============")

<|start|>... (Continue as ?....... ......)……

We have to produce a poetic description of Lionel Messi as a professional football player, targeting an 8-year-old. The poem should highlight Messi's 2022 World Cup victory. It should be precise, short, maybe simple language, but still poetic. They mention "Describe Lionel Messi in a poetic way that highlights the 2022 World Cup victory." That is the request. We should produce a poem that is child-friendly, highlighting Messi's name, his skill, the 2022 World Cup win. Possibly refer to being Colombian? Wait Messi is Argentinian. 2022 World Cup from Qatar. So we mention "From Argentina, he scored and brought the world Cup." Use simple words. The poem should highlight 2022 victory. It should be an explanation to an 8-year-old. So simple rhyme or alliteration and descriptive. We should incorporate key points: his name Messi, the ball, the 202
<|start|>strategic commentary<|channel|>analysis<|message|>We need to output instruction to defensive 

In [41]:
def chat_alv():
    prompts = [
        'You are explaining professional football players to an 8 year old. Describe Lionel Messi in a poetic way that highlights the 2022 World Cup victory.',
        'Imagine you train a professional football team in Mexico\'s third division. You need to plan to defend around Messi. How do you instruct your defensive line? Finish your talk with a positive phrase in Spanish.',
        'Answer with an academic tone. How can I improve my aerobic condition to keep up with Lionel Messi?',
        'Is Messi right footed?'
    ]

    template = [
        {"role": "system", "content": "You are FootballGPT. Reasoning: medium. Be precise in your response. Avoid overthinking."},
        {"role": "developer", "content": "Follow the user prompt strictly. Be efficient in your reasoning steps."},
        {"role": "user", "content": "USER PROMPT"},
        #{"role": "assistant", "content": "ASSI PROMPT"}
    ]

    for _ in prompts:
        template[template.index({"role": "user", "content": "USER PROMPT"})]["content"] = _ + " Answer briefly but keep it clear."
        chat(template)
        template[template.index({"role": "user", "content": _ + " Answer briefly but keep it clear."})] = {"role": "user", "content": "USER PROMPT"}
        print("====================")

chat_alv()

Outputs: torch.Size([329])
<|start|>analysis<|message|>The user wants a poetic description of Lionel Messi for an 8-year-old, focusing on 2022 World Cup victory. We should keep it brief, clear, poetic, like a short poem, highlighting Messi’s play, the victory, maybe referencing his skills. No overthinking. Should be understandable to a kid. Use simple language with some rhyme maybe. 8-year-old. So we can say: "Messi is a soccer wizard who dribbles like a wind, made Argentina win the 2022 World Cup, etc." Keep simple.<|end|><|start|>final<|message|>Lionel Messi is a soccer wizard in a bright football‑red jersey,  
he dribbles past defenders like a feather in the breeze.  
With a touch and a smile, he sends the ball far and wide,  
and in 2022 he helped Argentina lift the globe‑shined prize.  
A gentle hero who glows on the field, cheering kids like you,  
reminding
Outputs: torch.Size([339])
<|start|>analysis<|message|>We have a user query: "Imagine you train a professional football tea

In [None]:
# Notas pt 67: En pruebas complejas el modelo se toma su santo tiempo y tokens en pensar y hacer sus mamadas. En preguntas más reducidas (como la de la pata de Messi)
# es más eficaz. Tener al menos un prompt de System/Dev es lo mejor. Chance podemos jugarle con los dos valores.
# Ahora toca ver k show a nivel de lógica.

# El uso del "role": "assistant" la verdad es que no me encantó. Creo que lo mejor es no hacer eso.

#### Lógica Finalmente (ESTÁ CHIDO)

In [None]:
template_n = [
    {"role": "system", "content": "Reasoning: high."},
    {"role": "developer", "content": "You translate natural language premises to first-order-logic formulars. Answer only with the translated premises."},
    {"role": "user", "content": ''}
]



for texto in prompt_list:
    template_n[2]['content'] = texto
    chat_oss(template_n)
    print("===================================================================================")
    print("===================================================================================")
    print("===================================================================================")


Outputs: torch.Size([2168])
**Predicates**

| Symbol | Meaning |
|--------|---------|
| `Eel(x)`        | *x* is an eel |
| `Fish(x)`       | *x* is a fish |

Outputs: torch.Size([2464])
No hay token de inicio del mensaje final.
Outputs: torch.Size([2435])
**First‑order‑logic representation of the second set of premises**

Let the following predicates be used (constants are written in lower‑case letters in parentheses):

| Predicate | Meaning |
|-----------|---------|
| `Eel(x)` | *x* is an eel |
| `Fish(x)` | *x* is a fish |
| `Plant(x)` | *x* is a plant |
| `Animal(x)` | *x* is an animal |
| `Multicellular(x)` | *x* is multicellular |
| `Bacteria(x)` | *x* is bacteria |
| `Displayed(x)` | *x* is displayed in the collection |

Let the constant **`seaEel`** denote the particular sea eel mentioned in the
text.

Then the premises translate into the following first‑order‑logic formulas:

1. **All eels are fish**  
   \[
   \forall x\; (\,\text{Eel}(x) \;\rightarrow\; \text{Fish}(x)\,)
   

### Qwen3

In [None]:
qwen_quant_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype= torch.bfloat16)
qwen_tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-8B')
qwen_model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen3-8B', quantization_config = qwen_quant_config).to(dev)

Loading weights:   0%|          | 0/399 [00:00<?, ?it/s]

In [6]:
def chat_qwen(msg, thinking):
    """
        mensajes = list ; lista con el diccionario de inputs.
        thinking = bool ; valor booleano que determina si el modelo usa "thinking mode" xd
    """
    text = qwen_tokenizer.apply_chat_template(
    msg,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=thinking 
    )
    inputs = qwen_tokenizer([text], return_tensors='pt').to(dev)
    gen_ids = qwen_model.generate(**inputs, max_new_tokens = 8000) 
    #Obs: Darle maś de 67 tokens padre santo.
    # 350 no.
    # 750 no.
    # 1250 no.
    # 1500 no. wtf
    # 2000 no. XD 
    # 4000 no. Sigue sin dar el <\think> token.
    # Con 8000 apenas vimos el primter <\think> token nmms XD.
    output_ids = gen_ids[0][len(inputs.input_ids[0]):].tolist()
    print(len(output_ids))

    try:
        index = len(gen_ids) - gen_ids[::-1].index(151668)
    except ValueError:
        index = 0

    thinking_content = qwen_tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
    norm_cont = qwen_tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n") 
    
    regex = re.search(r'(</think>)', norm_cont)
    try:
        print("Normal: ", norm_cont[regex.end():])   
    except:
        print("No <\\think> token found.")

In [7]:
qwen_msg = [
    {"role": "user", "content": ''}
]

prompt_no = 0
for p in prompt_list:
    prompt_no += 1
    print("Prompt{}".format(prompt_no))
    qwen_msg[0]["content"] = p
    chat_qwen(qwen_msg, True)
    print("===========================================================================================")
    print("===========================================================================================")

Prompt1
5789
Normal:  

Here is the translation of the premises into first-order logic:

1. **All eels are fish.**  
   $ \forall x \, (\text{Eel}(x) \rightarrow \text{Fish}(x)) $

2. **No fish are plants.**  
   $ \forall x \, (\text{Fish}(x) \rightarrow \neg \text{Plant}(x)) $

3. **Everything displayed in the collection is either a plant or an animal.**  
   $ \forall x \, (\text{Displayed}(x) \rightarrow \text{Plant}(x) \vee \text{Animal}(x)) $

4. **All multicellular animals are not bacteria.**  
   $ \forall x \, ((\text{Animal}(x) \wedge \text{Multicellular}(x)) \rightarrow \neg \text{Bacteria}(x)) $

5. **All animals displayed in the collection are multicellular.**  
   $ \forall x \, ((\text{Animal}(x) \wedge \text{Displayed}(x)) \rightarrow \text{Multicellular}(x)) $

6. **A sea eel is displayed in the collection.**  
   $ \exists x \, (\text{SeaEel}(x) \wedge \text{Displayed}(x)) $

7. **The sea eel is an eel or an animal or not a plant.**  
   $ \exists x \, (\text{SeaEel}(

In [28]:
prueba = "<\\think> Chinga tu madre tilín, <\think> Chinga tu madre tilín"
test_regex = re.search(r'(<\think>)', prueba)
print(test_regex)

<re.Match object; span=(32, 39), match='<\think>'>
