<a href="https://colab.research.google.com/github/RDGopal/IB9AU-2026/blob/main/RL2_Qwen_Reasoning_LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Qwen3-0.6B 'Thinking' LLM


A 'Thinking' or 'Reasoning' Large Language Model (LLM) refers to a model that exhibits an ability to process information beyond simple token prediction, often by generating intermediate thoughts or steps before producing a final answer. This internal 'thought process' can make the model's reasoning more transparent and improve the quality of its responses by allowing it to break down complex problems, evaluate options, and refine its logic. In the context of the Qwen3-0.6B model used in this notebook, the `enable_thinking=True` parameter activates this mode, allowing the model to generate an internal 'thought' block (`<think>...</think>`) before its main response. This can be particularly useful for understanding how the model arrives at its conclusions, aiding in debugging, and improving its performance on tasks requiring multi-step reasoning.

This cell loads the Qwen3-0.6B model and its corresponding tokenizer from the Hugging Face Transformers library. `AutoTokenizer` and `AutoModelForCausalLM` are used to automatically select the correct tokenizer and model architecture based on the provided model name. `torch_dtype='auto'` optimizes memory usage, and `device_map='auto'` intelligently distributes the model across available devices (like GPUs) for efficient loading and inference.

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen3-0.6B"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)


In this cell, the input for the language model is prepared. A `prompt` defines the user's query. This prompt is then formatted into a list of messages (simulating a chat conversation). The `tokenizer.apply_chat_template` function converts these messages into a format the model understands, and importantly, `enable_thinking=True` activates the model's 'thinking' mode, instructing it to generate an internal thought process before its final answer.

In [None]:
# prepare the model input
prompt = "Tell me how the Q-learning algorithm for Reinforcement Learning works" #@param {type:"string"}
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

This cell performs the core text completion task. The `model.generate()` method is called with the prepared model inputs. `max_new_tokens` is set to a high value to ensure a comprehensive response. The generated IDs are then processed to extract only the newly generated tokens, excluding the input prompt tokens.

In [None]:
# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

This cell is responsible for parsing the generated output. Since the model operates in 'thinking' mode, its output includes a `<think>...</think>` block. This code identifies the end of the thinking block (using a special token ID 151668 for `</think>`) and then decodes the content before and after this marker into `thinking_content` and `content` variables, respectively. This allows for separate analysis of the model's internal reasoning and its final answer.

In [None]:
# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

This cell imports `Markdown` from `IPython.display`. This utility is used to render markdown-formatted strings directly in the Jupyter/Colab output, making the display of the model's output (both thinking and final content) more readable and structured.

In [None]:
from IPython.display import Markdown

This cell displays the `thinking_content` extracted in the previous parsing step. Viewing this output helps understand the model's internal reasoning process, how it breaks down the problem, and what steps it considers before formulating its final answer.

This cell displays the final `content` generated by the Qwen3-0.6B model. This is the ultimate answer or explanation provided by the LLM, following its internal thought process.

In [None]:
# Display thinking content
display(Markdown(thinking_content))

In [None]:
# Display content
display(Markdown(content))

### Try - Fintech Application of LLM Reasoning

**Scenario:** A fintech startup wants to use an LLM to provide basic, personalized financial advice or explain complex financial concepts to users. They want to ensure the LLM's responses are transparent and well-reasoned.

**Task:** Modify the existing code cells in this notebook to perform the following:

1.  **Change the `prompt`**: Update the `prompt` variable  to ask the Qwen3-0.6B model to explain a common fintech concept. Examples include:
    *   "Explain the concept of 'Decentralized Finance (DeFi)' and its potential impact on traditional banking."
    *   "Describe how blockchain technology is used in supply chain finance."
    *   "What are the pros and cons of algorithmic trading in financial markets?"

2.  **Analyze Thinking Content**: After running the cells, examine the `thinking_content` output. How does the model's internal 'thought process' contribute to the clarity or accuracy of its final explanation?

3.  **Evaluate Clarity and Accuracy**: Based on the `content` output, how well does the Qwen3-0.6B model explain the chosen fintech concept? If there are any areas for improvement, think about how the prompt could be refined to get a better answer.