---



# Unit 2 - Part 3a: Chain of Thought (CoT)

## 1. Introduction: The Inner Monologue

Standard LLMs try to jump straight to the answer. For complex problems (math, logic), this often fails.

**Chain of Thought (CoT)** forces the model to "think out loud" before answering.

### Why use a "Dumb" Model?
For this unit, we will use **Llama3.1-8b** (via Groq). It is a smaller, faster model.
Why? Because huge models (like Gemini Pro or GPT-4) are often *too smart*—they solve logic riddles instantly without thinking.

To really see the power of Prompt Engineering, we need a model that **needs help**.

### Visualizing the Process (Flowchart)
```mermaid
graph TD
    Input[Question: 5+5*2?]
    Input -->|Standard| Wrong[Answer: 20 (Wrong)]
    Input -->|CoT| Step1[Step 1: 5*2=10]
    Step1 --> Step2[Step 2: 5+10=15]
    Step2 --> Correct[Answer: 15 (Correct)]
```

## 2. Concept: Latent Reasoning

Why does this work?
Because LLMs are "Next Token Predictors".
- If you force it to answer immediately, it must predict the digits `1` and `5` immediately.
- If you let it "think", it generates intermediate tokens (`5`, `*`, `2`, `=`, `1`, `0`).
- The model then **ATTENDS** to these new tokens to compute the final answer.

**Writing is Thinking.**

In [17]:
# Setup
%pip install --upgrade --quiet langchain langchain-google-genai

import os
from langchain_google_genai import ChatGoogleGenerativeAI
from google.colab import userdata

# Get Google API Key from Colab secrets
if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")

# Using Gemini Pro (a capable model for logic)
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0.0)

## 3. The Experiment: A Tricky Logic Problem

Let's try a problem that requires multi-step reasoning.

**Problem:**
"A startup has 3 AI engineers. They decide to triple their engineering team by hiring from 4 different bootcamps. Each bootcamp provides 2 engineers. How many total AI engineers will the startup have?"

In [18]:
question = "A startup has 3 AI engineers. They decide to triple their engineering team by hiring from 4 different bootcamps. Each bootcamp provides 2 engineers. How many total AI engineers will the startup have?"

# 1. Standard Prompt (Direct Answer)
prompt_standard = f"Answer this question: {question}"
print("--- STANDARD (Llama3.1-8b) ---")
print(llm.invoke(prompt_standard).content)

--- STANDARD (Llama3.1-8b) ---
Here's how to break it down:

1.  **Initial engineers:** 3
2.  **Tripling the team:** 3 engineers * 3 = 9 engineers
3.  **Engineers available from bootcamps:** 4 bootcamps * 2 engineers/bootcamp = 8 engineers

Since they want to triple their team to 9 engineers, and they currently have 3, they need to hire 6 more engineers (9 - 3 = 6). They have 8 engineers available from the bootcamps, which is more than enough to reach their goal.

So, the startup will have **9** AI engineers.


### Critique
Smaller models often get confused by multiple operations. They might just add visible numbers (3 + 4 = 7) or misinterpret "triple" as multiply everything, ignoring the proper sequence: (bootcamps × engineers per bootcamp) + original team.

Let's force it to think step by step.

In [19]:
# 2. CoT Prompt (Magic Phrase)
prompt_cot = f"Answer this question. Let's think step by step. {question}"

print("--- Chain of Thought (Llama3.1-8b) ---")
print(llm.invoke(prompt_cot).content)

--- Chain of Thought (Llama3.1-8b) ---
Let's break this down step by step:

1.  **Initial number of AI engineers:** The startup starts with 3 AI engineers.
2.  **Goal for the team size:** They decide to triple their engineering team.
    *   3 engineers * 3 = 9 engineers.
    *   So, their goal is to have 9 AI engineers in total.
3.  **Number of new engineers needed:** To reach 9 engineers from their current 3, they need to hire:
    *   9 (goal) - 3 (current) = 6 new engineers.
4.  **Engineers provided by bootcamps:** They are hiring from 4 different bootcamps, and each provides 2 engineers.
    *   4 bootcamps * 2 engineers/bootcamp = 8 engineers available for hire.
5.  **Total AI engineers:** Since they need 6 new engineers to triple their team, and 8 are available, they will hire the 6 needed to reach their goal.
    *   3 (initial engineers) + 6 (new hires) = 9 total AI engineers.

The startup will have **9** total AI engineers.


## 4. Analysis

Look at the output. By explicitly breaking it down:
1.  "The startup currently has 3 AI engineers."
2.  "They hire from 4 bootcamps: 4 × 2 = 8 new engineers."
3.  "Total: 3 + 8 = 11 engineers."

The model effectively "debugs" its own logic by generating the intermediate steps. Note: The word "triple" was actually a red herring - the model had to ignore that and focus on the actual numbers!

---



# Unit 2 - Part 3b: Tree of Thoughts (ToT) & Graph of Thoughts (GoT)

## 1. Introduction: Beyond A -> B

CoT is linear. But complex reasoning is often nonlinear. We need to explore branches (ToT) or even combine ideas (GoT).

We continue using **Llama3.1-8b via Groq** to show how structure improves performance.

In [20]:
# Setup
%pip install --upgrade --quiet langchain langchain-google-genai

import os
from langchain_google_genai import ChatGoogleGenerativeAI
from google.colab import userdata

# Get Google API Key from Colab secrets
if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")

# Using Gemini Pro
llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0.7) # Creativity needed

## 2. Tree of Thoughts (ToT)

ToT explores multiple branches before making a decision.
**Analogy:** A chess player considering 3 possible moves before playing one.

### Implementation
We will generate 3 distinct solutions for a problem and then use a "Judge" to pick the best one.

In [21]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableParallel, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
import time

problem = "How can we reduce our cloud infrastructure costs by 30% without impacting performance?"

# Step 1: The Branch Generator
prompt_branch = ChatPromptTemplate.from_template(
    "Problem: {problem}. Give me one unique, actionable solution. Focus on a different aspect (architecture, pricing, optimization). Solution {id}:"
)

branches = RunnableParallel(
    sol1=prompt_branch.partial(id="1") | llm | StrOutputParser(),
    sol2=prompt_branch.partial(id="2") | llm | StrOutputParser(),
    sol3=prompt_branch.partial(id="3") | llm | StrOutputParser(),
)

# Step 2: The Judge
prompt_judge = ChatPromptTemplate.from_template(
    """
    I have three proposed solutions for: '{problem}'

    1: {sol1}
    2: {sol2}
    3: {sol3}

    Act as a Senior Cloud Architect. Pick the solution with the best balance of cost savings and minimal risk. Explain why in 2-3 sentences.
    """
)

# Chain: Input -> Branches -> Judge -> Output
tot_chain = (
    RunnableParallel(problem=RunnableLambda(lambda x: x), branches=branches)
    | (lambda x: {**x["branches"], "problem": x["problem"]})
    | prompt_judge
    | llm
    | StrOutputParser()
)

try:
    print("--- Tree of Thoughts (ToT) Result ---")
    print(tot_chain.invoke(problem))
except Exception as e:
    if "RESOURCE_EXHAUSTED" in str(e) or "429" in str(e):
        print("⚠️ Quota limit reached! ToT uses multiple API calls simultaneously.")
    else:
        print(f"Error: {e}")

--- Tree of Thoughts (ToT) Result ---
As a Senior Cloud Architect, I would choose **Solution 3: Intelligent Data Lifecycle Management & Tiering.**

This solution offers a highly impactful cost reduction by addressing the often-overlooked and substantial expense of storage, with minimal risk to performance since it specifically targets infrequently accessed or "cold" data. Unlike architectural refactorings, it avoids direct changes to active application logic, making its implementation inherently safer for critical system operations.


## 3. Graph of Thoughts (GoT)

You asked: **"Where is Graph of Thoughts?"**

GoT is more complex. It's a network. Information can split, process specific parts, and then **AGGREGATE** back together.

### The Workflow (Writer's Room)
1.  **Split:** Generate 3 independent story plots (Sci-Fi, Fantasy, Mystery).
2.  **Aggregate:** The model reads all 3 and creates a "Master Plot" that combines the best elements of each.
3.  **Refine:** Polish the Master Plot.

```mermaid
graph LR
   Start(Concept) --> A[Draft 1]
   Start --> B[Draft 2]
   Start --> C[Draft 3]
   A & B & C --> Mixer[Aggregator]
   Mixer --> Final[Final Story]
```

In [22]:
# 1. The Generator (Divergence)
prompt_draft = ChatPromptTemplate.from_template(
    "Describe a SaaS product feature for: {topic}. Focus on {aspect}. One sentence only."
)

drafts = RunnableParallel(
    draft_ux=prompt_draft.partial(aspect="User Experience") | llm | StrOutputParser(),
    draft_tech=prompt_draft.partial(aspect="Technical Innovation") | llm | StrOutputParser(),
    draft_business=prompt_draft.partial(aspect="Business Value") | llm | StrOutputParser(),
)

# 2. The Aggregator (Convergence)
prompt_combine = ChatPromptTemplate.from_template(
    """
    I have three perspectives on a SaaS feature for '{topic}':
    1. UX Perspective: {draft_ux}
    2. Technical Perspective: {draft_tech}
    3. Business Perspective: {draft_business}

    Your task: Write a compelling product feature description that combines the best aspects of all three perspectives.
    Make it sound exciting and customer-focused. Write 2-3 sentences.
    """
)

# 3. The Chain
got_chain = (
    RunnableParallel(topic=RunnableLambda(lambda x: x), drafts=drafts)
    | (lambda x: {**x["drafts"], "topic": x["topic"]})
    | prompt_combine
    | llm
    | StrOutputParser()
)

try:
    print("--- Graph of Thoughts (GoT) Result ---")
    print(got_chain.invoke("AI-powered code review"))
except Exception as e:
    if "RESOURCE_EXHAUSTED" in str(e) or "429" in str(e):
        print("⚠️ Quota limit reached! GoT uses multiple API calls simultaneously.")
    else:
        print(f"Error: {e}")

--- Graph of Thoughts (GoT) Result ---
Revolutionize your development with AI-powered code review, delivering clear, actionable insights that seamlessly integrate into your workflow. Our generative AI performs deep context-aware semantic analysis to proactively identify architectural debt, predict performance bottlenecks, and autonomously suggest multi-file refactorings, empowering your team to effortlessly ensure higher quality, secure, and performant code while drastically reducing costs and accelerating time-to-market.


## 4. Summary & Comparison Table

| Method | Structure | Best For... | Cost/Latency |
|--------|-----------|-------------|--------------|
| **Simple Prompt** | Input -> Output | Simple facts, summaries | ⭐ Low |
| **CoT (Chain)** | Input -> Steps -> Output | Math, Logic, Debugging | ⭐⭐ Med |
| **ToT (Tree)** | Input -> 3x Branches -> Select -> Output | Strategic decisions, Brainstorming | ⭐⭐⭐ High |
| **GoT (Graph)** | Input -> Branch -> Mix/Aggregate -> Output | Creative Writing, Research Synthesis | ⭐⭐⭐⭐ V. High |

**Recommendation:** Start with CoT. Only use ToT/GoT if CoT fails.