# Beam Search

- Beam search is a heuristic search algorithm that explores a graph by expanding the most promising nodes within a limited set. 
- It is widely used in Natural Language Processing (NLP) and sequence-to-sequence models to balance computational efficiency with the quality of the output.

**The Core Concept**
- Think of beam search as a “focused” version of Breadth-First Search (BFS). While BFS explores every possible path at each level of a tree, beam search restricts itself to only the top  most promising paths .
- This limit  is called the Beam Width.
	-	If the beam width is 1, the algorithm behaves exactly like a Greedy Search (only taking the single best step at a time).
	-	If the beam width is infinite, it behaves like a standard BFS (exploring all possibilities).
	-	By setting  somewhere in between (e.g., 3, 5, or 10), beam search finds a middle ground: it explores more options than a greedy search to avoid dead ends, but fewer than BFS to save memory and time .

**How It Works: Step-by-Step**
- Imagine an AI trying to generate the next word in a sentence. Here is how beam search with a Beam Width () of 2 would handle it:
	1.	Initialization: Start with the initial input (e.g., the word “The”).
	2.	Expansion: The model predicts all possible next words (e.g., “The cat…”, “The dog…”, “The car…”, “The apple…”).
	3.	Evaluation: It assigns a probability score to each phrase.
	4.	Selection (The Pruning Step): instead of keeping all options, it keeps only the top  (2) mostly likely phrases:
        -	“The cat” (Score: 0.9)
        -	“The dog” (Score: 0.8)
        -	Discarded: “The car” (0.4), “The apple” (0.1), etc.
	5.	Iteration: It now expands only from the two survivors (“The cat” and “The dog”) to find the next word, scores those new combinations, and again keeps only the top 2 overall best paths .

**Pros and Cons**

**Advantages:**
-	Memory Efficient: Unlike BFS, it does not store the entire search tree in memory. The memory usage is linear with respect to the maximum depth and beam width .
-	Better than Greedy: It avoids the trap of picking a locally “best” option that leads to a bad outcome later (e.g., a sentence that starts well but ends nonsensically) .

**Disadvantages:**
-	Incomplete: Because it prunes paths to save space, it might accidentally discard the path that eventually leads to the absolute best solution (the global optimum) .
-	No Guarantee: It is not guaranteed to find the optimal solution, just a “good enough” one.

**Practical Use in AI**
You will most commonly encounter beam search in:
-	**Large Language Models (LLMs):** When ChatGPT or Gemini generates text, they often use sampling methods similar to beam search to ensure the text is coherent but not repetitive.
-	**Machine Translation:** Google Translate uses it to translate a sentence by keeping track of the few most likely translations simultaneously rather than translating word-by-word .
-	**Speech Recognition:** Converting audio signals to text by exploring multiple likely phoneme sequences at once .