diff --git a/notes/backtracking.md b/notes/backtracking.md
index b50d1f9..26481f1 100644
--- a/notes/backtracking.md
+++ b/notes/backtracking.md
@@ -209,6 +209,49 @@ Main Idea:
6. If the partial solution is complete and valid, record or output it.
7. If all options are exhausted at a level, remove the last component and backtrack to the previous level.
+General Template (pseudocode)
+
+```
+function backtrack(partial):
+ if is_complete(partial):
+ handle_solution(partial)
+ return // or continue if looking for all solutions
+
+ for candidate in generate_candidates(partial):
+ if is_valid(candidate, partial):
+ place(candidate, partial) // extend partial with candidate
+ backtrack(partial)
+ unplace(candidate, partial) // undo extension (backtrack)
+```
+
+Pieces you supply per problem:
+
+* `is_complete`: does `partial` represent a full solution?
+* `handle_solution`: record/output the solution.
+* `generate_candidates`: possible next choices given current partial.
+* `is_valid`: pruning test to reject infeasible choices early.
+* `place` / `unplace`: apply and revert the choice.
+
+Python-ish Generic Framework
+
+```python
+def backtrack(partial, is_complete, generate_candidates, is_valid, handle_solution):
+ if is_complete(partial):
+ handle_solution(partial)
+ return
+
+ for candidate in generate_candidates(partial):
+ if not is_valid(candidate, partial):
+ continue
+ # make move
+ partial.append(candidate)
+ backtrack(partial, is_complete, generate_candidates, is_valid, handle_solution)
+ # undo move
+ partial.pop()
+```
+
+You can wrap those callbacks into a class or closures for stateful problems.
+
#### N-Queens Problem
The N-Queens problem is a classic puzzle in which the goal is to place $N$ queens on an $N \times N$ chessboard such that no two queens threaten each other. In chess, a queen can move any number of squares along a row, column, or diagonal. Therefore, no two queens can share the same row, column, or diagonal.
diff --git a/notes/basic_concepts.md b/notes/basic_concepts.md
index fd5d9a8..faf0d5f 100644
--- a/notes/basic_concepts.md
+++ b/notes/basic_concepts.md
@@ -10,12 +10,73 @@ Data structures and algorithms are fundamental concepts in computer science that
A **data structure** organizes and stores data in a way that allows efficient access, modification, and processing. The choice of the appropriate data structure depends on the specific use case and can significantly impact the performance of an application. Here are some common data structures:
-1. Imagine an **array** as a row of lockers, each labeled with a number and capable of holding one item of the same type. Technically, arrays are blocks of memory storing elements sequentially, allowing quick access using an index. However, arrays have a fixed size, which limits their flexibility when you need to add or remove items.
-2. Think of a **stack** like stacking plates: you always add new plates on top (push), and remove them from the top as well (pop). This structure follows the Last-In, First-Out (LIFO) approach, meaning the most recently added item is removed first. Stacks are particularly helpful in managing function calls (like in the call stack of a program) or enabling "undo" operations in applications.
-3. A **queue** is similar to a line at the grocery store checkout. People join at the end (enqueue) and leave from the front (dequeue), adhering to the First-In, First-Out (FIFO) principle. This ensures the first person (or item) that arrives is also the first to leave. Queues work great for handling tasks or events in the exact order they occur, like scheduling print jobs or processing messages.
-4. You can picture a **linked list** as a treasure hunt, where each clue leads you to the next one. Each clue, or node, holds data and a pointer directing you to the next node. Because nodes can be added or removed without shifting other elements around, linked lists offer dynamic and flexible management of data at any position.
-5. A **tree** resembles a family tree, starting from one ancestor (the root) and branching out into multiple descendants (nodes), each of which can have their own children. Formally, trees are hierarchical structures organized across various levels. They’re excellent for showing hierarchical relationships, such as organizing files on your computer or visualizing company structures.
-6. Consider a **graph** like a network of cities connected by roads. Each city represents a node, and the roads connecting them are edges, which can either be one-way (directed) or two-way (undirected). Graphs effectively illustrate complex relationships and networks, such as social media connections, website link structures, or even mapping transportation routes.
+**I. Array**
+
+Imagine an **array** as a row of lockers, each labeled with a number and capable of holding one item of the same type. Technically, arrays are blocks of memory storing elements sequentially, allowing quick access using an index. However, arrays have a fixed size, which limits their flexibility when you need to add or remove items.
+
+```
+Indices: 0 1 2 3
+Array: [A] [B] [C] [D]
+```
+
+**II. Stack**
+
+Think of a **stack** like stacking plates: you always add new plates on top (push), and remove them from the top as well (pop). This structure follows the Last-In, First-Out (LIFO) approach, meaning the most recently added item is removed first. Stacks are particularly helpful in managing function calls (like in the call stack of a program) or enabling "undo" operations in applications.
+
+```
+Top
+ ┌───┐
+ │ C │ ← most recent (pop/push here)
+ ├───┤
+ │ B │
+ ├───┤
+ │ A │
+ └───┘
+Bottom
+```
+
+**III. Queue**
+
+A **queue** is similar to a line at the grocery store checkout. People join at the end (enqueue) and leave from the front (dequeue), adhering to the First-In, First-Out (FIFO) principle. This ensures the first person (or item) that arrives is also the first to leave. Queues work great for handling tasks or events in the exact order they occur, like scheduling print jobs or processing messages.
+
+```
+Front → [A] → [B] → [C] → [D] ← Rear
+(dequeue) (enqueue)
+```
+
+**IV. Linked List**
+
+You can picture a **linked list** as a treasure hunt, where each clue leads you to the next one. Each clue, or node, holds data and a pointer directing you to the next node. Because nodes can be added or removed without shifting other elements around, linked lists offer dynamic and flexible management of data at any position.
+
+```
+Head -> [A] -> [B] -> [C] -> NULL
+```
+
+**V. Tree**
+
+A **tree** resembles a family tree, starting from one ancestor (the root) and branching out into multiple descendants (nodes), each of which can have their own children. Formally, trees are hierarchical structures organized across various levels. They’re excellent for showing hierarchical relationships, such as organizing files on your computer or visualizing company structures.
+
+```
+# Tree
+ (Root)
+ / \
+ (L) (R)
+ / \ \
+ (LL) (LR) (RR)
+```
+
+**VI. Graph**
+
+Consider a **graph** like a network of cities connected by roads. Each city represents a node, and the roads connecting them are edges, which can either be one-way (directed) or two-way (undirected). Graphs effectively illustrate complex relationships and networks, such as social media connections, website link structures, or even mapping transportation routes.
+
+```
+(A) ↔ (B)
+ | \
+(C) ---> (D)
+```
+
+(↔ undirected edge, ---> directed edge)
+

@@ -88,10 +149,7 @@ sum = num1 + num2
print("The sum is", sum)
```
-To recap:
-
-- Algorithms are abstract instructions designed to terminate after a finite number of steps.
-- Programs are concrete implementations, which may sometimes run indefinitely or until an external action stops them. For instance, an operating system is a program designed to run continuously until explicitly terminated.
+Programs may sometimes run indefinitely or until an external action stops them. For instance, an operating system is a program designed to run continuously until explicitly terminated.
#### Types of Algorithms
@@ -101,13 +159,14 @@ I. **Sorting Algorithms** arrange data in a specific order, such as ascending or
Example: Bubble Sort
-```
-Initial Array: [5, 3, 8, 4, 2]
+Initial Array: `[5, 3, 8, 4, 2]`
Steps:
+
1. Compare adjacent elements and swap if needed.
2. Repeat for all elements.
+```
After 1st Pass: [3, 5, 4, 2, 8]
After 2nd Pass: [3, 4, 2, 5, 8]
After 3rd Pass: [3, 2, 4, 5, 8]
@@ -118,16 +177,17 @@ II. **Search Algorithms** are designed to find a specific item or value within a
Example: Binary Search
-```
-Searching 33 in Sorted Array: [1, 3, 5, 7, 9, 11, 33, 45, 77, 89]
+Searching 33 in Sorted Array: `[1, 3, 5, 7, 9, 11, 33, 45, 77, 89]`
Steps:
+
1. Start with the middle element.
2. If the middle element is the target, return it.
3. If the target is greater, ignore the left half.
4. If the target is smaller, ignore the right half.
5. Repeat until the target is found or the subarray is empty.
+```
Mid element at start: 9
33 > 9, so discard left half
New mid element: 45
@@ -137,64 +197,98 @@ New mid element: 11
The remaining element is 33, which is the target.
```
-**Graph Algorithms** address problems related to graphs, such as finding the shortest path between nodes or determining if a graph is connected. Examples include Dijkstra's algorithm and the Floyd-Warshall algorithm.
+III. **Graph Algorithms** address problems related to graphs, such as finding the shortest path between nodes or determining if a graph is connected. Examples include Dijkstra's algorithm and the Floyd-Warshall algorithm.
Example: Dijkstra's Algorithm
-```
Given a graph with weighted edges, find the shortest path from a starting node to all other nodes.
Steps:
+
1. Initialize the starting node with a distance of 0 and all other nodes with infinity.
2. Visit the unvisited node with the smallest known distance.
3. Update the distances of its neighboring nodes.
4. Repeat until all nodes have been visited.
Example Graph:
+
+```
A -> B (1)
A -> C (4)
B -> C (2)
B -> D (5)
C -> D (1)
+```
+
+Trace Table
+
+| Iter | Extracted Node (u) | PQ before extraction | dist[A,B,C,D] | prev[A,B,C,D] | Visited | Comments / Updates |
+| ---- | ------------------ | ---------------------------------- | -------------- | -------------- | --------- | --------------------------------------------------------------------------------------- |
+| 0 | — (initial) | (0, A) | [0, ∞, ∞, ∞] | [-, -, -, -] | {} | Initialization: A=0, others ∞ |
+| 1 | A (0) | (0, A) | [0, 1, 4, ∞] | [-, A, A, -] | {A} | Relax A→B (1), A→C (4); push (1,B), (4,C) |
+| 2 | B (1) | (1, B), (4, C) | [0, 1, 3, 6] | [-, A, B, B] | {A, B} | Relax B→C: alt=3 <4 ⇒ update C; B→D: dist[D]=6; push (3,C), (6,D). (4,C) becomes stale |
+| 3 | C (3) | (3, C), (4, C) stale, (6, D) | [0, 1, 3, 4] | [-, A, B, C] | {A, B, C} | Relax C→D: alt=4 <6 ⇒ update D; push (4,D). (6,D) becomes stale |
+| 4 | D (4) | (4, D), (4, C) stale, (6, D) stale | [0, 1, 3, 4] | [-, A, B, C] | {A,B,C,D} | No outgoing improvements; done |
+
+Legend:
+
+* `dist[X]`: current best known distance from A to X
+* `prev[X]`: predecessor of X on that best path
+* PQ: min-heap of (tentative distance, node); stale entries (superseded by better distance) are shown in parentheses
+* Visited: nodes whose shortest distance is finalized
Starting from A:
+
- Shortest path to B: A -> B (1)
- Shortest path to C: A -> B -> C (3)
- Shortest path to D: A -> B -> C -> D (4)
-```
-**String Algorithms** deal with problems related to strings, such as finding patterns or matching sequences. Examples include the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore algorithm.
+IV. **String Algorithms** deal with problems related to strings, such as finding patterns or matching sequences. Examples include the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore algorithm.
Example: Boyer-Moore Algorithm
```
Text: "ABABDABACDABABCABAB"
Pattern: "ABABCABAB"
+```
Steps:
+
1. Compare the pattern from right to left.
2. If a mismatch occurs, use the bad character and good suffix heuristics to skip alignments.
3. Repeat until the pattern is found or the text is exhausted.
+| Iter | Start | Text window | Mismatch (pattern vs text) | Shift applied | Next Start | Result |
+| ---- | ----- | ----------- | ----------------------------------------- | -------------------------------------------------- | ---------- | --------------- |
+| 1 | 0 | `ABABDABAC` | pattern[8]=B vs text[8]=C | bad char C → last in pattern at idx4 ⇒ 8−4 = **4** | 4 | no match |
+| 2 | 4 | `DABACDABA` | pattern[8]=B vs text[12]=A | bad char A → last at idx7 ⇒ 8−7 = **1** | 5 | no match |
+| 3 | 5 | `ABACDABAB` | pattern[4]=C vs text[9]=D | D not in pattern ⇒ 4−(−1)= **5** | 10 | no match |
+| 4 | 10 | `ABABCABAB` | full right-to-left comparison → **match** | — | — | **found** at 10 |
+
Pattern matched starting at index 10 in the text.
-```
#### Important Algorithms for Software Engineers
-- As a software engineer, it is not necessary to **master every algorithm**. Instead, knowing how to effectively use libraries and packages that implement widely-used algorithms is more practical.
+- As a software engineer, it is not necessary to **master every algorithm**. Instead, knowing how to use libraries and packages that implement widely-used algorithms is more practical.
- The important skill is the ability to **select the right algorithm** for a task by considering factors such as its efficiency, the problem’s requirements, and any specific constraints.
- Learning **algorithms** during the early stages of programming enhances problem-solving skills. It builds a solid foundation in logical thinking, introduces various problem-solving strategies, and helps in understanding how to approach complex issues.
- Once the **fundamentals of algorithms** are understood, the focus often shifts to utilizing pre-built libraries and tools for solving real-world problems, as writing algorithms from scratch is rarely needed in practice.
+Real Life Story:
+
+```
+When Zara landed her first job at a logistics-tech startup, her assignment was to route delivery vans through a sprawling city in under a second—something she’d never tackled before. She remembered the semester she’d wrestled with graph theory and Dijkstra’s algorithm purely for practice, so instead of hand-coding the logic she opened the company’s Python stack and pulled in NetworkX, benchmarking its built-in shortest-path routines against the map’s size and the firm’s latency budget. The initial results were sluggish, so she compared A* with Dijkstra, toggling heuristics until the run time dipped below 500 ms, well under the one-second target. Her teammates were impressed not because she reinvented an algorithm, but because she knew which one to choose, how to reason about its complexity, and where to find a rock-solid library implementation. Later, in a sprint retrospective, Zara admitted that mastering algorithms in college hadn’t been about memorizing code—it had trained her to dissect problems, weigh trade-offs, and plug in the right tool when every millisecond and memory block counted.
+```
+
### Understanding Algorithmic Complexity
Algorithmic complexity helps us understand the computational resources (time or space) an algorithm needs as the input size increases. Here’s a breakdown of different types of complexity:
-* *Best-case complexity* describes how quickly or efficiently an algorithm runs under the most favorable conditions. For example, an algorithm with a best-case complexity of O(1) performs its task instantly, regardless of how much data it processes.
-* *Average-case complexity* reflects the typical performance of an algorithm across all possible inputs. Determining this can be complex, as it involves analyzing how often different inputs occur and how each one influences the algorithm's overall performance.
-* *Worst-case complexity* defines the maximum amount of time or resources an algorithm could consume when faced with the most difficult or demanding inputs. Understanding the worst-case scenario is crucial because it sets an upper limit on performance, ensuring predictable and reliable behavior.
-* *Space complexity* refers to how much memory an algorithm needs relative to the amount of data it processes. It's an important consideration when memory availability is limited or when optimizing an algorithm to be resource-efficient.
-* *Time complexity* indicates how the execution time of an algorithm increases as the input size grows. Typically, this is the primary focus when evaluating algorithm efficiency because faster algorithms are generally more practical and user-friendly.
+* In an ideal input scenario, *best-case complexity* shows the minimum work an algorithm will do; include it to set expectations for quick interactions, omit it and you may overlook fast paths that are useful for user experience, as when insertion sort finishes almost immediately on a nearly sorted list.
+* When you ask what to expect most of the time, *average-case complexity* estimates typical running time; include it to make useful forecasts under normal workloads, omit it and designs can seem fine in tests but lag on common inputs, as with randomly ordered customer IDs that need $O(n log n)$ sorting.
+* By establishing an upper bound, *worst-case complexity* tells you the maximum time or space an algorithm might need; include it to ensure predictable behavior, omit it and peak loads can surprise you, as when quicksort degrades to $O(n^2)$ on already sorted input without careful pivot selection.
+* On memory-limited devices, *space complexity* measures how much extra storage an algorithm requires; include it to fit within available RAM, omit it and an otherwise fast solution may crash or swap, as when merge sort’s $O(n)$ auxiliary array overwhelms a phone with little free memory.
+* As your dataset scales, *time complexity* describes how running time expands with input size; include it to choose faster approaches, omit it and performance can degrade sharply, as when an $O(n^2)$ deduplication routine turns a minute-long job into hours after a customer list doubles.
#### Analyzing Algorithm Growth Rates
@@ -208,6 +302,8 @@ If we designate $f(n)$ as the actual complexity and $g(n)$ as the function in Bi
For instance, if an algorithm has a time complexity of $O(n)$, it signifies that the algorithm's running time does not grow more rapidly than a linear function of the input size, in the worst-case scenario.
+
+
##### Big Omega Notation (Ω-notation)
The Big Omega notation provides an asymptotic lower bound that expresses the best-case scenario for the time or space complexity of an algorithm.
@@ -216,13 +312,17 @@ If $f(n) = Ω(g(n))$, this means that $f(n)$ grows at a rate that is at least as
For example, if an algorithm has a time complexity of $Ω(n)$, it implies that the running time is at the bare minimum proportional to the input size in the best-case scenario.
+
+
##### Theta Notation (Θ-notation)
Theta notation offers a representation of the average-case scenario for an algorithm's time or space complexity. It sets an asymptotically tight bound, implying that the function grows neither more rapidly nor slower than the bound.
Stating $f(n) = Θ(g(n))$ signifies that $f(n)$ grows at the same rate as $g(n)$ under average circumstances. This indicates the time or space complexity is both at most and at least a linear function of the input size.
-Remember, these notations primarily address the growth rate as the input size becomes significantly large. While they offer a high-level comprehension of an algorithm's performance, the actual running time in practice can differ based on various factors, such as the specific input data, the hardware or environment where the algorithm is operating, and the precise way the algorithm is implemented in the code.
+
+
+These notations primarily address the growth rate as the input size becomes significantly large. While they offer a high-level comprehension of an algorithm's performance, the actual running time in practice can differ based on various factors, such as the specific input data, the hardware or environment where the algorithm is operating, and the precise way the algorithm is implemented in the code.
#### Diving into Big O Notation Examples
diff --git a/notes/brain_teasers.md b/notes/brain_teasers.md
index 1ee7a4e..4f0596c 100644
--- a/notes/brain_teasers.md
+++ b/notes/brain_teasers.md
@@ -1,21 +1,21 @@
+- tree traversal in order, post order etc.
+
## Solving Programming Brain Teasers
-Programming puzzles and brain teasers are excellent tools for testing and enhancing your coding abilities and problem-solving skills. They are frequently used in technical interviews to evaluate a candidate's logical thinking, analytical prowess, and ability to devise efficient algorithms. To excel in these scenarios, it is recommended to master effective strategies for approaching and solving these problems.
+Programming puzzles and brain teasers are great ways to improve your coding and problem-solving skills. They're commonly used in technical interviews to assess a candidate's logical thinking, analytical ability, and skill in creating efficient solutions. To do well in these situations, it's important to learn and apply effective strategies for solving these problems.
### General Strategies
When tackling programming puzzles, consider the following strategies:
-- Starting with a **simple solution** can help you understand the problem better and identify challenges. This initial approach often highlights areas where optimization is needed later.
-- Writing **unit tests** ensures your solution works for a variety of input scenarios. These tests are invaluable for catching logical errors and handling edge cases, and they allow for safe updates through regression testing.
-- Analyzing the **time and space complexity** of your algorithm helps you measure its efficiency. Aim for the best possible complexity, such as $O(n)$, while avoiding unnecessary memory usage.
-- Choosing the **appropriate data structure** is important for achieving better performance. Knowing when to use structures like arrays, linked lists, stacks, or trees can greatly enhance your solution.
-- **Hash tables** are ideal for problems that require fast lookups, such as counting elements, detecting duplicates, or associating keys with values, as they offer average-case $O(1)$ complexity.
-- Implementing **memoization or dynamic programming** can optimize problems with overlapping subproblems by storing and reusing previously computed results to save time.
-- Breaking a problem into **smaller subproblems** often simplifies the process. Solving these subproblems individually makes it easier to manage and integrate the solutions.
-- Considering both **recursive and iterative approaches** allows flexibility. Recursion can simplify the logic for certain problems, while iteration may be more efficient and avoid stack overflow risks.
-- Paying attention to **edge cases and constraints** helps ensure robustness. Examples include handling empty inputs, very large or very small values, and duplicate data correctly.
-- While optimizing too early can complicate development, **targeted optimization** at the right time focuses on the most resource-intensive parts of the code, improving performance without reducing clarity or maintainability.
+* Starting with a *simple solution* can be helpful in understanding the problem and revealing areas that may need further optimization later on.
+* Writing *unit tests* is useful for ensuring that your solution works correctly across a range of input scenarios, including edge cases.
+* Analyzing *time and space complexity* of your algorithm is important for assessing its efficiency and striving for an optimal *performance*.
+* Choosing the *appropriate data structure*, such as an array or tree, is beneficial for improving the speed and clarity of your solution.
+* Breaking down the problem into *smaller parts* can make the overall task more manageable and easier to solve.
+* Considering both *recursive* and *iterative* approaches gives you flexibility in selecting the method that best suits the problem’s needs.
+* Paying attention to *edge cases* and *constraints* ensures your solution handles unusual or extreme inputs gracefully.
+* *Targeted optimization*, when applied at the right time, can improve performance in specific areas without sacrificing clarity.
### Data Structures
@@ -23,16 +23,27 @@ Understanding and effectively using data structures is fundamental in programmin
#### Working with Arrays
-Arrays are fundamental data structures that store elements in contiguous memory locations, allowing efficient random access. Here are strategies for working with arrays:
-
-- **Sorting** an array can simplify many problems. Algorithms like Quick Sort and Merge Sort are efficient with $O(n \log n)$ time complexity. For nearly sorted or small arrays, **Insertion Sort** may be a better option due to its simplicity and efficiency in those cases.
-- In **sorted arrays**, binary search provides a fast way to find elements or their positions, working in $O(\log n)$. Be cautious with **mid-point calculations** in languages prone to integer overflow due to fixed-size integer types.
-- The **two-pointer technique** uses two indices, often starting from opposite ends of the array, to solve problems involving pairs or triplets, like finding two numbers that add up to a target sum. It helps optimize time and space.
-- The **sliding window technique** is effective for subarray or substring problems, such as finding the longest substring without repeating characters. It keeps a dynamic subset of the array while iterating, improving efficiency.
-- **Prefix sums** enable quick range sum queries after preprocessing the array in $O(n)$. Similarly, **difference arrays** allow efficient range updates without modifying individual elements one by one.
-- **In-place operations** modify the array directly without using extra memory. This approach saves space but requires careful handling to avoid unintended side effects on other parts of the program.
-- When dealing with **duplicates**, it’s important to adjust the algorithm to handle them correctly. For example, in the two-pointer technique, duplicates may need to be skipped to prevent redundant results or errors.
-- **Memory usage** is a important consideration with large arrays, as they can consume significant space. Be mindful of space complexity in constrained environments to prevent excessive memory usage.
+Arrays are basic data structures that store elements in a continuous block of memory, making it easy to access any element quickly. Here are some tips for working with arrays:
+
+* Sorting an array can often simplify many problems, with algorithms like Quick Sort and Merge Sort offering efficient $O(n \log n)$ time complexity. For nearly sorted or small arrays, *Insertion Sort* might be a better option due to its simplicity and efficiency in such cases.
+* In sorted arrays, *binary search* provides a fast way to find elements or their positions, working in $O(\log n)$. Be cautious with mid-point calculations in languages that may experience integer overflow due to fixed-size integer types.
+* The *two-pointer* technique uses two indices, typically starting from opposite ends of the array, to solve problems involving pairs or triplets, like finding two numbers that sum to a target. It helps optimize both time and space efficiency.
+* The *sliding window* technique is effective for solving subarray or substring problems, such as finding the longest substring without repeating characters. It maintains a dynamic subset of the array while iterating, improving overall efficiency.
+* *Prefix sums* enable fast range sum queries after preprocessing the array in $O(n)$. Likewise, difference arrays allow efficient range updates without the need to modify individual elements one by one.
+* In-place operations modify the array directly without using extra memory. This method saves space but requires careful handling to avoid unintended side effects on other parts of the program.
+* When dealing with duplicates, it’s important to adjust the algorithm to handle them appropriately. For example, the two-pointer technique may need to skip duplicates to prevent redundant results or errors.
+* When working with large arrays, it’s important to be mindful of memory usage, as they can consume a lot of space. To optimize, try to minimize the space complexity by using more memory-efficient data structures or algorithms. For instance, instead of storing a full array of values, consider using a *sliding window* or *in-place modifications* to avoid extra memory allocation. Additionally, analyze the space complexity of your solution and check for operations that create large intermediate data structures, which can lead to excessive memory consumption. In constrained environments, tools like memory profiling or checking the space usage of your program (e.g., using Python’s `sys.getsizeof()`) can help you identify areas for improvement.
+* When using dynamic arrays, it’s helpful to allow automatic resizing, which lets the array expand or shrink based on the data size. This avoids the need for manual memory management and improves flexibility.
+* Resizing arrays frequently can be costly in terms of time complexity. A more efficient approach is to resize the array exponentially, such as doubling its size, rather than resizing it by a fixed amount each time.
+* To avoid unnecessary memory usage, it's important to pass arrays by reference (or using pointers in some languages) when possible, instead of copying the entire array for each function call.
+* For arrays with many zero or null values, using sparse arrays or hash maps can be useful. This allows you to store only non-zero values, saving memory when dealing with large arrays that contain mostly empty data.
+* When dealing with multi-dimensional arrays, flattening them into a one-dimensional array can make it easier to perform operations, but be aware that this can temporarily increase memory usage.
+* To improve performance, accessing memory in contiguous blocks is important. Random access patterns may lead to cache misses, which can slow down operations, so try to access array elements sequentially when possible.
+* The `bisect` module helps maintain sorted order in a list by finding the appropriate index for inserting an element or by performing binary searches.
+* Use `bisect.insort()` to insert elements into a sorted list while keeping it ordered.
+* Use `bisect.bisect_left()` or `bisect.bisect_right()` to find the index where an element should be inserted.
+* Don’t use on unsorted lists or when frequent updates are needed, as maintaining order can be inefficient.
+* Binary search operations like `bisect_left()` are `O(log n)`, but `insort()` can be `O(n)` due to shifting elements.
#### Working with Strings
diff --git a/notes/dynamic_programming.md b/notes/dynamic_programming.md
index da180c0..9233e1c 100644
--- a/notes/dynamic_programming.md
+++ b/notes/dynamic_programming.md
@@ -2,9 +2,9 @@
Dynamic Programming (DP) is a way to solve complex problems by breaking them into smaller, easier problems. Instead of solving the same small problems again and again, DP **stores their solutions** in a structure like an array, table, or map. This avoids wasting time on repeated calculations and makes the process much faster and more efficient.
-DP works best for problems that have two key features. The first is **optimal substructure**, which means you can build the solution to a big problem from the solutions to smaller problems. The second is **overlapping subproblems**, where the same smaller problems show up multiple times during the process. By focusing on these features, DP ensures that each part of the problem is solved only once.
+DP works best for problems that have two features. The first is **optimal substructure**, which means you can build the solution to a big problem from the solutions to smaller problems. The second is **overlapping subproblems**, where the same smaller problems show up multiple times during the process. By focusing on these features, DP ensures that each part of the problem is solved only once.
-This method was introduced by Richard Bellman in the 1950s and has become a valuable tool in areas like computer science, economics, and operations research. It has been used to solve problems that would otherwise take too long by turning slow, exponential-time algorithms into much faster polynomial-time solutions. DP is practical and powerful for tackling real-world optimization challenges.
+This method was introduced by Richard Bellman in the 1950s and has become a valuable tool in areas like computer science, economics, and operations research. It has been used to solve problems that would otherwise take too long by turning slow, exponential-time algorithms into much faster polynomial-time solutions. DP is used in practice for tackling real-world optimization challenges.
### Principles
diff --git a/notes/graphs.md b/notes/graphs.md
index 1b1aa5f..4ef4fef 100644
--- a/notes/graphs.md
+++ b/notes/graphs.md
@@ -56,21 +56,29 @@ Graph theory has its own language, full of terms that make it easier to talk abo
### Representation of Graphs in Computer Memory
-Graphs, with their versatile applications in numerous domains, necessitate efficient storage and manipulation mechanisms in computer memory. The choice of representation often depends on the graph's characteristics, such as sparsity, and the specific operations to be performed. Among the various methods available, the adjacency matrix and the adjacency list are the most prevalent.
+Graphs, with their versatile applications in numerous domains, necessitate efficient storage and manipulation mechanisms in computer memory. The choice of representation often depends on the graph's characteristics (e.g., dense vs. sparse, directed vs. undirected, weighted vs. unweighted) and the specific operations to be performed. Among the various methods available, the adjacency matrix and the adjacency list are the most prevalent.
#### Adjacency Matrix
-An adjacency matrix represents a graph $G$ as a two-dimensional matrix. Given $V$ vertices, it utilizes a $V \times V$ matrix $A$. The rows and columns correspond to the graph's vertices, and each cell $A_{ij}$ holds:
+An adjacency matrix represents a graph $G$ with $V$ vertices as a two-dimensional matrix $A$ of size $V \times V$. The rows and columns correspond to vertices, and each cell $A_{ij}$ holds:
-- `1` if there is an edge between vertex $i$ and vertex $j$
-- `0` if no such edge exists
+* `1` if there is an edge between vertex $i$ and vertex $j$ (or specifically $i \to j$ in a directed graph)
+* `0` if no such edge exists
+* For weighted graphs, $A_{ij}$ contains the **weight** of the edge; often `0` or `∞` (or `None`) indicates “no edge”
-For graphs with edge weights, $A_{ij}$ contains the weight of the edge between vertices $i$ and $j$.
+**Same graph used throughout (undirected 4-cycle A–B–C–D–A):**
-Example:
+```
+ (A)------(B)
+ | |
+ | |
+ (D)------(C)
+```
+
+**Matrix (table form):**
| | A | B | C | D |
-|---|---|---|---|---|
+| - | - | - | - | - |
| A | 0 | 1 | 0 | 1 |
| B | 1 | 0 | 1 | 0 |
| C | 0 | 1 | 0 | 1 |
@@ -78,21 +86,56 @@ Example:
Here, the matrix indicates a graph with vertices A to D. For instance, vertex A connects with vertices B and D, hence the respective 1s in the matrix.
-**Benefits**:
+**Matrix:**
+
+```
+4x4
+ Columns →
+ A B C D
+ +---+---+---+---+
+Row A | 0 | 1 | 0 | 1 |
+↓ B | 1 | 0 | 1 | 0 |
+ C | 0 | 1 | 0 | 1 |
+ D | 1 | 0 | 1 | 0 |
+ +---+---+---+---+
+```
+
+**Notes & Variants**
-- Fixed-time ( $O(1)$ ) edge existence checks.
-- Particularly suitable for dense graphs, where the edge-to-vertex ratio is high.
+* When an *undirected graph* is represented, the adjacency matrix is symmetric because the connection from node $i$ to node $j$ also implies a connection from node $j$ to node $i$; if this property is omitted, the matrix will misrepresent mutual relationships, such as a road existing in both directions between two cities.
+* In the case of a *directed graph*, the adjacency matrix does not need to be symmetric since an edge from node $i$ to node $j$ does not guarantee a reverse edge; without this rule, one might incorrectly assume bidirectional links, such as mistakenly treating a one-way street as two-way.
+* A *self-loop* appears as a nonzero entry on the diagonal of the adjacency matrix, indicating that a node is connected to itself; if ignored, the representation will overlook scenarios like a website containing a hyperlink to its own homepage.
-**Drawbacks**:
+**Benefits**
-- Consumes more space for sparse graphs.
-- Traversing neighbors can be slower due to the need to check all vertices.
+* An *edge existence check* in an adjacency matrix takes constant time $O(1)$ because the presence of an edge is determined by directly inspecting a single cell; if this property is absent, the lookup could require scanning a list, as in adjacency list representations where finding whether two cities are directly connected may take longer.
+* With *simple, compact indexing*, the adjacency matrix aligns well with array-based structures, which makes it helpful for GPU optimizations or bitset operations; without this feature, algorithms relying on linear algebra techniques, such as computing paths with matrix multiplication, become less efficient.
+
+**Drawbacks**
+
+* The *space* requirement of an adjacency matrix is always $O(V^2)$, meaning memory usage grows with the square of the number of vertices even if only a few edges exist; if this property is overlooked, sparse networks such as social graphs with millions of users but relatively few connections will be stored inefficiently.
+* For *neighbor iteration*, each vertex requires $O(V)$ time because the entire row of the matrix must be scanned to identify adjacent nodes; without recognizing this cost, tasks like finding all friends of a single user in a large social network could become unnecessarily slow.
+
+**Common Operations (Adjacency Matrix)**
+
+| Operation | Time |
+| ---------------------------------- | -------- |
+| Check if edge $u\leftrightarrow v$ | $O(1)$ |
+| Add/remove edge | $O(1)$ |
+| Iterate neighbors of $u$ | $O(V)$ |
+| Compute degree of $u$ (undirected) | $O(V)$ |
+| Traverse all edges | $O(V^2)$ |
+
+**Space Tips**
+
+* Using a *boolean or bitset matrix* allows each adjacency entry to be stored in just one bit, which reduces memory consumption by a factor of eight compared to storing each entry as a byte; if this method is not applied, representing even moderately sized graphs, such as a network of 10,000 nodes, can require far more storage than necessary.
+* The approach is most useful when the graph is *dense*, the number of vertices is relatively small, or constant-time edge queries are the primary operation; without these conditions, such as in a sparse graph with millions of vertices, the $V^2$ bit requirement remains wasteful and alternative representations like adjacency lists become more beneficial.
#### Adjacency List
-An adjacency list uses a collection (often an array or a linked list) to catalog the neighbors of each vertex. Each vertex points to its own list, enumerating its direct neighbors.
+An adjacency list stores, for each vertex, the list of its neighbors. It’s usually implemented as an array/vector of lists (or vectors), hash sets, or linked structures. For weighted graphs, each neighbor entry also stores the weight.
-Example:
+**Same graph (A–B–C–D–A) as lists:**
```
A -> [B, D]
@@ -101,100 +144,183 @@ C -> [B, D]
D -> [A, C]
```
-This list reflects the same graph as our matrix example. Vertex A's neighbors, for instance, are B and D.
+**“In-memory” view (array of heads + per-vertex chains):**
-**Benefits**:
+```
+Vertices (index) → 0 1 2 3
+Names [ A ] [ B ] [ C ] [ D ]
+ | | | |
+ v v v v
+A-list: head -> [B] -> [D] -> NULL
+B-list: head -> [A] -> [C] -> NULL
+C-list: head -> [B] -> [D] -> NULL
+D-list: head -> [A] -> [C] -> NULL
+```
+
+**Variants & Notes**
+
+* In an *undirected graph* stored as adjacency lists, each edge is represented twice—once in the list of each endpoint—so that both directions can be traversed easily; if this duplication is omitted, traversing from one node to its neighbor may be possible in one direction but not in the other, as with a friendship relation that should be mutual but is stored only once.
+* For a *directed graph*, only out-neighbors are recorded in each vertex’s list, meaning that edges can be followed in their given direction; without a separate structure for in-neighbors, tasks like finding all users who link to a webpage require inefficient scanning of every adjacency list.
+* In a *weighted graph*, each adjacency list entry stores both the neighbor and the associated weight, such as $(\text{destination}, \text{distance})$; if weights are not included, algorithms like Dijkstra’s shortest path cannot be applied correctly.
+* The *order of neighbors* in adjacency lists may be arbitrary, though keeping them sorted allows faster checks for membership; if left unsorted, testing whether two people are directly connected in a social network could require scanning the entire list rather than performing a quicker search.
+
+**Benefits**
-- Space-efficient for sparse graphs, where edges are relatively fewer.
-- Facilitates faster traversal of a vertex's neighbors since the direct neighbors are listed without extraneous checks.
+* The representation is *space-efficient for sparse graphs* because it requires $O(V+E)$ storage, growing only with the number of vertices and edges; without this property, a graph with millions of vertices but relatively few edges, such as a road network, would consume far more memory if stored as a dense matrix.
+* For *neighbor iteration*, the time cost is $O(\deg(u))$, since only the actual neighbors of vertex $u$ are examined; if this benefit is absent, each query would need to scan through all possible vertices, as happens in adjacency matrices when identifying a node’s connections.
+* In *edge traversals and searches*, adjacency lists support breadth-first search and depth-first search efficiently on sparse graphs because only existing edges are processed; without this design, traversals would involve wasted checks on non-edges, making exploration of large but sparsely connected networks, like airline routes, much slower.
-**Drawbacks**:
+**Drawbacks**
-- Edge existence checks can take up to $O(V)$ time in the worst case.
-- Potentially consumes more space for dense graphs.
+* An *edge existence check* in adjacency lists requires $O(\deg(u))$ time in the worst case because the entire neighbor list may need to be scanned; if a hash set is used for each vertex, the expected time improves to $O(1)$, though at the cost of extra memory and overhead, as seen in fast membership tests within large social networks.
+* With respect to *cache locality*, adjacency lists often rely on pointers or scattered memory, which reduces their efficiency on modern hardware; without this drawback, as in dense matrix storage, sequential memory access patterns make repeated operations such as matrix multiplication more beneficial.
+
+**Common Operations (Adjacency List)**
+
+| Operation | Time (typical) |
+| ---------------------------------- | ------------------------------------------- |
+| Check if edge $u\leftrightarrow v$ | $O(\deg(u))$ (or expected $O(1)$ with hash) |
+| Add edge | Amortized $O(1)$ (append to list(s)) |
+| Remove edge | $O(\deg(u))$ (find & delete) |
+| Iterate neighbors of $u$ | $O(\deg(u))$ |
+| Traverse all edges | $O(V + E)$ |
The choice between these (and other) representations often depends on the graph's characteristics and the specific tasks or operations envisioned.
+* Choosing an *adjacency matrix* is helpful when the graph is dense, the number of vertices is moderate, and constant-time edge queries or linear-algebra formulations are beneficial; if this choice is ignored, operations such as repeatedly checking flight connections in a fully connected air network may become slower or harder to express mathematically.
+* Opting for an *adjacency list* is useful when the graph is sparse or when neighbor traversal dominates, as in breadth-first search or shortest-path algorithms; without this structure, exploring a large but lightly connected road network would waste time scanning nonexistent edges.
+
+**Hybrids/Alternatives:**
+
+* With *CSR/CSC (Compressed Sparse Row/Column)* formats, all neighbors of a vertex are stored contiguously in memory, which improves cache locality and enables fast traversals; without this layout, as in basic pointer-based adjacency lists, high-performance analytics on graphs like web link networks would suffer from slower memory access.
+* An *edge list* stores edges simply as $(u,v)$ pairs, making it convenient for graph input, output, and algorithms like Kruskal’s minimum spanning tree; if used for queries such as checking whether two nodes are adjacent, the lack of structure forces scanning the entire list, which becomes inefficient in large graphs.
+* In *hash-based adjacency* structures, each vertex’s neighbor set is managed as a hash table, enabling expected $O(1)$ membership tests; without this tradeoff, checking connections in dense social networks requires linear scans, while the hash-based design accelerates lookups at the cost of extra memory.
+
### Planarity
-Planarity examines whether a graph can be drawn on a flat surface (a plane) without any of its edges crossing. This idea holds significant importance in areas such as circuit design, urban planning, and geography.
+**Planarity** asks: can a graph be drawn on a flat plane so that edges only meet at their endpoints (no crossings)?
-#### What is a Planar Graph?
+Why it matters: layouts of circuits, road networks, maps, and data visualizations often rely on planar drawings.
-A graph is considered **planar** if there exists a representation (also called a drawing) of it on a two-dimensional plane where its edges intersect only at their vertices and nowhere else. Even if a graph is initially drawn with overlaps or crossings, it may still be planar if it is possible to **redraw** (or **rearrange**) it so that no edges intersect in the interior of the drawing.
+#### What is a planar graph?
-An important theoretical result related to planarity is **Kuratowski’s Theorem**, which states that a graph is planar if and only if it does not contain a subgraph that is a subdivision of either $K_5$ (the complete graph on five vertices) or $K_{3,3}$ (the complete bipartite graph on six vertices, partitioned into sets of three).
+A graph is **planar** if it has **some** drawing in the plane with **no edge crossings**. A messy drawing with crossings doesn’t disqualify it—if you can **redraw** it without crossings, it’s planar.
-#### Planar Embedding
+* A crossing-free drawing of a planar graph is called a **planar embedding** (or **plane graph** once embedded).
+* In a planar embedding, the plane is divided into **faces** (regions), including the unbounded **outer face**.
-A **planar embedding** refers to a specific way of drawing a graph on a plane so that none of its edges cross each other in the interior. If such a crossing-free drawing exists, the graph is planar. A related fact is **Euler’s Formula** for planar graphs:
+**Euler’s Formula (connected planar graphs):**
-$$|V| - |E| + |F| = 2$$
+$$
+|V| - |E| + |F| = 2 \quad
+\text{(for \(c\) connected components: } |V|-|E|+|F|=1+c)
+$$
-where:
+#### Kuratowski’s & Wagner’s characterizations
+
+* According to *Kuratowski’s Theorem*, a graph is planar if and only if it does not contain a subgraph that is a subdivision of $K_5$ or $K_{3,3}$; if this condition is not respected, as in a network with five nodes all mutually connected, the graph cannot be drawn on a plane without edge crossings.
+* By *Wagner’s Theorem*, a graph is planar if and only if it has no $K_5$ or $K_{3,3}$ minor, meaning such structures cannot be formed through edge deletions, vertex deletions, or edge contractions; without ruling out these minors, a graph like the complete bipartite structure of three stations each linked to three others cannot be embedded in the plane without overlaps.
+
+These are equivalent “forbidden pattern” views.
+
+#### Handy planar edge bounds (quick tests)
-- $|V|$ is the number of vertices,
-- $|E|$ is the number of edges,
-- $|F|$ is the number of faces (including the "outer" infinite face).
+For a **simple** planar graph with $|V|\ge 3$:
+
+* $|E| \le 3|V| - 6$.
+* If the graph is **bipartite**, then $|E| \le 2|V| - 4$.
+
+These give fast non-planarity proofs:
+
+* $K_5$: $|V|=5, |E|=10 > 3\cdot5-6=9$ ⇒ **non-planar**.
+* $K_{3,3}$: $|V|=6, |E|=9 > 2\cdot6-4=8$ ⇒ **non-planar**.
#### Examples
-I. **Cycle Graphs**
+**I. Cycle graphs $C_n$ (always planar)**
-Simple cycle graphs (triangles, squares, pentagons, hexagons, etc.) are planar because you can easily draw them without any edges crossing. In the square cycle graph $C_4$ example below, there are no intersecting edges:
+A 4-cycle $C_4$:
```
-A-----B
-| |
-C-----D
+A───B
+│ │
+D───C
```
-II. **Complete Graph with Four Vertices ($K_4$)**
+No crossings; faces: 2 (inside + outside).
+
+**II. Complete graph on four vertices $K_4$ (planar)**
-This graph has every vertex connected to every other vertex. Despite having 6 edges, $K_4$ is planar. Its planar drawing can resemble a tetrahedron (triangular pyramid) flattened onto a plane:
+A planar embedding places one vertex inside a triangle:
```
- A
- / \
- B---C
- \ /
- D
+ A
+ / \
+ B───C
+ \ /
+ D
```
-III. **Complete Graph with Five Vertices ($K_5$)**
+All edges meet only at vertices; no crossings.
+
+**III. Complete graph on five vertices $K_5$ (non-planar)**
-$K_5$ has every one of its five vertices connected to the other four, making a total of 10 edges. This graph is **non-planar**: no matter how you try to arrange the vertices and edges, there will always be at least one pair of edges that must cross. A rough sketch illustrating its inherent crossing is shown below:
+No drawing avoids crossings. Even a “best effort” forces at least one:
```
- A
- /|\
- / | \
-B--+--C
- \ | /
- \|/
- D
- |
+A───B
+│╲ ╱│
+│ ╳ │ (some crossing is unavoidable)
+│╱ ╲│
+D───C
+ \ /
E
```
-Attempting to avoid one crossing in $K_5$ inevitably forces another crossing elsewhere, confirming its non-planarity.
+The edge bound $10>9$ (above) certifies non-planarity.
+
+**IV. Complete bipartite $K_{3,3}$ (non-planar)**
+
+Two sets $\{u_1,u_2,u_3\}$ and $\{v_1,v_2,v_3\}$, all cross-set pairs connected:
+
+```
+u1 u2 u3
+│ \ │ \ │ \
+│ \ │ \ │ \
+v1───v2───v3 (many edges must cross in the plane)
+```
+
+The bipartite bound $9>8$ proves non-planarity.
+
+#### How to check planarity in practice
-#### Strategies for Assessing Planarity
+**For small graphs**
-- The **planarity** of a graph refers to whether it can be drawn on a flat surface without any edges crossing each other.
-- **Small graphs** can be tested for planarity by manually rearranging their vertices and edges to check if a crossing-free drawing is possible.
-- **Kuratowski's theorem** states that a graph is planar if it does not contain a subgraph that can be transformed into $K_5$ (a graph with five vertices all connected to each other) or $K_{3,3}$ (a graph with two groups of three vertices, where every vertex in one group connects to every vertex in the other).
-- **$K_5$** is a complete graph with five vertices where every pair of vertices has a direct edge connecting them.
-- **$K_{3,3}$** is a bipartite graph where two sets of three vertices are connected such that each vertex in the first set is connected to all vertices in the second set, with no edges within the same set.
-- **Wagner’s theorem** provides an alternative way to determine planarity, stating that a graph is planar if it does not have $K_5$ or $K_{3,3}$ as a "minor." A minor is a smaller graph formed by deleting edges, deleting vertices, or merging connected vertices.
-- For **larger graphs**, manual testing becomes impractical, and planarity algorithms are often used instead.
-- The **Hopcroft-Tarjan algorithm** is a linear-time method for testing planarity. It uses depth-first search to efficiently decide if a graph can be drawn without crossing edges.
-- The **Boyer-Myrvold algorithm** is another linear-time approach that tests planarity and can also provide an embedding of the graph (a specific way to draw it without crossings) if it is planar.
-- Both **algorithms** are widely used in computer science for applications that involve networks, circuit design, and data visualization, where planarity helps simplify complex structures.
+1. Rearrange vertices and try to remove crossings.
+2. Look for $K_5$ / $K_{3,3}$ (or their subdivisions/minors).
+3. Apply the edge bounds above for quick eliminations.
+
+**For large graphs (efficient algorithms)**
+
+* The *Hopcroft–Tarjan* algorithm uses a depth-first search approach to decide planarity in linear time; without such an efficient method, testing whether a circuit layout can be drawn without wire crossings would take longer on large graphs.
+* The *Boyer–Myrvold* algorithm also runs in linear time but, in addition to deciding planarity, it produces a planar embedding when one exists; if this feature is absent, as in Hopcroft–Tarjan, a separate procedure would be required to actually construct a drawing of a planar transportation network.
+
+Both are widely used in graph drawing, EDA (circuit layout), GIS, and network visualization.
### Traversals
-- When we **traverse** a graph, we visit its vertices in an organized way to make sure we don’t miss any vertices or edges.
+What does it mean to traverse a graph?
+
+Graph traversal **can** be done in a way that visits *all* vertices and edges (like a full DFS/BFS), but it doesn’t *have to*.
+
+* If you start DFS or BFS from a single source vertex, you’ll only reach the **connected component** containing that vertex. Any vertices in other components won’t be visited.
+* Some algorithms (like shortest path searches, A*, or even partial DFS) intentionally stop early, meaning not all vertices or edges are visited.
+* In weighted or directed graphs, you may also skip certain edges depending on the traversal rules.
+
+So the precise way to answer that question is:
+
+> **Graph traversal is a systematic way of exploring vertices and edges, often ensuring complete coverage of the reachable part of the graph — but whether all vertices/edges are visited depends on the algorithm and stopping conditions.**
+
- Graphs, unlike **trees**, don’t have a single starting point like a root. This means we either need to be given a starting vertex or pick one randomly.
- Let’s say we start from a specific vertex, like **$i$**. From there, the traversal explores all connected vertices according to the rules of the chosen method.
- In both **breadth-first search (BFS)** and **depth-first search (DFS)**, the order of visiting vertices depends on how the algorithm is implemented.
@@ -203,525 +329,816 @@ Attempting to avoid one crossing in $K_5$ inevitably forces another crossing els
#### Breadth-First Search (BFS)
-Breadth-First Search (BFS) is a fundamental graph traversal algorithm that explores the vertices of a graph in layers, starting from a specified source vertex. It progresses by visiting all immediate neighbors of the starting point, then the neighbors of those neighbors, and so on.
+Breadth-First Search (BFS) is a fundamental graph traversal algorithm that explores a graph **level by level** from a specified start vertex. It first visits all vertices at distance 1 from the start, then all vertices at distance 2, and so on. This makes BFS the natural choice whenever “closest in number of edges” matters.
To efficiently keep track of the traversal, BFS employs two primary data structures:
-* A queue, typically named `unexplored` or `queue`, to store nodes that are pending exploration.
-* A hash table or a set called `visited` to ensure that we do not revisit nodes.
+* A **queue** (often named `queue` or `unexplored`) that stores vertices pending exploration in **first-in, first-out (FIFO)** order.
+* A **`visited` set** (or boolean array) that records which vertices have already been discovered to prevent revisiting.
-##### Algorithm Steps
+*Useful additions in practice:*
-1. Begin from a starting vertex, $i$.
-2. Mark the vertex $i$ as visited.
-3. Explore each of its neighbors. If the neighbor hasn't been visited yet, mark it as visited and enqueue it in `unexplored`.
-4. Dequeue the front vertex from `unexplored` and repeat step 3.
-5. Continue this process until the `unexplored` queue becomes empty.
+* An optional **`parent` map** to reconstruct shortest paths (store `parent[child] = current` when you first discover `child`).
+* An optional **`dist` map** to record the edge-distance from the start (`dist[start] = 0`, and when discovering `v` from `u`, set `dist[v] = dist[u] + 1`).
-To ensure the algorithm doesn't fall into an infinite loop due to cycles in the graph, it could be useful to mark nodes as visited as soon as they are enqueued. This prevents them from being added to the queue multiple times.
+**Algorithm Steps**
-##### Example
+1. Pick a start vertex $i$.
+2. Set `visited = {i}`, `parent[i] = None`, optionally `dist[i] = 0`, and enqueue $i$ into `queue`.
+3. While `queue` is nonempty, repeat steps 4–5.
+4. Dequeue the front vertex `u`.
+5. For each neighbor `v` of `u`, if `v` is not in `visited`, add it to `visited`, set `parent[v] = u` (and `dist[v] = dist[u] + 1` if tracking), and enqueue `v`.
+6. Stop when the queue is empty.
-```
-Queue: Empty Visited: A, B, C, D, E
+Marking nodes as **visited at the moment they are enqueued** (not when dequeued) is crucial: it prevents the same node from being enqueued multiple times in graphs with cycles or multiple incoming edges.
- A
- / \
- B C
- | |
- D E
-```
+*Reference pseudocode (adjacency-list graph):*
-In this example, BFS started at the top of the graph and worked its way down, visiting nodes in order of their distance from the starting node. The ASCII representation provides a step-by-step visualization of BFS using a queue and a list of visited nodes.
+```
+BFS(G, i):
+ visited = {i}
+ parent = {i: None}
+ dist = {i: 0} # optional
+ queue = [i]
-##### Applications
+ order = [] # optional: visitation order
-BFS is not only used for simple graph traversal. Its applications span multiple domains:
+ while queue:
+ u = queue.pop(0) # dequeue
+ order.append(u)
-1. BFS can determine the **shortest path** in an unweighted graph from a source to all other nodes.
-2. To find all **connected components** in an undirected graph, you can run BFS on every unvisited node.
-3. BFS mirrors the propagation in broadcasting networks, where a message is forwarded to neighboring nodes, and they subsequently forward it to their neighbors.
-4. If during BFS traversal, an already visited node is encountered (and it's not the parent of the current node in traversal), then there exists a cycle in the graph.
+ for v in G[u]: # iterate neighbors
+ if v not in visited:
+ visited.add(v)
+ parent[v] = u
+ dist[v] = dist[u] + 1 # if tracking
+ queue.append(v)
-##### Implementation
+ return order, parent, dist
+```
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/bfs)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/bfs)
+*Sanity notes:*
-#### Depth-First Search (DFS)
+* The *time* complexity of breadth-first search is $O(V+E)$ because each vertex is enqueued once and each edge is examined once; if this property is overlooked, one might incorrectly assume that exploring a large social graph requires quadratic time rather than scaling efficiently with its size.
+* The *space* requirement is $O(V)$ since the algorithm maintains a queue and a visited array, with optional parent or distance arrays if needed; without accounting for this, applying BFS to a network of millions of nodes could be underestimated in memory cost.
+* The order in which BFS visits vertices depends on the *neighbor iteration order*, meaning that traversal results can vary between implementations; if this variation is not recognized, two runs on the same graph—such as exploring a road map—may appear inconsistent even though both are correct BFS traversals.
-Depth-First Search (DFS) is another fundamental graph traversal algorithm, but unlike BFS which traverses level by level, DFS dives deep into the graph, exploring as far as possible along each branch before backtracking.
+**Example**
-To implement DFS, we use two main data structures:
+Graph (undirected) with start at **A**:
-* A stack, either implicitly using the call stack through recursion or explicitly using a data structure. This stack is responsible for tracking vertices that are to be explored.
-* A hash table or set called `visited` to ensure nodes aren't revisited.
+```
+ ┌─────┐
+ │ A │
+ └──┬──┘
+ ┌───┘ └───┐
+ ┌─▼─┐ ┌─▼─┐
+ │ B │ │ C │
+ └─┬─┘ └─┬─┘
+ ┌─▼─┐ ┌─▼─┐
+ │ D │ │ E │
+ └───┘ └───┘
-##### Algorithm Steps
+Edges: A–B, A–C, B–D, C–E
+```
-1. Begin from a starting vertex, $i$.
-2. Mark vertex $i$ as visited.
-3. Visit an unvisited neighbor of $i$, mark it as visited, and move to that vertex.
-4. Repeat the above step until the current vertex has no unvisited neighbors.
-5. Backtrack to the previous vertex and explore other unvisited neighbors.
-6. Continue this process until you've visited all vertices connected to the initial start vertex.
+*Queue/Visited evolution (front → back):*
-Marking nodes as visited as soon as you encounter them is important to avoid infinite loops, particularly in graphs with cycles.
+```
+Step | Dequeued | Action | Queue | Visited
+-----+----------+-------------------------------------------+------------------+----------------
+0 | — | enqueue A | [A] | {A}
+1 | A | discover B, C; enqueue both | [B, C] | {A, B, C}
+2 | B | discover D; enqueue | [C, D] | {A, B, C, D}
+3 | C | discover E; enqueue | [D, E] | {A, B, C, D, E}
+4 | D | no new neighbors | [E] | {A, B, C, D, E}
+5 | E | no new neighbors | [] | {A, B, C, D, E}
+```
-##### Example
+*BFS tree and distances from A:*
```
-Stack: Empty Visited: A, B, D, C, E
+dist[A]=0
+A → B (1), A → C (1)
+B → D (2), C → E (2)
- A
- / \
- B C
- | |
- D E
+Parents: parent[B]=A, parent[C]=A, parent[D]=B, parent[E]=C
+Shortest path A→E: backtrack E→C→A ⇒ A - C - E
```
-In this example, DFS explored as deep as possible along the left side (branch with B and D) of the graph before backtracking and moving to the right side (branch with C and E). The ASCII representation provides a step-by-step visualization of DFS using a stack and a list of visited nodes.
+**Applications**
-##### Applications
+* In *shortest path computation on unweighted graphs*, BFS finds the minimum number of edges from a source to all reachable nodes and allows path reconstruction via a parent map; without this approach, one might incorrectly use Dijkstra’s algorithm, which is slower for unweighted networks such as social connections.
+* For identifying *connected components in undirected graphs*, BFS is run repeatedly from unvisited vertices, with each traversal discovering one full component; without this method, components in a road map or friendship network may remain undetected.
+* When modeling *broadcast or propagation*, BFS naturally mirrors wavefront-like spreading, such as message distribution or infection spread; ignoring this property makes it harder to simulate multi-hop communication in networks.
+* During BFS-based *cycle detection in undirected graphs*, encountering a visited neighbor that is not the current vertex’s parent signals a cycle; without this check, cycles in structures like utility grids may be overlooked.
+* For *bipartite testing*, BFS alternates colors by level, and the appearance of an edge connecting same-colored nodes disproves bipartiteness; without this strategy, verifying whether a task-assignment graph can be split into two groups becomes more complicated.
+* In *multi-source searches*, initializing the queue with several start nodes at distance zero allows efficient nearest-facility queries, such as finding the closest hospital from multiple candidate sites; without this, repeated single-source BFS runs would be less efficient.
+* In *topological sorting of DAGs*, a BFS-like procedure processes vertices of indegree zero using a queue, producing a valid ordering; without this method, scheduling tasks with dependency constraints may require less efficient recursive DFS approaches.
-DFS, with its inherent nature of diving deep, has several intriguing applications:
+**Implementation**
-1. Topological Sorting is used in scheduling tasks, where one task should be completed before another starts.
-2. To find all strongly connected components in a directed graph.
-3. DFS can be employed to find a path between two nodes, though it might not guarantee the shortest path.
-4. If during DFS traversal, an already visited node is encountered (and it's not the direct parent of the current node in traversal), then there's a cycle in the graph.
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/bfs)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/bfs)
-##### Implementation
+*Implementation tip:* For dense graphs or when memory locality matters, an adjacency **matrix** can be used, but the usual adjacency **list** representation is more space- and time-efficient for sparse graphs.
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/dfs)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/dfs)
+#### Depth-First Search (DFS)
-### Shortest paths
+Depth-First Search (DFS) is a fundamental graph traversal algorithm that explores **as far as possible** along each branch before backtracking. Starting from a source vertex, it dives down one neighbor, then that neighbor’s neighbor, and so on—only backing up when it runs out of new vertices to visit.
-A common task when dealing with weighted graphs is to find the shortest route between two vertices, such as from vertex $A$ to vertex $B$. Note that there might not be a unique shortest path, since several paths could have the same length.
+To track the traversal efficiently, DFS typically uses:
-#### Dijkstra's Algorithm
+* A **call stack** via **recursion** *or* an explicit **stack** data structure (LIFO).
+* A **`visited` set** (or boolean array) to avoid revisiting vertices.
-- **Dijkstra's algorithm** is a method to find the shortest paths from a starting vertex to all other vertices in a weighted graph.
-- A **weighted graph** is one where each edge has a numerical value (cost, distance, or time).
-- The algorithm starts at a **starting vertex**, often labeled **A**, and computes the shortest path to every other vertex.
-- It keeps a **tentative distance** for each vertex, representing the current known shortest distance from the start.
-- It repeatedly **selects the vertex** with the smallest tentative distance that hasn't been finalized (or "finished") yet.
-- Once a vertex is selected, the algorithm **relaxes all its edges**: it checks if going through this vertex offers a shorter path to its neighbors.
-- This continues until all vertices are processed, yielding the shortest paths from the starting vertex to every other vertex.
-- **Important**: Dijkstra’s algorithm requires **non-negative edge weights**, or else results can be incorrect.
+*Useful additions in practice:*
-##### Algorithm Steps
+* A **`parent` map** to reconstruct paths and build the DFS tree (`parent[child] = current` on discovery).
+* Optional **timestamps** (`tin[u]` on entry, `tout[u]` on exit) to reason about edge types, topological order, and low-link computations.
+* Optional **`order` lists**: pre-order (on entry) and post-order (on exit).
-**Input**
+**Algorithm Steps**
-- A weighted graph where each edge has a cost or distance
-- A starting vertex `A`
+1. Pick a start vertex $i$.
+2. Initialize `visited[v]=False` for all $v$; optionally set `parent[v]=None`; set a global timer `t=0`.
+3. Start a DFS from $i$ (recursive or with an explicit stack).
+4. On entry to a vertex $u$: set `visited[u]=True`, record `tin[u]=t++` (and keep `parent[u]=None` if $u=i$).
+5. Scan neighbors $v$ of $u$; whenever `visited[v]=False`, set `parent[v]=u` and visit $v$ (recurse/push), then resume scanning $u$’s neighbors.
+6. After all neighbors of $u$ are processed, record `tout[u]=t++` and backtrack (return or pop).
+7. When the DFS from $i$ finishes, if any vertex remains unvisited, choose one and repeat steps 4–6 to cover disconnected components.
+8. Stop when no unvisited vertices remain.
-**Output**
+Mark vertices **when first discovered** (on entry/push) to prevent infinite loops in cyclic graphs.
-- An array `distances` where `distances[v]` is the shortest distance from `A` to vertex `v`
+*Pseudocode (recursive, adjacency list):*
-**Containers and Data Structures**
+```
+time = 0
-- An array `distances`, initialized to `∞` for all vertices except `A`, which is set to `0`
-- A hash table `finished` to mark vertices with confirmed shortest paths
-- A priority queue to efficiently select the vertex with the smallest current distance
+DFS(G, i):
+ visited = set()
+ parent = {i: None}
+ tin = {}
+ tout = {}
+ pre = [] # optional: order on entry
+ post = [] # optional: order on exit
-**Steps**
+ def explore(u):
+ nonlocal time
+ visited.add(u)
+ time += 1
+ tin[u] = time
+ pre.append(u) # preorder
-I. Initialize `distances[A]` to `0`
+ for v in G[u]:
+ if v not in visited:
+ parent[v] = u
+ explore(v)
-II. Initialize `distances[v]` to `∞` for every other vertex `v`
+ time += 1
+ tout[u] = time
+ post.append(u) # postorder
-III. While not all vertices are marked as finished
+ explore(i)
+ return pre, post, parent, tin, tout
+```
-- Select vertex `u` with the smallest `distances[u]` among unfinished vertices
-- Mark `finished[u]` as `true`
-- For each neighbor `w` of `u`, if `distances[u] + weights[u][w]` is less than `distances[w]`, update `distances[w]` to `distances[u] + weights[u][w]`
+*Pseudocode (iterative, traversal order only):*
-##### Step by Step Example
+```
+DFS_iter(G, i):
+ visited = set()
+ parent = {i: None}
+ order = []
+ stack = [i]
-Consider a graph with vertices A, B, C, D, and E, and edges:
+ while stack:
+ u = stack.pop() # take the top
+ if u in visited:
+ continue
+ visited.add(u)
+ order.append(u)
-```
-A-B: 4
-A-C: 2
-C-B: 1
-B-D: 5
-C-D: 8
-C-E: 10
-D-E: 2
+ # Push neighbors in reverse of desired visiting order
+ for v in reversed(G[u]):
+ if v not in visited:
+ parent[v] = u
+ stack.append(v)
+
+ return order, parent
```
-The adjacency matrix looks like this (∞ means no direct edge):
+*Sanity notes:*
-| | A | B | C | D | E |
-|---|----|----|----|----|----|
-| **A** | 0 | 4 | 2 | ∞ | ∞ |
-| **B** | 4 | 0 | 1 | 5 | ∞ |
-| **C** | 2 | 1 | 0 | 8 | 10 |
-| **D** | ∞ | 5 | 8 | 0 | 2 |
-| **E** | ∞ | ∞ | 10 | 2 | 0 |
+* The *time* complexity of DFS is $O(V+E)$ because every vertex and edge is processed a constant number of times; if this property is ignored, one might incorrectly assume exponential growth when analyzing networks like citation graphs.
+* The *space* complexity is $O(V)$, coming from the visited array and the recursion stack (or an explicit stack in iterative form); without recognizing this, applying DFS to very deep structures such as long linked lists could risk stack overflow unless the iterative approach is used.
-**Starting from A**, here’s how Dijkstra’s algorithm proceeds:
+**Example**
-I. Initialize all distances with ∞ except A=0:
+Same graph as the BFS section, start at **A**; assume neighbor order: `B` before `C`, and for `B` the neighbor `D`; for `C` the neighbor `E`.
```
-A: 0
-B: ∞
-C: ∞
-D: ∞
-E: ∞
+ ┌─────────┐
+ │ A │
+ └───┬─┬───┘
+ │ │
+ ┌─────────┘ └─────────┐
+ ▼ ▼
+ ┌─────────┐ ┌─────────┐
+ │ B │ │ C │
+ └───┬─────┘ └───┬─────┘
+ │ │
+ ▼ ▼
+ ┌─────────┐ ┌─────────┐
+ │ D │ │ E │
+ └─────────┘ └─────────┘
+
+Edges: A–B, A–C, B–D, C–E (undirected)
```
-II. From A (distance 0), update neighbors:
+*Recursive DFS trace (pre-order):*
```
-A: 0
-B: 4 (via A)
-C: 2 (via A)
-D: ∞
-E: ∞
+call DFS(A)
+ visit A
+ -> DFS(B)
+ visit B
+ -> DFS(D)
+ visit D
+ return D
+ return B
+ -> DFS(C)
+ visit C
+ -> DFS(E)
+ visit E
+ return E
+ return C
+ return A
```
-III. Pick the smallest unvisited vertex (C with distance 2). Update its neighbors:
-
-- B can be updated to 3 if 2 + 1 < 4
-- D can be updated to 10 if 2 + 8 < ∞
-- E can be updated to 12 if 2 + 10 < ∞
+*Discovery/finish times (one valid outcome):*
```
-A: 0
-B: 3 (via C)
-C: 2
-D: 10 (via C)
-E: 12 (via C)
+Vertex | tin | tout | parent
+-------+-----+------+---------
+A | 1 | 10 | None
+B | 2 | 5 | A
+D | 3 | 4 | B
+C | 6 | 9 | A
+E | 7 | 8 | C
```
-IV. Pick the next smallest unvisited vertex (B with distance 3). Update its neighbors:
-
-- D becomes 8 if 3 + 5 < 10
-- E remains 12 (no direct edge from B to E)
+*Stack/Visited evolution (iterative DFS, top = right):*
```
-A: 0
-B: 3
-C: 2
-D: 8 (via B)
-E: 12
+Step | Action | Stack | Visited
+-----+------------------------------+-----------------------+-----------------
+0 | push A | [A] | {}
+1 | pop A; visit | [] | {A}
+ | push C, B | [C, B] | {A}
+2 | pop B; visit | [C] | {A, B}
+ | push D | [C, D] | {A, B}
+3 | pop D; visit | [C] | {A, B, D}
+4 | pop C; visit | [] | {A, B, D, C}
+ | push E | [E] | {A, B, D, C}
+5 | pop E; visit | [] | {A, B, D, C, E}
```
-V. Pick the next smallest unvisited vertex (D with distance 8). Update its neighbors:
-
-- E becomes 10 if 8 + 2 < 12
+*DFS tree (tree edges shown), with preorder: A, B, D, C, E*
```
-A: 0
-B: 3
-C: 2
-D: 8
-E: 10 (via D)
+A
+├── B
+│ └── D
+└── C
+ └── E
```
-VI. The only remaining vertex is E (distance 10). No further updates are possible.
+**Applications**
-**Final shortest paths from A**:
+* In *path existence and reconstruction*, DFS records parent links so that after reaching a target node, the path can be backtracked to the source; without this, finding an explicit route through a maze-like graph would require re-running the search.
+* For *topological sorting of DAGs*, running DFS and outputting vertices in reverse postorder yields a valid order; if this step is omitted, dependencies in workflows such as build systems cannot be properly sequenced.
+* During *cycle detection*, DFS in undirected graphs reports a cycle when a visited neighbor is not the parent, while in directed graphs the discovery of a back edge to an in-stack node reveals a cycle; without these checks, feedback loops in control systems or task dependencies may go unnoticed.
+* To identify *connected components in undirected graphs*, DFS is launched from every unvisited vertex, with each traversal discovering one component; without this method, clusters in social or biological networks remain hidden.
+* Using *low-link values* in DFS enables detection of bridges (edges whose removal disconnects the graph) and articulation points (vertices whose removal increases components); if these are not identified, critical links in communication or power networks may be overlooked.
+* In *strongly connected components* of directed graphs, algorithms like Tarjan’s and Kosaraju’s use DFS to group vertices where every node is reachable from every other; ignoring this method prevents reliable partitioning of web link graphs or citation networks.
+* For *backtracking and state-space search*, DFS systematically explores decision trees and reverses when hitting dead ends, as in solving puzzles like Sudoku or N-Queens; without DFS, these problems would be approached less efficiently with blind trial-and-error.
+* With *edge classification in directed graphs*, DFS timestamps allow edges to be labeled as tree, back, forward, or cross, which helps analyze structure and correctness; without this classification, reasoning about graph algorithms such as detecting cycles or proving properties becomes more difficult.
-```
-A: 0
-B: 3
-C: 2
-D: 8
-E: 10
-```
+**Implementation**
-##### Optimizing Time Complexity
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/dfs)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/dfs)
-- A basic (array-based) implementation of Dijkstra's algorithm runs in **O(n^2)** time.
-- Using a priority queue (min-heap) to select the vertex with the smallest distance reduces the complexity to **O((V+E) log V)**, where **V** is the number of vertices and **E** is the number of edges.
+*Implementation tips:*
-##### Applications
+* For **very deep** or skewed graphs, prefer the **iterative** form to avoid recursion limits.
+* If neighbor order matters (e.g., lexicographic traversal), control push order (push in reverse for stacks) or sort adjacency lists.
+* For sparse graphs, adjacency **lists** are preferred over adjacency matrices for time/space efficiency.
-- **Internet routing** protocols use it to determine efficient paths for data packets.
-- **Mapping software** (e.g., Google Maps, Waze) employ variations of Dijkstra to compute driving routes.
-- **Telecommunication networks** use it to determine paths with minimal cost.
+### Shortest paths
-##### Implementation
+A common task when dealing with weighted graphs is to find the shortest route between two vertices, such as from vertex $A$ to vertex $B$. Note that there might not be a unique shortest path, since several paths could have the same length.
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/dijkstra)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/dijkstra)
+#### Dijkstra’s Algorithm
-#### Bellman-Ford Algorithm
+Dijkstra’s algorithm computes **shortest paths** from a specified start vertex in a graph with **non-negative edge weights**. It grows a “settled” region outward from the start, always choosing the unsettled vertex with the **smallest known distance** and relaxing its outgoing edges to improve neighbors’ distances.
-- **Bellman-Ford algorithm** is a method for finding the shortest paths from a single starting vertex to all other vertices in a weighted graph.
-- Unlike **Dijkstra’s algorithm**, Bellman-Ford can handle **negative edge weights**, making it more flexible for certain types of graphs.
-- The algorithm works by **repeatedly relaxing all edges** in the graph. Relaxing an edge means updating the current shortest distance to a vertex if a shorter path is found via another vertex.
-- The algorithm performs this **relaxation process** exactly **$V - 1$ times**, where $V$ is the number of vertices. This ensures that every possible shortest path is discovered.
-- After completing $V - 1$ relaxations, the algorithm does one more pass to detect **negative weight cycles**. If any edge can still be relaxed, a negative cycle exists and no finite shortest path is defined.
-- Bellman-Ford’s time complexity is **$O(V \times E)$**, which is generally slower than Dijkstra’s algorithm for large graphs.
+To efficiently keep track of the traversal, Dijkstra’s algorithm employs two primary data structures:
-##### Algorithm Steps
+* A **min-priority queue** (often named `pq`, `open`, or `unexplored`) keyed by each vertex’s current best known distance from the start.
+* A **`dist` map** storing the best known distance to each vertex (∞ initially, except the start), a **`visited`/`finalized` set** to mark vertices whose shortest distance is proven, and a **`parent` map** to reconstruct paths.
-**Input**
+*Useful additions in practice:*
-- A weighted graph with possible negative edge weights
-- A starting vertex `A`
+* A *target-aware early stop* allows Dijkstra’s algorithm to halt once the target vertex is extracted from the priority queue, saving work compared to continuing until all distances are finalized; without this optimization, computing the shortest route between two cities would require processing the entire network unnecessarily.
+* With *decrease-key or lazy insertion* strategies, priority queues that lack a decrease-key operation can still work by inserting updated entries and discarding outdated ones when popped; without this adjustment, distance updates in large road networks would be inefficient or require a more complex data structure.
+* Adding optional *predecessor lists* enables reconstruction of multiple optimal paths or counting the number of shortest routes; if these lists are not maintained, applications like enumerating all equally fast routes between transit stations cannot be supported.
-**Output**
+**Algorithm Steps**
-- An array `distances` where `distances[v]` represents the shortest path from `A` to vertex `v`
+1. Pick a start vertex $i$.
+2. Set `dist[i] = 0` and `parent[i] = None`; for all other vertices $v \ne i$, set `dist[v] = ∞`.
+3. Push $i$ into a min-priority queue keyed by `dist[·]`.
+4. While the priority queue is nonempty, repeat steps 5–8.
+5. Extract the vertex $u$ with the smallest `dist[u]`.
+6. If $u$ is already finalized, continue to step 4; otherwise mark $u$ as finalized.
+7. For each neighbor $v$ of $u$ with edge weight $w(u,v) \ge 0$, test whether `dist[u] + w(u,v) < dist[v]`.
+8. If true, set `dist[v] = dist[u] + w(u,v)`, set `parent[v] = u`, and push $v$ into the priority queue keyed by the new `dist[v]`.
+9. Stop when the queue is empty (all reachable vertices finalized) or, if you have a target, when that target is finalized.
+10. Reconstruct any shortest path by following `parent[·]` backward from the target to $i$.
-**Containers and Data Structures**
+Vertices are **finalized when they are dequeued** (popped) from the priority queue. With **non-negative** weights, once a vertex is popped the recorded `dist` is **provably optimal**.
-- An array `distances`, set to `∞` for all vertices except the start vertex (set to `0`)
-- A `predecessor` array to help reconstruct the actual shortest path
+*Reference pseudocode (adjacency-list graph):*
-**Steps**
+```
+Dijkstra(G, i, target=None):
+ INF = +infinity
+ dist = defaultdict(lambda: INF)
+ parent = {i: None}
+ dist[i] = 0
-I. Initialize `distances[A]` to `0` and `distances[v]` to `∞` for all other vertices `v`
+ pq = MinPriorityQueue()
+ pq.push(i, 0)
-II. Repeat `V - 1` times
+ finalized = set()
-- For every edge `(u, v)` with weight `w`, if `distances[u] + w < distances[v]`, update `distances[v]` to `distances[u] + w` and `predecessor[v]` to `u`
+ while pq:
+ u, du = pq.pop_min() # smallest current distance
+ if u in finalized: # ignore stale entries
+ continue
-III. Check for negative cycles by iterating over all edges `(u, v)` again
+ finalized.add(u)
-- If `distances[u] + w < distances[v]` for any edge, a negative weight cycle exists
+ if target is not None and u == target:
+ break # early exit: target finalized
-##### Step by Step Example
+ for (v, w_uv) in G[u]: # w_uv >= 0
+ alt = du + w_uv
+ if alt < dist[v]:
+ dist[v] = alt
+ parent[v] = u
+ pq.push(v, alt) # decrease-key or lazy insert
-We have vertices A, B, C, D, and E. The edges and weights (including a self-loop on E):
+ return dist, parent
+# Reconstruct path i -> t (if t reachable):
+reconstruct(parent, t):
+ path = []
+ while t is not None:
+ path.append(t)
+ t = parent.get(t)
+ return list(reversed(path))
```
-A-B: 6
-A-C: 7
-B-C: 8
-B-D: -4
-B-E: 5
-C-E: -3
-D-A: 2
-D-C: 7
-E-E: 9
-```
-Adjacency matrix (∞ means no direct edge):
+*Sanity notes:*
+
+* The *time* complexity of Dijkstra’s algorithm depends on the priority queue: $O((V+E)\log V)$ with a binary heap, $O(E+V\log V)$ with a Fibonacci heap, and $O(V^2)$ with a plain array; without this distinction, one might wrongly assume that all implementations scale equally on dense versus sparse road networks.
+* The *space* complexity is $O(V)$, needed to store distance values, parent pointers, and priority queue bookkeeping; if underestimated, running Dijkstra on very large graphs such as nationwide transit systems may exceed available memory.
+* The *precondition* is that all edge weights must be nonnegative, since the algorithm assumes distances only improve as edges are relaxed; if negative weights exist, as in certain financial models with losses, the computed paths can be incorrect and Bellman–Ford must be used instead.
+* In terms of *ordering*, the sequence in which neighbors are processed does not affect correctness, only the handling of ties and slight performance differences; without recognizing this, variations in output order between implementations might be mistakenly interpreted as errors.
-| | A | B | C | D | E |
-|---|----|----|----|----|----|
-| **A** | 0 | 6 | 7 | ∞ | ∞ |
-| **B** | ∞ | 0 | 8 | -4 | 5 |
-| **C** | ∞ | ∞ | 0 | ∞ | -3 |
-| **D** | 2 | ∞ | 7 | 0 | ∞ |
-| **E** | ∞ | ∞ | ∞ | ∞ | 9 |
+**Example**
-**Initialization**:
+Weighted, undirected graph; start at **A**. Edge weights are on the links.
```
-dist[A] = 0
-dist[B] = ∞
-dist[C] = ∞
-dist[D] = ∞
-dist[E] = ∞
+ ┌────────┐
+ │ A │
+ └─┬──┬───┘
+ 4/ │1
+ ┌── │ ──┐
+ ┌─────▼──┐ │ ┌▼──────┐
+ │ B │──┘2 │ C │
+ └───┬────┘ └──┬────┘
+ 1 │ 4 │
+ │ │
+ ┌───▼────┐ 3 ┌──▼───┐
+ │ E │────────│ D │
+ └────────┘ └──────┘
+
+Edges: A–B(4), A–C(1), C–B(2), B–E(1), C–D(4), D–E(3)
```
-**Iteration 1** (relax edges from A):
+*Priority queue / Finalized evolution (front = smallest key):*
```
-dist[B] = 6
-dist[C] = 7
+Step | Pop (u,dist) | Relaxations (v: new dist, parent) | PQ after push | Finalized
+-----+--------------+--------------------------------------------+----------------------------------+----------------
+0 | — | init A: dist[A]=0 | [(A,0)] | {}
+1 | (A,0) | B:4←A , C:1←A | [(C,1), (B,4)] | {A}
+2 | (C,1) | B:3←C , D:5←C | [(B,3), (B,4), (D,5)] | {A,C}
+3 | (B,3) | E:4←B | [(E,4), (B,4), (D,5)] | {A,C,B}
+4 | (E,4) | D:7 via E (no improve; current 5) | [(B,4), (D,5)] | {A,C,B,E}
+5 | (B,4) stale | (ignore; B already finalized) | [(D,5)] | {A,C,B,E}
+6 | (D,5) | — | [] | {A,C,B,E,D}
```
-**Iteration 2** (relax edges from B, then C):
+*Distances and parents (final):*
```
-dist[D] = 2 (6 + (-4))
-dist[E] = 11 (6 + 5)
-dist[E] = 4 (7 + (-3)) // C → E is better
+dist[A]=0 (—)
+dist[C]=1 (A)
+dist[B]=3 (C)
+dist[E]=4 (B)
+dist[D]=5 (C)
+
+Shortest path A→E: A → C → B → E (total cost 4)
+```
+
+*Big-picture view of the expanding frontier:*
+
```
+ Settled set grows outward from A by increasing distance.
+ After Step 1: {A}
+ After Step 2: {A, C}
+ After Step 3: {A, C, B}
+ After Step 4: {A, C, B, E}
+ After Step 6: {A, C, B, E, D} (all reachable nodes done)
+```
+
+**Applications**
+
+* In *single-source shortest paths* with non-negative edge weights, Dijkstra’s algorithm efficiently finds minimum-cost routes in settings like roads, communication networks, or transit systems; without it, travel times or costs could not be computed reliably when distances vary.
+* For *navigation and routing*, stopping the search as soon as the destination is extracted from the priority queue avoids unnecessary work; without this early stop, route planning in a road map continues exploring irrelevant regions of the network.
+* In *network planning and quality of service (QoS)*, Dijkstra selects minimum-latency or minimum-cost routes when weights are additive and non-negative; without this, designing efficient data or logistics paths becomes more error-prone.
+* As a *building block*, Dijkstra underlies algorithms like A* (with zero heuristic), Johnson’s algorithm for all-pairs shortest paths in sparse graphs, and $k$-shortest path variants; without it, these higher-level methods would lack a reliable core procedure.
+* In *multi-source Dijkstra*, initializing the priority queue with several starting nodes at distance zero solves nearest-facility queries, such as finding the closest hospital; without this extension, repeated single-source runs would waste time.
+* As a *label-setting baseline*, Dijkstra provides the reference solution against which heuristics like A*, ALT landmarks, or contraction hierarchies are compared; without this baseline, heuristic correctness and performance cannot be properly evaluated.
+* For *grid pathfinding with terrain costs*, Dijkstra handles non-negative cell costs when no admissible heuristic is available; without it, finding a least-effort path across weighted terrain would require less efficient exhaustive search.
+
+**Implementation**
+
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/dijkstra)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/dijkstra)
+
+*Implementation tip:* If your PQ has no decrease-key, **push duplicates** on improvement and, when popping a vertex, **skip it** if it’s already finalized or if the popped key doesn’t match `dist[u]`. This “lazy” approach is simple and fast in practice.
+
+#### Bellman–Ford Algorithm
+
+Bellman–Ford computes **shortest paths** from a start vertex in graphs that may have **negative edge weights** (but no negative cycles reachable from the start). It works by repeatedly **relaxing** every edge; each full pass can reduce some distances until they stabilize. A final check detects **negative cycles**: if an edge can still be relaxed after $(V-1)$ passes, a reachable negative cycle exists.
+
+To efficiently keep track of the computation, Bellman–Ford employs two primary data structures:
+
+* A **`dist` map** (or array) with the best-known distance to each vertex (initialized to ∞ except the start).
+* A **`parent` map** to reconstruct shortest paths (store `parent[v] = u` when relaxing edge $u\!\to\!v$).
+
+*Useful additions in practice:*
-**Iteration 3** (relax edges from D):
+* With an *edge list*, iterating directly over edges simplifies implementation and keeps updates fast, even if the graph is stored in adjacency lists; without this practice, repeatedly scanning adjacency structures adds unnecessary overhead in each relaxation pass.
+* Using an *early exit* allows termination once a full iteration over edges yields no updates, improving efficiency; without this check, the algorithm continues all $V-1$ passes even on graphs like road networks where distances stabilize early.
+* For *negative-cycle extraction*, if an update still occurs on the $V$-th pass, backtracking through parent links reveals a cycle; without this step, applications such as financial arbitrage detection cannot identify opportunities caused by negative cycles.
+* Adding a *reachability guard* skips edges from vertices with infinite distance, avoiding wasted work on unreached nodes; without this filter, the algorithm needlessly inspects irrelevant edges in disconnected parts of the graph.
+
+**Algorithm Steps**
+
+1. Pick a start vertex $i$.
+2. Set `dist[i] = 0` and `parent[i] = None`; for all other vertices $v \ne i$, set `dist[v] = ∞`.
+3. Do up to $V-1$ passes: in each pass, scan every directed edge $(u,v,w)$; if `dist[u] + w < dist[v]`, set `dist[v] = dist[u] + w` and `parent[v] = u`. If a full pass makes no changes, stop early.
+4. (Optional) Detect negative cycles: if any edge $(u,v,w)$ still satisfies `dist[u] + w < dist[v]`, a reachable negative cycle exists. To extract one, follow `parent` from $v$ for $V$ steps to enter the cycle, then continue until a vertex repeats, collecting the cycle.
+5. To get a shortest path to a target $t$ (when no relevant negative cycle exists), follow `parent[t]` backward to $i$.
+
+*Reference pseudocode (edge list):*
```
-dist[A] = 4 (2 + 2)
-(No update for C since dist[C]=7 is already < 9)
+BellmanFord(V, E, i): # V: set/list of vertices
+ INF = +infinity # E: list of (u, v, w) edges
+ dist = {v: INF for v in V}
+ parent = {v: None for v in V}
+ dist[i] = 0
+
+ # (V-1) relaxation passes
+ for _ in range(len(V) - 1):
+ changed = False
+ for (u, v, w) in E:
+ if dist[u] != INF and dist[u] + w < dist[v]:
+ dist[v] = dist[u] + w
+ parent[v] = u
+ changed = True
+ if not changed:
+ break
+
+ # Negative-cycle check
+ cycle_vertex = None
+ for (u, v, w) in E:
+ if dist[u] != INF and dist[u] + w < dist[v]:
+ cycle_vertex = v
+ break
+
+ return dist, parent, cycle_vertex # cycle_vertex=None if no neg cycle
+
+# Reconstruct shortest path i -> t (if safe):
+reconstruct(parent, t):
+ path = []
+ while t is not None:
+ path.append(t)
+ t = parent[t]
+ return list(reversed(path))
```
-**Iteration 4**:
+*Sanity notes:*
+
+* The *time* complexity of Bellman–Ford is $O(VE)$ because each of the $V-1$ relaxation passes scans all edges; without this understanding, one might underestimate the cost of running it on dense graphs with many edges.
+* The *space* complexity is $O(V)$, needed for storing distance estimates and parent pointers; if this is not accounted for, memory use may be underestimated in large-scale applications such as road networks.
+* The algorithm *handles negative weights* correctly and can also *detect negative cycles* that are reachable from the source; without this feature, Dijkstra’s algorithm would produce incorrect results on graphs with negative edge costs.
+* When a reachable *negative cycle* exists, shortest paths to nodes that can be reached from it are undefined, effectively taking value $-\infty$; without recognizing this, results such as infinitely decreasing profit in arbitrage graphs would be misinterpreted as valid finite paths.
+
+**Example**
+
+Directed, weighted graph; start at **A**. (Negative edges allowed; **no** negative cycles here.)
```
-No changes in this round
+ ┌─────────┐
+ │ A │
+ └──┬───┬──┘
+ 4 │ │ 2
+ │ │
+ ┌──────────▼───┐ -1 ┌─────────┐
+ │ B │ ───────►│ C │
+ └──────┬───────┘ └──┬─────┘
+ 2 │ 5│
+ │ │
+ ┌──────▼──────┐ -3 ┌───▼─────┐
+ │ D │ ◄────── │ E │
+ └──────────────┘ └─────────┘
+
+Also:
+A → C (2)
+C → B (1)
+C → E (3)
+(Edges shown with weights on arrows)
```
-**Final distances from A**:
+*Edges list:*
+
+`A→B(4), A→C(2), B→C(-1), B→D(2), C→B(1), C→D(5), C→E(3), D→E(-3)`
+
+*Relaxation trace (dist after each full pass; start A):*
```
-dist[A] = 0
-dist[B] = 6
-dist[C] = 7
-dist[D] = 2
-dist[E] = 4
+Init (pass 0):
+ dist[A]=0, dist[B]=∞, dist[C]=∞, dist[D]=∞, dist[E]=∞
+
+After pass 1:
+ A=0, B=3, C=2, D=6, E=3
+ (A→B=4, A→C=2; C→B improved B to 3; B→D=5? (via B gives 6); D→E=-3 gives E=3)
+
+After pass 2:
+ A=0, B=3, C=2, D=5, E=2
+ (B→D improved D to 5; D→E improved E to 2)
+
+After pass 3:
+ A=0, B=3, C=2, D=5, E=2 (no changes → early stop)
```
-##### Special Characteristics
+*Parents / shortest paths (one valid set):*
-- It can manage **negative edge weights** but cannot produce valid results when **negative cycles** are present.
-- It is often used when edges can be negative, though it is slower than Dijkstra’s algorithm.
+```
+parent[A]=None
+parent[C]=A
+parent[B]=C
+parent[D]=B
+parent[E]=D
-##### Applications
+Example shortest path A→E:
+A → C → B → D → E with total cost 2 + 1 + 2 + (-3) = 2
+```
-- **Financial arbitrage** detection in currency exchange markets.
-- **Routing** in networks where edges might have negative costs.
-- **Game development** scenarios with penalties or negative terrain effects.
+*Negative-cycle detection (illustration):*
+If we **add** an extra edge `E→C(-4)`, the cycle `C → D → E → C` has total weight `5 + (-3) + (-4) = -2` (negative).
+Bellman–Ford would perform a $V$-th pass and still find an improvement (e.g., relaxing `E→C(-4)`), so it reports a **reachable negative cycle**.
+
+**Applications**
+
+* In *shortest path problems with negative edges*, Bellman–Ford is applicable where Dijkstra or A* fail, such as road networks with toll credits; without this method, these graphs cannot be handled correctly.
+* For *arbitrage detection* in currency or financial markets, converting exchange rates into $\log$ weights makes profit loops appear as negative cycles; without Bellman–Ford, such opportunities cannot be systematically identified.
+* In solving *difference constraints* of the form $x_v - x_u \leq w$, the algorithm checks feasibility by detecting whether any negative cycles exist; without this check, inconsistent scheduling or timing systems may go unnoticed.
+* As a *robust baseline*, Bellman–Ford verifies results of faster algorithms or initializes methods like Johnson’s for all-pairs shortest paths; without it, correctness guarantees in sparse-graph all-pairs problems would be weaker.
+* For *graphs with penalties or credits*, where some transitions decrease accumulated cost, Bellman–Ford models these adjustments accurately; without it, such systems—like transport discounts or energy recovery paths—cannot be represented properly.
##### Implementation
* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/bellman_ford)
* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/bellman_ford)
-
+
+*Implementation tip:* For **all-pairs** on sparse graphs with possible negative edges, use **Johnson’s algorithm**: run Bellman–Ford once from a super-source to reweight edges (no negatives), then run **Dijkstra** from each vertex.
+
#### A* (A-Star) Algorithm
-- **A\*** is an informed search algorithm used for **pathfinding** and **graph traversal**.
-- It is a **best-first search** because it prioritizes the most promising paths first, combining known and estimated costs.
-- The algorithm relies on:
-- **g(n)**: The actual cost from the start node to the current node **n**.
-- **h(n)**: A **heuristic** estimating the cost from **n** to the goal.
-- The total cost function is **f(n) = g(n) + h(n)**, guiding the search toward a potentially optimal path.
-- At each step, A* expands the node with the **lowest f(n)** in the priority queue.
-- The heuristic **h(n)** must be **admissible** (never overestimates the real cost) to guarantee an optimal result.
-- A* terminates when it either reaches the **goal** or exhausts all possibilities if no solution exists.
-- It is efficient for many applications because it balances **exploration** with being **goal-directed**, but its performance depends on the heuristic quality.
-- A* is broadly used in **games**, **robotics**, and **navigation** due to its effectiveness in real-world pathfinding.
+A* is a best-first search that finds a **least-cost path** from a start to a goal by minimizing
-##### Algorithm Steps
+$$
+f(n) = g(n) + h(n),
+$$
-**Input**
+where:
-- A graph
-- A start vertex `A`
-- A goal vertex `B`
-- A heuristic function `h(v)` that estimates the cost from `v` to `B`
+* $g(n)$ = cost from start to $n$ (so far),
+* $h(n)$ = heuristic estimate of the remaining cost from $n$ to the goal.
-**Output**
+If $h$ is **admissible** (never overestimates) and **consistent** (triangle inequality), A* is **optimal** and never needs to “reopen” closed nodes.
-- The shortest path from `A` to `B` if one exists
+**Core data structures**
-**Used Data Structures**
+* The *open set* is a min-priority queue keyed by the evaluation function $f=g+h$, storing nodes pending expansion; without it, selecting the next most promising state in pathfinding would require inefficient linear scans.
+* The *closed set* contains nodes already expanded and finalized, preventing reprocessing; if omitted, the algorithm may revisit the same grid cells or graph states repeatedly, wasting time.
+* The *$g$ map* tracks the best known cost-so-far to each node, ensuring paths are only updated when improvements are found; without it, the algorithm cannot correctly accumulate and compare path costs.
+* The *parent map* stores predecessors so that a complete path can be reconstructed once the target is reached; if absent, the algorithm would output only a final distance without the actual route.
+* An optional *heuristic cache* and *tie-breaker* (such as preferring larger $g$ or smaller $h$ when $f$ ties) can improve efficiency and consistency; without these, the search may expand more nodes than necessary or return different paths under equivalent conditions.
-I. **g(n)**: The best-known cost from the start vertex to vertex `n`
+**Algorithm Steps**
-II. **h(n)**: The heuristic estimate from vertex `n` to the goal
+1. Put `start` in `open` (a min-priority queue by `f`); set `g[start]=0`, `f[start]=h(start)`, `parent[start]=None`; initialize `closed = ∅`.
+2. While `open` is nonempty, repeat steps 3–7.
+3. Pop the node `u` with the smallest `f(u)` from `open`.
+4. If `u` is the goal, reconstruct the path by following `parent` back to `start` and return it.
+5. Add `u` to `closed`.
+6. For each neighbor `v` of `u` with edge cost $w(u,v) \ge 0$, set `tentative = g[u] + w(u,v)`.
+7. If `v` not in `g` or `tentative < g[v]`, set `parent[v]=u`, `g[v]=tentative`, `f[v]=g[v]+h(v)`, and push `v` into `open` (even if it was already there with a worse key).
+8. If the loop ends because `open` is empty, no path exists.
-III. **f(n) = g(n) + h(n)**: The estimated total cost from start to goal via `n`
+*Mark neighbors **when you enqueue them** (by storing their best `g`) to avoid duplicate work; with **consistent** $h$, any node popped from `open` is final and will not improve later.*
-IV. **openSet**: Starting with the initial node, contains nodes to be evaluated
+**Reference pseudocode**
-V. **closedSet**: Contains nodes already fully evaluated
+```
+A_star(G, start, goal, h):
+ open = MinPQ() # keyed by f = g + h
+ open.push(start, h(start))
+ g = {start: 0}
+ parent = {start: None}
+ closed = set()
-VI. **cameFrom**: Structure to record the path taken
+ while open:
+ u = open.pop_min() # node with smallest f
+ if u == goal:
+ return reconstruct_path(parent, goal), g[goal]
-**Steps**
+ closed.add(u)
-I. Add the starting node to the **openSet**
+ for (v, w_uv) in G.neighbors(u): # w_uv >= 0
+ tentative = g[u] + w_uv
+ if v in closed and tentative >= g.get(v, +inf):
+ continue
-II. While the **openSet** is not empty
+ if tentative < g.get(v, +inf):
+ parent[v] = u
+ g[v] = tentative
+ f_v = tentative + h(v)
+ open.push(v, f_v) # decrease-key OR push new entry
-- Get the node `current` in **openSet** with the lowest **f(n)**
-- If `current` is the goal node, reconstruct the path and return it
-- Remove `current` from **openSet** and add it to **closedSet**
-- For each neighbor `n` of `current`, skip it if it is in **closedSet**
-- If `n` is not in **openSet**, add it and compute **g(n)**, **h(n)**, and **f(n)**
-- If a better path to `n` is found, update **cameFrom** for `n`
+ return None, +inf
-III. If the algorithm terminates without finding the goal, no path exists
+reconstruct_path(parent, t):
+ path = []
+ while t is not None:
+ path.append(t)
+ t = parent[t]
+ return list(reversed(path))
+```
-##### Step by Step Example
+*Sanity notes:*
-We have a graph with vertices A, B, C, D, and E:
+* The *time* complexity of A* is worst-case exponential, though in practice it runs much faster when the heuristic $h$ provides useful guidance; without an informative heuristic, the search can expand nearly the entire graph, as in navigating a large grid without directional hints.
+* The *space* complexity is $O(V)$, covering the priority queue and bookkeeping maps, which makes A* memory-intensive; without recognizing this, applications such as robotics pathfinding may exceed available memory on large maps.
+* In *special cases*, A* reduces to Dijkstra’s algorithm when $h \equiv 0$, and further reduces to BFS when all edges have cost 1 and $h \equiv 0$; without this perspective, one might overlook how A* generalizes these familiar shortest-path algorithms.
-```
-A-B: 1
-A-C: 2
-B-D: 3
-C-D: 2
-D-E: 1
-```
+**Visual walkthrough (grid with 4-neighborhood, Manhattan $h$)**
-Heuristic estimates to reach E:
+Legend: `S` start, `G` goal, `#` wall, `.` free, `◉` expanded (closed), `•` frontier (open), `×` final path
```
-h(A) = 3
-h(B) = 2
-h(C) = 2
-h(D) = 1
-h(E) = 0
+Row/Col → 1 2 3 4 5 6 7 8 9
+ ┌────────────────────────────┐
+ 1 │ S . . . . # . . . │
+ 2 │ . # # . . # . # . │
+ 3 │ . . . . . . . # . │
+ 4 │ # . # # . # . . . │
+ 5 │ . . . # . . . # G │
+ └────────────────────────────┘
+Movement cost = 1 per step; 4-dir moves; h = Manhattan distance
```
-Adjacency matrix (∞ = no direct path):
+**Early expansion snapshot (conceptual):**
-| | A | B | C | D | E |
-|---|---|---|---|---|----|
-| **A** | 0 | 1 | 2 | ∞ | ∞ |
-| **B** | ∞ | 0 | ∞ | 3 | ∞ |
-| **C** | ∞ | ∞ | 0 | 2 | ∞ |
-| **D** | ∞ | ∞ | ∞ | 0 | 1 |
-| **E** | ∞ | ∞ | ∞ | ∞ | 0 |
+```
+Step 0:
+Open: [(S, g=0, h=|S-G|, f=g+h)] Closed: {}
+Grid: S is • (on frontier)
-**Initialization**:
+Step 1: pop S → expand neighbors
+Open: [((1,2), g=1, h=?, f=?), ((2,1), g=1, h=?, f=?)]
+Closed: {S}
+Marks: S→ ◉, its valid neighbors → •
-```
-g(A) = 0
-f(A) = g(A) + h(A) = 0 + 3 = 3
-openSet = [A]
-closedSet = []
+Step 2..k:
+A* keeps popping the lowest f, steering toward G.
+Nodes near the straight line to G are preferred over detours around '#'.
```
-Expand **A**:
+**When goal is reached, reconstruct the path:**
```
-f(B) = 0 + 1 + 2 = 3
-f(C) = 0 + 2 + 2 = 4
+Final path (example rendering):
+Row/Col → 1 2 3 4 5 6 7 8 9
+ ┌─────────────────────────────┐
+ 1 │ × × × × . # . . . │
+ 2 │ × # # × × # . # . │
+ 3 │ × × × × × × × # . │
+ 4 │ # . # # × # × × × │
+ 5 │ . . . # × × × # G │
+ └─────────────────────────────┘
+Path length (g at G) equals number of × steps (optimal with admissible/consistent h).
```
-Expand **B** next (lowest f=3):
+**Priority queue evolution (toy example)**
```
-f(D) = g(B) + cost(B,D) + h(D) = 1 + 3 + 1 = 5
+Step | Popped u | Inserted neighbors (v: g,h,f) | Note
+-----+----------+-------------------------------------------------+---------------------------
+0 | — | push S: g=0, h=14, f=14 | S at (1,1), G at (5,9)
+1 | S | (1,2): g=1,h=13,f=14 ; (2,1): g=1,h=12,f=13 | pick (2,1) next
+2 | (2,1) | (3,1): g=2,h=11,f=13 ; (2,2) blocked | ...
+3 | (3,1) | (4,1) wall; (3,2): g=3,h=10,f=13 | still f=13 band
+… | … | frontier slides along the corridor toward G | A* hugs the beeline
```
-Next lowest is **C** (f=4):
+(Exact numbers depend on the specific grid and walls; shown for intuition.)
-```
-f(D) = g(C) + cost(C,D) + h(D) = 2 + 2 + 1 = 5 (no improvement)
-```
+**Heuristic design**
-Expand **D** (f=5):
+For **grids**:
-```
-f(E) = g(D) + cost(D,E) + h(E) = 5 + 1 + 0 = 6
-E is the goal; algorithm stops.
-```
+* *4-dir moves:* $h(n)=|x_n-x_g|+|y_n-y_g|$ (Manhattan).
+* *8-dir (diag cost √2):* **Octile**: $h=\Delta_{\max} + (\sqrt{2}-1)\Delta_{\min}$.
+* *Euclidean* when motion is continuous and diagonal is allowed.
-Resulting path: **A -> B -> D -> E** with total cost **5**.
+For **sliding puzzles (e.g., 8/15-puzzle)**:
-##### Special Characteristics
+**Misplaced tiles* (admissible, weak).
+* *Manhattan sum* (stronger).
+* *Linear conflict / pattern databases* (even stronger).
-- **A\*** finds an optimal path if the heuristic is **admissible**.
-- Edges must have **non-negative weights** for A* to work correctly.
-- A good heuristic drastically improves its efficiency.
+**Admissible vs. consistent**
-##### Applications
+* An *admissible* heuristic satisfies $h(n) \leq h^*(n)$, meaning it never overestimates the true remaining cost, which guarantees that A* finds an optimal path; without admissibility, the algorithm may return a suboptimal route, such as a longer-than-necessary driving path.
+* A *consistent (monotone)* heuristic obeys $h(u) \leq w(u,v) + h(v)$ for every edge, ensuring that $f$-values do not decrease along paths and that once a node is removed from the open set, its $g$-value is final; without consistency, nodes may need to be reopened, increasing complexity in searches like grid navigation.
-- Used in **video games** for enemy AI or player navigation.
-- Employed in **robotics** for motion planning.
-- Integral to **mapping** and **GPS** systems for shortest route calculations.
+**Applications**
-##### Implementation
+* In *pathfinding* for maps, games, and robotics, A* computes shortest or least-risk routes by combining actual travel cost with heuristic guidance; without it, movement planning in virtual or physical environments becomes slower or less efficient.
+* For *route planning* with road metrics such as travel time, distance, or tolls, A* incorporates these costs and constraints into its evaluation; without heuristic search, navigation systems must fall back to slower methods like plain Dijkstra.
+* In *planning and scheduling* tasks, A* serves as a general shortest-path algorithm in abstract state spaces, supporting AI decision-making; without it, solving resource allocation or task sequencing problems may require less efficient exhaustive search.
+* In *puzzle solving* domains such as the 8-puzzle or Sokoban, A* uses problem-specific heuristics to guide the search efficiently; without heuristics, the state space may grow exponentially and become impractical to explore.
+* For *network optimization* problems with nonnegative edge costs, A* applies whenever a useful heuristic is available to speed convergence; without heuristics, computations on communication or logistics networks may take longer than necessary.
+
+**Variants & practical tweaks**
+
+* Viewing *Dijkstra* as A* with $h \equiv 0$ shows that A* generalizes the classic shortest-path algorithm; without this equivalence, the connection between uninformed and heuristic search may be overlooked.
+* In *Weighted A**, the evaluation function becomes $f = g + \varepsilon h$ with $\varepsilon > 1$, trading exact optimality for faster performance with bounded suboptimality; without this variant, applications needing quick approximate routing, like logistics planning, would run slower.
+* The *A*ε / Anytime A** approach begins with $\varepsilon > 1$ for speed and gradually reduces it to converge toward optimal paths; without this strategy, incremental refinement in real-time systems like navigation aids is harder to achieve.
+* With *IDA** (Iterative Deepening A*), the search is conducted by gradually increasing an $f$-cost threshold, greatly reducing memory usage but sometimes increasing runtime; without it, problems like puzzle solving could exceed memory limits.
+* *RBFS and Fringe Search* are memory-bounded alternatives that manage recursion depth or fringe sets more carefully; without these, large state spaces in AI planning can overwhelm storage.
+* In *tie-breaking*, preferring larger $g$ or smaller $h$ when $f$ ties reduces unnecessary re-expansions; without careful tie-breaking, searches on uniform-cost grids may explore more nodes than needed.
+* For the *closed-set policy*, when heuristics are inconsistent, nodes must be reopened if a better $g$ value is found; without allowing this, the algorithm may miss shorter paths, as in road networks with varying travel times.
+
+**Pitfalls & tips**
+
+* The algorithm requires *non-negative edge weights* because A* assumes $w(u,v) \ge 0$; without this, negative costs can cause nodes to be expanded too early, breaking correctness in applications like navigation.
+* If the heuristic *overestimates* actual costs, A* loses its guarantee of optimality; without enforcing admissibility, a routing system may return a path that is faster to compute but longer in distance.
+* With *floating-point precision issues*, comparisons of $f$-values should include small epsilons to avoid instability; without this safeguard, two nearly equal paths may lead to inconsistent queue ordering in large-scale searches.
+* In *state hashing*, equivalent states must hash identically so duplicates are merged properly; without this, search in puzzles or planning domains may blow up due to treating the same state as multiple distinct ones.
+* While *neighbor order* does not affect correctness, it influences performance and the aesthetics of the returned path trace; without considering this, two identical problems might yield very different expansion sequences or outputs.
+
+**Implementation**
* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/a_star)
* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/a_star)
+*Implementation tip:* If your PQ lacks decrease-key, **push duplicates** with improved keys and ignore stale entries when popped (check if popped `g` matches current `g[u]`). This is simple and fast in practice.
+
### Minimal Spanning Trees
Suppose we have a graph that represents a network of houses. Weights represent the distances between vertices, which each represent a single house. All houses must have water, electricity, and internet, but we want the cost of installation to be as low as possible. We need to identify a subgraph of our graph with the following properties:
@@ -732,228 +1149,445 @@ Suppose we have a graph that represents a network of houses. Weights represent t
Such a subgraph is called a minimal spanning tree.
-#### Prim's Algorithm
+#### Prim’s Algorithm
+
+Prim’s algorithm builds a **minimum spanning tree (MST)** of a **weighted, undirected** graph by growing a tree from a start vertex. At each step it adds the **cheapest edge** that connects a vertex **inside** the tree to a vertex **outside** the tree.
+
+To efficiently keep track of the construction, Prim’s algorithm employs two primary data structures:
+
+* A **min-priority queue** (often named `pq`, `open`, or `unexplored`) keyed by a vertex’s **best known connection cost** to the current tree.
+* A **`in_mst`/`visited` set** to mark vertices already added to the tree, plus a **`parent` map** to record the chosen incoming edge for each vertex.
+
+*Useful additions in practice:*
+
+* A *key map* stores, for each vertex, the lightest edge weight connecting it to the current spanning tree, initialized to infinity except for the starting vertex at zero; without this, Prim’s algorithm cannot efficiently track which edges should be added next to grow the tree.
+* With *lazy updates*, when the priority queue lacks a decrease-key operation, improved entries are simply pushed again and outdated ones are skipped upon popping; without this adjustment, priority queues become harder to manage, slowing down minimum spanning tree construction.
+* For *component handling*, if the graph is disconnected, Prim’s algorithm must either restart from each unvisited vertex or seed multiple starts with key values of zero to produce a spanning forest; without this, the algorithm would stop after one component, leaving parts of the graph unspanned.
+
+**Algorithm Steps**
+
+1. Pick a start vertex $i$.
+2. Set `key[i] = 0`, `parent[i] = None`; for all other vertices $v \ne i$, set `key[v] = ∞`; push $i$ into a min-priority queue keyed by `key`.
+3. While the priority queue is nonempty, repeat steps 4–6.
+4. Extract the vertex $u$ with the smallest `key[u]`.
+5. If $u$ is already in the MST, continue; otherwise add $u$ to the MST and, if `parent[u] ≠ None`, record the tree edge `(parent[u], u)`.
+6. For each neighbor $v$ of $u$ with weight $w(u,v)$, if $v$ is not in the MST and $w(u,v) < key[v]$, set `key[v] = w(u,v)`, set `parent[v] = u`, and push $v$ into the priority queue keyed by the new `key[v]`.
+7. Stop when the queue is empty or when all vertices are in the MST (for a connected graph).
+8. The edges $\{(parent[v], v) : v \ne i\}$ form an MST; the MST total weight equals $\sum key[v]$ at the moments when each $v$ is added.
+
+Vertices are **finalized when they are dequeued**: at that moment, `key[u]` is the **minimum** cost to connect `u` to the growing tree (by the **cut property**).
+
+*Reference pseudocode (adjacency-list graph):*
+
+```
+Prim(G, i):
+ INF = +infinity
+ key = defaultdict(lambda: INF)
+ parent = {i: None}
+ key[i] = 0
+
+ pq = MinPriorityQueue() # holds (key[v], v)
+ pq.push((0, i))
+
+ in_mst = set()
+ mst_edges = []
+
+ while pq:
+ ku, u = pq.pop_min() # smallest key
+ if u in in_mst:
+ continue
+ in_mst.add(u)
-- **Prim's Algorithm** is used to find a **minimum spanning tree (MST)**, which is a subset of a graph that connects all its vertices with the smallest total edge weight.
-- It works on a **weighted undirected graph**, meaning the edges have weights, and the direction of edges doesn’t matter.
-- It starts with an **arbitrary vertex** and grows the MST by adding one edge at a time.
-- At each step, it chooses the **smallest weight edge** that connects a vertex in the MST to a vertex not yet in the MST (a **greedy** approach).
-- This process continues until **all vertices** are included.
-- The resulting MST is **connected**, ensuring a path between any two vertices, and the total edge weight is minimized.
-- Using a **priority queue** (min-heap), it can achieve a time complexity of **O(E log V)** with adjacency lists, where E is the number of edges and V is the number of vertices.
-- With an adjacency matrix, the algorithm can be implemented in **O(V^2)** time.
+ if parent[u] is not None:
+ mst_edges.append((parent[u], u, ku))
-##### Algorithm Steps
+ for (v, w_uv) in G[u]: # undirected: each edge seen twice
+ if v not in in_mst and w_uv < key[v]:
+ key[v] = w_uv
+ parent[v] = u
+ pq.push((key[v], v)) # decrease-key or lazy insert
-**Input**
+ return mst_edges, parent, sum(w for (_,_,w) in mst_edges)
+```
-- A connected, undirected graph with weighted edges
-- A start vertex `A`
+*Sanity notes:*
-**Output**
+* The *time* complexity of Prim’s algorithm is $O(E \log V)$ with a binary heap, $O(E + V \log V)$ with a Fibonacci heap, and $O(V^2)$ for the dense-graph adjacency-matrix variant; without knowing this, one might apply the wrong implementation and get poor performance on sparse or dense networks.
+* The *space* complexity is $O(V)$, required for storing the key values, parent pointers, and bookkeeping to build the minimum spanning tree; without this allocation, the algorithm cannot track which edges belong to the MST.
+* The *graph type* handled is a weighted, undirected graph with no restrictions on edge weights being positive; without this flexibility, graphs with negative costs, such as energy-saving transitions, could not be processed.
+* In terms of *uniqueness*, if all edge weights are distinct, the minimum spanning tree is unique; without distinct weights, multiple MSTs may exist, such as in networks where two equally light connections are available.
-- A minimum spanning tree, which is a subset of the edges that connects all vertices together without any cycles and with the minimum total edge weight
+**Example**
-**Containers and Data Structures**
+Undirected, weighted graph; start at **A**. Edge weights shown on links.
-- An array `key[]` to store the minimum reachable edge weight for each vertex. Initially, `key[v] = ∞` for all `v` except the first chosen vertex (set to `0`)
-- A boolean array `mstSet[]` to keep track of whether a vertex is included in the MST. Initially, all values are `false`
-- An array `parent[]` to store the MST. Each `parent[v]` indicates the vertex connected to `v` in the MST
+```
+ ┌────────┐
+ │ A │
+ └─┬──┬───┘
+ 4│ │1
+ │ │
+ ┌───────────┘ └───────────┐
+ │ │
+ ┌────▼────┐ ┌────▼────┐
+ │ B │◄──────2────────│ C │
+ └───┬─────┘ └─────┬───┘
+ 1 │ 4 │
+ │ │
+ ┌───▼────┐ 3 ┌────▼───┐
+ │ E │─────────────────▶│ D │
+ └────────┘ └────────┘
-**Steps**
+Edges: A–B(4), A–C(1), C–B(2), B–E(1), C–D(4), D–E(3)
+```
-I. Start with an arbitrary node as the initial MST node
+*Frontier (keys) / In-tree evolution (min at front):*
-II. While there are vertices not yet included in the MST
+```
+Legend: key[v] = cheapest known connection to tree; parent[v] = chosen neighbor
-- Pick a vertex `v` with the smallest `key[v]`
-- Include `v` in `mstSet[]`
-- For each neighboring vertex `u` of `v` not in the MST
-- If the weight of edge `(u, v)` is less than `key[u]`, update `key[u]` and set `parent[u]` to `v`
+Step | Action | PQ (key:vertex) after push | In MST | Updated keys / parents
+-----+---------------------------------+------------------------------------+--------+-------------------------------
+0 | init at A | [0:A] | {} | key[A]=0, others=∞
+1 | pop A → add | [1:C, 4:B] | {A} | key[C]=1 (A), key[B]=4 (A)
+2 | pop C → add | [2:B, 4:D, 4:B] | {A,C} | key[B]=min(4,2)=2 (C), key[D]=4 (C)
+3 | pop B(2) → add | [1:E, 4:D, 4:B] | {A,C,B}| key[E]=1 (B)
+4 | pop E(1) → add | [3:D, 4:D, 4:B] | {A,C,B,E}| key[D]=min(4,3)=3 (E)
+5 | pop D(3) → add | [4:D, 4:B] | {A,C,B,E,D}| done
+```
-III. The MST is formed using the `parent[]` array once all vertices are included
+*MST edges chosen (with weights):*
-##### Step by Step Example
+```
+A—C(1), C—B(2), B—E(1), E—D(3)
+Total weight = 1 + 2 + 1 + 3 = 7
+```
-Consider a simple graph with vertices **A**, **B**, **C**, **D**, and **E**. The edges with weights are:
+*Resulting MST (tree edges only):*
```
-A-B: 2
-A-C: 3
-B-D: 1
-B-E: 3
-C-D: 4
-C-E: 5
-D-E: 2
+A
+└── C (1)
+ └── B (2)
+ └── E (1)
+ └── D (3)
```
-The adjacency matrix for the graph (using ∞ where no direct edge exists) is:
+**Applications**
+
+* In *network design*, Prim’s or Kruskal’s MST construction connects all sites such as offices, cities, or data centers with the least total cost of wiring, piping, or fiber; without using MSTs, infrastructure plans risk including redundant and more expensive links.
+* As an *approximation for the traveling salesman problem (TSP)*, building an MST and performing a preorder walk of it yields a tour within twice the optimal length for metric TSP; without this approach, even approximate solutions for large instances may be much harder to obtain.
+* In *clustering with single linkage*, removing the $k-1$ heaviest edges of the MST partitions the graph into $k$ clusters; without this technique, hierarchical clustering may require recomputing pairwise distances repeatedly.
+* For *image processing and segmentation*, constructing an MST over pixels or superpixels highlights low-contrast boundaries as cut edges; without MST-based grouping, segmentations may fail to respect natural intensity or color edges.
+* In *map generalization and simplification*, the MST preserves a connectivity backbone with minimal redundancy, reducing complexity while maintaining essential routes; without this, simplified maps may show excessive or unnecessary detail.
+* In *circuit design and VLSI*, MSTs minimize interconnect length under simple wiring models, supporting efficient layouts; without this method, chip designs may consume more area and power due to avoidable wiring overhead.
+
+##### Implementation
+
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/prim)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/prim)
+
+*Implementation tip:*
+For **dense graphs** ($E \approx V^2$), skip heaps: store `key` in an array and, at each step, scan all non-MST vertices to pick the minimum `key` in $O(V)$. Overall $O(V^2)$ but often **faster in practice** on dense inputs due to low overhead.
-| | A | B | C | D | E |
-|---|---|---|---|---|---|
-| **A** | 0 | 2 | 3 | ∞ | ∞ |
-| **B** | 2 | 0 | ∞ | 1 | 3 |
-| **C** | 3 | ∞ | 0 | 4 | 5 |
-| **D** | ∞ | 1 | 4 | 0 | 2 |
-| **E** | ∞ | 3 | 5 | 2 | 0 |
+#### Kruskal’s Algorithm
-Run Prim's algorithm starting from vertex **A**:
+Kruskal’s algorithm builds a **minimum spanning tree (MST)** for a **weighted, undirected** graph by sorting all edges by weight (lightest first) and repeatedly adding the next lightest edge that **does not create a cycle**. It grows the MST as a forest of trees that gradually merges until all vertices are connected.
-I. **Initialization**
+To efficiently keep track of the construction, Kruskal’s algorithm employs two primary data structures:
+
+* A *sorted edge list* arranged in ascending order of weights ensures that Kruskal’s algorithm always considers the lightest available edge next; without this ordering, the method cannot guarantee that the resulting spanning tree has minimum total weight.
+* A *Disjoint Set Union (DSU)*, or Union–Find structure, tracks which vertices belong to the same tree and prevents cycles by only uniting edges from different sets; without this mechanism, the algorithm could inadvertently form cycles instead of building a spanning tree.
+
+*Useful additions in practice:*
+
+* Using *Union–Find with path compression and union by rank/size* enables near-constant-time merge and find operations, making Kruskal’s algorithm efficient; without these optimizations, edge processing in large graphs such as communication networks would slow down significantly.
+* Applying an *early stop* allows the algorithm to terminate once $V-1$ edges have been added in a connected graph, since the MST is then complete; without this, unnecessary edges are still considered, adding avoidable work.
+* Enforcing *deterministic tie-breaking* ensures that when multiple edges share equal weights, the same MST is consistently produced; without this, repeated runs on the same weighted graph might yield different but equally valid spanning trees, complicating reproducibility.
+* On *disconnected graphs*, Kruskal’s algorithm naturally outputs a minimum spanning forest with one tree per component; without this property, handling graphs such as multiple separate road systems would require additional adjustments.
+
+**Algorithm Steps**
+
+1. Gather all edges $E=\{(u,v,w)\}$ and sort them by weight $w$ in ascending order.
+2. Initialize a DSU with each vertex in its own set: `parent[v]=v`, `rank[v]=0`.
+3. Traverse the edges in the sorted order.
+4. For the current edge $(u,v,w)$, compute `ru = find(u)` and `rv = find(v)` in the DSU.
+5. If `ru ≠ rv`, add $(u,v,w)$ to the MST and `union(ru, rv)`.
+6. If `ru = rv`, skip the edge (it would create a cycle).
+7. Continue until $V-1$ edges have been chosen (connected graph) or until all edges are processed (forest).
+8. The chosen edges form the MST; the total weight is the sum of their weights.
+
+By the **cycle** and **cut** properties of MSTs, selecting the minimum-weight edge that crosses any cut between components is always safe; rejecting edges that close a cycle preserves optimality.
+
+*Reference pseudocode (edge list + DSU):*
```
-Chosen vertex: A
-Not in MST: B, C, D, E
+Kruskal(V, E):
+ # V: iterable of vertices
+ # E: list of edges (u, v, w) for undirected graph
+
+ sort E by weight ascending
+
+ make_set(v) for v in V # DSU init: parent[v]=v, rank[v]=0
+
+ mst_edges = []
+ total = 0
+
+ for (u, v, w) in E:
+ if find(u) != find(v):
+ union(u, v)
+ mst_edges.append((u, v, w))
+ total += w
+ if len(mst_edges) == len(V) - 1: # early stop if connected
+ break
+
+ return mst_edges, total
+
+# Union-Find helpers (path compression + union by rank):
+find(x):
+ if parent[x] != x:
+ parent[x] = find(parent[x])
+ return parent[x]
+
+union(x, y):
+ rx, ry = find(x), find(y)
+ if rx == ry: return
+ if rank[rx] < rank[ry]:
+ parent[rx] = ry
+ elif rank[rx] > rank[ry]:
+ parent[ry] = rx
+ else:
+ parent[ry] = rx
+ rank[rx] += 1
```
-II. **Pick the smallest edge from A**
+*Sanity notes:*
+
+* The *time* complexity of Kruskal’s algorithm is dominated by sorting edges, which takes $O(E \log E)$, or equivalently $O(E \log V)$, while DSU operations run in near-constant amortized time; without recognizing this, one might wrongly attribute the main cost to the union–find structure rather than sorting.
+* The *space* complexity is $O(V)$ for the DSU arrays and $O(E)$ to store the edges; without this allocation, the algorithm cannot track connectivity or efficiently access candidate edges.
+* With respect to *weights*, Kruskal’s algorithm works on undirected graphs with either negative or positive weights; without this flexibility, cases like networks where some connections represent cost reductions could not be handled.
+* Regarding *uniqueness*, if all edge weights are distinct, the MST is guaranteed to be unique; without distinct weights, multiple equally valid minimum spanning trees may exist, such as in graphs where two different links have identical costs.
+
+**Example**
+
+Undirected, weighted graph (we’ll draw the key edges clearly and list the rest).
+Start with all vertices as separate sets: `{A} {B} {C} {D} {E} {F}`.
```
-Closest vertex is B with a weight of 2.
-MST now has: A, B
-Not in MST: C, D, E
+Top row: A────────4────────B────────2────────C
+ │ │ │
+ │ │ │
+ 7 3 5
+ │ │ │
+Bottom row: F────────1────────E────────6────────D
+
+Other edges (not all drawn to keep the picture clean):
+A–C(4), B–D(5), C–E(5), D–E(6), D–F(2)
```
-III. **From A and B, pick the smallest edge**
+*Sorted edge list (ascending):*
+`E–F(1), B–C(2), D–F(2), B–E(3), A–B(4), A–C(4), B–D(5), C–D(5), C–E(5), D–E(6), A–F(7)`
+
+*Union–Find / MST evolution (take the edge if it connects different sets):*
```
-Closest vertex is D (from B) with a weight of 1.
-MST now has: A, B, D
-Not in MST: C, E
+Step | Edge (w) | Find(u), Find(v) | Action | Components after union | MST so far | Total
+-----+-----------+-------------------+------------+-----------------------------------+----------------------------+------
+ 1 | E–F (1) | {E}, {F} | TAKE | {E,F} {A} {B} {C} {D} | [E–F(1)] | 1
+ 2 | B–C (2) | {B}, {C} | TAKE | {E,F} {B,C} {A} {D} | [E–F(1), B–C(2)] | 3
+ 3 | D–F (2) | {D}, {E,F} | TAKE | {B,C} {D,E,F} {A} | [E–F(1), B–C(2), D–F(2)] | 5
+ 4 | B–E (3) | {B,C}, {D,E,F} | TAKE | {A} {B,C,D,E,F} | [..., B–E(3)] | 8
+ 5 | A–B (4) | {A}, {B,C,D,E,F} | TAKE | {A,B,C,D,E,F} (all connected) | [..., A–B(4)] | 12
+ | (stop: we have V−1 = 5 edges for 6 vertices)
```
-IV. **Next smallest edge from A, B, or D**
+*Resulting MST edges and weight:*
```
-Closest vertex is E (from D) with a weight of 2.
-MST now has: A, B, D, E
-Not in MST: C
+E–F(1), B–C(2), D–F(2), B–E(3), A–B(4) ⇒ Total = 1 + 2 + 2 + 3 + 4 = 12
```
-V. **Pick the final vertex**
+*Clean MST view (tree edges only):*
```
-The closest remaining vertex is C (from A) with a weight of 3.
-MST now has: A, B, D, E, C
+A
+└── B (4)
+ ├── C (2)
+ └── E (3)
+ └── F (1)
+ └── D (2)
```
-The MST includes the edges: **A-B (2), B-D (1), D-E (2),** and **A-C (3)**, with a total weight of **8**.
+**Applications**
-##### Special Characteristics
+* In *network design*, Kruskal’s algorithm builds the least-cost backbone, such as roads, fiber, or pipelines, that connects all sites with minimal total expense; without MST construction, the resulting infrastructure may include redundant and costlier links.
+* For *clustering with single linkage*, constructing the MST and then removing the $k-1$ heaviest edges partitions the graph into $k$ clusters; without this method, grouping data points into clusters may require repeated and slower distance recalculations.
+* In *image segmentation*, applying Kruskal’s algorithm to pixel or superpixel graphs groups regions by intensity or feature similarity through MST formation; without MST-based grouping, boundaries between regions may be less well aligned with natural contrasts.
+* As an *approximation for the metric traveling salesman problem*, building an MST and performing a preorder walk (with shortcutting) yields a tour at most twice the optimal length; without this approach, near-optimal solutions would be harder to compute efficiently.
+* In *circuit and VLSI layout*, Kruskal’s algorithm finds minimal interconnect length under simplified wiring models; without this, designs may require more area and energy due to unnecessarily long connections.
+* For *maze generation*, a randomized Kruskal process selects edges in random order while maintaining acyclicity, producing mazes that remain connected without loops; without this structure, generated mazes could contain cycles or disconnected regions.
-- It always selects the smallest edge that can connect a new vertex to the existing MST.
-- Different choices of starting vertex can still result in the same total MST weight (though the exact edges might differ if multiple edges have the same weight).
-- With adjacency lists and a priority queue, the time complexity is **O(E log V)**; with an adjacency matrix, it is **O(V^2)**.
+**Implementation**
-##### Applications
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/kruskal)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/kruskal)
-- **Network design**: Building telecommunication networks with minimal cable length.
-- **Road infrastructure**: Constructing roads, tunnels, or bridges at minimal total cost.
-- **Utility services**: Designing water, electrical, or internet infrastructure to connect all locations at minimum cost.
+*Implementation tip:*
+On huge graphs that **stream from disk**, you can **external-sort** edges by weight, then perform a single pass with DSU. For reproducibility across platforms, **stabilize** sorting by `(weight, min(u,v), max(u,v))`.
-##### Implementation
+### Topological Sort
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/prim)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/prim)
+Topological sort orders the vertices of a **directed acyclic graph (DAG)** so that **every directed edge** $u \rightarrow v$ goes **from left to right** in the order (i.e., $u$ appears before $v$). It’s the canonical tool for scheduling tasks with dependencies.
-#### Kruskal's Algorithm
+To efficiently keep track of the process (Kahn’s algorithm), we use:
-- **Kruskal's Algorithm** is used to find a **minimum spanning tree (MST)** in a connected, undirected graph with weighted edges.
-- It **sorts all edges** from smallest to largest by weight.
-- It **adds edges** one by one to the MST if they do not form a cycle.
-- **Cycle detection** is managed by a **disjoint-set** (union-find) data structure, which helps quickly determine if two vertices belong to the same connected component.
-- If adding an edge connects two different components, it is safe to include; if both vertices are already in the same component, including that edge would create a cycle and is skipped.
-- The process continues until the MST has **V-1** edges, where **V** is the number of vertices.
-- Its time complexity is **O(E \log E)**, dominated by sorting the edges, while union-find operations typically take near-constant time (**O(α(V))**, where α is the inverse Ackermann function).
+* A **queue** (or min-heap if you want lexicographically smallest order) holding all vertices with **indegree = 0** (no unmet prerequisites).
+* An **`indegree` map/array** that counts for each vertex how many prerequisites remain.
+* An **`order` list** to append vertices as they are “emitted.”
-##### Algorithm Steps
+*Useful additions in practice:*
-**Input**
+* Maintaining a *visited count* or tracking the length of the output order lets you detect cycles, since producing fewer than $V$ vertices indicates that some could not be placed due to a cycle; without this check, algorithms like Kahn’s may silently return incomplete results on cyclic task graphs.
+* Using a *min-heap* instead of a simple FIFO queue ensures that, among available candidates, the smallest-indexed vertex is always chosen, yielding the lexicographically smallest valid topological order; without this modification, the output order depends on arbitrary queueing, which may vary between runs.
+* A *DFS-based alternative* computes a valid topological order by recording vertices in reverse postorder, also in $O(V+E)$ time, while detecting cycles via a three-color marking or recursion stack; without DFS, cycle detection must be handled separately in Kahn’s algorithm.
-- A connected, undirected graph with weighted edges
+**Algorithm Steps (Kahn’s algorithm)**
-**Output**
+1. Compute `indegree[v]` for every vertex $v$; set `order = []`.
+2. Initialize a queue `Q` with all vertices of indegree 0.
+3. While `Q` is nonempty, repeat steps 4–6.
+4. Dequeue a vertex `u` from `Q` and append it to `order`.
+5. For each outgoing edge `u → v`, decrement `indegree[v]` by 1.
+6. If `indegree[v]` becomes 0, enqueue `v` into `Q`.
+7. If `len(order) < V` at the end, a cycle exists and no topological order; otherwise `order` is a valid topological ordering.
-- A subset of edges forming a MST, ensuring all vertices are connected with no cycles and minimal total weight
+*Reference pseudocode (adjacency-list graph):*
-**Containers and Data Structures**
+```
+TopoSort_Kahn(G):
+ # G[u] = iterable of neighbors v with edge u -> v
+ V = all_vertices(G)
+ indeg = {v: 0 for v in V}
+ for u in V:
+ for v in G[u]:
+ indeg[v] += 1
-- A list or priority queue to sort the edges by weight
-- A `disjoint-set (union-find)` structure to manage and merge connected components
+ Q = Queue()
+ for v in V:
+ if indeg[v] == 0:
+ Q.enqueue(v)
-**Steps**
+ order = []
-I. Sort all edges in increasing order of their weights
+ while not Q.empty():
+ u = Q.dequeue()
+ order.append(u)
+ for v in G[u]:
+ indeg[v] -= 1
+ if indeg[v] == 0:
+ Q.enqueue(v)
-II. Initialize a forest where each vertex is its own tree
+ if len(order) != len(V):
+ return None # cycle detected
+ return order
+```
-III. Iterate through the sorted edges
+*Sanity notes:*
-- If the edge `(u, v)` connects two different components, include it in the MST and perform a `union` of the sets
-- If it connects vertices in the same component, skip it
+* The *time* complexity of topological sorting is $O(V+E)$ because each vertex is enqueued exactly once and every edge is processed once when its indegree decreases; without this efficiency, ordering tasks in large dependency graphs would be slower.
+* The *space* complexity is $O(V)$, required for storing indegree counts, the processing queue, and the final output order; without allocating this space, the algorithm cannot track which vertices are ready to be placed.
+* The required *input* is a directed acyclic graph (DAG), since if a cycle exists, no valid topological order is possible; without this restriction, attempts to schedule cyclic dependencies, such as tasks that mutually depend on each other, will fail.
-IV. Once `V-1` edges have been added, the MST is complete
+**Example**
-##### Step by Step Example
+DAG; we’ll start with all indegree-0 vertices. (Edges shown as arrows.)
-Consider a graph with vertices **A**, **B**, **C**, **D**, and **E**. The weighted edges are:
+```
+ ┌───────┐
+ │ A │
+ └───┬───┘
+ │
+ │
+ ┌───────┐ ┌───▼───┐ ┌───────┐
+ │ B │──────────│ C │──────────│ D │
+ └───┬───┘ └───┬───┘ └───┬───┘
+ │ │ │
+ │ │ │
+ │ ┌───▼───┐ │
+ │ │ E │──────────────┘
+ │ └───┬───┘
+ │ │
+ │ │
+ ┌───▼───┐ ┌───▼───┐
+ │ G │ │ F │
+ └───────┘ └───────┘
+Edges:
+A→C, B→C, C→D, C→E, E→D, B→G
```
-A-B: 2
-A-C: 3
-B-D: 1
-B-E: 3
-C-D: 4
-C-E: 5
-D-E: 2
+
+*Initial indegrees:*
+
+```
+indeg[A]=0, indeg[B]=0, indeg[C]=2, indeg[D]=2, indeg[E]=1, indeg[F]=0, indeg[G]=1
```
-The adjacency matrix (∞ indicates no direct edge):
+*Queue/Indegree evolution (front → back; assume we keep the queue **lexicographically** by using a min-heap):*
-| | A | B | C | D | E |
-|---|---|---|---|---|---|
-| **A** | 0 | 2 | 3 | ∞ | ∞ |
-| **B** | 2 | 0 | ∞ | 1 | 3 |
-| **C** | 3 | ∞ | 0 | 4 | 5 |
-| **D** | ∞ | 1 | 4 | 0 | 2 |
-| **E** | ∞ | 3 | 5 | 2 | 0 |
+```
+Step | Pop u | Emit order | Decrease indeg[...] | Newly 0 → Enqueue | Q after
+-----+-------+--------------------+------------------------------+-------------------+-----------------
+0 | — | [] | — | A, B, F | [A, B, F]
+1 | A | [A] | C:2→1 | — | [B, F]
+2 | B | [A, B] | C:1→0, G:1→0 | C, G | [C, F, G]
+3 | C | [A, B, C] | D:2→1, E:1→0 | E | [E, F, G]
+4 | E | [A, B, C, E] | D:1→0 | D | [D, F, G]
+5 | D | [A, B, C, E, D] | — | — | [F, G]
+6 | F | [A, B, C, E, D, F] | — | — | [G]
+7 | G | [A, B, C, E, D, F, G] | — | — | []
+```
+
+*A valid topological order:*
+`A, B, C, E, D, F, G` (others like `B, A, C, E, D, F, G` are also valid.)
-**Sort edges** by weight:
+*Clean left-to-right view (one possible ordering):*
```
-B-D: 1
-A-B: 2
-D-E: 2
-A-C: 3
-B-E: 3
-C-D: 4
-C-E: 5
+A B F C E D G
+│ │ │ │ │
+└──►└──► └──►└──►└──► (all arrows go left→right)
```
-1. **Pick B-D (1)**: Include it. MST has {B-D}, weight = 1.
-2. **Pick A-B (2)**: Include it. MST has {B-D, A-B}, weight = 3.
-3. **Pick D-E (2)**: Include it. MST has {B-D, A-B, D-E}, weight = 5.
-4. **Pick A-C (3)**: Include it. MST has {B-D, A-B, D-E, A-C}, weight = 8.
-5. **Pick B-E (3)**: Would form a cycle (B, D, E already connected), skip.
-6. **Pick C-D (4)**: Would form a cycle (C, D already connected), skip.
-7. **Pick C-E (5)**: Would form a cycle as well, skip.
+**Cycle detection (why it fails on cycles)**
-The MST edges are **B-D, A-B, D-E, and A-C**, total weight = **8**.
+If there’s a cycle, some vertices **never** reach indegree 0. Example:
-##### Special Characteristics
+```
+ ┌─────┐ ┌─────┐
+ │ X │ ───► │ Y │
+ └──┬──┘ └──┬──┘
+ └───────────►┘
+ (Y ───► X creates a cycle)
+```
-- It always picks the **smallest available edge** that won't create a cycle.
-- In case of a **tie**, any equally weighted edge can be chosen.
-- The approach is particularly efficient for **sparse graphs**.
-- Sorting edges takes **O(E \log E)** time, and disjoint-set operations can be considered almost **O(1)** on average.
+Here `indeg[X]=indeg[Y]=1` initially; `Q` starts empty ⇒ `order=[]` and `len(order) < V` ⇒ **cycle reported**.
-##### Applications
+**Applications**
-- **Network design**: Connecting servers or cities using minimal cable length.
-- **Infrastructure**: Building road systems, water lines, or power grids with the smallest total cost.
-- **Any MST requirement**: Ensuring connectivity among all nodes at minimum cost.
+* In *build systems and compilation*, topological sorting ensures that each file is compiled only after its prerequisites are compiled; without it, a build may fail by trying to compile a module before its dependencies are available.
+* For *course scheduling*, topological order provides a valid sequence in which to take courses respecting prerequisite constraints; without it, students may be assigned courses they are not yet eligible to take.
+* In *data pipelines and DAG workflows* such as Airflow or Spark, tasks are executed when their inputs are ready by following a topological order; without this, pipeline stages might run prematurely and fail due to missing inputs.
+* For *dependency resolution* in package managers or container systems, topological sorting installs components in an order that respects their dependencies; without it, software may be installed in the wrong sequence and break.
+* In *dynamic programming on DAGs*, problems like longest path, shortest path, or path counting are solved efficiently by processing vertices in topological order; without this ordering, subproblems may be computed before their dependencies are solved.
+* For *circuit evaluation or spreadsheets*, topological order ensures that each cell or net is evaluated only after its referenced inputs; without it, computations could use undefined or incomplete values.
-##### Implementation
+**Implementation**
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/kruskal)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/kruskal)
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/topological_sort)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/topological_sort/kruskal)
+
+*Implementation tips:*
+
+* Use a **deque** for FIFO behavior; use a **min-heap** to get the **lexicographically smallest** topological order.
+* When the graph is large and sparse, store adjacency as **lists** and compute indegrees in one pass for $O(V+E)$.
+* **DFS variant** (brief): color states `0=unseen,1=visiting,2=done`; on exploring `u`, mark `1`; DFS to neighbors; if you see `1` again, there’s a cycle; on finish, push `u` to a stack. Reverse the stack for the order.
diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
new file mode 100644
index 0000000..576bf9b
--- /dev/null
+++ b/notes/greedy_algorithms.md
@@ -0,0 +1,804 @@
+## Greedy algorithms
+
+Greedy algorithms build a solution one step at a time. At each step, grab the option that looks best *right now* by some simple rule (highest value, earliest finish, shortest length, etc.). Keep it if it doesn’t break the rules of the problem.
+
+1. Sort by your rule (the “key”).
+2. Scan items in that order.
+3. If adding this item keeps the partial answer valid, keep it.
+4. Otherwise skip it.
+
+Picking the best “now” doesn’t obviously give the best “overall.” The real work is showing that these local choices still lead to a globally best answer.
+
+**Two proof tricks you’ll see a lot:**
+
+* *Exchange argument.* Take any optimal solution that disagrees with greedy at the first point. Show you can “swap in” the greedy choice there without making the solution worse or breaking feasibility. Do this repeatedly and you morph some optimal solution into the greedy one—so greedy must be optimal.
+* *Loop invariant.* Write down a sentence that’s true after every step of the scan (e.g., “the current set is feasible and as good as any other set built from the items we’ve seen”). Prove it stays true as you process the next item; at the end, that sentence implies optimality.
+
+*Picture it like this:*
+
+```
+position → 1 2 3 4 5
+greedy: [✓] [✗] [✓] [✓] [✗]
+some optimal:
+ ✓ ✓ ✗ ? ?
+First mismatch at 3 → swap in greedy’s pick without harm.
+Repeat until both rows match → greedy is optimal.
+```
+
+**Where greedy shines automatically: matroids (nice constraint systems).**
+There’s a tidy setting where greedy is *always* right (for nonnegative weights): when your “what’s allowed” rules form a **matroid**. You don’t need the symbols—just the vibe:
+
+1. **You can start from empty.**
+2. **Throwing things out never hurts.** If a set is allowed, any subset is allowed.
+3. **Smooth growth (augmentation).** If one allowed set is smaller than another, you can always add *something* from the bigger one to the smaller and stay allowed.
+
+That third rule prevents dead ends and is exactly what exchange arguments rely on. In matroids, the simple “sort by weight and take what fits” greedy is guaranteed optimal. Outside matroids, greedy can still work—but you must justify it for the specific problem using exchange/invariants.
+
+### Reachability on a line
+
+- You stand at square $0$ on squares $0,1,\ldots,n-1$.
+- Each square $i$ has a jump power $a\[i]$. From $i$ you may land on any of $i+1, i+2, \dots, i+a\[i]$.
+- Goal: decide if you can reach $n-1$; if not, report the furthest reachable square.
+
+**Example**
+
+Input: $a=\[3,1,0,0,4,1]$, so $n=6$ (squares $0..5$).
+
+```
+indices: 0 1 2 3 4 5
+a[i] : 3 1 0 0 4 1
+reach : ^ start at 0
+```
+
+From any $i$, the allowed landings are a range:
+
+```
+i=0 (a[0]=3): 1..3
+i=1 (a[1]=1): 2
+i=2 (a[2]=0): —
+i=3 (a[3]=0): —
+i=4 (a[4]=4): 5..8 (board ends at 5)
+```
+
+**Baseline idea**
+
+“Paint everything reachable, one wave at a time.”
+
+1. Start with ${0}$ reachable.
+2. For each already-reachable $i$, add all $i+1..i+a\[i]$.
+3. Stop when nothing new appears.
+
+*Walkthrough:*
+
+```
+start: reachable = {0}
+from 0: add {1,2,3} → reachable = {0,1,2,3}
+from 1: add {2} → no change
+from 2: add {} → a[2]=0
+from 3: add {} → a[3]=0
+stop: no new squares → furthest = 3; last (5) unreachable
+```
+
+Correct, but can reprocess many squares.
+
+**One-pass trick**
+
+Carry one number while scanning left→right: the furthest frontier $F$ seen so far.
+
+Rules:
+
+* If you are at $i$ with $i>F$, you hit a gap → stuck forever.
+* Otherwise, extend $F \leftarrow \max(F, i+a\[i])$ and continue.
+
+At the end:
+
+* Can reach last iff $F \ge n-1$.
+* Furthest reachable square is $F$ (capped by $n-1$).
+
+*Pseudocode*
+
+```
+F = 0
+for i in 0..n-1:
+ if i > F: break
+ F = max(F, i + a[i])
+
+can_reach_last = (F >= n-1)
+furthest = min(F, n-1)
+```
+
+Why this is safe (one line): $F$ always equals “best jump end discovered from any truly-reachable square $\le i$,” and never decreases; if $i>F$, no earlier jump can help because its effect was already folded into $F$.
+
+*Walkthrough:*
+
+We draw the frontier as a bracket reaching to $F$.
+
+Step $i=0$ (inside frontier since $0\le F$); update $F=\max(0,0+3)=3$.
+
+```
+indices: 0 1 2 3 4 5
+ [===============F]
+ 0 1 2 3
+F=3
+```
+
+Step $i=1$: still $i\le F$. Update $F=\max(3,1+1)=3$ (no change).
+Step $i=2$: $F=\max(3,2+0)=3$ (no change).
+Step $i=3$: $F=\max(3,3+0)=3$ (no change).
+
+Now $i=4$ but $4>F(=3)$ → gap → stuck.
+
+```
+indices: 0 1 2 3 4 5
+ [===============F] x (i=4 is outside)
+F=3
+```
+
+Final: $F=3$. Since $F\ d[u] + w:
+ d[v] = d[u] + w
+ π[v] = u
+ push (d[v], v) into H
+```
+
+Time $O((|V|+|E|)\log|V|)$; space $O(|V|)$.
+
+*Walkthrough*
+
+Legend: “S” = settled, “π\[x]” = parent of $x$. Ties break arbitrarily.
+
+Round 0 (init)
+
+```
+S = ∅
+d: A:0 B:∞ C:∞ D:∞ E:∞
+π: A:- B:- C:- D:- E:-
+```
+
+Round 1 — pick min unsettled → A(0); relax neighbors
+
+```
+S = {A}
+relax A-B (2): d[B]=2 π[B]=A
+relax A-C (5): d[C]=5 π[C]=A
+d: A:0S B:2 C:5 D:∞ E:∞
+π: A:- B:A C:A D:- E:-
+```
+
+Round 2 — pick B(2); relax
+
+```
+S = {A,B}
+B→C (1): 2+1=3 <5 → d[C]=3 π[C]=B
+B→D (2): 2+2=4 → d[D]=4 π[D]=B
+B→E (7): 2+7=9 → d[E]=9 π[E]=B
+d: A:0S B:2S C:3 D:4 E:9
+π: A:- B:A C:B D:B E:B
+```
+
+Round 3 — pick C(3); relax
+
+```
+S = {A,B,C}
+C→D (3): 3+3=6 (no improv; keep 4)
+C→E (1): 3+1=4 <9 → d[E]=4 π[E]=C
+d: A:0S B:2S C:3S D:4 E:4
+π: A:- B:A C:B D:B E:C
+```
+
+Round 4 — pick D(4); relax
+
+```
+S = {A,B,C,D}
+D→E (2): 4+2=6 (no improv; keep 4)
+d: A:0S B:2S C:3S D:4S E:4
+```
+
+Round 5 — pick E(4); done
+
+```
+S = {A,B,C,D,E} (all settled)
+Final d: A:0 B:2 C:3 D:4 E:4
+Parents π: B←A, C←B, D←B, E←C
+```
+
+Reconstruct routes by following parents backward:
+
+* $B$: $A\to B$
+* $C$: $A\to B\to C$
+* $D$: $A\to B\to D$
+* $E$: $A\to B\to C\to E$
+
+Complexity
+
+* Time: $O((V+E)\log V)$ with a binary heap (often written $O(E \log V)$ when $E\ge V$).
+* Space: $O(V)$ for distances, parent pointers, and heap entries.
+
+### Maximum contiguous sum
+
+You’re given a list of numbers laid out in a line. You may pick one **contiguous** block, and you want that block’s sum to be as large as possible.
+
+**Example inputs and outputs**
+
+```
+x = [ 2, -3, 4, -1, 2, -5, 3 ]
+best block = [ 4, -1, 2 ] → sum = 5
+```
+
+So the correct output is “maximum sum $=5$” and one optimal segment is positions $3$ through $5$ (1-based).
+
+*Baseline*
+
+Try every possible block and keep the best total. To sum any block $i..j$ quickly, precompute **prefix sums** $S_0=0$ and $S_j=\sum_{k=1}^j x_k$. Then
+
+$$
+\sum_{k=i}^j x_k = S_j - S_{i-1}
+$$
+
+Loop over all $j$ and all $i\le j$, compute $S_j-S_{i-1}$, and take the maximum. This is easy to reason about and always correct, but it does $O(n^2)$ block checks.
+
+**How it works**
+
+Walk left to right once and carry two simple numbers.
+
+* $S$: the running prefix sum up to the current position.
+* $M$: the **smallest** prefix seen so far (to the left of the current position).
+
+At each step $j$, the best block **ending at** $j$ is “current prefix minus the smallest older prefix”:
+
+$$
+\text{best ending at j} = S_j - \min_{0\le t \le j} S_t
+$$
+
+So during the scan:
+
+1. Update $S \leftarrow S + x_j$.
+2. Update the answer with $S - M$.
+3. Update $M \leftarrow \min(M, S)$.
+
+This is the whole algorithm. In words: keep the lowest floor you’ve ever seen and measure how high you are above it now. If you dip to a new floor, remember it; if you rise, maybe you’ve set a new record.
+
+A widely used equivalent form keeps a “best sum ending here” value $E$: set $E \leftarrow \max(x_j,; E+x_j)$ and track a global maximum. It’s the same idea written incrementally: if the running sum ever hurts you, you “reset” and start fresh at the current element.
+
+*Walkthrough*
+
+Sequence $x = [2,-3,4,-1,2,-5,3]$.
+
+Initialize $S=0$, $M=0$, and $\text{best}=-\infty$. Keep the index $t$ where the current $M$ occurred so we can reconstruct the block as $(t+1)..j$.
+
+```
+ j x_j S_j = S+x_j M (min prefix so far) S_j - M best chosen block
+--------------------------------------------------------------------------------
+ 1 2 2 0 2 2 (1..1)
+ update: M = min(0,2) = 0
+
+ 2 -3 -1 0 -1 2 (still 1..1)
+ update: M = min(0,-1) = -1 [new floor at t=2]
+
+ 3 4 3 -1 4 4 (3..3)
+ update: M = min(-1,3) = -1
+
+ 4 -1 2 -1 3 4 (still 3..3)
+ update: M = min(-1,2) = -1
+
+ 5 2 4 -1 5 5 (3..5) ✓
+ update: M = min(-1,4) = -1
+
+ 6 -5 -1 -1 0 5 (still 3..5)
+ update: M = min(-1,-1) = -1
+
+ 7 3 2 -1 3 5 (still 3..5)
+ update: M = min(-1,2) = -1
+```
+
+Final answer: maximum sum $=5$, achieved by indices $3..5$ (that’s $[4,-1,2]$).
+
+You can picture $S_j$ as a hilly skyline and $M$ as the lowest ground you’ve touched. The best block is the tallest vertical gap between the skyline and any earlier ground level.
+
+```
+prefix S: 0 → 2 → -1 → 3 → 2 → 4 → -1 → 2
+ground M: 0 0 -1 -1 -1 -1 -1 -1
+gap S-M: 0 2 0 4 3 5 0 3
+ ^ peak gap = 5 here
+```
+
+Pseudocode (prefix-floor form):
+
+```
+best = -∞ # or x[0] if you require non-empty
+S = 0
+M = 0 # 0 makes empty prefix available
+t = 0 # index where M happened (0 means before first element)
+best_i = best_j = None
+
+for j in 1..n:
+ S = S + x[j]
+ if S - M > best:
+ best = S - M
+ best_i = t + 1
+ best_j = j
+ if S < M:
+ M = S
+ t = j
+
+return best, (best_i, best_j)
+```
+
+*Edge cases*
+
+When all numbers are negative, the best block is the **least negative single element**. The scan handles this automatically because $M$ keeps dropping with every step, so the maximum of $S_j-M$ happens when you take just the largest entry.
+
+Empty-block conventions matter. If you define the answer to be strictly nonempty, initialize $\text{best}$ with $x_1$ and $E=x_1$ in the incremental form; if you allow empty blocks with sum $0$, initialize $\text{best}=0$ and $M=0$. Either way, the one-pass logic doesn’t change.
+
+*Complexity*
+
+* Time: $O(n)$
+* Space: $O(1)$
+
+### Scheduling themes
+
+Two classics:
+
+- Pick as many non-overlapping intervals as possible (one room, max meetings).
+- Keep maximum lateness small when jobs have deadlines.
+
+They’re both greedy—and both easy to run by hand.
+
+Imagine you have time intervals on a single line, and you can keep an interval only if it doesn’t overlap anything you already kept. The aim is to keep as many as possible.
+
+**Example inputs and outputs**
+
+Intervals (start, finish):
+
+```
+(1,3) (2,5) (4,7) (6,9) (8,10) (9,11)
+```
+
+A best answer keeps four intervals, for instance $(1,3),(4,7),(8,10),(10,11)$.
+
+**Baseline (slow)**
+
+Try all subsets and keep the largest that has no overlaps. That’s conceptually simple and always correct, but it’s exponential in the number of intervals, which is a non-starter for anything but tiny inputs.
+
+**Greedy rule:**
+
+Sort by finish time and take what fits.
+
+- Scan from earliest finisher to latest.
+- Keep $(s,e)$ iff $s \ge \text{last end}$; then set $\text{last end}\leftarrow e$.
+
+Sorted by finish:
+
+$$
+(1,3), (2,5), (4,7), (6,9), (8,10), (9,11)
+$$
+
+Run the scan and track the end of the last kept interval.
+
+```
+last_end = -∞
+(1,3): 1 ≥ -∞ → keep → last_end = 3
+(2,5): 2 < 3 → skip
+(4,7): 4 ≥ 3 → keep → last_end = 7
+(6,9): 6 < 7 → skip
+(8,10): 8 ≥ 7 → keep → last_end = 10
+(9,11): 9 < 10 → skip
+```
+
+Kept intervals: $(1,3),(4,7),(8,10)$. If we allow a meeting that starts exactly at $10$, we can also keep $(10,11)$ if it exists. Four kept, which matches the claim.
+
+A tiny picture helps the “finish early” idea feel natural:
+
+```
+time →
+kept: [1────3) [4─────7) [8────10)
+skip: [2────5) [6──────9)[9─────11)
+ending earlier leaves more open space to the right
+```
+
+Why this works: at the first place an optimal schedule would choose a later-finishing interval, swapping in the earlier finisher cannot reduce what still fits afterward, so you can push the optimal schedule to match greedy without losing size.
+
+Handy pseudocode
+
+```python
+# Interval scheduling (max cardinality)
+sort intervals by end time
+last_end = -∞
+keep = []
+for (s,e) in intervals:
+ if s >= last_end:
+ keep.append((s,e))
+ last_end = e
+```
+
+*Complexity*
+
+* Time: $O(n \log n)$ to sort by finishing time; $O(n)$ scan.
+* Space: $O(1)$ (beyond input storage).
+
+### Minimize the maximum lateness
+
+Now think of $n$ jobs, all taking the same amount of time (say one unit). Each job $i$ has a deadline $d_i$. When you run them in some order, the completion time of the $k$-th job is $C_k=k$ (since each takes one unit), and its lateness is
+
+$$
+L_i = C_i - d_i.
+$$
+
+Negative values mean you finished early; the quantity to control is the worst lateness $L_{\max}=\max_i L_i$. The goal is to order the jobs so $L_{\max}$ is as small as possible.
+
+**Example inputs and outputs**
+
+Jobs and deadlines:
+
+* $J_1: d_1=3$
+* $J_2: d_2=1$
+* $J_3: d_3=4$
+* $J_4: d_4=2$
+
+An optimal schedule is $J_2,J_4, J_1, J_3$. The maximum lateness there is $0$.
+
+**Baseline (slow)**
+
+Try all $n!$ orders, compute every job’s completion time and lateness, and take the order with the smallest $L_{\max}$. This explodes even for modest $n$.
+
+**Greedy rule**
+
+Order jobs by nondecreasing deadlines (earliest due date first, often called EDD). Fixing any “inversion” where a later deadline comes before an earlier one can only help the maximum lateness, so sorting by deadlines is safe.
+
+Deadlines in increasing order:
+
+$$
+J_2(d=1), J_4(d=2), J_1(d=3), J_3(d=4)
+$$
+
+Run them one by one and compute completion times and lateness.
+
+```
+slot 1: J2 finishes at C=1 → L2 = 1 - d2(=1) = 0
+slot 2: J4 finishes at C=2 → L4 = 2 - d4(=2) = 0
+slot 3: J1 finishes at C=3 → L1 = 3 - d1(=3) = 0
+slot 4: J3 finishes at C=4 → L3 = 4 - d3(=4) = 0
+L_max = 0
+```
+
+If you scramble the order, the worst lateness jumps. For example, $J_1,J_2,J_3,J_4$ gives
+
+```
+slot 1: J1 → L1 = 1 - 3 = -2
+slot 2: J2 → L2 = 2 - 1 = 1
+slot 3: J3 → L3 = 3 - 4 = -1
+slot 4: J4 → L4 = 4 - 2 = 2
+L_max = 2 (worse)
+```
+
+A quick timeline sketch shows how EDD keeps you out of trouble:
+
+```
+time → 1 2 3 4
+EDD: [J2][J4][J1][J3] deadlines: 1 2 3 4
+late? 0 0 0 0 → max lateness 0
+```
+
+Why this works: if two adjacent jobs are out of deadline order, swapping them never increases any completion time relative to its own deadline, and strictly improves at least one, so repeatedly fixing these inversions leads to the sorted-by-deadline order with no worse maximum lateness.
+
+Pseudocode
+
+```
+# Minimize L_max (EDD)
+sort jobs by increasing deadline d_j
+t = 0; Lmax = -∞
+for job j in order:
+ t += p_j # completion time C_j
+ L = t - d_j
+ Lmax = max(Lmax, L)
+return order, Lmax
+```
+
+*Complexity*
+
+* Time: $O(n \log n)$ to sort by deadlines; $O(n)$ evaluation.
+* Space: $O(1)$.
+
+### Huffman coding
+
+You have symbols that occur with known frequencies $f_i>0$ and $\sum_i f_i=1$ (if you start with counts, first normalize by their total). The goal is to assign each symbol a binary codeword so that no codeword is a prefix of another (a **prefix code**, i.e., uniquely decodable without separators), and the average length
+
+$$
+\mathbb{E}[L]=\sum_i f_i\,L_i
+$$
+
+is as small as possible. Prefix codes correspond exactly to **full binary trees** (every internal node has two children) whose leaves are the symbols and whose leaf depths equal the codeword lengths $L_i$. The **Kraft inequality** $\sum_i 2^{-L_i}\le 1$ characterizes feasibility; equality holds for full trees (so an optimal prefix code “fills” the inequality).
+
+**Example inputs and outputs**
+
+Frequencies:
+
+$$
+A:0.40,\quad B:0.20,\quad C:0.20,\quad D:0.10,\quad E:0.10.
+$$
+
+A valid optimal answer will be a prefix code with expected length as small as possible. We will compute the exact minimum and one optimal set of lengths $L\_A,\dots,L\_E$, plus a concrete codebook. (There can be multiple optimal codebooks when there are ties in frequencies; their **lengths** agree, though the exact bitstrings may differ.)
+
+**Baseline**
+
+One conceptual baseline is to enumerate all full binary trees with five labeled leaves and pick the one minimizing $\sum f\_i,L\_i$. That is correct but explodes combinatorially as the number of symbols grows. A simpler but usually suboptimal baseline is to give every symbol the same length $\lceil \log\_2 5\rceil=3$. That fixed-length code has $\mathbb{E}\[L]=3$.
+
+**Greedy Approach**
+
+Huffman’s rule repeats one tiny step: always merge the two least frequent items. When you merge two “symbols” with weights $p$ and $q$, you create a parent of weight $p+q$. **Why does this change the objective by exactly $p+q$?** Every leaf in those two subtrees increases its depth (and thus its code length) by $1$, so the total increase in $\sum f\_i L\_i$ is $\sum\_{\ell\in\text{subtrees}} f\_\ell\cdot 1=(p+q)$ by definition of $p$ and $q$. Summing over all merges yields the final cost:
+
+$$
+\mathbb{E}[L]=\sum_{\text{merges}} (p+q)=\sum_{\text{internal nodes}} \text{weight}.
+$$
+
+**Why is the greedy choice optimal?** In an optimal tree the two deepest leaves must be siblings; if not, pairing them to be siblings never increases any other depth and strictly reduces cost whenever a heavier symbol is deeper than a lighter one (an **exchange argument**: swapping depths changes the cost by $f\_{\text{heavy}}-f\_{\text{light}}>0$ in our favor). Collapsing those siblings into a single pseudo-symbol reduces the problem size without changing optimality, so induction finishes the proof. (Ties can be broken arbitrarily; all tie-breaks achieve the same minimum $\mathbb{E}\[L]$.)
+
+Start with the multiset ${0.40, 0.20, 0.20, 0.10, 0.10}$. At each line, merge the two smallest weights and add their sum to the running cost.
+
+```
+1) merge 0.10 + 0.10 → 0.20 cost += 0.20 (total 0.20)
+ multiset becomes {0.20, 0.20, 0.20, 0.40}
+
+2) merge 0.20 + 0.20 → 0.40 cost += 0.40 (total 0.60)
+ multiset becomes {0.20, 0.40, 0.40}
+
+3) merge 0.20 + 0.40 → 0.60 cost += 0.60 (total 1.20)
+ multiset becomes {0.40, 0.60}
+
+4) merge 0.40 + 0.60 → 1.00 cost += 1.00 (total 2.20)
+ multiset becomes {1.00} (done)
+```
+
+So the optimal expected length is $\boxed{\mathbb{E}\[L]=2.20}$ bits per symbol. This already beats the naive fixed-length baseline $3$. It also matches the information-theoretic bound $H(f)\le \mathbb{E}\[L]\ [0.60]
+| +--0--> A(0.40)
+| `--1--> [0.20]
+| +--0--> D(0.10)
+| `--1--> E(0.10)
+`--1--> [0.40]
+ +--0--> B(0.20)
+ `--1--> C(0.20)
+```
+
+One concrete codebook arises by reading left edges as 0 and right edges as 1 (the left/right choice is arbitrary; flipping all bits in a subtree yields an equivalent optimal code):
+
+* $A \mapsto 00$
+* $B \mapsto 10$
+* $C \mapsto 11$
+* $D \mapsto 010$
+* $E \mapsto 011$
+
+You can verify the prefix property immediately and recompute $\mathbb{E}\[L]$ from these lengths to get $2.20$ again. (From these lengths you can also construct the **canonical Huffman code**, which orders codewords lexicographically—useful for compactly storing the codebook.)
+
+*Complexity*
+
+* Time: $O(k \log k)$ using a min-heap over $k$ symbol frequencies (each of the $k-1$ merges performs two extractions and one insertion).
+* Space: $O(k)$ for the heap and $O(k)$ for the resulting tree (plus $O(k)$ for an optional map from symbols to codewords).
diff --git a/notes/matrices.md b/notes/matrices.md
new file mode 100644
index 0000000..dee57a8
--- /dev/null
+++ b/notes/matrices.md
@@ -0,0 +1,529 @@
+## Matrices and 2D Grids
+
+Matrices represent images, game boards, and maps. Many classic problems reduce to transforming matrices, traversing them, or treating grids as graphs for search.
+
+### Conventions
+
+**Rows indexed $0..R-1$, columns $0..C-1$; cell $(r,c)$.**
+
+Rows increase **down**, columns increase **right**. Think “top-left is $(0,0)$”, not a Cartesian origin.
+
+Visual index map (example $R=6$, $C=8$; each cell labeled $rc$):
+
+```
+ c → 0 1 2 3 4 5 6 7
+r ↓ +----+----+----+----+----+----+----+----+
+0 | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 |
+ +----+----+----+----+----+----+----+----+
+1 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
+ +----+----+----+----+----+----+----+----+
+2 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 |
+ +----+----+----+----+----+----+----+----+
+3 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 |
+ +----+----+----+----+----+----+----+----+
+4 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
+ +----+----+----+----+----+----+----+----+
+5 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 |
+ +----+----+----+----+----+----+----+----+
+```
+
+Handy conversions (for linearization / array-of-arrays):
+
+* Linear index: $\text{id}=r\cdot C+c$.
+* From id: $r=\lfloor \text{id}/C \rfloor$, $c=\text{id}\bmod C$.
+* Row-major scan order (common in problems): for $r$ in $0..R-1$, for $c$ in $0..C-1$.
+
+**Row-major vs column-major arrows (same $3\times 6$ grid):**
+
+```
+Row-major (r, then c): Column-major (c, then r):
+→ → → → → → ↓ ↓ ↓
+↓ ↓ ↓ ↓ ↓
+← ← ← ← ← ← ↓ ↓ ↓
+↓ ↓ ↓ ↓ ↓
+→ → → → → → ↓ ↓ ↓
+```
+
+**Neighborhoods: $\mathbf{4}$-dir $\Delta={(-1,0),(1,0),(0,-1),(0,1)}$; $\mathbf{8}$-dir adds diagonals.**
+
+The offsets $(\Delta r,\Delta c)$ are applied as $(r+\Delta r,\ c+\Delta c)$.
+
+**4-neighborhood (“+”):**
+
+```
+ (r-1,c)
+ ↑
+ (r,c-1) ← (r,c) → (r,c+1)
+ ↓
+ (r+1,c)
+```
+
+**8-neighborhood (“×” adds diagonals):**
+
+```
+ (r-1,c-1) (r-1,c) (r-1,c+1)
+ \ ↑ /
+ \ │ /
+ (r,c-1) ←——— (r,c) ———→ (r,c+1)
+ / │ \
+ / ↓ \
+ (r+1,c-1) (r+1,c) (r+1,c+1)
+```
+
+Typical direction arrays (keep them consistent to avoid bugs):
+
+```
+// 4-dir
+dr = [-1, 1, 0, 0]
+dc = [ 0, 0, -1, 1]
+
+// 8-dir
+dr8 = [-1,-1,-1, 0, 0, 1, 1, 1]
+dc8 = [-1, 0, 1,-1, 1,-1, 0, 1]
+```
+
+**Boundary checks** (always guard neighbors):
+
+```
+0 ≤ nr < R and 0 ≤ nc < C
+```
+
+**Edge/inside intuition:**
+
+```
+ out of bounds
+ ┌─────────────────┐
+ │ · · · · · · · · │
+ │ · +---+---+---+ │
+ │ · | a | b | c | │ ← valid cells
+ │ · +---+---+---+ │
+ │ · | d | e | f | │
+ │ · +---+---+---+ │
+ │ · · · · · · · · │
+ └─────────────────┘
+```
+
+### Basic Operations (Building Blocks)
+
+#### Transpose
+
+Swap across the main diagonal: $A_{r,c} \leftrightarrow A_{c,r}$ (square). For non-square, result shape is $C\times R$.
+
+**Example inputs and outputs**
+
+*Example 1 (square)*
+
+$$
+A = \begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
+\quad\Rightarrow\quad
+A^{\mathsf{T}} =
+\begin{bmatrix}
+1 & 4 & 7 \\
+2 & 5 & 8 \\
+3 & 6 & 9
+\end{bmatrix}
+$$
+
+*Example 2 (rectangular)*
+
+$$
+\text{Input: } \quad
+A = \begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6
+\end{bmatrix}
+\ (2 \times 3)
+$$
+
+$$
+\text{Output: } \quad
+A^{\mathsf{T}} = \begin{bmatrix}
+1 & 4 \\
+2 & 5 \\
+3 & 6
+\end{bmatrix}
+\ (3 \times 2)
+$$
+
+**How it works**
+
+Iterate pairs once and swap. For square matrices, can be in-place by visiting only $c>r$.
+
+* Time: $O(R\cdot C)$
+* Space: $O(1)$ in-place (square), else $O(R\cdot C)$ to allocate
+
+#### Reverse Rows (Horizontal Flip)
+
+Reverse each row left $\leftrightarrow$ right.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6
+\end{bmatrix}
+\quad\Rightarrow\quad
+\text{Output: }
+\begin{bmatrix}
+3 & 2 & 1 \\
+6 & 5 & 4
+\end{bmatrix}
+$$
+
+* Time: $O(R\cdot C)$
+* Space: $O(1)$
+
+#### Reverse Columns (Vertical Flip)
+
+Reverse each column top $\leftrightarrow$ bottom.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
+\quad\Rightarrow\quad
+\text{Output: }
+\begin{bmatrix}
+7 & 8 & 9 \\
+4 & 5 & 6 \\
+1 & 2 & 3
+\end{bmatrix}
+$$
+
+* Time: $O(R\cdot C)$
+* Space: $O(1)$
+
+### Rotations (Composed from Basics)
+
+Use transpose + reversals for square in-place rotations; rectangular rotations produce new shape $(R\times C)\to(C\times R)$.
+
+#### 90° Clockwise (CW)
+
+Transpose, then reverse each row.
+
+**Example inputs and outputs**
+
+*Example 1 (3×3)*
+
+$$
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
+\quad\Rightarrow\quad
+\text{Output: }
+\begin{bmatrix}
+7 & 4 & 1 \\
+8 & 5 & 2 \\
+9 & 6 & 3
+\end{bmatrix}
+$$
+
+*Example 2 (2×3 → 3×2)*
+
+$$
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6
+\end{bmatrix}
+\quad\Rightarrow\quad
+\text{Output: }
+\begin{bmatrix}
+4 & 1 \\
+5 & 2 \\
+6 & 3
+\end{bmatrix}
+$$
+
+**How it works**
+
+Transpose swaps axes; reversing each row aligns columns to rows of the rotated image.
+
+* Time: $O(R\cdot C)$
+* Space: $O(1)$ in-place for square, else $O(R\cdot C)$ new
+
+#### 90° Counterclockwise (CCW)
+
+Transpose, then reverse each column (or reverse rows, then transpose).
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
+\quad\Rightarrow\quad
+\text{Output: }
+\begin{bmatrix}
+3 & 6 & 9 \\
+2 & 5 & 8 \\
+1 & 4 & 7
+\end{bmatrix}
+$$
+
+**How it works**
+
+Transpose, then flip vertically to complete the counterclockwise rotation.
+
+* Time: $O(R\cdot C)$
+* Space: $O(1)$ (square) or $O(R\cdot C)$
+
+#### 180° Rotation
+
+Equivalent to reversing rows, then reversing columns (or two 90° rotations).
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
+\quad\Rightarrow\quad
+\text{Output: }
+\begin{bmatrix}
+9 & 8 & 7 \\
+6 & 5 & 4 \\
+3 & 2 & 1
+\end{bmatrix}
+$$
+
+**How it works**
+
+Horizontal + vertical flips relocate each element to $(R-1-r,\ C-1-c)$.
+
+* Time: $O(R\cdot C)$
+* Space: $O(1)$ (square) or $O(R\cdot C)$
+
+#### 270° Rotation
+
+270° CW = 90° CCW; 270° CCW = 90° CW. Reuse the 90° procedures.
+
+#### Layer-by-Layer (Square) 90° CW
+
+Rotate each ring by cycling 4 positions.
+
+**How it works**
+
+For layer $\ell$ with bounds $[\ell..n-1-\ell]$, for each offset move:
+
+```
+top ← left, left ← bottom, bottom ← right, right ← top
+```
+
+* Time: $O(n^{2})$
+* Space: $O(1)$
+
+### Traversal Patterns
+
+#### Spiral Order
+
+Read outer layer, then shrink bounds.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 & 4 \\
+5 & 6 & 7 & 8 \\
+9 & 10 & 11 & 12
+\end{bmatrix}
+$$
+
+$$
+\text{Output sequence: } 1,2,3,4,8,12,11,10,9,5,6,7
+$$
+
+**How it works**
+
+Maintain top, bottom, left, right. Walk edges in order; after each edge, move the corresponding bound inward.
+
+* Time: $O(R\cdot C)$; Space: $O(1)$ beyond output.
+
+#### Diagonal Order (r+c layers)
+
+Visit cells grouped by $s=r+c$; alternate direction per diagonal to keep locality if desired.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+\text{Input: }
+\begin{bmatrix}
+a & b & c \\
+d & e & f
+\end{bmatrix}
+\quad\Rightarrow\quad
+\text{One order: } a, b, d, e, c, f
+$$
+
+* Time: $O(R\cdot C)$; Space: $O(1)$.
+
+### Grids as Graphs
+
+Each cell is a node; edges connect neighboring walkable cells.
+
+**Grid-as-graph view (4-dir edges).** Each cell is a node; edges connect neighbors that are “passable”. Great for BFS shortest paths on unweighted grids.
+
+**Example map (walls `#`, free `.`, start `S`, target `T`).**
+Left: the map. Right: BFS distances (4-dir) from `S` until `T` is reached.
+
+```
+Original Map:
+#####################
+#S..#....#....#.....#
+#.#.#.##.#.##.#.##..#
+#.#...#..#.......#.T#
+#...###.....###.....#
+#####################
+
+BFS layers (distance mod 10):
+#####################
+#012#8901#9012#45678#
+#1#3#7##2#8##1#3##89#
+#2#456#43#7890123#7X#
+#345###54567###34567#
+#####################
+
+Legend: walls (#), goal reached (X)
+```
+
+BFS explores in **expanding “rings”**; with 4-dir edges, each step increases Manhattan distance by 1 (unless blocked). Time $O(RC)$, space $O(RC)$ with a visited matrix/queue.
+
+**Obstacles / costs / diagonals.**
+
+* Obstacles: skip neighbors that are `#` (or where cost is $\infty$).
+* Weighted grids: Dijkstra / 0-1 BFS on the same neighbor structure.
+* 8-dir with Euclidean costs: use $1$ for orthogonal moves and $\sqrt{2}$ for diagonals (A\* often pairs well here with an admissible heuristic).
+
+**Common symbols:**
+
+```
+. = free cell # = wall/obstacle
+S = start T = target/goal
+V = visited * = on current path / frontier
+```
+
+#### BFS Shortest Path (Unweighted)
+
+Find the minimum steps from S to T.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+\text{Grid (0 = open, 1 = wall), } S = (0,0), T = (2,3)
+$$
+
+$$
+\begin{bmatrix}
+S & 0 & 1 & 0 \\
+0 & 0 & 0 & 0 \\
+1 & 1 & 0 & T
+\end{bmatrix}
+\quad\Rightarrow\quad
+\text{Output: distance } = 5
+$$
+
+**How it works**
+
+Push S to a queue, expand in 4-dir layers, track distance/visited; stop when T is dequeued.
+
+* Time: $O(R\cdot C)$; Space: $O(R\cdot C)$.
+
+#### Connected Components (Islands)
+
+Count regions of ‘1’s via DFS/BFS.
+
+**Example inputs and outputs**
+
+$$
+\text{Input: }
+\begin{bmatrix}
+1 & 1 & 0 \\
+0 & 1 & 0 \\
+0 & 0 & 1
+\end{bmatrix}
+\quad\Rightarrow\quad
+\text{Output: } 2 \ \text{islands}
+$$
+
+**How it works**
+
+Scan cells; when an unvisited ‘1’ is found, flood it (DFS/BFS) to mark the whole island.
+
+* Time: $O(R\cdot C)$
+* Space: $O(R\cdot C)$ worst-case
+
+### Backtracking on Grids
+
+#### Word Search (Single Word)
+
+Find a word by moving to adjacent cells (4-dir), using each cell once per path.
+
+**Example inputs and outputs**
+
+$$
+\text{Board: }
+\begin{bmatrix}
+A & B & C & E \\
+S & F & C & S \\
+A & D & E & E
+\end{bmatrix},
+\quad
+\text{Word: } "ABCCED"
+\quad\Rightarrow\quad
+\text{Output: true}
+$$
+
+**How it works**
+
+From each starting match, DFS to next char; mark visited (temporarily), backtrack on failure.
+
+* Time: up to $O(R\cdot C\cdot b^{L})$ (branching $b\in[3,4]$, word length $L$)
+* Space: $O(L)$
+
+Pruning: early letter mismatch; frequency precheck; prefix trie when searching many words.
+
+#### Crossword-style Fill (Multiple Words)
+
+Place words to slots with crossings; verify consistency at intersections.
+
+**How it works**
+
+Backtrack over slot assignments; use a trie for prefix feasibility; order by most constrained slot first.
+
+* Time: exponential in slots; strong pruning and good heuristics are crucial.
+
diff --git a/notes/searching.md b/notes/searching.md
new file mode 100644
index 0000000..523307f
--- /dev/null
+++ b/notes/searching.md
@@ -0,0 +1,2128 @@
+## Searching
+
+Searching refers to the process of finding the location of a specific element within a collection of data, such as an array, list, tree, or graph. It underpins many applications, from databases and information retrieval to routing and artificial intelligence. Depending on the organization of the data, different search techniques are used—such as linear search for unsorted data, binary search for sorted data, and more advanced approaches like hash-based lookup or tree traversals for hierarchical structures. Efficient searching is important because it directly impacts the performance and scalability of software systems.
+
+### Linear & Sequential Search
+
+#### Linear Search
+
+Scan the list from left to right, comparing the target with each element until you either find a match (return its index) or finish the list (report “not found”).
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Input: } [7, 3, 5, 2, 9], \quad \text{target} = 5
+$$
+
+$$
+\text{Output: } \text{index} = 2
+$$
+
+*Example 2*
+
+$$
+\text{Input: } [4, 4, 4], \quad \text{target} = 4
+$$
+
+$$
+\text{Output: } \text{index} = 0 (\text{first match})
+$$
+
+*Example 3*
+
+$$
+\text{Input: } [10, 20, 30], \quad \text{target} = 25
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
+
+**How Linear Search Works**
+
+We start at index `0`, compare the value with the target, and keep moving right until we either **find it** or reach the **end**.
+
+Target **5** in `[7, 3, 5, 2, 9]`
+
+```
+Indexes: 0 1 2 3 4
+List: [7] [3] [5] [2] [9]
+Target: 5
+```
+
+*Step 1:* pointer at index 0
+
+```
+|
+v
+7 3 5 2 9
+
+→ compare 7 vs 5 → no
+```
+
+*Step 2:* pointer moves to index 1
+
+```
+ |
+ v
+7 3 5 2 9
+
+→ compare 3 vs 5 → no
+```
+
+*Step 3:* pointer moves to index 2
+
+```
+ |
+ v
+7 3 5 2 9
+
+→ compare 5 vs 5 → YES ✅ → return index 2
+```
+
+**Worst Case (Not Found)**
+
+Target **9** in `[1, 2, 3]`
+
+```
+Indexes: 0 1 2
+List: [1] [2] [3]
+Target: 9
+```
+
+Checks:
+
+```
+→ 1 ≠ 9
+→ 2 ≠ 9
+→ 3 ≠ 9
+→ end
+→ not found ❌
+```
+
+* Works on any list; no sorting or structure required.
+* Returns the first index containing the target; if absent, reports “not found.”
+* Time: $O(n)$ comparisons on average and in the worst case; best case $O(1)$ if the first element matches.
+* Space: $O(1)$ extra memory.
+* Naturally finds the earliest occurrence when duplicates exist.
+* Simple and dependable for short or unsorted data.
+* Assumes 0-based indexing in these notes.
+
+#### Sentinel Linear Search
+
+Place one copy of the target at the very end as a “sentinel” so the scan can run without checking bounds each step; afterward, decide whether the match was inside the original list or only at the sentinel position.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Input: } [12, 8, 6, 15], \quad \text{target} = 6
+$$
+
+$$
+\text{Output: } \text{index} = 2
+$$
+
+*Example 2*
+
+$$
+\text{Input: } [2, 4, 6, 8], \quad \text{target} = 5
+$$
+
+$$
+\text{Output: } \text{not found } (\text{only the sentinel matched})
+$$
+
+**How it works**
+
+Put the target at one extra slot at the end so the loop is guaranteed to stop on a match; afterward, check whether the match was inside the original range.
+
+Target **11** not in the list
+
+```
+Original list (n=5):
+[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ]
+Target: 11
+```
+
+Add sentinel (extra slot):
+
+```
+[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ][ 11 ]
+ 0 1 2 3 4 5 ← sentinel
+```
+
+Scan step by step:
+
+```
+4 ≠ 11 → pointer at 0
+9 ≠ 11 → pointer at 1
+1 ≠ 11 → pointer at 2
+7 ≠ 11 → pointer at 3
+6 ≠ 11 → pointer at 4
+11 = 11 → pointer at 5 (sentinel)
+```
+
+Therefore, **not found** in original list.
+
+Target **6** inside the list
+
+```
+Original list (n=4):
+[ 12 ][ 8 ][ 6 ][ 15 ]
+Target: 6
+```
+
+Add sentinel:
+
+```
+[ 12 ][ 8 ][ 6 ][ 15 ][ 6 ]
+ 0 1 2 3 4
+```
+
+Scan:
+
+```
+12 ≠ 6 → index 0
+ 8 ≠ 6 → index 1
+ 6 = 6 → index 2 ✅
+```
+
+* Removes the per-iteration “have we reached the end?” check; the sentinel guarantees termination.
+* Same $O(n)$ time in big-O terms, but slightly fewer comparisons in tight loops.
+* Space: needs one extra slot; if you cannot append, you can temporarily overwrite the last element (store it, write the target, then restore it).
+* After scanning, decide by index: if the first match index < original length, it’s a real match; otherwise, it’s only the sentinel.
+* Use when micro-optimizing linear scans over arrays where bounds checks are costly.
+* Behavior with duplicates: still returns the first occurrence within the original range.
+* Be careful to restore any overwritten last element if you used the in-place variant.
+
+### Divide & Conquer Search
+
+#### Binary Search
+
+On a sorted array, repeatedly halve the search interval by comparing the target to the middle element until found or the interval is empty.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Input: } A = [2, 5, 8, 12, 16, 23, 38], \quad \text{target} = 16
+$$
+
+$$
+\text{Output: } \text{index} = 4
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [1, 3, 3, 3, 9], \quad \text{target} = 3
+$$
+
+$$
+\text{Output: } \text{index} = 2 \quad (\text{any valid match; first/last needs a variant})
+$$
+
+*Example 3*
+
+$$
+\text{Input: } A = [10, 20, 30, 40], \quad \text{target} = 35
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
+
+**How it works**
+
+We repeatedly check the **middle** element, and then discard half the list based on comparison.
+
+Find **16** in:
+
+```
+A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
+i = 0 1 2 3 4 5 6
+```
+
+*Step 1*
+
+```
+low = 0, high = 6
+mid = (0+6)//2 = 3
+A[3] = 12 < 16 → target is to the RIGHT → new low = mid + 1 = 4
+
+A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
+i = 0 1 2 3 4 5 6
+ ↑L ↑M ↑H
+ 0 3 6
+Active range: indices 0..6
+```
+
+*Step 2*
+
+```
+low = 4, high = 6
+mid = (4+6)//2 = 5
+A[5] = 23 > 16 → target is to the LEFT → new high = mid - 1 = 4
+
+A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
+i = 0 1 2 3 4 5 6
+ ↑L ↑M ↑H
+ 4 5 6
+Active range: indices 4..6
+```
+
+*Step 3*
+
+```
+low = 4, high = 4
+mid = 4
+A[4] = 16 == target ✅
+
+A = [ 2 ][ 5 ][ 8 ][ 12 ][ 16 ][ 23 ][ 38 ]
+i = 0 1 2 3 4 5 6
+ ↑LMH
+ 4
+Active range: indices 4..4
+```
+
+FOUND at index 4
+
+* Requires a sorted array (assume ascending here).
+* Time: $O(log n)$; Space: $O(1)$ iterative.
+* Returns any one matching index by default; “first/last occurrence” is a small, common refinement.
+* Robust, cache-friendly, and a building block for many higher-level searches.
+* Beware of off-by-one errors when shrinking bounds.
+
+#### Ternary Search
+Like binary, but splits the current interval into three parts using two midpoints; used mainly for unimodal functions or very specific array cases.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Input: } A = [1, 4, 7, 9, 12, 15], \quad \text{target} = 9
+$$
+
+$$
+\text{Output: } \text{index} = 3
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [2, 6, 10, 14], \quad \text{target} = 5
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
+
+**How it works**
+
+We divide the array into **three parts** using two midpoints `m1` and `m2`.
+
+* If `target < A[m1]` → search $[low .. m1-1]$
+* Else if `target > A[m2]` → search $[m2+1 .. high]$
+* Else → search $[m1+1 .. m2-1]$
+
+```
+A = [ 1 ][ 4 ][ 7 ][ 9 ][ 12 ][ 15 ]
+i = 0 1 2 3 4 5
+Target: 9
+```
+
+*Step 1*
+
+```
+low = 0, high = 5
+
+m1 = low + (high - low)//3 = 0 + (5)//3 = 1
+m2 = high - (high - low)//3 = 5 - (5)//3 = 3
+
+A[m1] = 4
+A[m2] = 9
+
+A = [ 1 ][ 4 ][ 7 ][ 9 ][ 12 ][ 15 ]
+i = 0 1 2 3 4 5
+ ↑L ↑m1 ↑m2 ↑H
+ 0 1 3 5
+```
+
+FOUND at index 3
+
+* Also assumes a sorted array.
+* For discrete sorted arrays, it does **not** beat binary search asymptotically; it performs more comparisons per step.
+* Most valuable for searching the extremum of a **unimodal function** on a continuous domain; for arrays, prefer binary search.
+* Complexity: $O(log n)$ steps but with larger constant factors than binary search.
+
+#### Jump Search
+On a sorted array, jump ahead in fixed block sizes to find the block that may contain the target, then do a linear scan inside that block.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Input: } A = [1, 4, 9, 16, 25, 36, 49], \quad \text{target} = 25, \quad \text{jump} = \lfloor \sqrt{7} \rfloor = 2
+$$
+
+$$
+\text{Output: } \text{index} = 4
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [3, 8, 15, 20, 22, 27], \quad \text{target} = 21, \quad \text{jump} = 2
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
+
+**How it works**
+
+Perfect — that’s a **jump search trace**. Let me reformat and polish it so the steps are crystal clear and the “jump + linear scan” pattern pops visually:
+
+We’re applying **jump search** to find $25$ in
+
+$$
+A = [1, 4, 9, 16, 25, 36, 49]
+$$
+
+with $n=7$, block size $\approx \sqrt{7} \approx 2$, so **jump=2**.
+
+We probe every 2nd index:
+
+* probe = 0 → $A[0] = 1 < 25$ → jump to 2
+* probe = 2 → $A[2] = 9 < 25$ → jump to 4
+* probe = 4 → $A[4] = 25 \geq 25$ → stop
+
+So target is in block $(2..4]$.
+
+```
+[ 1 ][ 4 ] | [ 9 ][16 ] | [25 ][36 ] | [49 ]
+ ^ ^ ^ ^
+ probe=0 probe=2 probe=4 probe=6
+```
+
+Linear Scan in block (indexes 3..4)
+
+* i = 3 → $A[3] = 16 < 25$
+* i = 4 → $A[4] = 25 = 25$ ✅ FOUND
+
+```
+Block [16 ][25 ]
+ ^ ^
+ i=3 i=4 (found!)
+```
+
+The element $25$ is found at **index 4**.
+
+* Works on sorted arrays; pick jump ≈ √n for good balance.
+* Time: $O(√n)$ comparisons on average; Space: $O(1)$.
+* Useful when random access is cheap but full binary search isn’t desirable (e.g., limited CPU branch prediction, or when scanning in blocks is cache-friendly).
+* Degrades gracefully to “scan block then stop.”
+
+#### Exponential Search
+
+On a sorted array, grow the right boundary exponentially (1, 2, 4, 8, …) to find a containing range, then finish with binary search in that range.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Input: } A = [2, 3, 5, 7, 11, 13, 17, 19, 23], \quad \text{target} = 19
+$$
+
+$$
+\text{Output: } \text{index} = 7
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [10, 20, 30, 40, 50], \quad \text{target} = 12
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
+
+**How it works**
+
+```
+A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i = 0 1 2 3 4 5 6 7 8
+Target = 19
+```
+
+Find range by exponential jumps
+
+*Start* at `i=1`, double each step until `A[i] ≥ target` (or end).
+
+*Jump 1:* `i=1`
+
+```
+A[i]=3 ≤ 19 → continue
+A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i = 0 1 2 3 4 5 6 7 8
+ ↑
+```
+
+*Jump 2:* `i=2`
+
+```
+A[i]=5 ≤ 19 → continue
+A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i = 0 1 2 3 4 5 6 7 8
+ ↑
+```
+
+*Jump 3:* `i=4`
+
+```
+A[i]=11 ≤ 19 → continue
+A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i = 0 1 2 3 4 5 6 7 8
+ ↑
+```
+
+*Jump 4:* `i=8`
+
+```
+A[i]=23 > 19 → stop
+Range is (previous power of two .. i] = (4 .. 8] → search indices 5..8
+A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i = 0 1 2 3 4 5 6 7 8
+ ↑
+```
+
+*Range for binary search:* `low=5, high=8`.
+
+Binary search on $A[5..8]$
+
+```
+Subarray: [ 13 ][ 17 ][ 19 ][ 23 ]
+Indices : 5 6 7 8
+```
+
+*Step 1*
+
+```
+low=5, high=8 → mid=(5+8)//2=6
+A[6]=17 < 19 → move right → low=7
+A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i = 0 1 2 3 4 5 6 7 8
+ ↑L ↑M ↑H
+ 5 6 8
+```
+
+*Step 2*
+
+```
+low=7, high=8 → mid=(7+8)//2=7
+A[7]=19 == target ✅ → FOUND
+A = [ 2 ][ 3 ][ 5 ][ 7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i = 0 1 2 3 4 5 6 7 8
+ ↑L M H
+ 7
+```
+
+Found at **index 7**.
+
+* Great when the target is likely to be near the beginning or when the array is **unbounded**/**stream-like** but sorted (you can probe indices safely).
+* Time: $O(log p)$ to find the range where p is the final bound, plus $O(log p)$ for binary search → overall $O(log p)$.
+* Space: $O(1)$.
+* Often paired with data sources where you can test “is index i valid?” while doubling i.
+
+#### Interpolation Search
+
+On a sorted (roughly uniformly distributed) array, estimate the likely position using the values themselves and probe there; repeat on the narrowed side.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Input: } A = [10, 20, 30, 40, 50, 60, 70], \quad \text{target} = 55
+$$
+
+$$
+\text{Output: } \text{not found } (\text{probes near indices } 4\text{–}5)
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [5, 15, 25, 35, 45, 55, 65], \quad \text{target} = 45
+$$
+
+$$
+\text{Output: } \text{index} = 4
+$$
+
+*Example 3*
+
+$$
+\text{Input: } A = [1, 1000, 1001, 1002], \quad \text{target} = 2
+$$
+
+$$
+\text{Output: } \text{not found } (\text{bad distribution for interpolation})
+$$
+
+**How it works**
+
+* Guard against division by zero: if `A[high] == A[low]`, stop (or binary-search fallback).
+* Clamp the computed `pos` to `[low, high]` before probing.
+* Works best when values are **uniformly distributed**; otherwise it can degrade toward linear time.
+* Assumes `A` is sorted and values are uniform.
+
+Probe formula:
+
+```
+pos ≈ low + (high - low) * (target - A[low]) / (A[high] - A[low])
+```
+
+Let say we have following array and target:
+
+```
+A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
+i = 0 1 2 3 4 5 6
+target = 45
+```
+
+*Step 1 — initial probe*
+
+```
+low=0 (A[0]=10), high=6 (A[6]=70)
+
+pos ≈ 0 + (6-0) * (45-10)/(70-10)
+ ≈ 6 * 35/60
+ ≈ 3.5 → probe near 3.5
+
+A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
+i = 0 1 2 3 4 5 6
+ ↑L ↑H
+ ↑pos≈3.5 → choose ⌊pos⌋=3 (or ⌈pos⌉=4)
+```
+
+Probe **index 3**: `A[3]=40 < 45` → set `low = 3 + 1 = 4`
+
+*Step 2 — after moving low*
+
+```
+A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
+i = 0 1 2 3 4 5 6
+ ↑L ↑H
+```
+
+At this point, an **early-stop check** already tells us `target (45) < A[low] (50)` → cannot exist in `A[4..6]` → **not found**.
+
+* Best on **uniformly distributed** sorted data; expected time $O(log log n)$.
+* Worst case can degrade to $O(n)$, especially on skewed or clustered values.
+* Space: $O(1)$.
+* Very fast when value-to-index mapping is close to linear (e.g., near-uniform numeric keys).
+* Requires careful handling when A\[high] = A\[low] (avoid division by zero); also sensitive to integer rounding in discrete arrays.
+
+### Hash-based Search
+ * **Separate chaining:** Easiest deletions, steady $O(1)$ with α≈1; good when memory fragmentation isn’t a concern.
+* **Open addressing (double hashing):** Best probe quality among OA variants; great cache locality; keep α < 0.8.
+* **Open addressing (linear/quadratic):** Simple and fast at low α; watch clustering and tombstones.
+* **Cuckoo hashing:** Tiny and predictable lookup cost; inserts costlier and may rehash; great for read-heavy workloads.
+* In all cases: pick strong hash functions and resize early to keep α healthy.
+
+#### Hash Table Search
+Map a key to an array index with a hash function; look at that bucket to find the key, giving expected $O(1)$ lookups under a good hash and healthy load factor.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Table size: } m = 7,
+\quad \text{Keys stored: } \{10, 24, 31\},
+\quad \text{Target: } 24
+$$
+
+$$
+\text{Output: } \text{found (bucket 3)}
+$$
+
+*Example 2*
+
+$$
+\text{Table size: } m = 7,
+\quad \text{Keys stored: } \{10, 24, 31\},
+\quad \text{Target: } 18
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
+
+**How it works**
+
+```
++-----+ hash +-----------------+ search/compare +--------+
+| key | --------------> | index in array | ----------------------> | match? |
++-----+ +-----------------+ +--------+
+```
+
+* With chaining, the “collision path” is the **list inside one bucket**.
+* With linear probing, the “collision path” is the **probe sequence** across buckets (3 → 4 → 5 → …).
+* Both keep your original flow: hash → inspect bucket (and collision path) → match?
+
+```
+Array (buckets/indexes 0..6):
+
+Idx: 0 1 2 3 4 5 6
+ +---+-----+-----+-----+-----+-----+-----+
+ | | | | | | | |
+ +---+-----+-----+-----+-----+-----+-----+
+```
+
+**Example mapping with** `h(k) = k mod 7`, **stored keys** `{10, 24, 31}` all hash to index `3`.
+
+*Strategy A — Separate Chaining (linked list per bucket)*
+
+Insertions
+
+```
+10 -> 3
+24 -> 3 (collides with 10; append to bucket[3] list)
+31 -> 3 (collides again; append to bucket[3] list)
+
+Idx: 0 1 2 3 4 5 6
+ +---+-----+-----+-----+-----+-----+-----+
+ | | | | • | | | |
+ +---+-----+-----+-----+-----+-----+-----+
+
+bucket[3] chain: [10] → [24] → [31] → ∅
+```
+
+*Search(24)*
+
+```
+1) Compute index = h(24) = 3
+2) Inspect bucket 3's chain:
+ [10] → [24] → [31]
+ ↑ found here
+3) Return FOUND (bucket 3)
+```
+
+*Strategy B — Open Addressing (Linear Probing)*
+
+Insertions
+
+```
+10 -> 3 place at 3
+24 -> 3 (occupied) → probe 4 → place at 4
+31 -> 3 (occ) → 4 (occ) → probe 5 → place at 5
+
+Idx: 0 1 2 3 4 5 6
+ +---+-----+-----+-----+-----+-----+-----+
+ | | | | 10 | 24 | 31 | |
+ +---+-----+-----+-----+-----+-----+-----+
+```
+
+*Search(24)*
+
+```
+1) Compute index = h(24) = 3
+2) Probe sequence:
+ 3: 10 ≠ 24 → continue
+ 4: 24 = target → FOUND at index 4
+ (If not found, continue probing until an empty slot or wrap limit.)
+```
+
+* Quality hash + low load factor (α = n/m) ⇒ expected $O(1)$ search/insert/delete.
+* Collisions are inevitable; the collision strategy (open addressing vs. chaining vs. cuckoo) dictates actual steps.
+* Rehashing (growing and re-inserting) is used to keep α under control.
+* Uniform hashing assumption underpins the $O(1)$ expectation; adversarial keys or poor hashes can degrade performance.
+
+#### Open Addressing — Linear Probing
+
+Keep everything in one array; on collision, probe alternative positions in a deterministic sequence until an empty slot or the key is found.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+m = 10,
+\quad \text{Stored keys: } \{12, 22, 32\},
+\quad \text{Target: } 22
+$$
+
+$$
+\text{Output: } \text{found (index 3)}
+$$
+
+*Example 2*
+
+$$
+m = 10,
+\quad \text{Stored keys: } \{12, 22, 32\},
+\quad \text{Target: } 42
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
+
+**How it works**
+
+*Hash function:*
+
+```
+h(k) = k mod 10
+Probe sequence: i, i+1, i+2, ... (wrap around)
+```
+
+*Insertions*
+
+* Insert 12 → `h(12)=2` → place at index 2
+* Insert 22 → `h(22)=2` occupied → probe 3 → place at 3
+* Insert 32 → `h(32)=2` occupied → probe 3 (occupied) → probe 4 → place at 4
+
+Resulting table (indexes 0..9):
+
+```
+Index: 0 1 2 3 4 5 6 7 8 9
+ +---+---+----+----+----+---+---+---+---+---+
+Value: | | | 12 | 22 | 32 | | | | | |
+ +---+---+----+----+----+---+---+---+---+---+
+```
+
+*Search(22)*
+
+* Start at `h(22)=2`
+* index 2 → 12 ≠ 22 → probe →
+* index 3 → 22 ✅ FOUND
+
+Path followed:
+
+```
+2 → 3
+```
+
+*Search(42)*
+
+* Start at `h(42)=2`
+* index 2 → 12 ≠ 42 → probe →
+* index 3 → 22 ≠ 42 → probe →
+* index 4 → 32 ≠ 42 → probe →
+* index 5 → empty slot → stop → ❌ NOT FOUND
+
+Path followed:
+
+```
+2 → 3 → 4 → 5 (∅)
+```
+
+* Simple and cache-friendly; clusters form (“primary clustering”) which can slow probes.
+* Deletion uses **tombstones** to keep probe chains intact.
+* Performance depends sharply on load factor; keep α well below 1 (e.g., α ≤ 0.7).
+* Expected search \~ $O(1)$ at low α; degrades as clusters grow.
+
+#### Open Addressing — Quadratic Probing
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+m = 11 (\text{prime}),
+\quad \text{Stored keys: } \{22, 33, 44\},
+\quad \text{Target: } 33
+$$
+
+$$
+h(k) = k \bmod m, \quad h(33) = 33 \bmod 11 = 0
+$$
+
+$$
+\text{Output: found (index 4)}
+$$
+
+*Example 2*
+
+$$
+m = 11 (\text{prime}),
+\quad \text{Stored keys: } \{22, 33, 44\},
+\quad \text{Target: } 55
+$$
+
+$$
+h(55) = 55 \bmod 11 = 0
+$$
+
+$$
+\text{Output: not found}
+$$
+
+**How it works**
+
+*Hash function:*
+
+```
+h(k) = k mod 11
+```
+
+*Probe sequence (relative offsets):*
+
+```
++1², +2², +3², ... mod 11
+= +1, +4, +9, +5, +3, +3²… (wrapping around table size)
+```
+
+So from `h(k)`, we try slots in this order:
+
+```
+h, h+1, h+4, h+9, h+5, h+3, ... (all mod 11)
+```
+
+*Insertions*
+
+* Insert **22** → `h(22)=0` → place at index 0
+* Insert **33** → `h(33)=0` occupied → try `0+1²=1` → index 1 free → place at 1
+* Insert **44** → `h(44)=0` occupied → probe 1 (occupied) → probe `0+4=4` → place at 4
+
+Resulting table:
+
+```
+Idx: 0 1 2 3 4 5 6 7 8 9 10
+ +----+----+---+---+----+---+--+--+--+---+
+Val: | 22 | 33 | | | 44 | | | | | | |
+ +----+----+---+---+----+---+--+--+--+--+---+
+```
+
+*Search(33)*
+
+* Start `h(33)=0` → slot 0 = 22 ≠ 33
+* Probe `0+1²=1` → slot 1 = 33 ✅ FOUND
+
+Path:
+
+```
+0 → 1
+```
+
+*Search(55)*
+
+* Start `h(55)=0` → slot 0 = 22 ≠ 55
+* Probe `0+1²=1` → slot 1 = 33 ≠ 55
+* Probe `0+2²=4` → slot 4 = 44 ≠ 55
+* Probe `0+3²=9` → slot 9 = empty → stop → ❌ NOT FOUND
+
+Path:
+
+```
+0 → 1 → 4 → 9 (∅)
+```
+
+* Reduces primary clustering but can exhibit **secondary clustering** (keys with same h(k) follow same probe squares).
+* Table size choice matters (often prime); ensure the probe sequence can reach many slots.
+* Keep α modest; deletion still needs tombstones.
+* Expected $O(1)$ at healthy α; simpler than double hashing.
+
+#### Open Addressing — Double Hashing
+
+**Example inputs and outputs**
+
+*Hash functions*
+
+$$
+h_{1}(k) = k \bmod 11,
+\quad h_{2}(k) = 1 + (k \bmod 10)
+$$
+
+Probing sequence:
+
+$$
+h(k,i) = \big(h_{1}(k) + i \cdot h_{2}(k)\big) \bmod 11
+$$
+
+*Example 1*
+
+$$
+m = 11,
+\quad \text{Stored keys: } \{22, 33, 44\},
+\quad \text{Target: } 33
+$$
+
+For $k = 33$:
+
+$$
+h_{1}(33) = 33 \bmod 11 = 0,
+\quad h_{2}(33) = 1 + (33 \bmod 10) = 1 + 3 = 4
+$$
+
+So probe sequence is
+
+$$
+h(33,0) = 0,
+h(33,1) = (0 + 1\cdot 4) \bmod 11 = 4,
+h(33,2) = (0 + 2\cdot 4) \bmod 11 = 8, \dots
+$$
+
+Since the stored layout places $33$ at index $4$, the search succeeds.
+
+$$
+\text{Output: found (index 4)}
+$$
+
+*Example 2*
+
+$$
+m = 11,
+\quad \text{Stored keys: } \{22, 33, 44\},
+\quad \text{Target: } 55
+$$
+
+For $k = 55$:
+
+$$
+h_{1}(55) = 55 \bmod 11 = 0,
+\quad h_{2}(55) = 1 + (55 \bmod 10) = 1 + 5 = 6
+$$
+
+Probing sequence:
+
+$$
+0, (0+6)\bmod 11 = 6, (0+2\cdot 6)\bmod 11 = 1, (0+3\cdot 6)\bmod 11 = 7, \dots
+$$
+
+No slot matches $55$.
+
+$$
+\text{Output: not found}
+$$
+
+**How it works**
+
+We use **two hash functions**:
+
+```
+h₁(k) = k mod m
+h₂(k) = 1 + (k mod 10)
+```
+
+*Probe sequence:*
+
+```
+i, i + h₂, i + 2·h₂, i + 3·h₂, ... (all mod m)
+```
+
+This ensures fewer clustering issues compared to linear or quadratic probing.
+
+*Insertions (m = 11)*
+
+Insert **22**
+
+* `h₁(22)=0` → place at index 0
+
+Insert **33**
+
+* `h₁(33)=0` (occupied)
+* `h₂(33)=1+(33 mod 10)=4`
+* Probe sequence: 0, 4 → place at index 4
+
+Insert **44**
+
+* `h₁(44)=0` (occupied)
+* `h₂(44)=1+(44 mod 10)=5`
+* Probe sequence: 0, 5 → place at index 5
+
+*Table State*
+
+```
+Idx: 0 1 2 3 4 5 6 7 8 9 10
+ +---+---+---+---+---+---+---+---+---+---+
+Val: |22 | | | |33 |44 | | | | | |
+ +---+---+---+---+---+---+---+---+---+---+---+
+```
+
+*Search(33)*
+
+* Start at `h₁(33)=0` → slot 0 = 22 ≠ 33
+* Next: `0+1·h₂(33)=0+4=4` → slot 4 = 33 ✅ FOUND
+
+Path:
+
+```
+0 → 4
+```
+
+*Search(55)*
+
+* `h₁(55)=0`, `h₂(55)=1+(55 mod 10)=6`
+* slot 0 = 22 ≠ 55
+* slot 6 = empty → stop → ❌ NOT FOUND
+
+Path:
+
+```
+0 → 6 (∅)
+```
+
+* Minimizes clustering; probe steps depend on the key.
+* Choose h₂ so it’s **non-zero** and relatively prime to m, ensuring a full cycle.
+* Excellent performance at higher α than linear/quadratic, but still sensitive if α → 1.
+* Deletion needs tombstones; implementation slightly more complex.
+
+#### Separate Chaining
+
+Each array cell holds a small container (e.g., a linked list); colliding keys live together in that bucket.
+
+**Example inputs and outputs**
+
+*Setup*
+
+$$
+m = 5, \quad h(k) = k \bmod 5, \quad \text{buckets hold linked lists}
+$$
+
+Keys stored:
+
+$$
+\{12, 22, 7, 3, 14\}
+$$
+
+Bucket contents after hashing:
+
+$$
+\begin{aligned}
+h(12) &= 12 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12] [6pt]
+h(22) &= 22 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12, 22] [6pt]
+h(7) &= 7 \bmod 5 = 2 \Rightarrow& \text{bucket 2: } [12, 22, 7] [6pt]
+h(3) &= 3 \bmod 5 = 3 \Rightarrow& \text{bucket 3: } [3] [6pt]
+h(14) &= 14 \bmod 5 = 4 \Rightarrow& \text{bucket 4: } [14]
+\end{aligned}
+$$
+
+*Example 1*
+
+$$
+\text{Target: } 22
+$$
+
+$$
+h(22) = 2 \Rightarrow \text{bucket 2} = [12, 22, 7]
+$$
+
+Found at **position 2** in the list.
+
+$$
+\text{Output: found (bucket 2, position 2)}
+$$
+
+*Example 2*
+
+$$
+\text{Target: } 9
+$$
+
+$$
+h(9) = 9 \bmod 5 = 4 \Rightarrow \text{bucket 4} = [14]
+$$
+
+No match.
+
+$$
+\text{Output: not found}
+$$
+
+
+**How it works**
+
+```
+h(k) = k mod 5
+Buckets store small lists (linked lists or dynamic arrays)
+
+Idx: 0 1 2 3 4
+ [ ] [ ] [ 12 → 22 → 7 ] [ 3 ] [ 14 ]
+
+Search(22):
+- Compute bucket b = h(22) = 2
+- Linearly scan bucket 2 → find 22
+
+Search(9):
+- b = h(9) = 4
+- Bucket 4: [14] → 9 not present → NOT FOUND
+```
+
+* Simple deletes (remove from a bucket) and no tombstones.
+* Expected $O(1 + α)$ time; with good hashing and α kept near/below 1, bucket lengths stay tiny.
+* Memory overhead for bucket nodes; cache locality worse than open addressing.
+* Buckets can use **ordered lists** or **small vectors** to accelerate scans.
+* Rehashing still needed as n grows; α = n/m controls performance.
+
+#### Cuckoo Hashing
+Keep two (or more) hash positions per key; insert by “kicking out” occupants to their alternate home so lookups check only a couple of places.
+
+**Example inputs and outputs**
+
+*Setup*
+Two hash tables $T_{1}$ and $T_{2}$, each of size
+
+$$
+m = 5
+$$
+
+Two independent hash functions:
+
+$$
+h_{1}(k), \quad h_{2}(k)
+$$
+
+Cuckoo hashing invariant:
+
+* Each key is stored either in $T_{1}[h_{1}(k)]$ or $T_{2}[h_{2}(k)]$.
+* On insertion, if a spot is occupied, the existing key is **kicked out** and reinserted into the other table.
+* If relocations form a cycle, the table is **rebuilt (rehash)** with new hash functions.
+
+*Example 1*
+
+$$
+\text{Target: } 15
+$$
+
+Lookup procedure:
+
+1. Check $T_{1}[h_{1}(15)]$.
+2. If not found, check $T_{2}[h_{2}(15)]$.
+
+Result:
+
+$$
+\text{found in } T_{2} \text{ at index } 4
+$$
+
+$$
+\text{Output: found (T₂, index 4)}
+$$
+
+*Example 2*
+
+If insertion causes repeated displacements and eventually loops:
+
+$$
+\text{Cycle detected } \Rightarrow \text{rehash with new } h_{1}, h_{2}
+$$
+
+$$
+\text{Output: rebuild / rehash required}
+$$
+
+**How it works**
+
+We keep **two hash tables (T₁, T₂)**, each with its own hash function. Every key can live in **exactly one of two possible slots**:
+
+Hash functions:
+
+$$
+h_1(k) = k \bmod 5, \quad h_2(k) = 1 + (k \bmod 4)
+$$
+
+Every key can live in **exactly one of two slots**: $T_1[h_1(k)]$ or $T_2[h_2(k)]$.
+If a slot is occupied, we **evict** the old occupant and reinsert it at its alternate location.
+
+*Start empty:*
+
+```
+T₁: [ ][ ][ ][ ][ ]
+T₂: [ ][ ][ ][ ][ ]
+```
+
+*Insert 10* → goes to $T_1[h_1(10)=0]$:
+
+```
+T₁: [10 ][ ][ ][ ][ ]
+T₂: [ ][ ][ ][ ][ ]
+```
+
+*Insert 15*
+
+* $T_1[0]$ already has 10 → evict 10
+* Place 15 at $T_1[0]$
+* Reinsert evicted 10 at $T_2[h_2(10)=3]$:
+
+```
+T₁: [15 ][ ][ ][ ][ ]
+T₂: [ ][ ][ ][10 ][ ]
+```
+
+*Insert 20*
+
+* $T_1[0]$ has 15 → evict 15
+* Place 20 at $T_1[0]$
+* Reinsert 15 at $T_2[h_2(15)=4]$:
+
+```
+T₁: [20 ][ ][ ][ ][ ]
+T₂: [ ][ ][ ][10 ][15 ]
+```
+
+*Insert 25*
+
+* $T_1[0]$ has 20 → evict 20
+* Place 25 at $T_1[0]$
+* Reinsert 20 at $T_2[h_2(20)=1]$:
+
+```
+T₁: [25 ][ ][ ][ ][ ]
+T₂: [ ][20 ][ ][10 ][15 ]
+```
+
+🔎 *Search(15)*
+
+* $T_1[h_1(15)=0] \to 25 \neq 15$
+* $T_2[h_2(15)=4] \to 15$ ✅ FOUND
+
+**FOUND in T₂ at index 4**
+
+* Lookups probe at **most two places** (with two hashes) → excellent constant factors.
+* Inserts may trigger a chain of evictions; detect cycles and **rehash** with new functions.
+* High load factors achievable (e.g., \~0.5–0.9 depending on variant and number of hashes/tables).
+* Deletions are easy (remove key); no tombstones, but ensure invariants remain.
+* Sensitive to hash quality; poor hashes increase cycle risk.
+
+### Probabilistic & Approximate Search
+
+#### Bloom Filter
+Space-efficient structure for fast membership tests; answers **“maybe present”** or **“definitely not present”** with a tunable false-positive rate and no false negatives (if built correctly, without deletions).
+
+**Example inputs and outputs**
+
+*Setup*
+
+$$
+m = 16 \text{bits},
+\quad k = 3 \text{hash functions } (h_{1}, h_{2}, h_{3})
+$$
+
+Inserted set:
+
+$$
+\{"cat", "dog"\}
+$$
+
+*Example 1*
+
+$$
+\text{Query: contains("cat")}
+$$
+
+All $h_{i}(\text{"cat"})$ bits are set → actual member.
+
+$$
+\text{Output: maybe present (true positive)}
+$$
+
+*Example 2*
+
+$$
+\text{Query: contains("cow")}
+$$
+
+One probed bit = 0 → cannot be present.
+
+$$
+\text{Output: definitely not present}
+$$
+
+*Example 3*
+
+$$
+\text{Query: contains("eel")}
+$$
+
+All $h_{i}(\text{"eel"})$ bits happen to be set, even though "eel" was never inserted.
+
+$$
+\text{Output: maybe present (false positive)}
+$$
+
+**How it works**
+
+*Initial state* (all zeros):
+
+```
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+A = [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
+```
+
+Insert `"cat"`
+
+```
+h1(cat) = 3, h2(cat) = 7, h3(cat) = 12
+→ Set bits at 3, 7, 12
+
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+A = [0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0]
+ ^ ^ ^
+ 3 7 12
+```
+
+Insert `"dog"`
+
+```
+h1(dog) = 1, h2(dog) = 7, h3(dog) = 9
+→ Set bits at 1, 7, 9 (7 already set)
+
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+A = [0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0]
+ ^ ^ ^
+ 1 7 9
+```
+
+Query `"cow"`
+
+```
+h1(cow) = 1 → bit[1] = 1
+h2(cow) = 3 → bit[3] = 1
+h3(cow) = 6 → bit[6] = 0 ❌
+
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+A = [0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0]
+ ✓ ✓ ✗
+```
+
+At least one zero → **DEFINITELY NOT PRESENT**
+
+Query `"eel"`
+
+```
+h1(eel) = 7 → bit[7] = 1
+h2(eel) = 9 → bit[9] = 1
+h3(eel) = 12 → bit[12] = 1
+
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+A = [0 1 0 1 0 0 0 1 0 1 0 0 1 0 0 0]
+ ✓ ✓ ✓
+```
+
+All ones → **MAYBE PRESENT** (could be a **false positive**)
+
+* Answers: **maybe present** / **definitely not present**; never false negatives (without deletions).
+* False-positive rate is tunable via bit-array size **m**, number of hashes **k**, and items **n**; more space & good **k** → lower FPR.
+* Time: $O(k)$ per insert/lookup; Space: \~m bits.
+* No deletions in the basic form; duplicates are harmless (idempotent sets).
+* Union = bitwise OR; intersection = bitwise AND (for same m,k,hashes).
+* Choose independent, well-mixed hash functions to avoid correlated bits.
+
+#### Counting Bloom Filter
+Bloom filter variant that keeps a small counter per bit so you can **delete** by decrementing; still probabilistic and may have false positives.
+
+**Example inputs and outputs**
+
+*Setup*
+
+$$
+m = 12 \text{counters (each 2–4 bits)},
+\quad k = 3 \text{hash functions}
+$$
+
+Inserted set:
+
+$$
+\{\text{"alpha"}, \text{"beta"}\}
+$$
+
+Then delete `"alpha"`.
+
+*Example 1*
+
+$$
+\text{Query: contains("alpha")}
+$$
+
+Counters for `"alpha"` decremented; at least one probed counter is now $0$.
+
+$$
+\text{Output: definitely not present}
+$$
+
+*Example 2*
+
+$$
+\text{Query: contains("beta")}
+$$
+
+All three counters for `"beta"` remain $>0$.
+
+$$
+\text{Output: maybe present}
+$$
+
+*Example 3*
+
+$$
+\text{Query: contains("gamma")}
+$$
+
+At least one probed counter is $0$.
+
+$$
+\text{Output: definitely not present}
+$$
+
+**How it works**
+
+Each cell is a **small counter** (e.g. 4-bits, range 0..15).
+This allows **deletions**: increment on insert, decrement on delete.
+
+Initial state
+
+```
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11
+A = [0 0 0 0 0 0 0 0 0 0 0 0]
+```
+
+Insert `"alpha"`
+
+```
+Hashes: {2, 5, 9}
+→ Increment those counters
+
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11
+A = [0 0 1 0 0 1 0 0 0 1 0 0]
+ ↑ ↑ ↑
+ 2 5 9
+```
+
+Insert `"beta"`
+
+```
+Hashes: {3, 5, 11}
+→ Increment those counters
+
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11
+A = [0 0 1 1 0 2 0 0 0 1 0 1]
+ ↑ ↑ ↑
+ 3 5 11
+```
+
+Lookup `"beta"`
+
+```
+Hashes: {3, 5, 11}
+Counters = {1, 2, 1} → all > 0
+→ Result: MAYBE PRESENT
+
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11
+A = [0 0 1 1 0 2 0 0 0 1 0 1]
+ ✓ ✓ ✓
+```
+
+Delete `"alpha"`
+
+```
+Hashes: {2, 5, 9}
+→ Decrement those counters
+
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11
+A = [0 0 0 1 0 1 0 0 0 0 0 1]
+ ↓ ↓ ↓
+ 2 5 9
+```
+
+Lookup `"alpha"`
+
+```
+Hashes: {2, 5, 9}
+Counters = {0, 1, 0}
+→ At least one zero
+→ Result: DEFINITELY NOT PRESENT
+
+Idx: 0 1 2 3 4 5 6 7 8 9 10 11
+A = [0 0 0 1 0 1 0 0 0 0 0 1]
+ ✗ ✓ ✗
+```
+
+
+* Supports **deletion** by decrementing counters; insertion increments.
+* Still probabilistic: may return false positives; avoids false negatives **if counters never underflow** and hashes are consistent.
+* Space: more than Bloom (a few bits per counter instead of 1).
+* Watch for counter **saturation** (caps at max value) and **underflow** (don’t decrement below 0).
+* Good for dynamic sets with frequent inserts and deletes.
+
+#### Cuckoo Filter
+Hash-table–style filter that stores short **fingerprints** in two possible buckets; supports **insert, lookup, delete** with low false-positive rates and high load factors.
+
+**Example inputs and outputs**
+
+*Setup*
+
+$$
+b = 8 \text{buckets},
+\quad \text{bucket size} = 2,
+\quad \text{fingerprint size} = 8 \text{bits}
+$$
+
+Inserted set:
+
+$$
+\{\text{"cat"}, \text{"dog"}, \text{"eel"}\}
+$$
+
+Each element is stored as a short fingerprint in one of two candidate buckets.
+
+*Example 1*
+
+$$
+\text{Query: contains("cat")}
+$$
+
+Fingerprint for `"cat"` is present in one of its candidate buckets.
+
+$$
+\text{Output: maybe present (true positive)}
+$$
+
+*Example 2*
+
+$$
+\text{Query: contains("fox")}
+$$
+
+Fingerprint for `"fox"` is absent from both candidate buckets.
+
+$$
+\text{Output: definitely not present}
+$$
+
+*Example 3 (Deletion)*
+
+$$
+\text{Operation: remove("dog")}
+$$
+
+Fingerprint for `"dog"` is removed from its bucket.
+
+$$
+\text{Result: deletion supported directly by removing the fingerprint}
+$$
+
+**How it works**
+
+Each key `x` → short **fingerprint** `f = FP(x)`
+Two candidate buckets:
+
+* `i1 = H(x) mod b`
+* `i2 = i1 XOR H(f) mod b`
+ (`f` can be stored in either bucket; moving between buckets preserves the invariant.)
+
+Start (empty)
+
+```
+[0]: [ -- , -- ] [1]: [ -- , -- ] [2]: [ -- , -- ] [3]: [ -- , -- ]
+[4]: [ -- , -- ] [5]: [ -- , -- ] [6]: [ -- , -- ] [7]: [ -- , -- ]
+```
+
+Insert `"cat"`
+
+```
+f = 0xA7
+i1 = 1
+i2 = 1 XOR H(0xA7) = 5
+
+Bucket 1 has free slot → place 0xA7 in [1]
+
+[1]: [ A7 , -- ]
+```
+
+Insert `"dog"`
+
+```
+f = 0x3C
+i1 = 5
+i2 = 5 XOR H(0x3C) = 2
+
+Bucket 5 has free slot → place 0x3C in [5]
+
+[1]: [ A7 , -- ] [5]: [ 3C , -- ]
+```
+
+Insert `"eel"`
+
+```
+f = 0xD2
+i1 = 1
+i2 = 1 XOR H(0xD2) = 4
+
+Bucket 1 has one free slot → place 0xD2 in [1]
+
+[1]: [ A7 , D2 ] [5]: [ 3C , -- ]
+```
+
+Lookup `"cat"`
+
+```
+f = 0xA7
+Buckets: i1 = 1, i2 = 5
+Check: bucket[1] has A7 → found
+```
+
+Result: MAYBE PRESENT
+
+Lookup `"fox"`
+
+```
+f = 0x9B
+i1 = 0
+i2 = 0 XOR H(0x9B) = 7
+
+Check buckets 0 and 7 → fingerprint not found
+```
+
+Result: DEFINITELY NOT PRESENT
+
+* Stores **fingerprints**, not full keys; answers **maybe present** / **definitely not present**.
+* Supports **deletion** by removing a matching fingerprint from either bucket.
+* Very high load factors (often 90%+ with small buckets) and excellent cache locality.
+* False-positive rate controlled by fingerprint length (more bits → lower FPR).
+* Insertions can trigger **eviction chains**; worst case requires a **rehash/resize**.
+* Two buckets per item (or more in variants); lookups check a tiny, fixed set of places.
+
+### String Search Algorithms
+
+* **KMP:** Best all-rounder for guaranteed $O(n + m)$ and tiny memory.
+* **Boyer–Moore:** Fastest in practice on long patterns / large alphabets due to big skips.
+* **Rabin–Karp:** Great for **many patterns** or streaming; hashing enables batched checks.
+* **Naive:** Fine for tiny inputs or as a baseline; simplest to reason about.
+
+#### Naive String Search
+
+Slide the pattern one position at a time over the text; at each shift compare characters left-to-right until a mismatch or a full match.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Text: } "abracadabra",
+\quad \text{Pattern: } "abra"
+$$
+
+$$
+\text{Output: matches at indices } 0 \text{and} 7
+$$
+
+*Example 2*
+
+$$
+\text{Text: } "aaaaa",
+\quad \text{Pattern: } "aaa"
+$$
+
+$$
+\text{Output: matches at indices } 0, 1, 2
+$$
+
+**How it works**
+
+*Text* (length 11):
+
+```
+Text: a b r a c a d a b r a
+Idx: 0 1 2 3 4 5 6 7 8 9 10
+```
+
+*Pattern* (length 4):
+
+```
+Pattern: a b r a
+```
+
+*Shift 0*
+
+```
+Text: a b r a
+Pattern: a b r a
+```
+
+✅ All match → **REPORT at index 0**
+
+*Shift 1*
+
+```
+Text: b r a c
+Pattern: a b r a
+```
+
+❌ Mismatch at first char → advance
+
+*Shift 2*
+
+```
+Text: r a c a
+Pattern: a b r a
+```
+
+❌ Mismatch → advance
+
+*Shift 3*
+
+```
+Text: a c a d
+Pattern: a b r a
+```
+
+❌ Mismatch → advance
+
+*Shift 4*
+
+```
+Text: c a d a
+Pattern: a b r a
+```
+
+❌ Mismatch → advance
+
+*Shift 5*
+
+```
+Text: a d a b
+Pattern: a b r a
+```
+
+❌ Mismatch → advance
+
+*Shift 6*
+
+```
+Text: d a b r
+Pattern: a b r a
+```
+
+❌ Mismatch → advance
+
+*Shift 7*
+
+```
+Text: a b r a
+Pattern: a b r a
+```
+
+✅ All match → **REPORT at index 7**
+
+* Works anywhere; no preprocessing.
+* Time: worst/average $O(n·m)$ (text length n, pattern length m).
+* Space: $O(1)$.
+* Good for very short patterns or tiny inputs; otherwise use KMP/BM/RK.
+
+#### Knuth–Morris–Pratt (KMP)
+
+Precompute a table (LPS / prefix-function) for the pattern so that on a mismatch you “jump” the pattern to the longest proper prefix that is also a suffix, avoiding rechecks.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Text: } "ababcabcabababd",
+\quad \text{Pattern: } "ababd"
+$$
+
+$$
+\text{Output: match at index } 10
+$$
+
+*Example 2*
+
+$$
+\text{Text: } "aaaaab",
+\quad \text{Pattern: } "aaab"
+$$
+
+$$
+\text{Output: match at index } 2
+$$
+
+**How it works**
+
+We want to find the pattern `"ababd"` in the text `"ababcabca babab d"`.
+
+*1) Precompute LPS (Longest Proper Prefix that is also a Suffix)*
+
+Pattern:
+
+```
+a b a b d
+0 1 2 3 4 ← index
+```
+
+LPS array:
+
+```
+0 0 1 2 0
+```
+
+Meaning:
+
+* At each position, how many chars can we “fall back” within the pattern itself if a mismatch happens.
+* Example: at index 3 (pattern `"abab"`), LPS=2 means if mismatch occurs, restart comparison from `"ab"` inside the pattern.
+
+*2) Scan Text with Two Pointers*
+
+* `i` = text index
+* `j` = pattern index
+
+Text:
+
+```
+a b a b c a b c a b a b a b d
+0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
+Pattern: a b a b d
+```
+
+*Step A: Initial matches*
+
+```
+i=0..3: "abab" matched → j=4 points to 'd'
+```
+
+*Step B: Mismatch at i=4*
+
+```
+text[i=4] = 'c'
+pattern[j=4] = 'd' → mismatch
+```
+
+Instead of restarting, use LPS:
+
+```
+j = LPS[j-1] = LPS[3] = 2
+```
+
+So pattern jumps back to `"ab"` (no wasted text comparisons).
+i stays at 4.
+
+*Step C: Continue scanning*
+
+The algorithm keeps moving forward, reusing LPS whenever mismatches occur.
+
+*Step D: Full match found*
+
+At `i=14`, j advances to 5 (pattern length).
+
+```
+→ FULL MATCH found!
+Start index = i - m + 1 = 14 - 5 + 1 = 10
+```
+
+✅ Pattern `"ababd"` occurs in the text starting at **index 10**.
+
+* Time: $O(n + m)$ (preprocessing + scan).
+* Space: $O(m)$ for LPS table.
+* Never moves i backward; avoids redundant comparisons.
+* Ideal for repeated searches with the same pattern.
+* LPS is also called prefix-function / failure-function.
+
+#### Boyer–Moore (BM)
+
+Compare the pattern right-to-left; on a mismatch, skip ahead using bad-character and good-suffix rules so many text characters are never touched.
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Text: } "HERE IS A SIMPLE EXAMPLE",
+\quad \text{Pattern: } "EXAMPLE"
+$$
+
+$$
+\text{Output: match at index } 17
+$$
+
+*Example 2*
+
+$$
+\text{Text: } "NEEDLE IN A HAYSTACK",
+\quad \text{Pattern: } "STACK"
+$$
+
+$$
+\text{Output: match at index } 15
+$$
+
+**How it works**
+
+* Align the pattern under the text.
+* Compare **right → left**.
+* On mismatch, shift the pattern by the **max** of:
+* **Bad-character rule**: align the mismatched text char with its last occurrence in the pattern (or skip it if absent).
+* **Good-suffix rule**: if a suffix matched, align another occurrence of it (or a prefix).
+
+*Text* (with spaces shown as `_`):
+
+```
+0 10 20 30
+H E R E _ I S _ A _ S I M P L E _ E X A M P L E
+```
+
+**Pattern**: `"EXAMPLE"` (length = 7)
+
+*Step 1: Align pattern at text\[10..16] = `"SIMPLE"`*
+
+```
+Text: ... S I M P L E ...
+Pattern: E X A M P L E
+ ↑ (start comparing right → left)
+```
+
+Compare right-to-left:
+
+```
+E=E, L=L, P=P, M=M, A=A,
+X vs I → mismatch
+```
+
+* Bad character = `I` (from text).
+* Does pattern contain `I`? → ❌ no.
+* → Shift pattern **past `I`**.
+
+*Step 2: Shift until pattern under `"EXAMPLE"`*
+
+```
+Text: ... E X A M P L E
+Pattern: E X A M P L E
+```
+
+Compare right-to-left:
+
+```
+E=E, L=L, P=P, M=M, A=A, X=X, E=E
+```
+
+✅ **Full match** found at **index 17**.
+
+* Average case sublinear (often skips large chunks of text).
+* Worst case can be $O(n·m)$; with both rules + Galil’s optimization, comparisons can be bounded $O(n + m)$.
+* Space: $O(σ + m)$ for tables (σ = alphabet size).
+* Shines on long patterns over large alphabets (e.g., English text, logs).
+* Careful table prep (bad-character & good-suffix) is crucial.
+
+#### Rabin–Karp (RK)
+
+Compare rolling hashes of the current text window and the pattern; only if hashes match do a direct character check (to rule out collisions).
+
+**Example inputs and outputs**
+
+*Example 1*
+
+$$
+\text{Text: } "ABCDABCABCD",
+\quad \text{Pattern: } "ABC"
+$$
+
+$$
+\text{Output: matches at indices } 0, 4, 7
+$$
+
+*Example 2*
+
+$$
+\text{Text: } "ABCDE",
+\quad \text{Pattern: } "FG"
+$$
+
+$$
+\text{Output: no match}
+$$
+
+**How it works**
+
+We’ll use the classic choices:
+
+* Base **B = 256**
+* Modulus **M = 101** (prime)
+* Character value `val(c) = ASCII(c)` (e.g., `A=65, B=66, C=67, D=68`)
+* Pattern **P = "ABC"** (length **m = 3**)
+* Text **T = "ABCDABCABCD"** (length 11)
+
+```
+Text: A B C D A B C A B C D
+Index: 0 1 2 3 4 5 6 7 8 9 10
+Pattern: A B C (m = 3)
+```
+
+*Precompute*
+
+```
+pow = B^(m-1) mod M = 256^2 mod 101 = 88
+HP = H(P) = H("ABC")
+```
+
+Start `h=0`, then for each char `h = (B*h + val) mod M`:
+
+* After 'A': `(256*0 + 65) % 101 = 65`
+* After 'B': `(256*65 + 66) % 101 = 41`
+* After 'C': `(256*41 + 67) % 101 = 59`
+
+So **HP = 59**.
+
+*Rolling all windows*
+
+Initial window `T[0..2]="ABC"`:
+
+`h0 = 59` (matches HP → verify chars → ✅ match at 0)
+
+For rolling:
+
+`h_next = ( B * (h_curr − val(left) * pow) + val(new) ) mod M`
+
+(If the inner term is negative, add `M` before multiplying.)
+
+*First two rolls*
+
+From $[0..2]$ "ABC" $(h0=59)$ → $[1..3]$ "BCD":
+
+```
+left='A'(65), new='D'(68)
+inner = (59 − 65*88) mod 101 = (59 − 5720) mod 101 = 96
+h1 = (256*96 + 68) mod 101 = 24644 mod 101 = 0
+```
+
+From $[3..5]$ "DAB" $(h3=66)$ → $[4..6]$ "ABC":
+
+```
+left='D'(68), new='C'(67)
+inner = (66 − 68*88) mod 101 = (66 − 5984) mod 101 = 41
+h4 = (256*41 + 67) mod 101 = 10563 mod 101 = 59 (= HP)
+```
+
+*All windows (start index s)*
+
+```
+s window hash =HP?
+0 ABC 59 ✓ → verify → ✅ MATCH at 0
+1 BCD 0
+2 CDA 38
+3 DAB 66
+4 ABC 59 ✓ → verify → ✅ MATCH at 4
+5 BCA 98
+6 CAB 79
+7 ABC 59 ✓ → verify → ✅ MATCH at 7
+8 BCD 0
+```
+
+Matches at indices: **0, 4, 7**.
+
+* Expected time $O(n + m)$ with a good modulus and low collision rate; worst case $O(n·m)$ if many collisions.
+* Space: $O(1)$ beyond the text/pattern and precomputed powers.
+* Excellent for multi-pattern search (compute many pattern hashes, reuse rolling windows).
+* Choose modulus to reduce collisions; verify on hash hits to ensure correctness.
+* Works naturally on streams/very large texts since it needs only the current window.
diff --git a/notes/sorting.md b/notes/sorting.md
index c697ba0..b515b6b 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -72,343 +72,887 @@ If you then did a second pass (say, sorting by rank or battle-honors) you’d on
### Bubble Sort
-Bubble sort, one of the simplest sorting algorithms, is often a go-to choice for teaching the foundational concepts of sorting due to its intuitive nature. The name "bubble sort" stems from the way larger elements "bubble up" towards the end of the array, much like how bubbles rise in a liquid.
+Bubble sort is one of the simplest sorting algorithms. It is often used as an **introductory algorithm** because it is easy to understand, even though it is not efficient for large datasets.
-#### Conceptual Overview
+The name comes from the way **larger elements "bubble up"** to the top (end of the list), just as bubbles rise in water.
-Imagine a sequence of numbers. Starting from the beginning of the sequence, we compare each pair of adjacent numbers and swap them if they are out of order. As a result, at the end of the first pass, the largest number will have "bubbled up" to the last position. Each subsequent pass ensures that the next largest number finds its correct position, and this continues until the whole array is sorted.
+The basic idea:
-#### Steps
+* Compare **adjacent elements**.
+* Swap them if they are in the wrong order.
+* Repeat until no swaps are needed.
-1. Start from the first item and compare it with its neighbor to the right.
-2. If the items are out of order (i.e., the left item is greater than the right), swap them.
-3. Move to the next item and repeat the above steps until the end of the array.
-4. After the first pass, the largest item will be at the last position. On the next pass, you can ignore the last item and consider the rest of the array.
-5. Continue this process for `n-1` passes to ensure the array is completely sorted.
+**Step-by-Step Walkthrough**
+1. Start from the **first element**.
+2. Compare it with its **neighbor to the right**.
+3. If the left is greater, **swap** them.
+4. Move to the next pair and repeat until the end of the list.
+5. After the **first pass**, the largest element is at the end.
+6. On each new pass, ignore the elements already in their correct place.
+7. Continue until the list is sorted.
+
+**Example Run**
+
+We will sort the array:
+
+```
+[ 5 ][ 1 ][ 4 ][ 2 ][ 8 ]
+```
+
+**Pass 1**
+
+Compare adjacent pairs and push the largest to the end.
+
+```
+Initial: [ 5 ][ 1 ][ 4 ][ 2 ][ 8 ]
+
+Compare 5 and 1 → swap
+ [ 1 ][ 5 ][ 4 ][ 2 ][ 8 ]
+
+Compare 5 and 4 → swap
+ [ 1 ][ 4 ][ 5 ][ 2 ][ 8 ]
+
+Compare 5 and 2 → swap
+ [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ]
+
+Compare 5 and 8 → no swap
+ [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ]
+```
+
+✔ Largest element **8** has bubbled to the end.
+
+**Pass 2**
+
+Now we only need to check the first 4 elements.
+
+```
+Start: [ 1 ][ 4 ][ 2 ][ 5 ] [8]
+
+Compare 1 and 4 → no swap
+ [ 1 ][ 4 ][ 2 ][ 5 ] [8]
+
+Compare 4 and 2 → swap
+ [ 1 ][ 2 ][ 4 ][ 5 ] [8]
+
+Compare 4 and 5 → no swap
+ [ 1 ][ 2 ][ 4 ][ 5 ] [8]
```
-Start: [ 5 ][ 1 ][ 4 ][ 2 ][ 8 ]
-Pass 1:
- [ 5 ][ 1 ][ 4 ][ 2 ][ 8 ] → swap(5,1) → [ 1 ][ 5 ][ 4 ][ 2 ][ 8 ]
- [ 1 ][ 5 ][ 4 ][ 2 ][ 8 ] → swap(5,4) → [ 1 ][ 4 ][ 5 ][ 2 ][ 8 ]
- [ 1 ][ 4 ][ 5 ][ 2 ][ 8 ] → swap(5,2) → [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ]
- [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ] → no swap → [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ]
+✔ Second largest element **5** is now in place.
+
+**Pass 3**
+
+Check only the first 3 elements.
+
+```
+Start: [ 1 ][ 2 ][ 4 ] [5][8]
+
+Compare 1 and 2 → no swap
+ [ 1 ][ 2 ][ 4 ] [5][8]
+
+Compare 2 and 4 → no swap
+ [ 1 ][ 2 ][ 4 ] [5][8]
+```
+
+✔ Sorted order is now reached.
+
+**Final Result**
+
+```
+[ 1 ][ 2 ][ 4 ][ 5 ][ 8 ]
+```
-Pass 2:
- [ 1 ][ 4 ][ 2 ][ 5 ] [8] → no swap → [ 1 ][ 4 ][ 2 ][ 5 ] [8]
- [ 1 ][ 4 ][ 2 ][ 5 ] [8] → swap(4,2) → [ 1 ][ 2 ][ 4 ][ 5 ] [8]
- [ 1 ][ 2 ][ 4 ][ 5 ] [8] → no swap → [ 1 ][ 2 ][ 4 ][ 5 ] [8]
+**Visual Illustration of Bubble Effect**
-Pass 3:
- [ 1 ][ 2 ][ 4 ] [5,8] → all comparisons OK
+Here’s how the **largest values "bubble up"** to the right after each pass:
-Result: [ 1 ][ 2 ][ 4 ][ 5 ][ 8 ]
```
+Pass 1: [ 5 1 4 2 8 ] → [ 1 4 2 5 8 ]
+Pass 2: [ 1 4 2 5 ] → [ 1 2 4 5 ] [8]
+Pass 3: [ 1 2 4 ] → [ 1 2 4 ] [5 8]
+```
+
+Sorted! ✅
+
+**Optimizations**
-#### Optimizations
+* By keeping track of whether any swaps were made during a pass, Bubble Sort can terminate early if the array is already sorted. This optimization makes Bubble Sort’s **best case** much faster ($O(n)$).
-An important optimization for bubble sort is to keep track of whether any swaps were made during a pass. If a pass completes without any swaps, it means the array is already sorted, and there's no need to continue further iterations.
+**Stability**
-#### Stability
+Bubble sort is **stable**.
-Bubble sort is stable. This means that two objects with equal keys will retain their relative order after sorting. Thus, if you had records sorted by name and then sorted them using bubble sort based on age, records with the same age would still maintain the name order.
+* If two elements have the same value, they remain in the same order relative to each other after sorting.
+* This is important when sorting complex records where a secondary key matters.
-#### Time Complexity
+**Complexity**
-- In the **worst-case** scenario, the time complexity of bubble sort is $O(n^2)$, which occurs when the array is in reverse order.
-- The **average-case** time complexity is also $O(n^2)$, as bubble sort generally requires quadratic time for typical unsorted arrays.
-- In the **best-case** scenario, the time complexity is $O(n)$, which happens when the array is already sorted, especially if an optimization like early exit is implemented.
+| Case | Time Complexity | Notes |
+|------------------|-----------------|----------------------------------------|
+| **Worst Case** | $O(n^2)$ | Array in reverse order |
+| **Average Case** | $O(n^2)$ | Typically quadratic comparisons |
+| **Best Case** | $O(n)$ | Already sorted + early exit optimization |
+| **Space** | $O(1)$ | In-place, requires no extra memory |
-#### Space Complexity
+**Implementation**
-$(O(1))$ - It sorts in place, so it doesn't require any additional memory beyond the input array.
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/bubble_sort/src/bubble_sort.cpp)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/bubble_sort/src/bubble_sort.py)
### Selection Sort
-Selection sort is another intuitive algorithm, widely taught in computer science curricula due to its straightforward mechanism. The crux of selection sort lies in repeatedly selecting the smallest (or largest, depending on the desired order) element from the unsorted section of the array and swapping it with the first unsorted element.
+Selection sort is another simple sorting algorithm, often introduced right after bubble sort because it is equally easy to understand.
+
+Instead of repeatedly "bubbling" elements, **selection sort works by repeatedly selecting the smallest (or largest) element** from the unsorted portion of the array and placing it into its correct position.
+
+Think of it like arranging books:
-#### Conceptual Overview
+* Look through all the books, find the smallest one, and put it first.
+* Then, look through the rest, find the next smallest, and put it second.
+* Repeat until the shelf is sorted.
-Consider an array of numbers. The algorithm divides the array into two parts: a sorted subarray and an unsorted subarray. Initially, the sorted subarray is empty, while the entire array is unsorted. During each pass, the smallest element from the unsorted subarray is identified and then swapped with the first unsorted element. As a result, the sorted subarray grows by one element after each pass.
+**Step-by-Step Walkthrough**
-#### Steps
+1. Start at the **first position**.
+2. Search the **entire unsorted region** to find the smallest element.
+3. Swap it with the element in the current position.
+4. Move the boundary of the sorted region one step forward.
+5. Repeat until all elements are sorted.
-1. Assume the first element is the smallest.
-2. Traverse the unsorted subarray and find the smallest element.
-3. Swap the found smallest element with the first element of the unsorted subarray.
-4. Move the boundary of the sorted and unsorted subarrays one element to the right.
-5. Repeat steps 1-4 until the entire array is sorted.
+**Example Run**
+
+We will sort the array:
```
-Start:
[ 64 ][ 25 ][ 12 ][ 22 ][ 11 ]
+```
-Pass 1: find min(64,25,12,22,11)=11, swap with first element
-[ 11 ][ 25 ][ 12 ][ 22 ][ 64 ]
+**Pass 1**
-Pass 2: find min(25,12,22,64)=12, swap with second element
-[ 11 ][ 12 ][ 25 ][ 22 ][ 64 ]
+Find the smallest element in the entire array and put it in the first position.
-Pass 3: find min(25,22,64)=22, swap with third element
-[ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+```
+Initial: [ 64 ][ 25 ][ 12 ][ 22 ][ 11 ]
-Pass 4: find min(25,64)=25, swap with fourth element (self-swap)
-[ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+Smallest = 11
+Swap 64 ↔ 11
-Pass 5: only one element remains, already in place
-[ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+Result: [ 11 ][ 25 ][ 12 ][ 22 ][ 64 ]
+```
+
+✔ The first element is now in its correct place.
+
+**Pass 2**
+
+Find the smallest element in the remaining unsorted region.
+
+```
+Start: [ 11 ][ 25 ][ 12 ][ 22 ][ 64 ]
+
+Smallest in [25,12,22,64] = 12
+Swap 25 ↔ 12
+
+Result: [ 11 ][ 12 ][ 25 ][ 22 ][ 64 ]
+```
+
+✔ The second element is now in place.
+
+**Pass 3**
+
+Repeat for the next unsorted region.
+
+```
+Start: [ 11 ][ 12 ][ 25 ][ 22 ][ 64 ]
+
+Smallest in [25,22,64] = 22
+Swap 25 ↔ 22
+
+Result: [ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+```
+
+✔ The third element is now in place.
+
+**Pass 4**
+
+Finally, sort the last two.
+
+```
+Start: [ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+
+Smallest in [25,64] = 25
+Already in correct place → no swap
-Result:
+Result: [ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+```
+
+✔ Array fully sorted.
+
+**Final Result**
+
+```
[ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
```
-#### Stability
+**Visual Illustration of Selection**
+
+Here’s how the **sorted region expands** from left to right:
+
+```
+Pass 1: [ 64 25 12 22 11 ] → [ 11 ] [ 25 12 22 64 ]
+Pass 2: [ 11 ][ 25 12 22 64 ] → [ 11 12 ] [ 25 22 64 ]
+Pass 3: [ 11 12 ][ 25 22 64 ] → [ 11 12 22 ] [ 25 64 ]
+Pass 4: [ 11 12 22 ][ 25 64 ] → [ 11 12 22 25 ] [ 64 ]
+```
-Selection sort is inherently unstable. When two elements have equal keys, their relative order might change post-sorting. This can be problematic in scenarios where stability is crucial.
+At each step:
-#### Time Complexity
+* The **left region is sorted** ✅
+* The **right region is unsorted** 🔄
-- In the **worst-case**, the time complexity is $O(n^2)$, as even if the array is already sorted, the algorithm still iterates through every element to find the smallest.
-- The **average-case** time complexity is also $O(n^2)$, since the algorithm's performance generally remains quadratic regardless of input arrangement.
-- In the **best-case**, the time complexity is still $O(n^2)$, unlike other algorithms, because selection sort always performs the same number of comparisons, regardless of the input's initial order.
+**Optimizations**
-#### Space Complexity
+* Unlike bubble sort, **early exit is not possible** because selection sort always scans the entire unsorted region to find the minimum.
+* But it does fewer swaps: **at most (n-1) swaps**, compared to potentially many in bubble sort.
-$(O(1))$ - The algorithm sorts in-place, meaning it doesn't use any extra space beyond what's needed for the input.
+**Stability**
-#### Implementation
+* **Selection sort is NOT stable** in its classic form.
+* If two elements are equal, their order may change due to swapping.
+* Stability can be achieved by inserting instead of swapping, but this makes the algorithm more complex.
+
+**Complexity**
+
+| Case | Time Complexity | Notes |
+|------------------|-----------------|--------------------------------------------|
+| **Worst Case** | $O(n^2)$ | Scanning full unsorted region every pass |
+| **Average Case** | $O(n^2)$ | Quadratic comparisons |
+| **Best Case** | $O(n^2)$ | No improvement, still must scan every pass |
+| **Space** | $O(1)$ | In-place sorting |
+
+**Implementation**
* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/selection_sort/src/selection_sort.cpp)
* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/selection_sort/src/selection_sort.py)
### Insertion Sort
-Insertion sort works much like how one might sort a hand of playing cards. It builds a sorted array (or list) one element at a time by repeatedly taking one element from the input and inserting it into the correct position in the already-sorted section of the array. Its simplicity makes it a common choice for teaching the basics of algorithm design.
+Insertion sort is a simple, intuitive sorting algorithm that works the way people often sort playing cards in their hands.
+
+It builds the **sorted portion one element at a time**, by repeatedly taking the next element from the unsorted portion and inserting it into its correct position among the already sorted elements.
+
+The basic idea:
+
+1. Start with the **second element** (the first element by itself is trivially sorted).
+2. Compare it with elements to its **left**.
+3. Shift larger elements one position to the right.
+4. Insert the element into the correct spot.
+5. Repeat until all elements are processed.
+
+**Example Run**
+
+We will sort the array:
+
+```
+[ 12 ][ 11 ][ 13 ][ 5 ][ 6 ]
+```
+
+**Pass 1: Insert 11**
+
+Compare 11 with 12 → shift 12 right → insert 11 before it.
+
+```
+Before: [ 12 ][ 11 ][ 13 ][ 5 ][ 6 ]
+Action: Insert 11 before 12
+After: [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+```
+
+✔ Sorted portion: $[11, 12]$
+
+**Pass 2: Insert 13**
+
+Compare 13 with 12 → already greater → stays in place.
+
+```
+Before: [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+After: [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+```
+
+✔ Sorted portion: [11, 12, 13]
+
+**Pass 3: Insert 5**
+
+Compare 5 with 13 → shift 13
+Compare 5 with 12 → shift 12
+Compare 5 with 11 → shift 11
+Insert 5 at start.
-#### Conceptual Overview
+```
+Before: [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+Action: Move 13 → Move 12 → Move 11 → Insert 5
+After: [ 5 ][ 11 ][ 12 ][ 13 ][ 6 ]
+```
-Imagine you have a series of numbers. The algorithm begins with the second element (assuming the first element on its own is already sorted) and inserts it into the correct position relative to the first. With each subsequent iteration, the algorithm takes the next unsorted element and scans through the sorted subarray, finding the appropriate position to insert the new element.
+✔ Sorted portion: [5, 11, 12, 13]
-#### Steps
+**Pass 4: Insert 6**
-1. Start at the second element (index 1) assuming the element at index 0 is sorted.
-2. Compare the current element with the previous elements.
-3. If the current element is smaller than the previous element, compare it with the elements before until you reach an element smaller or until you reach the start of the array.
-4. Insert the current element into the correct position so that the elements before are all smaller.
-5. Repeat steps 2-4 for each element in the array.
+Compare 6 with 13 → shift 13
+Compare 6 with 12 → shift 12
+Compare 6 with 11 → shift 11
+Insert 6 after 5.
```
-Start:
-[ 12 ][ 11 ][ 13 ][ 5 ][ 6 ]
+Before: [ 5 ][ 11 ][ 12 ][ 13 ][ 6 ]
+Action: Move 13 → Move 12 → Move 11 → Insert 6
+After: [ 5 ][ 6 ][ 11 ][ 12 ][ 13 ]
+```
-Pass 1: key = 11, insert into [12]
-[ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+✔ Sorted!
-Pass 2: key = 13, stays in place
-[ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+**Final Result**
-Pass 3: key = 5, insert into [11,12,13]
-[ 5 ][ 11 ][ 12 ][ 13 ][ 6 ]
+```
+[ 5 ][ 6 ][ 11 ][ 12 ][ 13 ]
+```
-Pass 4: key = 6, insert into [5,11,12,13]
-[ 5 ][ 6 ][ 11 ][ 12 ][ 13 ]
+**Visual Growth of Sorted Region**
-Result:
-[ 5 ][ 6 ][ 11 ][ 12 ][ 13 ]
```
+Start: [ 12 | 11 13 5 6 ]
+Pass 1: [ 11 12 | 13 5 6 ]
+Pass 2: [ 11 12 13 | 5 6 ]
+Pass 3: [ 5 11 12 13 | 6 ]
+Pass 4: [ 5 6 11 12 13 ]
+```
+
+✔ The **bar ( | )** shows the boundary between **sorted** and **unsorted**.
-#### Stability
+**Optimizations**
-Insertion sort is stable. When two elements have equal keys, their relative order remains unchanged post-sorting. This stability is preserved since the algorithm only swaps elements if they are out of order, ensuring that equal elements never overtake each other.
+* Efficient for **small arrays**.
+* Useful as a **helper inside more complex sorts** (e.g., Quick Sort or Merge Sort) for small subarrays.
+* Can be optimized with **binary search** to find insertion positions faster (but shifting still takes linear time).
-#### Time Complexity
+**Stability**
-- In the **worst-case**, the time complexity is $O(n^2)$, which happens when the array is in reverse order, requiring every element to be compared with every other element.
-- The **average-case** time complexity is $O(n^2)$, as elements generally need to be compared with others, leading to quadratic performance.
-- In the **best-case**, the time complexity is $O(n)$, occurring when the array is already sorted, allowing the algorithm to simply pass through the array once without making any swaps.
+Insertion sort is **stable** (equal elements keep their relative order).
-#### Space Complexity
+**Complexity**
-$(O(1))$ - This in-place sorting algorithm doesn't need any additional storage beyond the input array.
+| Case | Time Complexity | Notes |
+|------------------|-----------------|---------------------------------------------------|
+| **Worst Case** | $O(n^2)$ | Reverse-sorted input |
+| **Average Case** | $O(n^2)$ | |
+| **Best Case** | $O(n)$ | Already sorted input — only comparisons, no shifts |
+| **Space** | $O(1)$ | In-place |
-#### Implementation
+**Implementation**
* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/insertion_sort/src/insertion_sort.cpp)
* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/insertion_sort/src/insertion_sort.py)
### Quick Sort
-Quick Sort, often simply referred to as "quicksort", is a divide-and-conquer algorithm that's renowned for its efficiency and is widely used in practice. Its name stems from its ability to sort large datasets quickly. The core idea behind quicksort is selecting a 'pivot' element and partitioning the other elements into two sub-arrays according to whether they are less than or greater than the pivot. The process is then recursively applied to the sub-arrays.
+Quick Sort is a **divide-and-conquer** algorithm. Unlike bubble sort or selection sort, which work by repeatedly scanning the whole array, Quick Sort works by **partitioning** the array into smaller sections around a "pivot" element and then sorting those sections independently.
+
+It is one of the **fastest sorting algorithms in practice**, widely used in libraries and systems.
+
+The basic idea:
+
+1. Choose a **pivot element** (commonly the last, first, middle, or random element).
+2. Rearrange (partition) the array so that:
+* All elements **smaller than the pivot** come before it.
+* All elements **larger than the pivot** come after it.
+3. The pivot is now in its **final sorted position**.
+4. Recursively apply Quick Sort to the **left subarray** and **right subarray**.
+
+**Example Run**
+
+We will sort the array:
+
+```
+[ 10 ][ 80 ][ 30 ][ 90 ][ 40 ][ 50 ][ 70 ]
+```
+
+**Step 1: Choose Pivot (last element = 70)**
+
+Partition around 70.
+
+```
+Initial: [ 10 ][ 80 ][ 30 ][ 90 ][ 40 ][ 50 ][ 70 ]
+
+→ Elements < 70: [ 10, 30, 40, 50 ]
+→ Pivot (70) goes here ↓
+Sorted split: [ 10 ][ 30 ][ 40 ][ 50 ][ 70 ][ 90 ][ 80 ]
+```
+
+*(ordering of right side may vary during partition; only pivot’s position is guaranteed)*
+
+✔ Pivot (70) is in correct place.
+
+**Step 2: Left Subarray [10, 30, 40, 50]**
+
+Choose pivot = 50.
+
+```
+[ 10 ][ 30 ][ 40 ][ 50 ] → pivot = 50
+
+→ Elements < 50: [10, 30, 40]
+→ Pivot at correct place
+
+Result: [ 10 ][ 30 ][ 40 ][ 50 ]
+```
+
+✔ Pivot (50) fixed.
+
+**Step 3: Left Subarray of Left [10, 30, 40]**
+
+Choose pivot = 40.
+
+```
+[ 10 ][ 30 ][ 40 ] → pivot = 40
-#### Conceptual Overview
+→ Elements < 40: [10, 30]
+→ Pivot at correct place
-1. The first step is to **choose a pivot** from the array, which is the element used to partition the array. The pivot selection method can vary, such as picking the first element, the middle element, a random element, or using a more advanced approach like the median-of-three.
-2. During **partitioning**, the elements in the array are rearranged so that all elements less than or equal to the pivot are placed before it, and all elements greater than the pivot are placed after it. At this point, the pivot reaches its final sorted position.
-3. Finally, **recursion** is applied by repeating the same process for the two sub-arrays: one containing elements less than the pivot and the other containing elements greater than the pivot.
+Result: [ 10 ][ 30 ][ 40 ]
+```
-#### Steps
+✔ Pivot (40) fixed.
-1. Choose a 'pivot' from the array.
-2. Partition the array around the pivot, ensuring all elements on the left are less than the pivot and all elements on the right are greater than it.
-3. Recursively apply steps 1 and 2 to the left and right partitions.
-4. Repeat until base case: the partition has only one or zero elements.
+**Step 4: [10, 30]**
+Choose pivot = 30.
```
-Start:
-[ 10 ][ 7 ][ 8 ][ 9 ][ 1 ][ 5 ]
+[ 10 ][ 30 ] → pivot = 30
-Partition around pivot = 5:
- • Compare and swap ↓
- [ 1 ][ 7 ][ 8 ][ 9 ][ 10 ][ 5 ]
- • Place pivot in correct spot ↓
- [ 1 ][ 5 ][ 8 ][ 9 ][ 10 ][ 7 ]
+→ Elements < 30: [10]
-Recurse on left [1] → already sorted
-Recurse on right [8, 9, 10, 7]:
+Result: [ 10 ][ 30 ]
+```
- Partition around pivot = 7:
- [ 7 ][ 9 ][ 10 ][ 8 ]
- Recurse left [] → []
- Recurse right [9, 10, 8]:
+✔ Sorted.
- Partition around pivot = 8:
- [ 8 ][ 10 ][ 9 ]
- Recurse left [] → []
- Recurse right [10, 9]:
- Partition pivot = 9:
- [ 9 ][ 10 ]
- → both sides sorted
+**Final Result**
- → merge [8] + [9, 10] → [ 8 ][ 9 ][ 10 ]
+```
+[ 10 ][ 30 ][ 40 ][ 50 ][ 70 ][ 80 ][ 90 ]
+```
+
+**Visual Partition Illustration**
+
+Here’s how the array gets partitioned step by step:
+
+```
+Pass 1: [ 10 80 30 90 40 50 | 70 ]
+ ↓ pivot = 70
+ [ 10 30 40 50 | 70 | 90 80 ]
- → merge [7] + [8, 9, 10] → [ 7 ][ 8 ][ 9 ][ 10 ]
+Pass 2: [ 10 30 40 | 50 ] [70] [90 80]
+ ↓ pivot = 50
+ [ 10 30 40 | 50 ] [70] [90 80]
-→ merge [1, 5] + [7, 8, 9, 10] → [ 1 ][ 5 ][ 7 ][ 8 ][ 9 ][ 10 ]
+Pass 3: [ 10 30 | 40 ] [50] [70] [90 80]
+ ↓ pivot = 40
+ [ 10 30 | 40 ] [50] [70] [90 80]
-Result:
-[ 1 ][ 5 ][ 7 ][ 8 ][ 9 ][ 10 ]
+Pass 4: [ 10 | 30 ] [40] [50] [70] [90 80]
+ ↓ pivot = 30
+ [ 10 | 30 ] [40] [50] [70] [90 80]
```
-#### Stability
+✔ Each pivot splits the problem smaller and smaller until fully sorted.
-Quick sort is inherently unstable due to the long-distance exchanges of values. However, with specific modifications, it can be made stable, although this is not commonly done.
+**Optimizations**
-#### Time Complexity
+* **Pivot Choice:** Choosing a good pivot (e.g., median or random) improves performance.
+* **Small Subarrays:** For very small partitions, switch to Insertion Sort for efficiency.
+* **Tail Recursion:** Can optimize recursion depth.
-- In the **worst-case**, the time complexity is $O(n^2)$, which can occur when the pivot is the smallest or largest element, resulting in highly unbalanced partitions. However, with effective pivot selection strategies, this scenario is rare in practice.
-- The **average-case** time complexity is $O(n \log n)$, which is expected when using a good pivot selection method that balances the partitions reasonably well.
-- In the **best-case**, the time complexity is also $O(n \log n)$, occurring when each pivot divides the array into two roughly equal-sized parts, leading to optimal partitioning.
+**Stability**
-#### Space Complexity
+* Quick Sort is **not stable** by default (equal elements may be reordered).
+* Stable versions exist, but require modifications.
-$(O(\log n))$ - Though quicksort sorts in place, it requires stack space for recursion, which in the best case is logarithmic.
+**Complexity**
-#### Implementation
+| Case | Time Complexity | Notes |
+|------------------|-----------------|----------------------------------------------------------------------|
+| **Worst Case** | $O(n^2)$ | Poor pivot choices (e.g., always smallest/largest in sorted array) |
+| **Average Case** | $O(n \log n)$ | Expected performance, very fast in practice |
+| **Best Case** | $O(n \log n)$ | Balanced partitions |
+| **Space** | $O(\log n)$ | Due to recursion stack |
+
+**Implementation**
* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/quick_sort/src/quick_sort.cpp)
* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/quick_sort/src/quick_sort.py)
### Heap sort
-Heap Sort is a comparison-based sorting technique performed on a binary heap data structure. It leverages the properties of a heap to efficiently sort a dataset. The essential idea is to build a heap from the input data, then continuously extract the maximum element from the heap and reconstruct the heap until it's empty. The result is a sorted list.
+Heap Sort is a **comparison-based sorting algorithm** that uses a special data structure called a **binary heap**.
+It is efficient, with guaranteed $O(n \log n)$ performance, and sorts **in-place** (no extra array needed).
+
+The basic idea:
-#### Conceptual Overview
+1. **Build a max heap** from the input array.
+* In a max heap, every parent is greater than its children.
+* This ensures the **largest element is at the root** (first index).
+2. Swap the **root (largest element)** with the **last element** of the heap.
+3. Reduce the heap size by 1 (ignore the last element, which is now in place).
+4. **Heapify** (restore heap property).
+5. Repeat until all elements are sorted.
-1. The first step is to **build a max heap**, which involves transforming the list into a max heap (a complete binary tree where each node is greater than or equal to its children). This is typically achieved using a bottom-up approach to ensure the heap property is satisfied. *(Building the heap with Floyd’s bottom-up procedure costs Θ(*n*) time—lower than Θ(*n log n*)—so it never dominates the overall running time.)*
+**Example Run**
-2. During **sorting**, the maximum element (the root of the heap) is swapped with the last element of the unsorted portion of the array, placing the largest element in its final position. **After each swap, the newly “fixed” maximum stays at the end of the *same* array; the active heap is simply the prefix that remains unsorted.** The heap size is then reduced by one, and the unsorted portion is restructured into a max heap. This process continues until the heap size is reduced to one, completing the sort.
+We will sort the array:
+
+```
+[ 4 ][ 10 ][ 3 ][ 5 ][ 1 ]
+```
-#### Steps
+**Step 1: Build Max Heap**
-1. Construct a max heap from the given data. This will place the largest element at the root.
-2. Swap the root (maximum value) with the last element of the heap. This element is now considered sorted.
-3. Decrease the heap size by one (to exclude the sorted elements).
-4. "Heapify" the root of the tree, i.e., ensure the heap property is maintained.
-5. Repeat steps 2-4 until the size of the heap is one.
+Binary tree view:
```
-Initial array (size n = 5) index: 0 1 2 3 4
- 4 [4,10,3,5,1]
+ 4
/ \
10 3
/ \
5 1
+```
+
+Heapify → Largest at top:
-↓ BUILD MAX-HEAP (Θ(n)) —> heapSize = 5
- 10 [10,5,3,4,1]
- / \
- 5 3
+```
+ 10
+ / \
+ 5 3
/ \
4 1
+
+Array: [ 10 ][ 5 ][ 3 ][ 4 ][ 1 ]
```
-**Pass 1 extract-max**
+**Step 2: Swap Root with Last**
+Swap 10 ↔ 1 → largest (10) moves to correct final place.
+
+```
+[ 1 ][ 5 ][ 3 ][ 4 ][ 10 ]
```
-swap 10 ↔ 1 [1,5,3,4 | 10] heapSize = 4
- ↑ live heap ↑ ↑fixed↑
-heapify (1↔5, 1↔4) → [5,4,3,1 | 10]
- 5
- / \
- 4 3
+Heapify the reduced heap \[1,5,3,4]:
+
+```
+ 5
+ / \
+ 4 3
/
1
+
+Array: [ 5 ][ 4 ][ 3 ][ 1 ][ 10 ]
```
-**Pass 2 extract-max**
+**Step 3: Swap Root with Last**
+Swap 5 ↔ 1.
+
+```
+[ 1 ][ 4 ][ 3 ][ 5 ][ 10 ]
```
-swap 5 ↔ 1 [1,4,3 | 5,10] heapSize = 3
-heapify (1↔4) → [4,1,3 | 5,10]
- 4
- / \
- 1 3
+Heapify reduced heap \[1,4,3]:
+
+```
+ 4
+ / \
+ 1 3
+
+Array: [ 4 ][ 1 ][ 3 ][ 5 ][ 10 ]
+```
+
+**Step 4: Swap Root with Last**
+
+Swap 4 ↔ 3.
+
+```
+[ 3 ][ 1 ][ 4 ][ 5 ][ 10 ]
```
-**Pass 3 extract-max**
+Heapify reduced heap \[3,1]:
```
-swap 4 ↔ 3 [3,1 | 4,5,10] heapSize = 2
-(no heapify needed – root already ≥ child)
+ 3
+ /
+ 1
- 3
- /
- 1
+Array: [ 3 ][ 1 ][ 4 ][ 5 ][ 10 ]
```
-**Pass 4 extract-max**
+**Step 5: Swap Root with Last**
+
+Swap 3 ↔ 1.
```
-swap 3 ↔ 1 [1 | 3,4,5,10] heapSize = 1
-(heap of size 1 is trivially a heap)
+[ 1 ][ 3 ][ 4 ][ 5 ][ 10 ]
```
-**Pass 5 extract-max**
+✔ Sorted array achieved.
+
+**Final Result**
```
-Done – heapSize = 0
-Sorted array: [1,3,4,5,10]
+[ 1 ][ 3 ][ 4 ][ 5 ][ 10 ]
```
-#### Stability
+**Visual Progress**
+
+```
+Initial: [ 4 10 3 5 1 ]
+Heapify: [ 10 5 3 4 1 ]
+Step 1: [ 5 4 3 1 | 10 ]
+Step 2: [ 4 1 3 | 5 10 ]
+Step 3: [ 3 1 | 4 5 10 ]
+Step 4: [ 1 | 3 4 5 10 ]
+Sorted: [ 1 3 4 5 10 ]
+```
-Heap sort is inherently unstable. Similar to quicksort, the relative order of equal items is not preserved because of the long-distance exchanges.
+✔ Each step places the largest element into its correct final position.
-#### Time Complexity
+**Optimizations**
-- In the **worst-case**, the time complexity is $O(n \log n)$, regardless of the arrangement of the input data.
-- The **average-case** time complexity is also $O(n \log n)$, as the algorithm's structure ensures consistent performance.
-- In the **best-case**, the time complexity remains $O(n \log n)$, since building and deconstructing the heap is still necessary, even if the input is already partially sorted.
+* Building the heap can be done in **O(n)** time using bottom-up heapify.
+* After building, each extract-max + heapify takes **O(log n)**.
-#### Space Complexity
+**Stability**
-$O(1)$ – The sorting is done in-place, requiring only a constant amount of auxiliary space. **This assumes an *iterative* `siftDown/heapify`; a recursive version would add an \$O(\log n)\$ call stack.**
+Heap sort is **not stable**. Equal elements may not preserve their original order because of swaps.
-#### Implementation
+**Complexity**
+
+| Case | Time Complexity | Notes |
+|------------------|-----------------|--------------------------------|
+| **Worst Case** | $O(n \log n)$ | |
+| **Average Case** | $O(n \log n)$ | |
+| **Best Case** | $O(n \log n)$ | No early exit possible |
+| **Space** | $O(1)$ | In-place |
+
+**Implementation**
* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/heap_sort/src/heap_sort.cpp)
* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/heap_sort/src/heap_sort.py)
+
+### Radix Sort
+
+Radix Sort is a **non-comparison-based sorting algorithm**.
+Instead of comparing elements directly, it processes numbers digit by digit, from either the **least significant digit (LSD)** or the **most significant digit (MSD)**, using a stable intermediate sorting algorithm (commonly **Counting Sort**).
+
+Because it avoids comparisons, Radix Sort can achieve **linear time complexity** in many cases.
+
+The basic idea:
+
+1. Pick a **digit position** (units, tens, hundreds, etc.).
+2. Sort the array by that digit using a **stable sorting algorithm**.
+3. Move to the next digit.
+4. Repeat until all digits are processed.
+
+**Example Run (LSD Radix Sort)**
+
+We will sort the array:
+
+```
+[ 170 ][ 45 ][ 75 ][ 90 ][ 802 ][ 24 ][ 2 ][ 66 ]
+```
+
+**Step 1: Sort by 1s place (units digit)**
+
+```
+Original: [170, 45, 75, 90, 802, 24, 2, 66]
+
+By 1s digit:
+[170][90] (0)
+[802][2] (2)
+[24] (4)
+[45][75] (5)
+[66] (6)
+
+Result: [170][90][802][2][24][45][75][66]
+```
+
+**Step 2: Sort by 10s place**
+
+```
+[170][90][802][2][24][45][75][66]
+
+By 10s digit:
+[802][2] (0)
+[24] (2)
+[45] (4)
+[66] (6)
+[170][75] (7)
+[90] (9)
+
+Result: [802][2][24][45][66][170][75][90]
+```
+
+**Step 3: Sort by 100s place**
+
+```
+[802][2][24][45][66][170][75][90]
+
+By 100s digit:
+[2][24][45][66][75][90] (0)
+[170] (1)
+[802] (8)
+
+Result: [2][24][45][66][75][90][170][802]
+```
+
+**Final Result**
+
+```
+[ 2 ][ 24 ][ 45 ][ 66 ][ 75 ][ 90 ][ 170 ][ 802 ]
+```
+
+**Visual Process**
+
+```
+Step 1 (1s): [170 90 802 2 24 45 75 66]
+Step 2 (10s): [802 2 24 45 66 170 75 90]
+Step 3 (100s): [2 24 45 66 75 90 170 802]
+```
+
+✔ Each pass groups by digit → final sorted order.
+
+**LSD vs MSD**
+
+* **LSD (Least Significant Digit first):** Process digits from right (units) to left (hundreds). Most common, simpler.
+* **MSD (Most Significant Digit first):** Process from left to right, useful for variable-length data like strings.
+
+**Stability**
+
+* Radix Sort **is stable**, because it relies on a stable intermediate sort (like Counting Sort).
+* Equal elements remain in the same order across passes.
+
+**Complexity**
+
+* **Time Complexity:** $O(n \cdot k)$
+
+ * $n$ = number of elements
+ * $k$ = number of digits (or max digit length)
+
+* **Space Complexity:** $O(n + k)$ (depends on the stable sorting method used, e.g., Counting Sort).
+
+* For integers with fixed number of digits, Radix Sort can be considered **linear time**.
+
+**Implementation**
+
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/radix_sort/src/radix_sort.cpp)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/radix_sort/src/radix_sort.py)
+
+### Counting Sort
+
+Counting Sort is a **non-comparison-based sorting algorithm** that works by **counting occurrences** of each distinct element and then calculating their positions in the output array.
+
+It is especially efficient when:
+
+* The input values are integers.
+* The **range of values (k)** is not significantly larger than the number of elements (n).
+
+The basic idea:
+
+1. Find the **range** of the input (min to max).
+2. Create a **count array** to store the frequency of each number.
+3. Modify the count array to store **prefix sums** (cumulative counts).
+* This gives the final position of each element.
+4. Place elements into the output array in order, using the count array.
+
+**Example Run**
+
+We will sort the array:
+
+```
+[ 4 ][ 2 ][ 2 ][ 8 ][ 3 ][ 3 ][ 1 ]
+```
+
+**Step 1: Count Frequencies**
+
+```
+Elements: 1 2 3 4 5 6 7 8
+Counts: 1 2 2 1 0 0 0 1
+```
+
+**Step 2: Prefix Sums**
+
+```
+Elements: 1 2 3 4 5 6 7 8
+Counts: 1 3 5 6 6 6 6 7
+```
+
+✔ Now each number tells us the **last index position** where that value should go.
+
+**Step 3: Place Elements**
+
+Process input from right → left (for stability).
+
+```
+Input: [4,2,2,8,3,3,1]
+
+Place 1 → index 0
+Place 3 → index 4
+Place 3 → index 3
+Place 8 → index 6
+Place 2 → index 2
+Place 2 → index 1
+Place 4 → index 5
+```
+
+**Final Result**
+
+```
+[ 1 ][ 2 ][ 2 ][ 3 ][ 3 ][ 4 ][ 8 ]
+```
+
+**Visual Process**
+
+```
+Step 1 Count: [0,1,2,2,1,0,0,0,1]
+Step 2 Prefix: [0,1,3,5,6,6,6,6,7]
+Step 3 Output: [1,2,2,3,3,4,8]
+```
+
+✔ Linear-time sorting by counting positions.
+
+**Stability**
+
+Counting Sort is **stable** if we place elements **from right to left** into the output array.
+
+**Complexity**
+
+| Case | Time Complexity | Notes |
+|------------------|-----------------|------------------------------------------|
+| **Overall** | $O(n + k)$ | $n$ = number of elements, $k$ = value range |
+| **Space** | $O(n + k)$ | Extra array for counts + output |
+
+**Implementation**
+
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/counting_sort/src/counting_sort.cpp)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/counting_sort/src/counting_sort.py)
+
+### Comparison Table
+
+Below is a consolidated **side-by-side comparison** of all the sorts we’ve covered so far:
+
+| Algorithm | Best Case | Average | Worst Case | Space | Stable? | Notes |
+|----------------|------------|-------------|-------------|-------------|---------|------------------------|
+| **Bubble Sort** | O(n) | O(n²) | O(n²) | O(1) | Yes | Simple, slow |
+| **Selection Sort** | O(n²) | O(n²) | O(n²) | O(1) | No | Few swaps |
+| **Insertion Sort** | O(n) | O(n²) | O(n²) | O(1) | Yes | Good for small inputs |
+| **Quick Sort** | O(n log n) | O(n log n) | O(n²) | O(log n) | No | Very fast in practice |
+| **Heap Sort** | O(n log n) | O(n log n) | O(n log n) | O(1) | No | Guaranteed performance |
+| **Counting Sort** | O(n + k) | O(n + k) | O(n + k) | O(n + k) | Yes | Integers only |
+| **Radix Sort** | O(nk) | O(nk) | O(nk) | O(n + k) | Yes | Uses Counting Sort |
diff --git a/resources/time_complexity.py b/resources/time_complexity.py
new file mode 100644
index 0000000..0191eed
--- /dev/null
+++ b/resources/time_complexity.py
@@ -0,0 +1,50 @@
+import numpy as np
+import matplotlib.pyplot as plt
+
+# Data range
+n = np.arange(2, 101)
+
+# Big O example: f(n) = n log n, upper bound g(n) = n^2 (showing f(n) = O(n^2))
+f_big_o = n * np.log2(n)
+upper_bound_big_o = n ** 2
+
+plt.figure()
+plt.scatter(n, f_big_o, label=r"$f(n) = n \log_2 n$ (data points)", s=10)
+plt.plot(n, upper_bound_big_o, label=r"Upper bound $g(n) = n^2$", linewidth=1.5)
+plt.title("Big O Notation: $f(n) = O(n^2)$")
+plt.xlabel("n")
+plt.ylabel("Time / Growth")
+plt.legend()
+plt.grid(True)
+
+# Big Omega example: f(n) = n log n, lower bound h(n) = n (showing f(n) = Ω(n))
+f_big_omega = n * np.log2(n)
+lower_bound_big_omega = n
+
+plt.figure()
+plt.scatter(n, f_big_omega, label=r"$f(n) = n \log_2 n$ (data points)", s=10)
+plt.plot(n, lower_bound_big_omega, label=r"Lower bound $h(n) = n$", linewidth=1.5)
+plt.title("Big Omega Notation: $f(n) = \Omega(n)$")
+plt.xlabel("n")
+plt.ylabel("Time / Growth")
+plt.legend()
+plt.grid(True)
+
+# Theta example: noisy f(n) around n log n, bounds 0.8*n log n and 1.2*n log n
+base_theta = n * np.log2(n)
+np.random.seed(42)
+f_theta = base_theta * (1 + np.random.uniform(-0.15, 0.15, size=n.shape))
+lower_theta = 0.8 * base_theta
+upper_theta = 1.2 * base_theta
+
+plt.figure()
+plt.scatter(n, f_theta, label=r"Noisy $f(n) \approx n \log_2 n$", s=10)
+plt.plot(n, lower_theta, label=r"Lower tight bound $0.8 \cdot n \log_2 n$", linewidth=1.5)
+plt.plot(n, upper_theta, label=r"Upper tight bound $1.2 \cdot n \log_2 n$", linewidth=1.5)
+plt.title("Theta Notation: $f(n) = \Theta(n \log n)$")
+plt.xlabel("n")
+plt.ylabel("Time / Growth")
+plt.legend()
+plt.grid(True)
+
+plt.show()