Linear Search and Binary Search in Python — A Practical, Deep Guide
This guide explains linear search and binary search in Python in depth: what they are, how they work, when to use each, full Python implementations, variations, complexity analysis, common pitfalls, testing and benchmarking advice, and practical recommendations for real-world use. It assumes you understand basic Python and algorithms but keeps explanations accessible and hands-on. Code examples are ready to run and adapt.
- Overview: what these searches are and why they matter
Linear search (or sequential search) is the simplest search method: check each element one by one until you find the target or exhaust the collection. It's general-purpose and works on any sequence, sorted or not.
Binary search is a divide-and-conquer algorithm that works on sorted sequences. It repeatedly cuts the search range in half by comparing the target to the middle element and discarding the half that cannot contain the target. Binary search is far more efficient than linear search for large, sorted arrays: O(log n) vs O(n).
Both algorithms are foundational. Linear search is the baseline: simple, robust, safe. Binary search is a dramatic example of how using the right assumptions (sorted data) yields huge gains. Understanding both — and when to apply them — is essential for writing efficient code.
- Linear search — definition, intuition, and Python implementations 2.1 Intuition
Imagine scanning a grocery list with your finger, left-to-right, looking for "eggs." You check each item until you either find "eggs" or reach the end. That’s linear search.
It’s appropriate when:
The data is unsorted.
The dataset is small.
You only need a single pass or the data structure doesn’t support random access (e.g., a linked list).
The search predicate is complex and cannot be expressed as a simple ordering.
2.2 Basic Python implementations Return index (iterative) def linear_search_index(arr, target): """Return index of target in arr, or -1 if not found.""" for i in range(len(arr)): if arr[i] == target: return i return -1
Return element (iterative) def linear_search_element(arr, target): """Return the element equal to target, or None if not found.""" for item in arr: if item == target: return item return None
Find first that matches a predicate def find_first(iterable, predicate): """Return first item satisfying predicate, or None.""" for item in iterable: if predicate(item): return item return None
Using Python built-ins (idiomatic) if target in arr: # membership check; uses linear search for lists idx = arr.index(target) # raises ValueError if not present
Using next with a generator for predicate-based search result = next((x for x in iterable if predicate(x)), None)
2.3 Complexity
Time (worst-case): O(n) — you may need to inspect every element.
Time (best-case): O(1) — the target is the first element.
Space: O(1) additional space (iterative).
2.4 Variations and micro-optimizations
Sentinel method: append a sentinel to avoid bound checks (rarely relevant in Python).
Return all matches: collect indices or elements — requires full scan.
Use set/dict for repeated membership tests: if you need many membership queries, convert the list to a set for average O(1) membership.
Numpy or C extensions: for numeric arrays, vectorized operations can be faster than Python-level loops.
2.5 When linear search is the right tool
Data is unsorted and not searched frequently.
Data is small or stored in structures requiring sequential traversal (linked lists, streams).
The predicate is arbitrary (e.g., first object with attribute x > 5).
- Binary search — definition, intuition, correctness 3.1 Intuition
Binary search requires a sorted array. Suppose you want to find x in a sorted list. Check the middle element:
If it equals x, you're done.
If it is less than x, you know that all elements left of the middle are also less than x (because the array is sorted), so you search the right half.
If it is greater than x, search the left half.
Repeat until the search range is empty.
This halving process gives binary search its efficiency. Each comparison eliminates half of the remaining possibilities.
3.2 Preconditions and correctness
Precondition: the collection must be sorted according to a total order consistent with the comparison operation used (e.g., ascending order).
Correctness can be shown by induction: at each step the algorithm maintains an invariant — the target, if present, lies within the current low..high interval. Halving preserves the invariant until the interval shrinks to size 0 or you find the target.
- Binary search in Python — multiple implementations
Binary search can be implemented in different forms depending on requirements:
Find any occurrence of a target.
Find the first or last occurrence in a sequence with duplicates.
Find the insertion point (lower bound, upper bound) — useful for sorted insertions or counting elements in a range.
Python’s bisect module provides ready-made functions (bisect_left, bisect_right) for insertion points. But implementing from scratch is instructive.
4.1 Classic iterative binary search (returns index or -1) def binary_search(arr, target): """Return index of target in sorted arr, or -1 if not found.""" lo, hi = 0, len(arr) - 1 while lo <= hi: mid = (lo + hi) // 2 if arr[mid] == target: return mid elif arr[mid] < target: lo = mid + 1 else: hi = mid - 1 return -1
Notes:
Use integer division //.
The lo <= hi condition ensures the search covers single-element ranges.
4.2 Recursive binary search def binary_search_recursive(arr, target, lo=0, hi=None): if hi is None: hi = len(arr) - 1 if lo > hi: return -1 mid = (lo + hi) // 2 if arr[mid] == target: return mid if arr[mid] < target: return binary_search_recursive(arr, target, mid + 1, hi) else: return binary_search_recursive(arr, target, lo, mid - 1)
Recursion is elegant but Python recursion depth can be a limit for very deep recursions. Iterative form is preferable in production.
4.3 Finding the leftmost (first) occurrence
When duplicates exist and you want the first index where arr[i] == target:
def binary_search_first(arr, target): lo, hi = 0, len(arr) while lo < hi: mid = (lo + hi) // 2 if arr[mid] < target: lo = mid + 1 else: hi = mid # lo is the insertion point for target; check if present if lo < len(arr) and arr[lo] == target: return lo return -1
This is essentially bisect_left. Note the convention hi = len(arr) and lo < hi loop.
4.4 Finding the rightmost (last) occurrence def binary_search_last(arr, target): lo, hi = 0, len(arr) while lo < hi: mid = (lo + hi) // 2 if arr[mid] <= target: lo = mid + 1 else: hi = mid # insertion point of target+epsilon is lo; last occurrence is lo-1 idx = lo - 1 if idx >= 0 and arr[idx] == target: return idx return -1
This is analogous to bisect_right(arr, target) - 1.
4.5 Using bisect from the standard library import bisect
i = bisect.bisect_left(arr, target) if i != len(arr) and arr[i] == target: print("Found at", i) else: print("Not found")
bisect is implemented in C for CPython and is reliable and fast.
- Complexity analysis: big-O, comparisons, and costs 5.1 Linear search
Worst-case time: O(n) comparisons.
Average-case (uniform): O(n/2) ~ O(n).
Space: O(1) additional.
5.2 Binary search
Worst-case time: O(log n) comparisons.
Average-case: O(log n).
Space: O(1) additional (iterative).
Binary search comparisons are cheap for primitives (numbers, strings). But for expensive comparisons (complex objects, custom comparators), the cost per comparison matters; e.g., binary search still drastically reduces the number of comparisons versus linear search.
5.3 Practical cost trade-offs
For small n (say, n < ~30–50 depending on hardware and data access patterns), linear search can be faster due to lower overhead and better cache behavior.
For many repeated queries on the same dataset, paying the cost to sort once (O(n log n)) or building an index (set/dict) will pay off.
Binary search requires random access (arrays, lists); it’s not directly usable on linked lists without additional indexing.
- Edge cases and common pitfalls 6.1 Off-by-one errors
Binary search implementations are notoriously sensitive to off-by-one mistakes. Two helpful strategies:
Use established templates (lo, hi = 0, len(arr) with while lo < hi for insertion-point style).
Test boundary cases thoroughly: empty array, single-element array, target at ends, target not present but within range, duplicates.
6.2 Integer mid computation overflow (historical)
In languages with fixed-width integers, computing mid = (lo + hi) // 2 could overflow. The safer version is mid = lo + (hi - lo) // 2. In Python this is not an issue because integers are unbounded, but the safer formula is still a good habit.
6.3 Sortedness assumption
Binary search requires that the data be sorted by the same comparison semantics used in the search. If the data is not sorted or sorting order differs, results are undefined.
6.4 Floating point and comparison tolerance
Searching for floats that were produced by computations may require tolerance-based comparisons (abs(a - b) < eps). With such predicates, binary search semantics may be trickier; use custom checks or transformations.
6.5 Mutable data
If the array changes between searches (insertions, deletions), you must maintain sortedness for binary search, or rebuild indexes accordingly.
- Variations and related techniques 7.1 Interpolation search
For uniformly distributed numeric data, interpolation search can be faster than binary search on average (O(log log n) expected), because it guesses a location proportional to the target value. It’s rarely used in standard libraries and sensitive to distribution assumptions.
7.2 Exponential (galloping) search
When the array is unbounded or you’re searching an unbounded sorted stream, exponential search finds a range by doubling until the target is within range, then uses binary search. Complexity: O(log n).
7.3 Fractional cascading and search structures
For multiple related sorted arrays, specialized data structures reduce repeated binary search cost. These are advanced techniques used in computational geometry.
7.4 Search in rotated sorted arrays
A classic interview variant: the array is a rotated sorted array (e.g., [4,5,6,7,0,1,2]). You can still find a target in O(log n) by modifying binary search logic to detect sorted halves.
- Practical Python advice: which to use when
Single, one-off search on an unsorted list: linear search (in, index) is simplest.
Many membership queries on the same dataset: build a set or dict for average O(1) membership.
Data sorted and many queries: use bisect or binary search; or consider sortedcontainers for dynamic sorted sequences.
Numeric arrays with heavy computation: consider numpy for vectorized search/filter; np.searchsorted is analogous to binary search.
Streaming or non-indexable data: linear traversal (possibly with generators) is the only option.
Need sorted order maintained with frequent inserts/deletes: balanced binary search trees (e.g., bisect on lists has O(n) insertion; consider bisect for mostly-static arrays or specialized structures for dynamic workloads).
- Testing and benchmarks 9.1 Unit tests
Cover:
Empty arrays
One-element arrays (target present/absent)
Target at beginning/end/middle
Duplicates (first, last)
Non-existent targets
Rotated arrays (if implementing that variant)
Example using pytest:
def test_binary_search_basic(): arr = [1,2,3,4,5] assert binary_search(arr, 3) != -1 assert binary_search(arr, 6) == -1
9.2 Benchmarking tips
Use timeit for microbenchmarks.
Compare built-in in/list.index vs manual loops vs bisect for realistic dataset sizes.
For large numeric arrays, measure numpy functions as well.
Repeat measurements and use timeit.repeat to reduce noise.
9.3 Representative benchmarks (conceptually)
Small arrays (n < 50): linear search may be faster.
Large arrays (n > 1e4): binary search wins dramatically if array is sorted.
Many repeated queries: building an index (set/dict) is typically best.
- Full worked example: code, tests, and micro-benchmarks
Below is a compact example that bundles linear search, binary search, and use of bisect. Runable in a Python script or notebook.
import bisect import random import timeit
def linear_search(arr, target): for i, v in enumerate(arr): if v == target: return i return -1
def binary_search(arr, target): lo, hi = 0, len(arr) - 1 while lo <= hi: mid = (lo + hi) // 2 if arr[mid] == target: return mid elif arr[mid] < target: lo = mid + 1 else: hi = mid - 1 return -1
n = 10000 arr = sorted(random.sample(range(n*10), n)) target_present = arr[n//2] target_absent = -1
print("linear present:", timeit.timeit(lambda: linear_search(arr, target_present), number=100)) print("binary present:", timeit.timeit(lambda: binary_search(arr, target_present), number=100)) print("bisect present:", timeit.timeit(lambda: bisect.bisect_left(arr, target_present), number=100))
This will show binary search and bisect being much faster than the linear approach for n=10000 when repeating searches.
- Summary and practical checklist
Linear search: simple, works on any sequence, O(n). Use for unsorted data or single-pass scans.
Binary search: requires sorted data, O(log n). Use when data is sorted and you need efficient lookups.
Use set/dict for repeated membership queries on unsorted data.
Use bisect for insertion points and fast binary-search-based utilities in Python (it's implemented in C and fast).
Test boundary cases thoroughly for binary search to avoid off-by-one bugs.
For small arrays, prefer simplicity (sometimes linear is faster). For large arrays, prefer algorithmic efficiency.