
# Lab 4: Big‑O Analysis

**Course:** CSCI 3143 Data Structures  

**Goals:**
1. **Formal Big‑O**: what it *means*, properties, quick practice.
2. **Sorting**: implement several classic sorts (with missing lines to fill), then analyze. Compare to Python’s `list.sort()` (Timsort).
3. **Anagram detection**: complete approaches disussed in textbook, analyze their Big‑O, and compare.




## Part 1 — Formal Big‑O (upper bounds)

**Definition (Big‑O):** For functions $f(n)$ and $g(n)$ with $f,g:\mathbb{N} \to \mathbb{R}_{\ge 0}$, we say  
$f(n) \in O(g(n))$ iff **there exist** constants $c>0$ and $n_0\ge 1$ such that  
$\forall n\ge n_0,\quad f(n) \le c\cdot g(n).$

**Intuition.** Beyond some threshold $n_0$, $g(n)$ is an *asymptotic upper bound* for $f(n)$ up to a constant multiplier.

### Quick practice (answer briefly in Markdown right below)
1. Prove (in 2–3 lines) that $3n^2 + 7n + 100 \in O(n^2)$ by choosing explicit constants $c,n_0$.



2. Is $5n\log n \in O(n^{1+\epsilon})$ for any fixed $\epsilon>0$? Why?

3. Let $f(n)=n^2 + 20n$ and $g(n)=n^2$. Is $f(n) \in O(g(n))$? Is $g(n) \in O(f(n))$?

**Useful facts (no proofs here):**
- If $f(n) \in O(g(n))$ and $g(n) \in O(h(n))$, then $f(n) \in O(h(n))$. *(transitivity)*  
- For polynomials, the **highest degree** term dominates: $an^k + \cdots \in O(n^k)$ if $a>0$.
- Log base is irrelevant: $\log_a n = \frac{\log_b n}{\log_b a}$ (base changes only a constant factor).
- Typical growth order: $1 \prec \log n \prec n \prec n\log n \prec n^2 \prec n^3 \prec 2^n \prec n!$.


## Part 2 — Build (and analyze) Sorting Algorithms

Complete each function by filling the `TODO` lines. Keep implementations **simple and clear**.  
Then answer the analysis questions after each algorithm.

### Helpers

First let's build some helpers. 

In [None]:
import random


def demo_data(n=12, low=0, high=99, seed=0):
    random.seed(seed)
    return [random.randint(low, high) for _ in range(n)]


data1 = demo_data(n=6)
data1

4. Make a function that tests if data is sorted. Test it with your demo data.

In [None]:
from typing import List  # For type hints


def is_sorted(a: List[int]) -> bool:
    return False  # Complete function

`(a: List[int])` is a **type hint** saying:

- that the parameter a should be a list of integers, and
- the -> bool means the function will return a Boolean value (True or False). 

These hints don’t change how the code runs (Python won’t enforce them at runtime), but they make the code easier to read and help tools like editors and checkers catch mistakes — for example, warning you if you accidentally pass a string instead of a list of numbers.

### Run time function

This function runs the time of an algorithm.

Note: `Callable[[list], list]` means "alg is expected to be a callable object (usually a function) that takes a list as its argument and returns a list."

In [3]:
import time
from typing import List, Callable


def is_sorted(a: List[int]) -> bool:
    return all(a[i] <= a[i + 1] for i in range(len(a) - 1))  # Complete function


def time_sort(alg: Callable[[list], list], a: list, repeats=3):
    """Return min runtime over a few repeats (seconds). alg must return a NEW list or sort in place and return it."""
    best = float("inf")
    for _ in range(repeats):
        b = list(a)  # fresh copy
        t0 = time.perf_counter()
        out = alg(b)
        dt = time.perf_counter() - t0
        assert is_sorted(out), "algorithm failed to sort"
        best = min(best, dt)
    return best

5. Test the base Python sort on n=100, 1000, and 10000 random integers. What do you observe?

In [None]:
def base_sort(a: list) -> list:
    return sorted(a)


data1 = demo_data(10)

time_sort(base_sort, data1)

### 2.1 Bubble Sort (basic)

Complete the inner swap and loop control. (Naive version: keep sweeping until no swaps.)

In [None]:
def bubble_sort(a: list) -> list:
    a = list(a)
    n = len(a)

    return a


# Quick check
print(bubble_sort(demo_data()))


**Analysis:**
- Worst‑case time: ___  
- Best‑case time (already sorted): ___  
- Stable? ___ 
- In‑place? ___


### 2.2 Selection Sort

On each pass, select the minimum from the unsorted suffix and swap it into place.

In [None]:
def selection_sort(a: list) -> list:
    a = list(a)
    n = len(a)

    return a


print(selection_sort(demo_data()))


**Analysis:**
- Worst‑case time: ___  
- Best‑case time (already sorted): ___  
- Stable? ___ 
- In‑place? ___

### 2.3 Insertion Sort

Insert each element into the already‑sorted prefix `a[:i]`.

In [None]:
def insertion_sort(a: list) -> list:
    a = list(a)

    return a


print(insertion_sort(demo_data()))


**Analysis:**
- Worst‑case time: ___  
- Best‑case time (already sorted): ___  
- Stable? ___ 
- In‑place? ___

### 2.4 Merge Sort (divide & conquer)

Complete the merge logic. (Return a **new** sorted list.)

In [None]:
def merge_sort(a: list) -> list:
    n = len(a)
    if n <= 1:
        return list(a)
    mid = n // 2
    left = merge_sort(a[:mid])
    right = merge_sort(a[mid:])
    # TODO: merge two sorted lists left and right into result
    i = j = 0
    result = []

    # TODO: then extend result with any tail

    return result


print(merge_sort(demo_data()))


**Analysis:**
- Worst‑case time: ___  
- Best‑case time (already sorted): ___  
- Stable? ___ 
- In‑place? ___

### 2.5 Python’s `list.sort()` (Timsort)

Python uses **Timsort**, a hybrid of merge sort and insertion sort that exploits **existing runs** and is **stable**. Empirically test your algorithms vs `sorted` on random and nearly‑sorted data.

In [None]:
def python_sort(a: list) -> list:
    a = list(a)
    a.sort()  # in place
    return a


arr_random = demo_data(n=2000, seed=42)
arr_near_sorted = list(range(2000))
# introduce a few small swaps
for i in range(0, 2000, 200):
    arr_near_sorted[i], arr_near_sorted[i + 1] = (
        arr_near_sorted[i + 1],
        arr_near_sorted[i],
    )

for name, alg in [
    ("bubble", bubble_sort),
    ("selection", selection_sort),
    ("insertion", insertion_sort),
    ("merge", merge_sort),
    ("python_sort", python_sort),
]:
    try:
        t1 = time_sort(alg, arr_random, repeats=1)
        t2 = time_sort(alg, arr_near_sorted, repeats=1)
        print(f"{name:12s} random ~ {t1:.4f}s   near‑sorted ~ {t2:.4f}s")
    except AssertionError as e:
        print(f"{name:12s} failed — complete your TODOs first!")


**Questions (brief):**
1. For random arrays, which algorithms are fastest/slowest? Does that match their theoretical Big‑O?
2. For nearly‑sorted arrays, which algorithm benefits most? Why is Timsort designed to exploit this?
3. When (and why) is $O(n\log n)$ comparison sorting the **best we can do** in general (decision‑tree argument)?



## Part 3 — Anagram Detection (four approaches)

Complete each function by filling in the missing lines, then analyze time complexity and compare.

We assume lowercase ASCII letters and equal lengths for simplicity.


### 3.1 Checking Off

In [None]:
def are_anagrams_checkoff(s1: str, s2: str) -> bool:
    if len(s1) != len(s2):
        return False
    checklist = list(s2)  # because strings are immutable
    for ch in s1:
        # TODO: scan checklist to find ch; if found, mark that slot (e.g., None) and break
        # if not found at all, return False
        pass
    return True


# quick sanity
print(are_anagrams_checkoff("heart", "earth"))

**Complexity:** 
- worst‑case time ___ 
- space ___

### 3.2 Sort and Compare

In [None]:
def are_anagrams_sort_compare(s1: str, s2: str) -> bool:
    # TODO: sort both strings and compare equality
    pass


print(are_anagrams_sort_compare("python", "typhon"))

**Complexity:** 
- worst‑case time ___ 
- space ___

### 3.3 Brute Force (generate permutations)

In [None]:
import itertools

perms = itertools.permutations("abc")
list(perms)

[('a', 'b', 'c'),
 ('a', 'c', 'b'),
 ('b', 'a', 'c'),
 ('b', 'c', 'a'),
 ('c', 'a', 'b'),
 ('c', 'b', 'a')]

In [None]:
def are_anagrams_bruteforce(s1: str, s2: str) -> bool:
    if len(s1) != len(s2):
        return False
    # TODO: generate permutations of s1 and check if any equals s2
    # WARNING: Only feasible for very small n.
    pass


print(are_anagrams_bruteforce("abc", "cba"))  # ok for n=3

**Complexity:**
- time: ___ 
- Why is this impractical beyond tiny n?

### 3.4 Count and Compare (frequency table)

We can index letters using `ord()`, which you the Unicode code point (an integer) for a character.

In [21]:
ord("a")

97

In [None]:
ord("b") - ord("a")

1

In [15]:
def are_anagrams_count_compare(s1: str, s2: str) -> bool:
    if len(s1) != len(s2):
        return False
    counts = [0] * 26
    # TODO: for each position i, increment count for s1[i], decrement for s2[i]
    # finally, return True if all counts are zero
    pass


print(are_anagrams_count_compare("listen", "silent"))
print(are_anagrams_count_compare("lister", "silent"))

None
None



**Complexity:** 
- worst‑case time ___ 
- space ___

**Compare all four:** Which approach is asymptotically best? Which has the biggest constant factors?


# Summary of Asymptotic Notation

- **Big-O (O)**  
  Upper bound on growth.  
  *The algorithm takes at most this much time (up to constants).*  
  - Example: Bubble Sort is O(n²).

- **Big-Omega (Ω)**  
  Lower bound on growth.  
  *The algorithm takes at least this much time (up to constants).*  
  - Example: Bubble Sort is Ω(n) (when input is already sorted, it still has to scan once).

- **Big-Theta (Θ)**  
  Tight bound on growth.  
  *When an algorithm is both O(f(n)) and Ω(f(n)), it is Θ(f(n)).*  
  - Example: Merge Sort is Θ(n log n).

---

**Summary:**  
- **Ω** = guaranteed floor on running time  
- **O** = guaranteed ceiling  
- **Θ** = exact growth rate (tight bound)


---

## Short Reflection

- Summarize each sorting algorithm’s **Big‑O** and a one‑line justification.  

- When would you prefer insertion sort over merge sort in *practice*?  


- Why are comparison sorts bounded below by $\Omega(n\log n)$ in the worst case, and how does Timsort still do better on nearly‑sorted data?  

---

## Self‑Assessment
Please mark one option by editing the brackets to `[x]`:

- [ ] **10** – I completed all of this work on my own (learning from in‑class ideas/approaches).
- [ ] **8** – I completed most on my own, with some out‑of‑class help (peers/online).
- [ ] **6** – I needed significant help (peers/online/AI) to complete parts.
- [ ] **4** – I mostly copied code from others/AI and **do not** fully understand it.
- [ ] **2** – I copied almost everything without attempting to understand it.