# K Most Frequent Strings
Find the k most frequently occurring strings in an array, and return them sorted by frequency in descending order. If two strings have the same frequency, sort them in lexicographical order.

**Example:**
Input: strs = ['go', 'coding', 'byte', 'byte', 'go', 'interview', 'go'], k = 2<br/>
Output: ['go', 'byte']

Explanation: The strings "go" and "byte" appear the most frequently, with frequencies of 3 and 2, respectively.

**Constraints:**
- k <= n, where n denotes the length of the array.

## **Intuition - Max-Heap**
To solve this problem efficiently, we must address two main challenges:  
1. **Identifying the k most frequent strings**  
2. **Sorting these strings** first by frequency (descending) and then lexicographically (ascending in case of ties).

---

### **1. Identifying the k Most Frequent Strings**
We use a **hash map (dictionary)** to store string frequencies:
- **Keys** → Unique strings  
- **Values** → Corresponding frequencies  

A simple but inefficient approach is to:
1. **Sort all strings** by frequency in descending order.
2. **Select the first k strings** from the sorted list.

However, sorting **all n strings** takes **O(n log n) time**, even though we only need the top k.

---

### **2. Optimized Approach: Using a Max-Heap**
Instead of sorting the entire list, we leverage a **max-heap** (priority queue) to efficiently extract the top k elements.

#### **Why a Max-Heap?**
- The most frequent string **always stays on top**.
- By removing the top element k times, we efficiently get the k most frequent strings.

#### **Heap Operations**
1. **Build the heap**  
   - Push all **(string, frequency)** pairs into the heap.
   - Instead of inserting elements one by one (**O(n log n)**), we can use the **heapify operation**, which constructs the heap in **O(n) time**.

2. **Extract k elements from the heap**  
   - Pop the **most frequent** string from the heap **k times**.
   - Each pop operation takes **O(log n)**, making this step **O(k log n)**.

---

### **Handling Lexicographic Order**
When two strings have the same frequency, the one **earlier in lexicographic order** should have higher priority.  
- We define a **custom comparator** for the heap:
  - **Primary sorting** → By frequency (descending).  
  - **Secondary sorting** → By string value (ascending).  

---

### **Time & Space Complexity**
- **Total Time Complexity** → **O(n + k log n)**
- **Building the heap** → **O(n)**
- **Extracting k elements** → **O(k log n)**
- **Space Complexity** → **O(n)** (for storing the hash map and heap)

In [2]:
from typing import List
from collections import Counter
import heapq

class Pair:
    def __init__(self, str, freq):
        self.str = str
        self.freq = freq
    
    def __lt__(self, other):
        if self.freq == other.freq:
            return self.str < other.str
        
        return self.freq > other.freq

def k_most_frequent_strings(strs: List[str], k: int) -> List[str]:
    freqs = Counter(strs)
    max_heap = [Pair(str, freq) for str, freq in freqs.items()]
    heapq.heapify(max_heap)

    return [heapq.heappop(max_heap).str for _ in range(k)]


## **Intuition - Min-Heap**
Can we maintain a heap with a **space complexity of O(k)** while still efficiently retrieving the k most frequent strings?  

### **Key Observation**
We only need the **k most frequent** strings. This means that if our heap ever exceeds size k, we can **discard the least frequent elements**. Since these discarded elements have lower frequencies, they **cannot** be part of the final result.

### **Why Use a Min-Heap Instead of a Max-Heap?**
- A **max-heap** efficiently retrieves the most frequent elements, but **removing the least frequent element** is inefficient.  
- A **min-heap** allows us to **always remove the least frequent element** when the heap size exceeds k, ensuring that the heap only stores the top k elements.

### **Heap Operations**
1. **Insert elements into the heap**  
   - Maintain a **heap of size at most k**.  
   - If the heap exceeds k, remove the **least frequent element** (heap root).  

2. **Retrieve elements from the heap**  
   - Pop elements one by one until empty.
   - Since we used a **min-heap**, we retrieve elements **from lowest to highest frequency**, so we must **reverse the order** before returning the result.

### **Time & Space Complexity**
- **Total Time Complexity** → **O(n log k)**
- **Heap insertions/removals** → **O(n log k)** (each insertion/removal takes log k time)
- **Extracting elements** → **O(k log k)**
- **Space Complexity** → **O(N)** (for storing the hash map. Heap only takes O(k) space.)

In [3]:
from typing import List
from collections import Counter
import heapq

class Pair:
    def __init__(self, str, freq):
        self.str = str
        self.freq = freq
    
    def __lt__(self, other):
        if self.freq == other.freq:
            return self.str > other.str
        
        return self.freq < other.freq

def k_most_frequent_strings(strs: List[str], k: int) -> List[str]:
    freqs = Counter(strs)
    min_heap = []
    for str, freq in freqs.items():
        heapq.heappush(min_heap, Pair(str, freq))

        if len(min_heap) > k:
            heapq.heappop(min_heap)
    
    res=[heapq.heappop(min_heap).str for _ in range(k)]
    res.reverse()

    return res