## 692. Top K Frequent Words
- Description:
  <blockquote>
    Given an array of strings words and an integer k, return the k most frequent strings.

  Return the answer sorted by the frequency from highest to lowest. Sort the words with the same frequency by their lexicographical order.

  Example 1:

  Input: words = ["i","love","leetcode","i","love","coding"], k = 2
  Output: ["i","love"]
  Explanation: "i" and "love" are the two most frequent words.
  Note that "i" comes before "love" due to a lower alphabetical order.

  Example 2:

  Input: words = ["the","day","is","sunny","the","the","the","sunny","is","is"], k = 4
  Output: ["the","is","sunny","day"]
  Explanation: "the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively.

  Constraints:

      1 <= words.length <= 500
      1 <= words[i].length <= 10
      words[i] consists of lowercase English letters.
      k is in the range [1, The number of unique words[i]]

  Follow-up: Could you solve it in O(n log(k)) time and O(n) extra space?

  </blockquote>

- URL: [Problem_URL](https://leetcode.com/problems/top-k-frequent-words/description/?envType=company&envId=attentive&favoriteSlug=attentive-all)

- Topics: Heap, Bucket Sort

- Difficulty: Medium

- Resources: [Top K Frequent Elements.py](Top%20K%20Frequent%20Elements.py)

### Solution Brute Force
Brute force solution using hash map and sorting
- Time Complexity: O(NlogN)
  - We count the frequency of each word in O(N) time, and then we sort the given words in O(NlogN) time.
- Space Complexity: O(N)
  - the space used to store frequencies in a HashMap and return a slice from a sorted list of length O(N).

In [None]:
from collections import Counter
from typing import List

class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        cnt = Counter(words)
        return sorted(list(cnt.keys()), key=lambda x: (-cnt[x], x))[:k]

### Solution 1 Most Optimum
Heap solution using Pair class to define custom less than magic method to use for heap comparison
- Time Complexity: O(Nlogk)
  - where N is the length of words. We count the frequency of each word in O(N) time, then we add N words to the heap, each in O(logk) time. Finally, we pop from the heap up to k times or just sort all elements in the heap as the returned result, which takes O(klogk). As k≤N, O(N)+O(Nlogk)+O(klogk)=O(Nlogk)
- Space Complexity: O(N)
  - O(N) space is used to store our Counter count while O(k) space is for the heap.

In [None]:
from collections import Counter
from typing import List
import heapq


class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        count = Counter(words)
        heap = []

        for word, freq in count.items():
            heapq.heappush(heap, Pair(freq, word))

            if len(heap) > k:
                heapq.heappop(heap)

        return [pair.word for pair in sorted(heap, reverse=True)]


class Pair:
    def __init__(self, freq, word):
        self.freq = freq
        self.word = word

    def __lt__(self, other):
        if self.freq == other.freq:
            # for words with same frequency, sort by word in reverse order (lexicographical order)
            return self.word > other.word
        return self.freq < other.freq

### Solution 2
Max Heap Solution
- Time Complexity: O(N+klogN)
  - We count the frequency of each word in O(N) time and then heapify the list of unique words in O(N) time. Each time we pop the top from the heap, it costs logN time as the size of the heap is O(N).
  - Overall time complexity:
    - Creating the counter: O(n)
    - Sorting the keys: O(m log m) where m is the number of unique words
    - Slicing the result: O(k)
    - The dominant term is usually the sorting operation, so the final time complexity is O(n + m log m). In the worst case where all words are unique (m = n), this becomes O(n log n).
- Space Complexity: O(N)
  - the space used to store our counter cnt and heap h.

In [None]:
from collections import Counter
from typing import List
from heapq import heapify, heappop


class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        cnt = Counter(words)
        heap = [(-freq, word) for word, freq in cnt.items()]
        heapify(heap)

        return [heappop(heap)[1] for _ in range(k)]

In [None]:
sol = Solution()

test_cases = [
    (["i","love","leetcode","i","love","coding"], 2, ["i", "love"]),
    (["i","love","leetcode","i","love","coding"], 3, ["i","love","coding"]),
    # (["the","day","is","sunny","the","the","the","sunny","is","is"], 4, ["the", "is", "sunny", "day"]),
    # (["a", "b", "c", "a", "b", "a"], 2, ["a", "b"]),
    # (["apple", "banana", "apple", "orange"], 1, ["apple"]),
    # (["hello", "world", "hello"], 1, ["hello"]),
]

for input, k, expected in test_cases:
    result = sol.topKFrequent(input, k)
    assert result == expected, f"Failed with input {input}, k {k}: got {result}, expected {expected}"

print("All tests passed!")