# Top K Frequent Elements

In [1]:
'''
Difficulty: Medium
'''

'\nDifficulty: Medium\n'

## Problem Statement

In [2]:
'''
Given an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.

Example 1:
Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]

Example 2:
Input: nums = [1], k = 1
Output: [1]
 
Constraints:
1 <= nums.length <= 105
k is in the range [1, the number of unique elements in the array].
It is guaranteed that the answer is unique.
 
Follow up: Your algorithm's time complexity must be better than O(n log n), where n is the array's size.
'''

"\nGiven an integer array nums and an integer k, return the k most frequent elements. You may return the answer in any order.\n\nExample 1:\nInput: nums = [1,1,1,2,2,3], k = 2\nOutput: [1,2]\n\nExample 2:\nInput: nums = [1], k = 1\nOutput: [1]\n \nConstraints:\n1 <= nums.length <= 105\nk is in the range [1, the number of unique elements in the array].\nIt is guaranteed that the answer is unique.\n \nFollow up: Your algorithm's time complexity must be better than O(n log n), where n is the array's size.\n"

## Given Test Cases

In [3]:
'''
Example 1:
Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]

Example 2:
Input: nums = [1], k = 1
Output: [1]
'''

'\nExample 1:\nInput: nums = [1,1,1,2,2,3], k = 2\nOutput: [1,2]\n\nExample 2:\nInput: nums = [1], k = 1\nOutput: [1]\n'

### Data Setup

In [4]:
nums1, k1 = [1,1,1,2,2,3], 2
nums2, k2 = [1], 1

## Strategy and Solution

### Brute Force Time = O(nlogn) | Space = O(n)

In [5]:
'''
CONCEPT
    the brute force solution would be the most direct way to do this:
    count the freq of every number in the list, then sort those numbers by their freq, and then return the top k values

IMPLEMENTATION
    create a freq dictionary
    O(n) spent to traverse the list, using the numbers as keys and the freq of that num as the value. return the values as a list
    O(nlogn) to sort the list values of dictionary in descending order
    O(k) return the first k values of this list

ANALYSIS
    as mentioned above, our runtime is going to be nlogn+n+k, or worst case, in the class of O(nlogn)
    at worst case, we spend O(n) space for the dictionary, and another O(n) for the sorted list (aka every value is unique), and thus, O(2n) is in the class of O(n)
'''

'\nCONCEPT\n    the brute force solution would be the most direct way to do this:\n    count the freq of every number in the list, then sort those numbers by their freq, and then return the top k values\n\nIMPLEMENTATION\n    create a freq dictionary\n    O(n) spent to traverse the list, using the numbers as keys and the freq of that num as the value. return the values as a list\n    O(nlogn) to sort the list values of dictionary in descending order\n    O(k) return the first k values of this list\n\nANALYSIS\n    as mentioned above, our runtime is going to be nlogn+n+k, or worst case, in the class of O(nlogn)\n    at worst case, we spend O(n) space for the dictionary, and another O(n) for the sorted list (aka every value is unique), and thus, O(2n) is in the class of O(n)\n'

In [6]:
def topKFrequent(nums, k):
    freq_dict = {}

    for int in nums:
        if int in freq_dict.keys():
            freq_dict[int] += 1
        else:
            freq_dict[int] = 1

    freq_list = list(freq_dict.items())
    freq_list = sorted(freq_list, key = lambda x : x[1], reverse=True)
    
    return [freq_list[i][0] for i in range(k)]

In [7]:
print(nums1)
topKFrequent(nums1, k1)

[1, 1, 1, 2, 2, 3]


[1, 2]

### Faster Solution Time = O(klogn) | Space = O(n)

In [8]:
'''
CONCEPT 
    we can further improve upon the runtime because we dont need to sort the entire array (rather, only need to return the first k sorted elements, not all sorted elements).
    we thus want to think of a way to manipulate the values in such a way that only requires us to work with the first k elements, reducing our total sample space.

IMPLEMENTATION
    one way we can do this is with a transofmration.
    similarly like the above, O(n) to grab the freq of all the occurences
    then, heapify that into a max heap in O(n) time, using the freq as the key by which we heapify
    with our max heap established, we can then pop the first k elements to get the largest k. each pop worst case will be logn  

MEMORY
    thus, our total runtime is O(n) + O(n) + k * O(logn), which simplifies to O(klogn). this would be faster than our original solution as long as k < n
    our memory would be O(n) for the size of the max heap
'''

'\nCONCEPT \n    we can further improve upon the runtime because we dont need to sort the entire array (rather, only need to return the first k sorted elements, not all sorted elements).\n    we thus want to think of a way to manipulate the values in such a way that only requires us to work with the first k elements, reducing our total sample space.\n\nIMPLEMENTATION\n    one way we can do this is with a transofmration.\n    similarly like the above, O(n) to grab the freq of all the occurences\n    then, heapify that into a max heap in O(n) time, using the freq as the key by which we heapify\n    with our max heap established, we can then pop the first k elements to get the largest k. each pop worst case will be logn  \n\nMEMORY\n    thus, our total runtime is O(n) + O(n) + k * O(logn), which simplifies to O(klogn). this would be faster than our original solution as long as k < n\n    our memory would be O(n) for the size of the max heap\n'

### Fastest Solution Time = O(n) | Space = O(n)

In [9]:
'''
CONCEPT 
    currently, the greatest timesink is klogn, which is involved in the lookup time from a heap
    we can improve on this time with an O(n) lookup time in either a dictionary or in a list from lowest to highest k already
    observe how in our previous iteration, we have already have a freq list (aka a list where the index is the chr and the value is the freq)
    by transforming this list into a new one in which the index is the freq and the values are lists of all the chrs of that freq, we can create a list that is ordered
    from lowest to highest k already. 

IMPLEMENTATION
    quick note: it sounds like we're sorting the freq list by its values, which should take nlogn at the fastest.
    the key to realize here is that in any given list, the range of all the possible freq's of every chr is going to be from 1 to n.
    eg) [1,2,2,4]
    the highest freq possible is going to be 4 (in the case where its [1,1,1,1]) and the lowest is going ot be 1 (in the case where its [1,2,3,4])
    the range of the freq is directly linearly proportional to the length of the list, meaning we can transform our freq list in O(n) by itearting through ti and placing it
    in a predefined index

    in implementation, we transform our given list into a freq dictionary, where (key, value) = (chr, freq)
    next, we can transform this freq dictionary into a "sorted k list", where the max length of this "sorted k list" is deterministically the length of our given list.
    "sorted k list" is where the (index, value) = (freq, sub list of all chrs that have this freq). 

    because the index is our k in order, we start counting from the back of this "sorted k list" to get the top k elements

MEMORY
    freq dictionary = O(n), "sorted k list" = O(n), lookup = O(n) = 3n = O(n)
    memory would be 2*O(n) for the freq dictionary and "sorted k list", which = O(n)
'''

'\nCONCEPT \n    currently, the greatest timesink is klogn, which is involved in the lookup time from a heap\n    we can improve on this time with an O(n) lookup time in either a dictionary or in a list from lowest to highest k already\n    observe how in our previous iteration, we have already have a freq list (aka a list where the index is the chr and the value is the freq)\n    by transforming this list into a new one in which the index is the freq and the values are lists of all the chrs of that freq, we can create a list that is ordered\n    from lowest to highest k already. \n\nIMPLEMENTATION\n    quick note: it sounds like we\'re sorting the freq list by its values, which should take nlogn at the fastest.\n    the key to realize here is that in any given list, the range of all the possible freq\'s of every chr is going to be from 1 to n.\n    eg) [1,2,2,4]\n    the highest freq possible is going to be 4 (in the case where its [1,1,1,1]) and the lowest is going ot be 1 (in the ca

In [69]:
def topKFrequent(nums, k):
    freq_dict = {}
    for int in nums:
        if int in freq_dict.keys():
            freq_dict[int] += 1
        else:
            freq_dict[int] = 1
    
    sorted_k_list = [[] for _ in range(len(nums) + 1)]
    for key, value in freq_dict.items():
        sorted_k_list[value].append(key)
    
    top_k = []
    for i in range(len(sorted_k_list)-1, 0, -1): #iterating starting from back of list
        for j in range(len(sorted_k_list[i])):
            top_k.append(sorted_k_list[i][j])
            if len(top_k) == k:
                return top_k

In [70]:
print(nums1)
topKFrequent(nums1, k1)

[1, 1, 1, 2, 2, 3]


[1, 2]

## Testing

In [71]:
print(f"\
{topKFrequent(nums1, k1)}\
{topKFrequent(nums2, k2)}")

[1, 2][1]


In [None]:
'''
Passed all test cases
'''

'\nPassed all test cases\n'