### System Design Thinking 

<img width="823" alt="System Design Thinking" src="https://github.com/user-attachments/assets/49af07fb-992a-4d2e-bf4d-b3ab9e7a872f">

### Find the top K-most popular videos on YouTube at the moment:

**First we write down:** - **Funtional Requirements**

We want our request to return list of top-k heavy hitters:

* topk(k, startTime, endTime)


**Second, we discuss the:** - **Non-Funcational Requirements**

We should consider these things:

* Scalable (Scales out together with increasing amiunt of data: video, tweets, posts etc.)

* High Available (survives hardware/network failure, no single point of failure)

* High Performance (few  tens of milliseconds to return top 100 list)

* Accurate (as accurate as we can get)

#### Hash Table, Single Host Solution

**Top k algorithm implementation**: **Single Host Solution**

In [None]:
public list < HeavyHitter > topK(String[], events, int k) {
    Map < String, Integer > frequencyTable = new HashMap < String, Integer > ();
    for (String event : events) {
        frequencyTable.put(event, frequencyTable.getOrDefault(event, 0) + 1);
    }
    PriorityQueue < HeavyHitter > heap = 
    new PriorityQueue < HeavyHitter > (Comparator.comparing(e -> e.getFrequency()));

    for (Map.Entry < String, Integer > entry : frequencyTable.entrySet()) {
        heap.offer(new HeavyHitter(entry.getKey(), entry.getValue()));

        if (heap.size() > k) {
            heap.pop();
        }
    }
    List < HeavyHitter > result = new ArrayList < HeavyHitter > ();
    while (heap.size() > 0) {
        result.add(heap.poll());
    }
    return result; 
}

**Python Implementation of top-k algorithm**

In [6]:
import heapq
from collections import defaultdict

class HeavyHitters:
    def __init__(self, event, freqency):
        self.event = event 
        self.frequency = freqency

    def __lt__(self, other):
        return self.frequency < other.frequency 

    def get_event(self):
        return self.event 

    def get_frequency(self):
        return self.frequency

    def __repr__(self):
        return f"({self.event}, {self.frequency})"

def topK(events, k):
    #Step 1: Build frequency table 
    frequency_table = defaultdict(int)

    for event in events:
        frequency_table[event] += 1

    #Step 2: Use a min-heap to store top k heavy hitters 
    heap = []

    for event, frequency in frequency_table.items():
        heavy_hitters = HeavyHitters(event, frequency)
        heapq.heappush(heap, heavy_hitters)

        #If the heap exceeds size k, remove the smallest elements
        if len(heap) > k:
            heapq.heappop(heap)

    # Step 3: Extract the top k heavy hitters from the heap 
    result = []
    while heap:
        result.append(heapq.heappop(heap))

    #Since we ued a min-heap, the results are in ascending order, we reverse it 
    result.reverse()

    return result

#Test Example 
if __name__ == '__main__':
    events = ["apple", "banana", "apple", "orange", "banana", "banana", "grape"]
    k = 2
    result = topK(events, k)
    print(result)


[(banana, 3), (apple, 2)]


**Explanation**

1) Step 1: A frequency table `frequency_table` is built using the `defaultdict` to store the occurrence of each event.

2) Step 2: A min-heap `heap` is used to keep track of top `k` events. This is done by pushing each event into the heap and popping the smallest (least frequent) wvent when the heap exceeds size k.

3) Step 3: The to `k` heavy hitters are extracted from the heap and stored in the result list. Since the heap gives the smallest element first, we reverse the list to the events in descending order of frequency.

**Overall Time & Space Complexity**

- **Time Comp;exity:** $O(n + mlog k) = O(nlogk)$

- **Space Complexity:** $O(m + k) = O(n + k)$

**Single Host Solution** was easy to build but the problem with this solution is that it is not scalable.

### Hash Table, Multiple Hosts