You are given a CSV file containing Instagram post data with the following columns: username, post_id, likes, comments, and shares. Compute an engagement score for each post using the formula: engagementScore = (likes^1.2 + comments^1.3 + shares^1.4) / 3 Then, determine the top 10 posts with the highest scores and output these posts by displaying their post_id along with the computed engagement score, sorted in descending order.


 Clarifying Questions to Ask
1. Are all fields guaranteed to be present and numeric?
2. What to do with malformed lines or missing fields?
3. Should we round the score or show full precision?


my initial approach would be I would read all rows, compute scores for each, store as a list of (score, post_id), sort it descending, and return the top 10."
Time: O(n log n) for sorting all posts
Space: O(n)

however i can make it more efficient using a heap ,I'd use a min-heap of size 10. As I read each row, I compute the score and push it into the heap. If heap size exceeds 10, I pop the smallest one. At the end, I sort the heap to get the top scores in descending order.
time : O(nlogk)  n = total number of posts (lines in the CSV) k = size of the heap (in this case, 10)
space: O(k)

Approach:

1. For each line in the CSV, I parse the likes, comments, and shares, and compute the engagement score using the given formula.
2.I push a tuple of (score, post_id) into a min-heap.
3. If the heap exceeds size 10, I pop the smallest score.
4.This way, the heap always contains the top 10 highest-scoring posts, efficiently maintained in O(log k) per insert.
5. At the end, I sort the heap in descending order and print the top post IDs with their scores

In [None]:
import csv
import heapq  # Min-heap for top K
import math   # For power calculations

# Function to calculate engagement score
def compute_score(likes, comments, shares):
    return (likes**1.2 + comments**1.3 + shares**1.4) / 3

# Min-heap to track top 10 posts (smallest score at root)
top_posts = []  # Each element will be (score, post_id)

# Step 1: Open the CSV file
with open('instagram_posts.csv', newline='') as f:
    reader = csv.DictReader(f)

    for row in reader:
        try:
            # Step 2: Extract post fields
            post_id = row['post_id']
            likes = float(row['likes'])
            comments = float(row['comments'])
            shares = float(row['shares'])

            # Step 3: Validate non-negative values
            if likes < 0 or comments < 0 or shares < 0:
                continue  # skip invalid data

            # Step 4: Compute engagement score
            score = compute_score(likes, comments, shares)

            # Step 5: Push to heap
            heapq.heappush(top_posts, (score, post_id))

            # Step 6: Keep heap size at most 10
            if len(top_posts) > 10:
                heapq.heappop(top_posts)

        except (KeyError, ValueError):
            # Handles missing fields or non-numeric data
            continue

# Step 7: Sort top 10 posts in descending order of score
top_posts.sort(reverse=True)

# Step 8: Display results
print("Top 10 Posts by Engagement Score:")
for score, post_id in top_posts:
    print(f"{post_id}: {score:.4f}")
