# Building a Smart Data Aggregator - Assignment 2

## Part 1: User Data Processing with Lists

### Write a function to:
- Filter out users older than 30 from specific countries (‘USA’, ‘Canada’).

In [1]:
def filter_users(data):
    country_to_check = ("usa", "canada")
    out = []
    for user in data:
        if int(user[2]) > 30 and user[3].lower() in country_to_check:
            out.append(user.copy())
    return out

- Extract their names into a new list.

In [2]:
def extract_names(data):
    filtered_users = filter_users(data)
    names = []
    for user in filtered_users:
        names.append(str(user[1]))
    return names

### Implement a function that:
- Sorts the original list of tuples by age and returns the top 10 oldest users.

In [3]:
def by_age(row):
    return int(row[2])
def sort_by_age(data):
    data.sort(key=by_age)
    return data

- checks if there are any users with duplicate names in the list. If duplicates are
found, output those names.

In [4]:
def check_duplicates(data):
    check = set()
    out = []
    for user in data:
        name = user[1].lower()
        if name not in check:
            check.add(name)
        else:
            out.append(name)
    return list(set(out))

## Part 2: Immutable Data Management with Tuples

### Write a function that:

- Takes a list of transactions (tuples) and finds the total number of unique users
involved in transactions.


- Ensures the integrity of the tuples by avoiding any changes to the original data.

In [5]:
def get_unique_users(data):
    check = set()
    unique_list = []
    for user in data:
        if user[1] not in check:
            check.add(user[0])
            unique_list.append(user)
    return unique_list

### Implement a function that:
- Identifies and returns the transaction with the highest amount without altering the
list of tuples.

In [6]:
def get_highest_amount(data):
    amount = float('-inf')
    my_tup = None
    for transaction in data:
        if int(transaction[2]) > amount:
            amount = int(transaction[2])
            my_tup = transaction
    return my_tup

- receives a list of tuples and returns two separate lists: one containing all the
transaction_ids and the other containing all user_ids. What challenges might
arise if the tuple size is inconsistent?

>> if there are some tuples with inconsistent size then the function that I have written below will not work correctly infact it may give index out of range error. It's also possible that it would not give any error, leading to incorrect results and also incorrect output. (We must know the indexes of transaction_ids and user_ids indexes in the tuples and data must be present at these indexes)

In [7]:
def get_ids(data):
    transaction_ids = set()
    user_ids = set()
    for user in data:
        transaction_ids.add(user[0])
        user_ids.add(user[1])
    return [list(transaction_ids), list(user_ids)]

## Part 3: Unique Data Handling with Sets
### Write a function that:
- Finds the users who visited both Page A and Page B.

In [8]:
def common_users(page_a, page_b):
    out = page_a.intersection(page_b)
    return out

- Finds users who visited either Page A or Page C, but not both.

In [9]:
def unique_users(page_a, page_c):
    out = page_a.symmetric_difference(page_c)
    return out

### Implement a function that:
- Updates the set for Page A with new user IDs.

In [10]:
def update_a(page_a, new_users):
    # page_a.update(new_users) or 
    for user in new_users:
        page_a.add(user)

- Removes a list of user IDs from the set for Page B.

In [11]:
def remove_from_b(page_b, user_ids):
    # page_a.difference_update(new_users) or 
    for user in user_ids:
        page_b.remove(user)

## Part 4: Data Aggregation with Dictionaries
### Write a function that:
- Filters out users who rated 4 or higher and stores their user_id and rating in a
new dictionary.

In [12]:
def filter_users(data):
    out = {user_id: feedback['rating'] for user_id, feedback in data.items() if feedback['rating'] >= 4}
    return out

- sort the dictionary of user feedback by rating in descending order and return the
top 5 users.

In [13]:
def top5_users(data):
    out_filtered = filter_users(data)
    out = dict(sorted(out_filtered.items(), key=lambda item: item[1], reverse=True)[:5])
    return out

### Implement a function that:
- Combines feedback from multiple dictionaries. If a user is present in more than
one dictionary, update their rating to the highest one and append their comments.

In [14]:
def update_feedback(out1):
    new_feedback = {}
    for feedback in out1:
        for user_id, comments in feedback.items():
            if user_id in new_feedback:
                new_feedback[user_id]['rating'] = max(new_feedback[user_id]['rating'], comments['rating'])
                new_feedback[user_id]['comments'].append(comments['comments'])
            else:
                new_feedback[user_id] = {
                    'rating': comments['rating'],
                    'comments': [comments['comments']]
                }
    return new_feedback

- Use dictionary comprehension to create a dictionary of user_id and rating for all
users whose rating is greater than 3.

In [15]:
def rating_above_3(out2):
    out = {user_id: comments['rating'] for user_id, comments in out2.items() if comments['rating'] > 3}
    return out