### K-Means

Use the k-means algorithm to return `k` means (or centroid) of the input features.

These features are result of dimensionality reduction technique by PCA on user app data interaction. You will have access to `USER_FEATURE_MAP` dictionary, mapping each user `user_id` to a respective list of 4 features associated with the user in question.

Below is an example portion of the `USER_FEATURE_MAP`

Note that:

1. The initial centroid locations are selected by you to ensure consistency when verifying solutions.
2. You should execute at least 10 iterations of the k-means algorithm , not including the initialization of the centroids
3. You should use manhattan distance as the distance metric

In [1]:
# Pseudo Code
'''
1. init centroids
2. for n in num_iterations
    2.1 for user in user_data
        2.1.1 for c in centroids
            2.1.1.1 distance = calculate_manhattan_distance(user, c)
            2.1.1.2 Assign user to centroid with minimum_distance
        2.1.2 for c in centroids:
            2.1.2.1 update centroids using average
return centroids

'''

'\n1. init centroids\n2. for n in num_iterations\n    2.1 for user in user_data\n        2.1.1 for c in centroids\n            2.1.1.1 distance = calculate_manhattan_distance(user, c)\n            2.1.1.2 Assign user to centroid with minimum_distance\n        2.1.2 for c in centroids:\n            2.1.2.1 update centroids using average\nreturn centroids\n\n'

In [None]:
import random
import math
class Centroid:
    def __init__(self, location):
        self.location = location
        self.closest_users = set()

NUM_FEATURES_PER_USER = 4

def get_k_means(user_feature_map, num_features_per_user, k):
    # Don't change the following two lines of code.
    random.seed(42)
    # Gets the inital users, to be used as centroids.
    inital_centroid_users = random.sample(sorted(list(user_feature_map.keys())), k)

    centroids = [Centroid(user_feature_map[inital_centroid_users]) for inital_centroid_users in inital_centroid_users]
    for _ in range(10):
        for uid, features in user_feature_map.items():
            closest_centroid_distance = float("inf")
            closest_centroid = None
            for centroid in centroids:
                features_to_centroid_distance = get_manhattan_distance(features, centroid.location)
                if features_to_centroid_distance < closest_centroid_distance:
                    closest_centroid_distance = features_to_centroid_distance
                    closest_centroid = centroid
            closest_centroid.closest_users.add(uid)

        for centroid in centroids:
            centroid.location = get_centroid_average(centroid, user_feature_map)
            centroid.closest_users.clear()
    return [centroid.location for centroid in centroids]

def get_centroid_average(centroid, user_feature_map):
    centroid_average = [0] * NUM_FEATURES_PER_USER
    for i in range(NUM_FEATURES_PER_USER):
        for user in centroid.closest_users:
            centroid_average[i] = centroid_average[i] + user_feature_map[user][i]
    return [centroid_dimension / float(len(centroid.closest_users)) for centroid_dimension in centroid_average]
            
def get_manhattan_distance(features, other_features):
    absolute_differences = []
    for i in range(len(features)):
        absolute_differences.append(abs(features[i] - other_features[i]))
    return sum(absolute_differences)