##### 1. A set of one-dimensional data points is given to you: 5, 10, 15, 20, 25, 30, 35. Assume that k = 2 and that the first set of random centroid is 15, 32, and that the second set is 12, 30. ?
1. Using the k-means method, create two clusters for each set of centroid described above.
2. For each set of centroid values, calculate the SSE.

In [1]:
import numpy as np

def k_means_clustering(data, k, centroids):
    clusters = [[] for _ in range(k)]
    
    # Assign data points to clusters based on nearest centroid
    for point in data:
        distances = [abs(point - centroid) for centroid in centroids]
        nearest_centroid_idx = np.argmin(distances)
        clusters[nearest_centroid_idx].append(point)
    
    # Update centroids based on cluster mean
    new_centroids = [np.mean(cluster) for cluster in clusters]
    
    return clusters, new_centroids

def calculate_sse(data, clusters, centroids):
    sse = 0
    for i in range(len(clusters)):
        cluster = clusters[i]
        centroid = centroids[i]
        sse += sum([(point - centroid) ** 2 for point in cluster])
    
    return sse

data = [5, 10, 15, 20, 25, 30, 35]
k = 2
centroid_set1 = [15, 32]
centroid_set2 = [12, 30]

clusters1, new_centroids1 = k_means_clustering(data, k, centroid_set1)
sse1 = calculate_sse(data, clusters1, new_centroids1)
print("Set 1 Clusters:", clusters1)
print("Set 1 SSE:", sse1)

clusters2, new_centroids2 = k_means_clustering(data, k, centroid_set2)
sse2 = calculate_sse(data, clusters2, new_centroids2)
print("Set 2 Clusters:", clusters2)
print("Set 2 SSE:", sse2)


Set 1 Clusters: [[5, 10, 15, 20], [25, 30, 35]]
Set 1 SSE: 175.0
Set 2 Clusters: [[5, 10, 15, 20], [25, 30, 35]]
Set 2 SSE: 175.0


##### 2. Describe how the Market Basket Research makes use of association analysis concepts ?

Market Basket Research uses association analysis to find relationships between items purchased together to understand customer behavior and make strategic business decisions.

##### 3. Give an example of the Apriori algorithm for learning association rules ?

Example of Apriori algorithm: If {A, B} → {C} is a frequent itemset, it means that when customers buy both A and B, they are likely to buy C as well.

##### 4. In hierarchical clustering, how is the distance between clusters measured? Explain how this metric is used to decide when to end the iteration ?

In hierarchical clustering, the distance between clusters is measured using methods such as Euclidean distance or Manhattan distance. The metric is used to decide when to end the iteration by either reaching a desired number of clusters or a specific threshold of distance.

##### 5. In the k-means algorithm, how do you recompute the cluster centroids ?

In the k-means algorithm, cluster centroids are recomputed by taking the average of all data points assigned to each cluster, resulting in new centroid positions.

##### 6. At the start of the clustering exercise, discuss one method for determining the required number of clusters ?

One method for determining the required number of clusters is the Elbow Method, which plots the SSE against the number of clusters and looks for a point where the decrease in SSE becomes less significant (forming an "elbow").

##### 7. Discuss the k-means algorithm's advantages and disadvantages ?

Advantages of k-means algorithm: Simple to implement, efficient for large datasets. Disadvantages: Sensitive to initial centroid selection, requires predefined number of clusters, susceptible to outliers.

##### 9. During your study, you discovered seven findings, which are listed in the data points below. Using the K-means algorithm, you want to build three clusters from these observations. The clusters C1, C2, and C3 have the following findings after the first iteration ?
- `C1: (2,2), (4,4), (6,6); C2: (2,2), (4,4), (6,6); C3: (2,2), (4,4),  `
- `C2: (0,4), (4,0), (0,4), (0,4), (0,4), (0,4), (0,4), (0,4), (0,  `
- `C3: (5,5) and (9,9) ` 

What would the cluster centroids be if you were to run a second iteration? What would this clustering's SSE be?

Second iteration cluster centroids: C1: (4, 4), C2: (0, 4), C3: (7, 7)
SSE for this clustering: 14