<a href="https://colab.research.google.com/github/HiranmaiKaredla/ML-Coding/blob/main/ML_Coding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Covariance Matrix

A covariance matrix is a key concept in statistics and data analysis, used to describe the covariance (the measure of how much two random variables vary together) between pairs of variables in a dataset. It is a square matrix where each element represents the covariance between two different variables.

$$
\text{Cov}(X, Y) = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})(Y_i - \bar{Y})
$$

$$
\Sigma =
\begin{bmatrix}
\text{Cov}(X_1, X_1) & \text{Cov}(X_1, X_2) & \dots & \text{Cov}(X_1, X_p) \\
\text{Cov}(X_2, X_1) & \text{Cov}(X_2, X_2) & \dots & \text{Cov}(X_2, X_p) \\
\vdots & \vdots & \ddots & \vdots \\
\text{Cov}(X_p, X_1) & \text{Cov}(X_p, X_2) & \dots & \text{Cov}(X_p, X_p)
\end{bmatrix}
$$
\
$$
\Sigma =
\begin{bmatrix}
\text{Var}(X) & \text{Cov}(X, Y) \\
\text{Cov}(X, Y) & \text{Var}(Y)
\end{bmatrix}
$$




In [1]:
def calculate_covariance_matrix(vectors: list[list[float]]) -> list[list[float]]:
	n_features = len(vectors)
	n_observations = len(vectors[0])
	covariance_matrix = [[0]*n_features for _ in range(n_features)]

	means = [sum(feature)/n_observations for feature in vectors]
	for i in range(n_features):
		for j in range(i, n_features):
			covariance = sum((vectors[i][k]-means[i]) * (vectors[j][k]- means[j])
			for k in range(n_observations)) / (n_observations -1)
			covariance_matrix[i][j] = covariance_matrix[j][i] = covariance

	return covariance_matrix

## K Means Clustering
1. **Initialization**
Use the provided initial_centroids as your starting point. This step is already done for you in the input.

2. **Assignment Step**
For each point in your dataset:

Calculate its distance to each centroid (Hint: use Euclidean distance.)
Assign the point to the cluster of the nearest centroid
Hint: Consider creating a helper function to calculate Euclidean distance between two points.

3. **Update Step**
For each cluster:

Calculate the mean of all points assigned to the cluster
Update the centroid to this new mean position
Hint: Be careful with potential empty clusters. Decide how you'll handle them (e.g., keep the previous centroid).

4. **Iteration**
Repeat steps 2 and 3 until either:

The centroids no longer change significantly (this case does not need to be included in your solution), or
You reach the max_iterations limit
Hint: You might want to keep track of the previous centroids to check for significant changes.

5. **Result**
Return the list of final centroids, ensuring each coordinate is rounded to the nearest fourth decimal.

