# Vector Applications

## Correlation

#### Correlation is one of the most fundamental and important analysis methods in statistics and machine learning. A correlation coefficient is a single number that quantifies the linear relationship between two variables. Correlation coefficients range from −1 to +1, with −1 indicating a perfect negative relationship, +1 a perfect positive relationships, and 0 indicating no linear relationship. 

## Mean center each variable
#### Mean centering means to subtract the average value from each data value.

## Divide the dot product by the product of the vector norms

#### This divisive normalization cancels the measurement units and scales the maximum possible correlation magnitude to |1|.

## Cosine Similarity
#### Correlation is not the only way to assess similarity between two variables. Another method is called cosine similarity. The formula for cosine similarity is simply the geometric formula for the dot product.

- dot(X, y) / ||x|| ||y||



## k-Means Clustering
#### k-means clustering is an unsupervised method of classifying multivariate data into a relatively small number of groups, or categories, based on minimizing distance to the group center.

## Exercise 4-1.
#### Write a Python function that takes two vectors as input and provides two numbers as output: the Pearson correlation coefficient and the cosine similarity value. Write code that follows the formulas presented in this chapter; don’t simply call np.corrcoef and spatial.distance.cosine. Check that the two output values are identical when the variables are already mean centered and different when the variables are not mean centered.



In [1]:
import numpy as np

def pearson_cosine(x,y):
  
  mean_x, mean_y = np.mean(x), np.mean(y)
  
  centered_x = x- mean_x
  centered_y = y - mean_y
  
  numerator = np.sum(centered_x * centered_y)
  denominator = np.sqrt(np.sum(centered_x**2) * np.sum(centered_y**2))
  pearson_correlation = numerator / denominator
  
  dot_product = np.dot(x, y)
  mag_x, mag_y = np.linalg.norm(x), np.linalg.norm(y)
  
  cosine_similarity = dot_product/ (mag_x * mag_y)
  
  return pearson_correlation, cosine_similarity

x = np.array([1,2,3,4,5])
y = np.array([5,4,3,2,1])

pearson, cosine = pearson_cosine(x,y)

print(f"Pearson Correlation Coefficient: {pearson}")
print(f"Cosine Similarity: {cosine}")

Pearson Correlation Coefficient: -1.0
Cosine Similarity: 0.6363636363636364
