In this activity we will be looking at how the different distance measure diff.

Here's a short summary of all of the distance measures:

**Euclidean** Given two points in the form $(x,y)$ we can compute
$\sqrt{(x_1 -x_2)^2+(y_1-y_2)^2)}$

---

**Manhattan:** Distance between two points is the sum of the absolute differences in their Cartesian coordinates.
![alt text](https://upload.wikimedia.org/wikipedia/commons/thumb/0/08/Manhattan_distance.svg/283px-Manhattan_distance.svg.png)

---

**Chebyshev:** The distance between two vectors is the greatest of their differences along any coordinate dimension.
![](https://www.kdnuggets.com/wp-content/uploads/chebyshev-distance-chessboard.jpg)

---

**Canberra**: Used for comparing ranked lists and for intrusion detection in computer security.
$d({\bf p},{\bf q})=\sum^n_{i=1}{\frac{|p_i-q_i|}{|p_i|+|q_i|}}$

where ${\bf p}$ and ${\bf q}$ are vectors.

---

**Cosine**: Similarity between two non-zero vectorss of an inner product space that measures the cosine of the angle between them.

 * cosine 0$^\circ$ is 1.
 * It is a judgement of orientation and not magnitude.



In [0]:
import scipy.spatial.distance as dist
import numpy as np

Begin the activity by initializing two random vectors with 10 dimensions (use np.random).


Next using the `dist` module apply each of the distance metrics discussed above.

In [0]:

# Prepare 2 vectors (data points) of 10 dimensions.
# Let data range randomly from 0 to 10
A = np.random.uniform(low=0, high=10,size=(1,10))
B = np.random.uniform(low=0, high=10,size=(1,10))

# Generate a matrix of size 1000x10
# Let data range randomly from 0 to 10
z = np.random.uniform(low=0, high=10,size=(1000,10))

# compute the covariance of the transpose for matrix above
covariance = np.cov(z.T)
# finally compute the inverse
inverse = np.linalg.inv(covariance)

print('\n2 10-dimensional vectors')
print('------------------------')
print(A)
print(B)

# Perform distance measurements 
print( '\nDistance measurements with 10-dimensional vectors')
print( '-------------------------------------------------')
# Replace INSERT CODE HERE 
print( '\nEuclidean distance is',dist.euclidean(A, B))
print( 'Manhattan distance is',  dist.cityblock(A, B))
print( 'Chebyshev distance is',  dist.chebyshev(A, B))
print( 'Canberra distance is',   dist.canberra(A, B))
print( 'Cosine distance is',     dist.cosine(A, B))
print( 'Mahalanobis distance is',dist.mahalanobis(A, B, inverse))

# Prepare 2 vectors (data points) of 100 dimensions.
# Let data range randomly from 0 to 10
C = np.random.uniform(low=0, high=10,size=(1,100))
D = np.random.uniform(low=0, high=10,size=(1,100))

# Generate a matrix of size 1000x100
# Let data range randomly from 0 to 10
E = np.random.uniform(low=0, high=10,size=(1000,100))

# compute the covariance of the transpose for matrix above
cov_E = np.cov(E.T)
# finally compute the inverse
inv_E = np.linalg.inv(cov_E)

# Perform distance measurements
print('\nDistance measurements with 100-dimensional vectors')
print('--------------------------------------------------')
print( '\nEuclidean distance is',dist.euclidean(C, D))
print( 'Manhattan distance is',  dist.cityblock(C, D))
print( 'Chebyshev distance is',  dist.chebyshev(C, D))
print( 'Canberra distance is',   dist.canberra(C, D))
print( 'Cosine distance is',     dist.cosine(C, D))
print( 'Mahalanobis distance is',dist.mahalanobis(C, D, inv_E))



2 10-dimensional vectors
------------------------
[[8.66384712 4.72726184 1.54265761 6.31285923 5.48103724 0.43744585
  0.99102183 1.11605574 6.31051381 5.58001585]]
[[1.66403905 3.8734393  4.64248601 4.89104562 2.89919603 7.49578772
  2.50847024 4.51447111 5.23018206 1.72459163]]

Distance measurements with 10-dimensional vectors
-------------------------------------------------

Euclidean distance is 12.154266717673437
Manhattan distance is 31.867075464986165
Chebyshev distance is 7.058341870658279
Canberra distance is 4.261520572716161
Cosine distance is 0.3390871181008167
Mahalanobis distance is 4.197582980391878

Distance measurements with 100-dimensional vectors
--------------------------------------------------

Euclidean distance is 41.41668286558069
Manhattan distance is 334.0679107500733
Chebyshev distance is 8.673711062597544
Canberra distance is 35.697287385535304
Cosine distance is 0.2422302746405105
Mahalanobis distance is 15.363936735974413


Similarly, as above but this time prepare two vectors of 100 dimensions.

1. Write a short statment why is cosine the smallest distance?

2.  Does anything change as the number of dimension increases? If so what distance is the most affected.


**1. Write a short statment why is cosine the smallest distance?**

Cosine similarity measures the angle between two vectros regardless the magnitude, if the cosine similarity is close to 1, the angle is very small which means there is a good match between two vectors. In our case the cosine similarity is very less because the two vectors are random.

**2.  Does anything change as the number of dimension increases? If so what distance is the most affected.**

Yes,if we observe the calculated distances, the distance increases as the number of dimension increases.Out of all other distance measures, Manhattan distance is affected the most because as mentioned in the definition the distance between two vectors is calculated by the sum of distance along each  dimension. So dimension plays a significant role in Chebyshev distance