# Similarity Matrix
Similarity matrix merupakan sebuah *method* yang digunakan untuk menciptakan sebuah matriks 2 dimensi yang nantinya akan diisi dengan nilai kemiripan kosinus (cosine similarity) yang didapatkan dari *method* `sentence_similarity`. 

```python
def __similarity_matrix(self):
        similarity_matrix = np.zeros((len(self.__word), len(self.__word)))
        for index1 in range(len(self.__word)):
            for index2 in range(len(self.__word)):
                if index1 == index2:
                    continue
                similarity_matrix[index1][index2] = self.__sentence_similarity(self.__word[index1], self.__word[index2])
        return similarity_matrix
```

Hal pertama yang perlu dilakukan adalah menginisialisasikan sebuah matriks 2 dimensi dengan panjang `self.__word` x `self.__word`. 
```python
similarity_matrix = np.zeros((len(self.__word), len(self.__word)))
```
Keluaran dari potongan kode diatas dapat dilihat dibawah ini.

In [8]:
import numpy as np
word = "For oil spots on the floor, nothing beats parking a motorbike in the lounge. Tom got a small piece of pie. The hand sanitizer was actually clear glue".split()
similarity_matrix = np.zeros((len(word), len(word)))
print(similarity_matrix)

[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
  0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0

Hal selanjutnya yang perlu dilakukan adalah menghitung similarity dari pasangan kata dan menyimpannya kedalam matriks `similarity_matrix`.
```python
for index1 in range(len(self.__word)):
    for index2 in range(len(self.__word)):
        if index1 == index2:
            continue
        similarity_matrix[index1][index2] = self.__sentence_similarity(self.__word[index1], self.__word[index2])
```
> Perhatikan bahwa terdapat pengecekan kondisi `if index1 == index2: continue`, hal tersebut ditujukan agar program tidak mengecek tingkat kemiripan antara sebuah kata dengan dirinya sendiri.

In [13]:
## already explained in another notebook.
from nltk.cluster.util import cosine_distance
def sentence_similarity(sentence1, sentence2):
        sentence1 = [word for word in sentence1]
        sentence2 = [word for word in sentence2]
        all_words = list(set(sentence1+sentence2))
        vector1 = [0] * len(all_words)
        vector2 = [0] * len(all_words)
        for w in sentence1:
            vector1[all_words.index(w)] += 1
        for w in sentence2:
            vector2[all_words.index(w)] += 1
        return 1 - cosine_distance(vector1, vector2)

## will be focusing on this part of the code.
for index1 in range(len(word)):
    for index2 in range(len(word)):
        if index1 == index2:
            continue
        similarity_matrix[index1][index2] = sentence_similarity(word[index1], word[index2])

Hal terakhir yang perlu dilakukan adalah mengembalikan (return) nilai dari `similarity_matrix`.
```python 
return similarity_matrix
```
Hasil keluaran `similarity_matrix` akan digunakan pada kalkulasi selanjutnya, yaitu pembuatan sebuah grafik sebagai basis dari algoritma TextRank.

In [14]:
print(similarity_matrix)

[[0.         0.33333333 0.21821789 0.40824829 0.         0.61237244
  0.19245009 0.         0.21821789 0.         0.52223297 0.
  0.         0.21821789 0.33333333 0.33333333 0.         0.
  0.         0.40824829 0.         0.         0.         0.17407766
  0.         0.         0.25819889 0.        ]
 [0.33333333 0.         0.21821789 0.40824829 0.         0.61237244
  0.38490018 0.         0.21821789 0.         0.52223297 0.40824829
  0.         0.43643578 0.33333333 0.33333333 0.         0.43643578
  0.21821789 0.40824829 0.28867513 0.         0.         0.34815531
  0.         0.33333333 0.25819889 0.28867513]
 [0.21821789 0.21821789 0.         0.26726124 0.21821789 0.26726124
  0.25197632 0.50709255 0.14285714 0.         0.34188173 0.
  0.21821789 0.14285714 0.21821789 0.43643578 0.         0.28571429
  0.14285714 0.26726124 0.18898224 0.         0.         0.34188173
  0.43643578 0.10910895 0.         0.        ]
 [0.40824829 0.40824829 0.26726124 0.         0.         0.5
  0.70