# Ranking
Ranking merupakan sebubah *method* yang digunakan untuk mendapatkan nilai peringkat kepentingan suatu kalimat berdasarkan banyaknya frekuensi kata yang terdapat dalam kalimat tersebut. 

Pada kasus ini, kami menggunakan algoritma TextRank, dimana algoritma tersebut merupakan sebuah modifikasi dari algoritma PageRank yang digunakan oleh Google untuk mendapatkan peringkat dari sebuah halaman web dalam internet. 

Berikut merupakan *method* yang akan dijelaskan saat ini.

```python
def __ranking(self):
        similarity_graph = nx.from_numpy_array(self.__similarity_matrix())
        score = nx.pagerank(similarity_graph)
        ranked_sentence = sorted(((score[i],s) for i,s in enumerate(self.__raw_sentence)), reverse=True)
        self.__ranked_sentence = ranked_sentence
```

Hal pertama yang perlu dilakukan dalam melakukan pencarian peringkat kalimat menggunakan TextRank adalah menciptakan sebuah **graf** yang menggunakan array Numpy yang diciptakan pada method `similarity_matrix`. Nilai pada *numpy array* akan dijadikan sebuah graf yang menggunakan **adjacency matrix**.

```python
similarity_graph = nx.from_numpy_array(self.__similarity_matrix())
```

In [23]:
## Already explained in previous notebooks
from nltk.cluster.util import cosine_distance
def sentence_similarity(sentence1, sentence2):
        sentence1 = [word for word in sentence1]
        sentence2 = [word for word in sentence2]
        all_words = list(set(sentence1+sentence2))
        vector1 = [0] * len(all_words)
        vector2 = [0] * len(all_words)
        for w in sentence1:
            vector1[all_words.index(w)] += 1
        for w in sentence2:
            vector2[all_words.index(w)] += 1
        return 1 - cosine_distance(vector1, vector2)

import numpy as np
def similarity_matrix(sentences):
    similarity_matrix = np.zeros((len(sentences), len(sentences)))
    for index1 in range(len(sentences)):
        for index2 in range(len(sentences)):
            if index1 == index2:
                continue
            similarity_matrix[index1][index2] = sentence_similarity(sentences[index1], sentences[index2])
    return similarity_matrix

In [22]:
import networkx as nx
sentences = "For oil spots on the floor, nothing beats parking a motorbike in the lounge. Tom got a small piece of pie. The hand sanitizer was actually clear glue. Here's the thing. She doesn't have anything to prove, but she is going to anyway. That's just her character. She knows she doesn't have to.".split()
similarity_graph = nx.from_numpy_array(similarity_matrix(sentences))
print(similarity_graph)

Graph with 53 nodes and 879 edges


Setelah menciptakan sebuah graf, hal selanjutnya yang perlu dilakukan adalah menggunakan algoritma PageRank untuk mencari kalimat yang paling penting dalam beberapa paragraf. 

```python
score = nx.pagerank(similarity_graph)
```

In [24]:
import scipy
score = nx.pagerank(similarity_graph)
print(score)

{0: 0.013264073638618985, 1: 0.01687341787174773, 2: 0.019685151847184654, 3: 0.017708810676597773, 4: 0.024363778769306246, 5: 0.016065544099709398, 6: 0.02432414127216918, 7: 0.02453289268025973, 8: 0.017874723027619038, 9: 0.01316778655753733, 10: 0.024218163126060236, 11: 0.013155132594049556, 12: 0.024363778769306246, 13: 0.022522777153703825, 14: 0.011740660788535306, 15: 0.019805019185185308, 16: 0.01316778655753733, 17: 0.013801861852264563, 18: 0.01814439012181301, 19: 0.01236659619749277, 20: 0.016511717013318543, 21: 0.018188921845412737, 22: 0.018204609595964823, 23: 0.02568793902942171, 24: 0.014664674449160874, 25: 0.01685844410459391, 26: 0.019560212352432046, 27: 0.016017272317714895, 28: 0.018759975973864678, 29: 0.024363778769306246, 30: 0.021639950016478834, 31: 0.01780494350431358, 32: 0.025885529748114963, 33: 0.021725746128206477, 34: 0.022984339343451652, 35: 0.020473806778946002, 36: 0.018285543969982766, 37: 0.011584719295154138, 38: 0.022303484610354002, 39: 0

Setelah mendapatkan skor dari masing-masing kalimat, hal selanjutnya yang perlu dilakukan adalah melakukan sorting terhadap nilai relevansi kalimat terhadap paragraf yang diberikan. Karena kita menginginkan kalimat paling relevan (memiliki nilai paling tinggi), maka perlu menambahkan argumen `reverse=True` saat melakukan sorting menggunakan fungsi `sorted()`.

```python
ranked_sentence = sorted(((score[i],s) for i,s in enumerate(self.__raw_sentence)), reverse=True)
```

In [30]:
ranked_sentence = sorted (((score[i], s) for i, s in enumerate(sentences)), reverse = True)

Setelah mendapatkan kalimat yang paling relevan dalam suatu paragraf, maka hal selanjutnya yang perlu dilakukan adalah menampilkan kalimat-kalimat tersebut. Pada kasus ini, kami akan hanya menampilkan kata paling relevan, hal tersebut dikarenakan paragraf yang diberikan berukuran kecil.

In [31]:
print(ranked_sentence)

[(0.025885529748114963, "doesn't"), (0.02588552974811496, "doesn't"), (0.02568793902942171, 'sanitizer'), (0.02453289268025973, 'beats'), (0.024363778769306246, 'the'), (0.024363778769306246, 'the'), (0.024363778769306246, 'the'), (0.02432414127216918, 'nothing'), (0.024218163126060236, 'motorbike'), (0.022984339343451652, 'anything'), (0.022522777153703825, 'lounge.'), (0.022303484610354002, 'she'), (0.022303484610354002, 'she'), (0.022241346306516852, 'character.'), (0.021725746128206477, 'have'), (0.021725746128206477, 'have'), (0.021639950016478834, 'thing.'), (0.02155706929209219, "That's"), (0.020473806778946002, 'to'), (0.020473806778946, 'to'), (0.020246922748182402, 'her'), (0.019805019185185308, 'got'), (0.019685151847184654, 'spots'), (0.019560212352432046, 'clear'), (0.019235373657329368, 'to.'), (0.018759975973864678, "Here's"), (0.018285543969982766, 'prove,'), (0.018204609595964823, 'hand'), (0.018188921845412737, 'The'), (0.01814439012181301, 'piece'), (0.01787472302761