[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MaxMitre/DeepLearning/blob/main/Semana01_Textrank.ipynb)

## Implementación de TextRank para la obtención de resúmenes

En este Notebook se implementará TextRank para obtener un resumen con las oraciones clave de todo un texto.

# Dependencias

In [None]:
%%capture
!pip install wikipedia git+https://github.com/neuml/txtai#egg=txtai[pipeline]

In [None]:
# PUEDE ser necesario utilizar una versión anterior de pillow
#!pip install Pillow==9.0.0

In [None]:
import re

import pandas as pd
import numpy as np
import scipy.linalg as splinalg

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer

import wikipedia

from txtai.pipeline import Translation

In [None]:
nltk.download("punkt")
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

In [None]:
# Radicalizador
stemmer = PorterStemmer()

# Lematizador (solo para mostrarles como funciona)
lemma = WordNetLemmatizer()

# Palabras de paro
cached_stopwords = stopwords.words('english')
print(cached_stopwords[:10])

# Traductor
translate = Translation()

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
translate('i have', 'es')

(…)m/fasttext/supervised-models/lid.176.ftz:   0%|          | 0.00/938k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/312M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/826k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.59M [00:00<?, ?B/s]



'Tengo'

Ejemplo de Radicalizar vs Stemmizar

In [None]:
nltk.download('wordnet')
nltk.download('omw-1.4')

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


True

In [None]:
muestra = 'i have two feet, but some people only got one foot.'
print(stemmer.stem(muestra))
print('-----------')
print(lemma.lemmatize(muestra))

print(lemma.lemmatize('foot'))
print(lemma.lemmatize('feet'))
print('---------------')
print(stemmer.stem('constitutional'))

i have two feet, but some people only got one foot.
-----------
i have two feet, but some people only got one foot.
foot
foot
---------------
constitut


# Datos

Los datos que ocuparemos serán el texto de páginas de Wikipedia. Descargaremos el texto ocupando el módulo [```wikipedia```](https://pypi.org/project/wikipedia/) que es un "wrapper" del API de Wikipedia. A este texto lo dividiremos en oraciones, procesaremos cada oración, radicalizaremos cada palabra, y aplicaremos TextRank para obtener las oraciones más importantes de todo el documento.

## Lectura de los datos

Descargamos un artículos de Wikipedia.

In [None]:
wiki = wikipedia.page('Expropiación del petróleo en México')
book = wiki.content
print(book)

The Mexican oil expropriation (Spanish: expropiación petrolera) was the nationalization of all petroleum reserves, facilities, and foreign oil companies in Mexico on March 18, 1938.  In accordance with Article 27 of the Constitution of 1917, President Lázaro Cárdenas declared that all mineral and oil reserves found within Mexico belong to "the nation", i.e., the federal government. The Mexican government established a state-owned petroleum company, Petróleos Mexicanos, or PEMEX.  For a short period, this measure caused an international boycott of Mexican products in the following years, especially by the United States, the United Kingdom, and the Netherlands, but with the outbreak of World War II and the alliance between Mexico and the Allies, the disputes with private companies over compensation were resolved. The anniversary, March 18, is now a Mexican civic holiday.

On August 16, 1935, the Petroleum Workers Union of Mexico (Sindicato de Trabajadores Petroleros de la República Mexic

## Procesamiento

Dividimos el texto en oraciones.

In [None]:
sentences = [x for x in sent_tokenize(book)]
print(f"# oraciones: {len(sentences)}")
for sentence in sentences[:3]:
    print(sentence)
    print()
    print("...Fin de la oración...")
    print()


# oraciones: 98
The Mexican oil expropriation (Spanish: expropiación petrolera) was the nationalization of all petroleum reserves, facilities, and foreign oil companies in Mexico on March 18, 1938.

...Fin de la oración...

In accordance with Article 27 of the Constitution of 1917, President Lázaro Cárdenas declared that all mineral and oil reserves found within Mexico belong to "the nation", i.e., the federal government.

...Fin de la oración...

The Mexican government established a state-owned petroleum company, Petróleos Mexicanos, or PEMEX.

...Fin de la oración...



In [None]:
# Ejemplo lista por comprension
lista = []
for i in range(9):
  lista.append('Hola')

# Otro modo de crearla
otra_lista = ['Hola' for i in range(9)]

print(lista)
print(otra_lista)

['Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola']
['Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola', 'Hola']


convertimos a minúsculas, eliminamos stopwords, eliminamos signos de puntuación y radicalizamos.

In [None]:
sent_low = [[stemmer.stem(re.sub('[^a-z]', "", word.lower())) for word in word_tokenize(sentence) if word not in cached_stopwords and len(word) > 2] for sentence in sentences]
sent_low[0]

['the',
 'mexican',
 'oil',
 'expropri',
 'spanish',
 'expropiacin',
 'petrolera',
 'nation',
 'petroleum',
 'reserv',
 'facil',
 'foreign',
 'oil',
 'compani',
 'mexico',
 'march',
 '']

In [None]:
sentences[0]

'The Mexican oil expropriation (Spanish: expropiación petrolera) was the nationalization of all petroleum reserves, facilities, and foreign oil companies in Mexico on March 18, 1938.'

# TextRank

Construimos la matriz de adyacencias/similitud A entre las oraciones, tomando el número de palabras que están en ambas como la similitud entre las dos oraciones.

In [None]:
from tqdm import tqdm

In [None]:
A = np.zeros((len(sent_low), len(sent_low)))

for i in tqdm(range(len(sentences))):
    for j in range(i+1, len(sentences)):
        # La simillitud entre oraciones va a ser el número de palabras que tienen en común
        A[i][j] = A[j][i] = len([x for x in sent_low[i] if x in sent_low[j]])

100%|██████████| 98/98 [00:00<00:00, 4287.18it/s]


Así es como se ve un fragmento de la matriz A.

In [None]:
A[:5, :5]

array([[0., 6., 4., 3., 3.],
       [6., 0., 1., 1., 0.],
       [4., 1., 0., 2., 2.],
       [3., 1., 2., 0., 1.],
       [3., 0., 2., 1., 0.]])

Normalizamos las columnas de A

In [None]:
# Comparamos las oraciones unas con otras, pero no consigo mismas
suma = np.sum(A, axis=0)
A_norm = np.divide(A, suma, where=suma!=0)
A_norm[:5, :5]

array([[0.        , 0.04285714, 0.03636364, 0.02654867, 0.06521739],
       [0.02459016, 0.        , 0.00909091, 0.00884956, 0.        ],
       [0.01639344, 0.00714286, 0.        , 0.01769912, 0.04347826],
       [0.01229508, 0.00714286, 0.01818182, 0.        , 0.02173913],
       [0.01229508, 0.        , 0.01818182, 0.00884956, 0.        ]])

Se crea el vector de TextRank con unos y se itera hasta que converja. Es decir, hasta que obtengamos $\Pi$ tal que $$\Pi = A~\Pi$$

In [None]:
# Impresiones mas bonitas, evita notación científica en ciertos casos
np.set_printoptions(suppress=True)

In [None]:
# Tolerancia para la diferencia al comparar
tol = 1e-5

PI_ = np.ones(A_norm.shape[1])

i = 0
while True:
    pi_ = A_norm.dot(PI_)
    print(i, abs(PI_- pi_).sum())
    if np.allclose(PI_, pi_, tol):
        break
    i += 1
    PI_ = pi_.copy()

0 285.4323576419214
1 227.4509568555338
2 47.41709379201733
3 15.416423336711173
4 5.888480216245359
5 2.5588083336710135
6 1.1297041229190943
7 0.5074491550313454
8 0.23040109731306604
9 0.1050876636219274
10 0.048070350882148435
11 0.02205581074946307
12 0.010140935212571603
13 0.0046690453713090635
14 0.0021527478715337098
15 0.0009936402194557908
16 0.0004589309760281088


In [None]:
PI_

array([10.74212943,  6.16350725,  4.84277311,  4.97483873,  2.02515954,
        8.23271691,  6.29561121,  5.63523381,  0.66038183,  2.68553963,
        2.905675  ,  3.78617928,  4.22642796,  6.51574394,  3.83020055,
        5.54717589,  5.54717903,  6.20754607,  7.17609545,  1.45282182,
        1.45282903,  2.94969943,  5.94341335,  7.08806246,  4.79875899,
        7.26414581,  2.77359573,  3.21385086,  2.28930146,  3.61005968,
        5.19498398,  5.4591289 ,  3.91825471,  1.18868351,  4.2264306 ,
        2.81762124,  5.1949624 ,  4.53459212,  6.11952592,  1.80504255,
        6.3396418 ,  2.28931851,  7.83649489,  4.8427748 ,  5.45912559,
        0.1320756 ,  1.58490519,  6.60376875,  9.46540005,  0.44025528,
        2.55344572,  4.0062886 ,  0.04402514,  0.        ,  6.07545473,
        0.5723253 ,  6.51572004,  4.79873425,  6.47169843,  2.28930806,
        5.81132137,  5.28300537,  7.44023091,  6.03144321,  1.18867628,
        1.0125763 ,  3.69810856,  0.83647655,  3.03773261,  2.99

Alternativamente, podemos obtener los eigenvectores izquierdos de nuestra matriz A_norm. Los valores de PageRank corresponden al vector de probabilidades del estado estacionario de la matriz A que a su vez es el eigenvector izquierdo con eigenvalor asociado 1.

$$\Pi = \Pi A^T$$

In [None]:
vals, vecs = splinalg.eig(A_norm.T, left=True, right=False)

In [None]:
vecs.shape

(98, 98)

In [None]:
vals[90]

(-0.051063422755659824+0j)

In [None]:
# Primer columna de matriz de eigenvectores por la izquierda
pi_ = vecs[:, 0]
pi_

array([-0.13815784,  0.01388035,  0.12558162,  0.17358284, -0.04296665,
       -0.0778107 , -0.16402883, -0.03467973, -0.01284784, -0.03105319,
       -0.07431268,  0.00100917, -0.19222469,  0.09389125,  0.22634928,
       -0.17992393, -0.24143928, -0.2160659 , -0.0546644 , -0.00021678,
        0.00039349,  0.00457717,  0.06552103,  0.06422254, -0.06553907,
        0.18594421,  0.07899857,  0.05843317,  0.00064922,  0.05816777,
       -0.30286526, -0.27668629, -0.00344645, -0.05033427,  0.19137139,
        0.0360442 ,  0.13224497,  0.29876163,  0.00709155, -0.01101443,
        0.13598213, -0.0200829 ,  0.05432253,  0.22840565,  0.06726318,
        0.00365755,  0.07722195,  0.10356672, -0.16299198,  0.00354534,
       -0.10888121, -0.        ,  0.00152571,  0.        ,  0.12976041,
       -0.01019842, -0.0255621 ,  0.0270018 , -0.0670729 , -0.00734195,
       -0.0699798 , -0.06787312, -0.10235514,  0.08356393, -0.02008727,
       -0.07271497, -0.0451704 ,  0.05321414,  0.01717771,  0.02

Obtenemos los índices de los k valores más grandes en $\Pi$ y los usamos para obtener las oraciones más relevantes.

In [None]:
k = 4
pi_.argsort()[-k:][::-1]

array([37, 43, 14, 34])

In [None]:
summary = [sentences[idx] for idx in pi_.argsort()[-k:][::-1]]

In [None]:
summary

['After the publication of the findings, the oil companies threatened to leave Mexico and take all of their capital with them.',
 'Cárdenas offered to end the strike if the oil companies paid the sum.',
 'The companies, however, insisted the demands would cripple production and bankrupt them, and refused to pay.',
 'Cárdenas convinced the union to end the strike until a decision by the companies could be made.']

Por último, sólo queda ver qué considero TextRank como las oraciones más importantes.

In [None]:
for bullet in summary:
    print('___________')
    print(bullet)

___________
After the publication of the findings, the oil companies threatened to leave Mexico and take all of their capital with them.
___________
Cárdenas offered to end the strike if the oil companies paid the sum.
___________
The companies, however, insisted the demands would cripple production and bankrupt them, and refused to pay.
___________
Cárdenas convinced the union to end the strike until a decision by the companies could be made.


Podemos traducir la salida.

In [None]:
# Aprox 34 seg las primeras 10 oraciones
for bullet in summary:
    print()
    print(translate(bullet, "es"))


Después de la publicación de los hallazgos, las compañías petroleras amenazaron con salir de México y llevarse todo su capital con ellas.

Cárdenas ofreció poner fin a la huelga si las compañías petroleras pagaban la suma.

Las empresas, sin embargo, insistieron en que las demandas paralizarían la producción y la bancarrota, y se negaron a pagar.

Cárdenas convenció al sindicato de poner fin a la huelga hasta que las empresas pudieran tomar una decisión.


# Función para crear resúmenes

Podemos condensar todo lo anterior en una función que reciba texto y nos regrese las oraciones más relevantes de acuerdo a TextRank.

In [None]:
def summary(text, k, to_spanish = True, tol = 1e-5, d = .15, eig = False):
    print("Paso 1. Obteniendo oraciones")
    sentences = [x for x in sent_tokenize(text)]

    print(f"# oraciones: {len(sentences)}")

    print("Paso 2. Procesando texto")
    sent_low = [[stemmer.stem(re.sub('[^a-z]', "", word.lower())) for word in word_tokenize(sentence) if word not in cached_stopwords and len(word) > 2] for sentence in sentences]

    print("Paso 3. Creando matriz de similitud")
    A = np.zeros((len(sent_low), len(sent_low)))

    for i in range(len(sentences)):
        for j in range(i+1, len(sentences)):
            # La simillitud entre oraciones va a ser el número de palabras que tienen en común
            A[i][j] = A[j][i] = len([x for x in sent_low[i] if x in sent_low[j]])

    print("Paso 4. Estandariza matriz de similitud")
    suma = np.sum(A, axis=0)
    A_norm = np.divide(A, suma, where=suma!=0)

    print("Paso 5. Ejecutando TextRank")
    if eig:
        vals, vecs = splinalg.eig(A_norm.T, left=True, right=False)
        pi_f = vecs[:, 0]
    else:
        PI_f = np.ones(A_norm.shape[1])

        while True:
            pi_f = A_norm.dot(PI_f)
            if np.allclose(PI_f, pi_f, tol):
                break
            PI_f = pi_f.copy()

    print("\tPaso 5. Terminado")

    if not to_spanish:
        return [sentences[idx] for idx in pi_.argsort()[-k:][::-1]]

    print("Paso 6. Traduciendo")
    return [translate(sentences[idx], "es") for idx in pi_.argsort()[-k:][::-1]]

def print_bullet_points(bullet_points):
    for point in bullet_points:
        print(f"- {point}\n")


In [None]:

wiki = wikipedia.page('Automatic summarization')
text = wiki.content
bullet_points = summary(text, 5, False, tol=1e-1, eig =True)

Paso 1. Obteniendo oraciones
# oraciones: 329
Paso 2. Procesando texto
Paso 3. Creando matriz de similitud
Paso 4. Estandariza matriz de similitud
Paso 5. Ejecutando TextRank
	Paso 5. Terminado


In [None]:
print_bullet_points(bullet_points)

- At a very high level, summarization algorithms try to find subsets of objects (like set of sentences, or a set of images), which cover information of the entire set.

- You are given a piece of text, such as a journal article, and you must produce a list of keywords or key[phrase]s that capture the primary topics discussed in the text.

- This has been applied mainly for text.

- Video summarization is a related domain, where the system automatically creates a trailer of a long video.

- Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs.



In [None]:
!wget https://www.gutenberg.org/files/84/84-0.txt -O book.txt

--2024-07-18 22:25:38--  https://www.gutenberg.org/files/84/84-0.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 448642 (438K) [text/plain]
Saving to: ‘book.txt’


2024-07-18 22:25:38 (3.06 MB/s) - ‘book.txt’ saved [448642/448642]



In [None]:
with open("book.txt") as f:
    book_raw = f.read()

print(book_raw[0:1000])

The Project Gutenberg eBook of Frankenstein, by Mary Wollstonecraft Shelley

This eBook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restrictions
whatsoever. You may copy it, give it away or re-use it under the terms
of the Project Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United States, you
will have to check the laws of the country where you are located before
using this eBook.

Title: Frankenstein
       or, The Modern Prometheus

Author: Mary Wollstonecraft Shelley

Release Date: October 31, 1993 [eBook #84]
[Most recently updated: December 2, 2022]

Language: English

Character set encoding: UTF-8

Produced by: Judith Boss, Christy Phillips, Lynn Hanninen and David Meltzer. HTML version by Al Haines.
Further corrections by Menno de Leeuw.

*** START OF THE PROJECT GUTENBERG EBOOK FRANKENSTEIN ***




Frankenstein;

or, the Modern Prometheus

by M

In [None]:
start = book_raw.rfind("Chapter 5\n")
end = book_raw.rfind('Chapter 6\n')

In [None]:
chapter_n = book_raw[start + len("Chapter 5\n"): end]

In [None]:
chapter_n

'\n\nIt was on a dreary night of November that I beheld the accomplishment\nof my toils. With an anxiety that almost amounted to agony, I\ncollected the instruments of life around me, that I might infuse a\nspark of being into the lifeless thing that lay at my feet. It was\nalready one in the morning; the rain pattered dismally against the\npanes, and my candle was nearly burnt out, when, by the glimmer of the\nhalf-extinguished light, I saw the dull yellow eye of the creature\nopen; it breathed hard, and a convulsive motion agitated its limbs.\n\nHow can I describe my emotions at this catastrophe, or how delineate\nthe wretch whom with such infinite pains and care I had endeavoured to\nform? His limbs were in proportion, and I had selected his features as\nbeautiful. Beautiful! Great God! His yellow skin scarcely covered\nthe work of muscles and arteries beneath; his hair was of a lustrous\nblack, and flowing; his teeth of a pearly whiteness; but these\nluxuriances only formed a more 

In [None]:
bullet_points = summary(chapter_n, 5, False, eig = True, tol=1e-2)

Paso 1. Obteniendo oraciones
# oraciones: 90
Paso 2. Procesando texto
Paso 3. Creando matriz de similitud
Paso 4. Estandariza matriz de similitud
Paso 5. Ejecutando TextRank
	Paso 5. Terminado


In [None]:
print_bullet_points(bullet_points)

- As it drew nearer I observed
that it was the Swiss diligence; it stopped just where I was standing, and
on the door being opened, I perceived Henry Clerval, who, on seeing me,
instantly sprung out.

- “You may easily believe,” said
he, “how great was the difficulty to persuade my father that all
necessary knowledge was not comprised in the noble art of book-keeping;
and, indeed, I believe I left him incredulous to the last, for his constant
answer to my unwearied entreaties was the same as that of the Dutch
schoolmaster in The Vicar of Wakefield: ‘I have ten thousand florins
a year without Greek, I eat heartily without Greek.’ But his
affection for me at length overcame his dislike of learning, and he has
permitted me to undertake a voyage of discovery to the land of
knowledge.”

“It gives me the greatest delight to see you; but tell me how you left
my father, brothers, and Elizabeth.”

“Very well, and very happy, only a little uneasy that they hear from
you so seldom.

- But it was 

In [None]:
translate(bullet_points, 'es') # ESPAÑOL

['Mientras se acercaba, observéque era la diligencia suiza; se detuvo justo donde yo estaba parado, yEn la puerta abierta, percibí a Henry Clerval, quien, al verme,instantáneamente salió.',
 'Puedes creer fácilmente, dijoél, “cuán grande fue la dificultad de persuadir a mi padre de que todoslos conocimientos necesarios no estaban comprendidos en el noble arte de la contabilidad;y, de hecho, creo que lo dejé incrédulo hasta el último, por su constantela respuesta a mis súplicas infatigables era la misma que la de los holandesesmaestro de escuela en El Vicario de Wakefield: ‘Tengo diez mil florinesun año sin griego, como de todo corazón sin griego.’ Pero suel afecto por mí al fin superó su disgusto por el aprendizaje, y él tieneme permitió emprender un viaje de descubrimiento a la tierra deconocimiento.”Me da la mayor delicia verte; pero dime cómo te fuistemi padre, hermanos, y Elizabeth.”“Muy bien, y muy feliz, sólo un poco incómodo que oyen deEres muy rara.',
 'Pero fue en vano; dormí,

# Ejercicios

## Matriz de similitud entre oraciones*

Para la similitud entre las oraciones se uso el número de palabras que aparecen en ambas. **Reemplazar por similitud coseno** y comparar los resultados.

Un muy buen primer acercamiento podría ser usando Latent Semantic Analysis y calcular la similitud coseno entre todos los documentos.

Si tienen una DataFrame con las columnas ```[id_documento_1, id_documento_2, similitud]```, usar la función [```pandas.DataFrame.pivot```](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html) puede ayudar a crear la matriz de similitud, dicha función toma como argumentos "index", "columns" y "values".




## Idioma *

Este ejemplo esta hecho para texto en inglés por las stopwords que se usan y el radicalizador (PorterStemmer). Hacer los cambios necesarios para que reciba textos en español.

Esto es, cambiar las stopwords (nltk tiene stopwords en español) y el radicalizador (Pista: ```nltk.stemmer``` tiene más radicalizadores y uno de ellos tienen un algoritmo para el español)

## Oraciones vs. Palabras

En este Notebook utilizamos las oraciones para obtener el resumen, de haber utilizado las palabras, de TextRank obtendríamos las palabras clave del texto.

Implementar TextRank con palabras. Para la matriz de similitud (o adyacencias), se pueden ligar las palabras que son consecutivas o definir una ventana de k palabras consecutivas en cada oración (parecido a skip-gram) y ligar todas estas palabras. En este caso, la matriz A tendría la dimensión del vocabulario (lista de palabras únicas) y tendría un 1 si las palabras están ligadas.

Una alternativa más sería ocupar un embedding de palabras (e.g. word2vec) y calcular la similitud coseno entre los vectores de cada palabra para llenas a A.

Después de eso, todo sería lo mismo.

## Resumen sobre un tema *

Aquí usamos sólo un documento para aplicarle TextRank. Podemos tener un corpus de documentos del mismo tema (e.g. noticias sobre precios de criptomonedas, etc) y aplicarlo para obtener los puntos importantes de todo el corpus.

A la implementación actual no se le tiene que cambiar nada, sólo concatenar en una sola cadena de texto todo el corpus.

Ejercicio: Construir un corpus con 4 artículos sobre un tema de interés, concatenarlos y pasarlo como parámetro a la función ```summary```.

## Mejorar la función ```summary``` -

Podemos dividir el código de la función para que funcionen como módulos y permita cierta libertad a la hora de ejecutarse. Por ejemplo, podríamos tener varias funciones que calculen la matriz A de diferentes maneras y que dentro de ```summary``` se ejecute una de tantas de acuerdo a un parámetro de la función.

Ejercicio: Crear funciones para cada paso de ```summary```

# Sobre la obtención de los valores de PageRank

https://nlp.stanford.edu/IR-book/html/htmledition/the-pagerank-computation-1.html

https://nlp.stanford.edu/IR-book/html/htmledition/markov-chains-1.html