# KeywordExtractor using TextRank algorithm
* TextRank algorithm is inspired from PageRank algorithm. The following code is developed by taking reference of the paper below.
* Link to the paper <a>https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf</a>

In [3]:
from text_summary import KeywordExtractor

In [4]:
text = '''In 2019, the world was struck by a novel coronavirus, COVID-19, unleashing an unprecedented global health crisis. The virus rapidly spread across borders, impacting every facet of human life. Governments enforced strict measures to curb its transmission, including lockdowns, travel restrictions, and mass vaccination campaigns.
COVID-19, primarily transmitted through respiratory droplets, caused severe respiratory illness, leading to a significant loss of life. The pandemic strained healthcare systems, disrupted economies, and sparked social upheaval. Researchers and scientists worked tirelessly to develop effective vaccines, offering hope for a brighter future.
Though progress has been made, vigilance remains crucial. COVID-19 serves as a reminder of our vulnerability and the need for international cooperation to combat infectious diseases.
'''.replace("\n", "").lower()

In [5]:
extractor = KeywordExtractor()

In [6]:
extractor.analyze(text, pos_to_consider=['NOUN', 'PROPN'])

In [8]:
extractor.getKeywords(20)

[['life', 1.5393452380952382],
 ['travel', 1.220595238095238],
 ['restrictions', 1.1295238095238096],
 ['vaccination', 1.1295238095238096],
 ['campaigns.covid-19', 1.1295238095238096],
 ['mass', 1.1011904761904763],
 ['lockdowns', 1.0991666666666666],
 ['scientists', 1.085],
 ['vaccines', 1.085],
 ['hope', 1.085],
 ['progress', 1.085],
 ['droplets', 1.0080952380952382],
 ['world', 1.0],
 ['coronavirus', 1.0],
 ['covid-19', 1.0],
 ['health', 1.0],
 ['crisis', 1.0],
 ['healthcare', 1.0],
 ['systems', 1.0],
 ['economies', 1.0]]

In [9]:
text2 = '''In the era of digital transformation, data science has emerged as a game-changer, revolutionizing the way businesses operate and unlocking insights previously hidden in vast amounts of information. With the explosion of data and advancements in technology, organizations are leveraging data science to drive innovation, make informed decisions, and gain a competitive edge.
Data scientists employ sophisticated techniques to extract valuable insights from complex datasets. They utilize statistical analysis, machine learning, and artificial intelligence algorithms to uncover patterns, predict outcomes, and automate processes. The applications of data science span across industries, from healthcare and finance to marketing and transportation.
This new age of data science empowers businesses to optimize operations, personalize customer experiences, and improve products and services. It enables targeted marketing campaigns, fraud detection, demand forecasting, and precision medicine, among numerous other applications.
However, with great power comes great responsibility. Ethical considerations around data privacy, bias, and transparency must be at the forefront of data science practices. As data becomes the lifeblood of organizations, maintaining data integrity and safeguarding privacy are critical."'''.replace("\n", "").lower()

In [10]:
extractor.analyze(text2, pos_to_consider=['NOUN', 'PROPN'])

In [11]:
extractor.getKeywords(20)

[['data', 5.256035109081421],
 ['science', 3.61465921675224],
 ['businesses', 1.6940290853893503],
 ['insights', 1.6000595238095237],
 ['organizations', 1.3838849110881988],
 ['marketing', 1.3455658819261465],
 ['intelligence', 1.3197619047619047],
 ['algorithms', 1.3197619047619047],
 ['precision', 1.2711904761904762],
 ['forecasting', 1.2286904761904762],
 ['privacy', 1.2109147538508003],
 ['demand', 1.1295238095238096],
 ['medicine', 1.114345238095238],
 ['learning', 1.0870238095238096],
 ['patterns', 1.0870238095238096],
 ['operations', 1.011637310497575],
 ['transparency', 0.9811468158501037],
 ['detection', 0.9726785714285714],
 ['applications.however', 0.9726785714285714],
 ['decisions', 0.9354684793287439]]

In [12]:
text3 = '''In today's data-driven world, data literacy has become a fundamental skill for individuals to navigate and thrive in the digital landscape. Data literacy refers to the ability to understand, interpret, and communicate insights from data effectively. It empowers individuals to make informed decisions, identify trends, and critically evaluate information.
With the increasing availability of data, from social media metrics to business analytics, data literacy bridges the gap between raw data and actionable knowledge. It enables individuals to question assumptions, challenge biases, and draw meaningful conclusions from data sources.
By fostering data literacy, we equip individuals with the skills needed to engage with data responsibly, contributing to a more informed society. In an era of fake news and misinformation, data literacy serves as a shield against manipulation and empowers individuals to become discerning consumers of information.
As data continues to shape our lives, investing in data literacy is crucial. Organizations, educational institutions, and governments must prioritize initiatives that promote data literacy, ensuring that individuals can harness the power of data for personal and societal benefit.'''

In [13]:
extractor.analyze(text3, pos_to_consider=['NOUN', 'PROPN'], lower=True)

In [14]:
extractor.getKeywords(20)

[['data', 7.718808441558441],
 ['literacy', 4.604183982683982],
 ['individuals', 3.439297619047619],
 ['information', 1.1529816017316015],
 ['shield', 0.9807554112554113],
 ['business', 0.953047619047619],
 ['initiatives', 0.9018268398268399],
 ['biases', 0.871469696969697],
 ['challenge', 0.8714696969696969],
 ['conclusions', 0.8714696969696969],
 ['misinformation', 0.8075357142857142],
 ['metrics', 0.7962023809523809],
 ['manipulation', 0.7955768398268398],
 ['media', 0.7456071428571429],
 ['governments', 0.7399220779220779],
 ['analytics', 0.7253690476190476],
 ['gap', 0.7253690476190476],
 ['world', 0.7063268398268399],
 ['institutions', 0.672952380952381],
 ['news', 0.6577738095238095]]