
Text Summarization
==================


Shows how to summarize text by extracting the most important sentences from it.

This module automatically summarizes the given text by extracting one or more important sentences from the text. Similarly, it can also extract keywords. 

This summarizer is based on the "TextRank" algorithm. 



**Extractive summarization** involves selecting and combining crucial sentences, phrases, or words from an original text to create a shorter version. Generally, the extracted information remains unchanged from the input. (Relative order of sentences may change but every sentence is taken from input)

In [31]:
from pprint import pprint as print
from gensim.summarization import summarize

`gensim` is a robust, efficient open-source python library which is designed for Natural Language Processing(NLP), topic modeling, similarity retrieval using corpora and document indexing etc..

`gensim.summarization.summarize` this function provided by the `gensim` library, 


This function is important for summarizing text. It uses a modified version of the TextRank algorithm to select the important sentences from the input text and create a summary that contain the most important information.

**TextRank** is an unsupervised, graph-based ranking algorithm used for Natural Language Processing tasks, like extracting keywords from text or generating summaries.Its works by assuming the text like a graph with nodes as words or sentences and the edges indicate how the nodes relate to each other.TextRank uses PageRank to determine the most important nodes which are used to generate summaries or highlight important sentences in texts.

Let's consider this example for better understanding.

Extractive summarization is a text summarization technique based on identifying and separating the primary sentences or phrases in the source text to create summary. The extractive summarization systems employ statistical algorithms and linguistic analysis to assess word frequency, sentence position, and keyword occurrence to gauge the importance of each type of textual input. The prioritized sentences are then placed together to develop a brief, information summary. The primary benefit of extractive summarization is its simplicity and the ability for computational deployment. Additionally, the process is relatively straight forward, as the summary is based on the pre-existing text and its extraction. However, in the operational mode, the summaries may lose interpersonal aspects and lack a wholistic context."


In [2]:
# text = (
#     "Extractive summarization is a text summarization technique based on identifying and separating the primary sentences or phrases in the source text to create summary. The extractive summarization systems employ statistical algorithms and linguistic analysis to assess word frequency, sentence position, and keyword occurrence to gauge the importance of each type of textual input. The prioritized sentences are then placed together to develop a brief, information summary. The primary benefit of extractive summarization is its simplicity and the ability for computational deployment. Additionally, the process is relatively straight forward, as the summary is based on the pre-existing text and its extraction. However, in the operational mode, the summaries may lose interpersonal aspects and lack a wholistic context."
# )
text = input("Enter text: ")
print(text)

('Extractive summarization is a text summarization technique based on '
 'identifying and separating the primary sentences or phrases in the source '
 'text to create summary. The extractive summarization systems employ '
 'statistical algorithms and linguistic analysis to assess word frequency, '
 'sentence position, and keyword occurrence to gauge the importance of each '
 'type of textual input. The prioritized sentences are then placed together to '
 'develop a brief, information summary. The primary benefit of extractive '
 'summarization is its simplicity and the ability for computational '
 'deployment. Additionally, the process is relatively straight forward, as the '
 'summary is based on the pre-existing text and its extraction. However, in '
 'the operational mode, the summaries may lose interpersonal aspects and lack '
 'a wholistic context')


When we pass the string data(input text) as an input to the summarize function, the function will process the data and generate a summary based on the input text.

In [3]:
print(summarize(text))

('Extractive summarization is a text summarization technique based on '
 'identifying and separating the primary sentences or phrases in the source '
 'text to create summary.')


As we are implementing Extractive summarization, the output generated is not created externally, it just the part of the input text which is most significant among all the sentences.

In [4]:
print(summarize(text, ratio=0.6))

('Extractive summarization is a text summarization technique based on '
 'identifying and separating the primary sentences or phrases in the source '
 'text to create summary.\n'
 'The primary benefit of extractive summarization is its simplicity and the '
 'ability for computational deployment.\n'
 'Additionally, the process is relatively straight forward, as the summary is '
 'based on the pre-existing text and its extraction.')


In [5]:
print(summarize(text, split=True))

['Extractive summarization is a text summarization technique based on '
 'identifying and separating the primary sentences or phrases in the source '
 'text to create summary.']


In [6]:
print(summarize(text, word_count=50))

('Extractive summarization is a text summarization technique based on '
 'identifying and separating the primary sentences or phrases in the source '
 'text to create summary.\n'
 'Additionally, the process is relatively straight forward, as the summary is '
 'based on the pre-existing text and its extraction.')


Here the summarize function has 2 attributes for each call, i.e; the input text and ratio, split, wordcount respectively. So, each attribute has its own function.

*ratio* controls the length of summary as a fraction of input text.

*split* when set to True, returns the summary as a list of sentences instead of single sentence/string.

*word_count* specifies the maximum number of words in the summary.


KeyWord Identification
==================

As mentioned earlier, this module also supports keyword extraction. Keyword extraction works in the same way as summary generation (i.e. sentence extraction), in that the algorithm tries to find words that are important or seem to be representative of the text as a whole.

In [7]:
from gensim.summarization import keywords

`gensim.summarization.keywords` function is used for identification and extracting the words/phrases from the text data.

The gensim keywords module uses Natural Language Processing techniques, like term frequency-inverse document frequency (TF-IDF), to find the most important keywords in a given text.

In [8]:
print(keywords(text))

('sentences\n'
 'sentence\n'
 'summarization\n'
 'text\n'
 'deployment\n'
 'straight\n'
 'interpersonal\n'
 'summary\n'
 'summaries\n'
 'information')


In [9]:
print(keywords(text,  lemmatize=True))

('sentence\n'
 'summarization\n'
 'text\n'
 'straight\n'
 'deployment\n'
 'interpersonal\n'
 'information\n'
 'summaries')


In [10]:
print(keywords(text, split=True))

['sentences',
 'sentence',
 'summarization',
 'text',
 'straight',
 'deployment',
 'interpersonal',
 'summary',
 'summaries',
 'information']


In [11]:
print(keywords(text, ratio=0.4))

('summarization\n'
 'primary sentences\n'
 'frequency sentence\n'
 'text\n'
 'deployment\n'
 'interpersonal\n'
 'straight\n'
 'information\n'
 'summary\n'
 'summaries\n'
 'word\n'
 'technique\n'
 'systems employ statistical\n'
 'wholistic')


Similarly, in keyword extraction, there are two parameters: input text and lemmatize, split, ratio are also important parameters.

*lemmatize* when set to True, this parameter lemmatizes the keywords.(Lemmatization reduces the given word into its root word)

*split* when set to True, returns the keywords as a list.

*ratio* specifies the number of keywords relative to the number of words in input text.


Lets take a larger example, **Text file** as input data

In [12]:
with open('avatar.txt', 'r') as file:
    text1 = file.read()

print(text1)

("In 2154, humans have depleted Earth's natural resources, leading to a severe "
 'energy crisis. The Resources Development Administration (RDA) mines a '
 'valuable mineral Unobtanium on Pandora, a densely forested habitable moon '
 'orbiting Polyphemus, a fictional gas giant in the Alpha Centauri star '
 'system. Pandora, whose atmosphere is poisonous to humans, is inhabited by '
 "the Na'Vi, a species of 10-foot tall (3.0 m), blue-skinned, sapient "
 'humanoids that live in harmony with nature and worship a mother goddess '
 'named Eywa. It takes 6 years to get from Earth to Pandora in cryogenic '
 'sleep.\n'
 '\n'
 "To explore Pandora's biosphere, scientists use Na'Vi-human hybrids (grown "
 'from human + native DNA) called "avatars", operated by genetically matched '
 'humans. Jake Sully (Sam Worthington), a paraplegic former Marine, replaces '
 'his deceased identical twin brother as an operator of one. Jake was leading '
 'a purposeless life on Earth and was contacted by RDA whe

In [13]:
print(summarize(text1))

("To explore Pandora's biosphere, scientists use Na'Vi-human hybrids (grown "
 'from human + native DNA) called "avatars", operated by genetically matched '
 'humans.\n'
 'Tracy (Michelle Rodriguez) is the pilot assigned to Grace and her team of '
 "Na'Vis. While escorting the avatars of Grace and fellow scientist Dr. Norm "
 "Spellman (Joel David Moore), Jake's avatar is attacked by a Thanator (while "
 'they were visiting the school that Grace was operating to teach the '
 'Omaticaya.\n'
 "Colonel Miles Quaritch (Stephen Lang), head of RDA's private security force, "
 'promises Jake that the company will restore his legs if he gathers '
 "information about the Na'Vi and the clan's gathering place, a giant tree "
 'called Hometree, which stands above the richest deposit of Unobtanium in the '
 'area.\n'
 'She even takes Jake to the tree of souls, their most sacred site), he and '
 'Neytiri choose each other as mates.\n'
 "When Quaritch shows a video recording of Jake's attack on the b

In [14]:
print(summarize(text1, ratio=0.7))

("Pandora, whose atmosphere is poisonous to humans, is inhabited by the Na'Vi, "
 'a species of 10-foot tall (3.0 m), blue-skinned, sapient humanoids that live '
 'in harmony with nature and worship a mother goddess named Eywa.\n'
 "To explore Pandora's biosphere, scientists use Na'Vi-human hybrids (grown "
 'from human + native DNA) called "avatars", operated by genetically matched '
 'humans.\n'
 'Jake was leading a purposeless life on Earth and was contacted by RDA when '
 'his brother died.\n'
 'his brother represented a significant investment by RDA, since the avatars '
 'are linked to the human DNA/genome.\n'
 'Since Jake is a twin, he has the same exact DNA as his brother and can take '
 'his place in the Avatar program.\n'
 'Dr. Grace Augustine (Sigourney Weaver), head of the Avatar Program, '
 'considers Sully an inadequate replacement (as she considers Jake a mere '
 'Jarhead) but accepts his assignment as a bodyguard for excursions deep into '
 "Na'Vi territory.\n"
 'Tracy (

In [15]:
print(summarize(text1, split=True))

["To explore Pandora's biosphere, scientists use Na'Vi-human hybrids (grown "
 'from human + native DNA) called "avatars", operated by genetically matched '
 'humans.',
 'Tracy (Michelle Rodriguez) is the pilot assigned to Grace and her team of '
 "Na'Vis. While escorting the avatars of Grace and fellow scientist Dr. Norm "
 "Spellman (Joel David Moore), Jake's avatar is attacked by a Thanator (while "
 'they were visiting the school that Grace was operating to teach the '
 'Omaticaya.',
 "Colonel Miles Quaritch (Stephen Lang), head of RDA's private security force, "
 'promises Jake that the company will restore his legs if he gathers '
 "information about the Na'Vi and the clan's gathering place, a giant tree "
 'called Hometree, which stands above the richest deposit of Unobtanium in the '
 'area.',
 'She even takes Jake to the tree of souls, their most sacred site), he and '
 'Neytiri choose each other as mates.',
 "When Quaritch shows a video recording of Jake's attack on the bulld

In [16]:
print(summarize(text1, word_count=100))

("Colonel Miles Quaritch (Stephen Lang), head of RDA's private security force, "
 'promises Jake that the company will restore his legs if he gathers '
 "information about the Na'Vi and the clan's gathering place, a giant tree "
 'called Hometree, which stands above the richest deposit of Unobtanium in the '
 'area.\n'
 'The clan attempts to transfer Grace from her human body into her avatar with '
 'the aid of the Tree of Souls, but she dies before the process can be '
 'completed.\n'
 'Jake destroys a makeshift bomber before it can reach the Tree of Souls; '
 'Quaritch, wearing an AMP suit, escapes from his own damaged aircraft and '
 "breaks open the avatar link unit containing Jake's human body, exposing it "
 "to Pandora's poisonous atmosphere.")


In [17]:
print(keywords(text1))

('jake\n'
 'grace\n'
 'quaritch\n'
 'neytiri\n'
 'humans\n'
 'human\n'
 'calls\n'
 'hometree\n'
 'tree\n'
 'selfridge\n'
 'predator\n'
 'dna called\n'
 'scientists\n'
 'scientist\n'
 'native\n'
 'natives\n'
 'tracy\n'
 'trudy\n'
 'destroyed\n'
 'destroy\n'
 'destroying\n'
 'destroys\n'
 'avatars\n'
 'avatar\n'
 'resources\n'
 'neural\n'
 'brother\n'
 'chief\n'
 'replaces\n'
 'replacement\n'
 'energy\n'
 'slywanin\n'
 'centauri\n'
 'mineral\n'
 'vortex\n'
 'night\n'
 'sign\n'
 'banshee\n'
 'gathers\n'
 'gathering\n'
 'gather\n'
 'kill\n'
 'killed\n'
 'killing\n'
 'kills\n'
 'tsu\n'
 'pilot\n'
 'force\n'
 'forces\n'
 'forced\n'
 'wildlife unexpectedly\n'
 'sapient\n'
 'rda\n'
 'administration\n'
 'administrator\n'
 'takes\n'
 'escape\n'
 'escapes\n'
 'michelle\n'
 'sully\n'
 'miles\n'
 'forest\n'
 'orders\n'
 'forested habitable moon orbiting\n'
 'considers\n'
 'unites\n'
 'unit')


In [18]:
print(keywords(text1, lemmatize=True))

('jake\n'
 'grace\n'
 'quaritch\n'
 'neytiri\n'
 'human\n'
 'calls\n'
 'hometree\n'
 'tree\n'
 'selfridge\n'
 'predator\n'
 'scientist\n'
 'dna\n'
 'natives\n'
 'tracy\n'
 'trudy\n'
 'destroys\n'
 'avatar\n'
 'resources\n'
 'neural\n'
 'brother\n'
 'chief\n'
 'banshee\n'
 'vortex\n'
 'night\n'
 'slywanin\n'
 'centauri\n'
 'mineral\n'
 'replacement\n'
 'sign\n'
 'gather\n'
 'energy\n'
 'kills\n'
 'tsu\n'
 'pilot\n'
 'forced\n'
 'wildlife unexpectedly\n'
 'sapient\n'
 'rda\n'
 'administrator\n'
 'takes\n'
 'escapes\n'
 'michelle\n'
 'sully\n'
 'miles\n'
 'forest\n'
 'orders\n'
 'considers\n'
 'habitable moon orbiting\n'
 'unit')


In [19]:
print(keywords(text1, ratio=0.05))

('jake\n'
 'grace\n'
 'quaritch\n'
 'neytiri\n'
 'humans\n'
 'human\n'
 'calls\n'
 'hometree\n'
 'tree\n'
 'selfridge\n'
 'predator\n'
 'dna called\n'
 'scientists\n'
 'scientist\n'
 'native\n'
 'natives')


In [20]:
print(keywords(text1, split=True))

['jake',
 'grace',
 'quaritch',
 'neytiri',
 'humans',
 'human',
 'calls',
 'hometree',
 'tree',
 'selfridge',
 'predator',
 'dna called',
 'scientists',
 'scientist',
 'native',
 'natives',
 'tracy',
 'trudy',
 'destroyed',
 'destroy',
 'destroying',
 'destroys',
 'avatars',
 'avatar',
 'resources',
 'neural',
 'brother',
 'chief',
 'energy',
 'banshee',
 'slywanin',
 'sign',
 'mineral',
 'gathers',
 'gathering',
 'gather',
 'vortex',
 'night',
 'replaces',
 'replacement',
 'centauri',
 'kill',
 'killed',
 'killing',
 'kills',
 'tsu',
 'pilot',
 'force',
 'forces',
 'forced',
 'wildlife unexpectedly',
 'sapient',
 'rda',
 'administration',
 'administrator',
 'takes',
 'escape',
 'escapes',
 'michelle',
 'sully',
 'miles',
 'forest',
 'orders',
 'forested habitable moon orbiting',
 'considers',
 'unites',
 'unit']


Now we will look into **PDF** files as input data

In [21]:
import PyPDF2

`PyPDF2` is a python library, which is used for reading and manuplating the PDF files.
It has many functionalities in handling PDF files.  
*  Reading PDF files: Read and extract text, and content from PDF files.
*  Merging PDF files: Combining multiple PDF files into single file.
*  Splitting PDF files: Split PDF file into multiples files.
*  Encrypting and Decrypting PDFs: Add passwords to PDF files for security.
* Rotating Pages: Rotate pages in PDF files.

In [22]:
reader = PyPDF2.PdfReader('economics_chap1.pdf')

fulltext=""
for pgnum in range(len(reader.pages)):
    pagecontent=reader.pages[pgnum].extract_text()
    fulltext+=pagecontent

print(fulltext)

('Macroeconomics\n'
 '1\n'
 'Macroeconomics\n'
 'MACRO → Greek word → Makros → Large/aggregate\n'
 'Study of macroeconomics as a separate branch is a recent origin. It started '
 'after the  \n'
 'classification of economics as microeconomics and macroeconomics by Prof. '
 'Ragnar  \n'
 'Frisch in 1933.\n'
 'Historical Review of Macroeconomy\n'
 '16th and 17th Century - Advisers of English merchantilist groups '
 'advocated  \n'
 'policies to the government based on macroeconomic approach\n'
 '18th Century - Physiocrats - the French thinkers tried to analyse the '
 'concept of  \n'
 'national income and wealth\n'
 'Classical Economic Theories - Theories of Adam Smith and J.S. Mill  \n'
 'discussed the determination of national income and wealth and division of  \n'
 'national income into total wages, total rent and total profit. However , '
 'their macro  \n'
 'analysis was combined with micro analysis.\n'
 'Neo-classical Economists - The increasing importance of macroeconomists  \n'
 

In [23]:
print(summarize(fulltext))

('Definition of Government Budget  - It is a financial statement showing '
 'estimated  \n'
 'receipts and estimated expenditures of the government during a fiscal year '
 '(from 1st  \n'
 'Budget impacts the economy through aggregate fiscal discipline and '
 'resource  \n'
 '3. Governement uses its fiscal tools i.e. taxes and subsidies to promote '
 'social  \n'
 '8. The government makes these provisions in its annual budget.\n'
 'expenditure and government receipts.\n'
 '5. During recession or deflation, government can increase its expenditure '
 'and  \n'
 '6. During inflation, government reduces its expenditure and increases its '
 'receipts  \n'
 'i.e government may enforce increase in tax and reduction in subsidies to  \n'
 'increases the rate of economic growth and development.\n'
 '4. Government budget can be an ef fective tool to ensure economic growth in '
 'the  \n'
 'infrastructural base of the economy like government increasing its '
 'expenditure  \n'
 '9. Government also

In [24]:
print(summarize(fulltext, split=True))

['Definition of Government Budget  - It is a financial statement showing '
 'estimated  ',
 'receipts and estimated expenditures of the government during a fiscal year '
 '(from 1st  ',
 'Budget impacts the economy through aggregate fiscal discipline and '
 'resource  ',
 '3. Governement uses its fiscal tools i.e. taxes and subsidies to promote '
 'social  ',
 '8. The government makes these provisions in its annual budget.',
 'expenditure and government receipts.',
 '5. During recession or deflation, government can increase its expenditure '
 'and  ',
 '6. During inflation, government reduces its expenditure and increases its '
 'receipts  ',
 'i.e government may enforce increase in tax and reduction in subsidies to  ',
 'increases the rate of economic growth and development.',
 '4. Government budget can be an ef fective tool to ensure economic growth in '
 'the  ',
 'infrastructural base of the economy like government increasing its '
 'expenditure  ',
 '9. Government also ensures tha

In [25]:
print(summarize(fulltext, ratio=0.1, split=True))

['receipts and estimated expenditures of the government during a fiscal year '
 '(from 1st  ',
 '3. Governement uses its fiscal tools i.e. taxes and subsidies to promote '
 'social  ',
 '5. During recession or deflation, government can increase its expenditure '
 'and  ',
 '6. During inflation, government reduces its expenditure and increases its '
 'receipts  ',
 'i.e government may enforce increase in tax and reduction in subsidies to  ',
 '4. Government budget can be an ef fective tool to ensure economic growth in '
 'the  ',
 'infrastructural base of the economy like government increasing its '
 'expenditure  ',
 'Fiscal Discipline refers to ideal balance between the government expenditure '
 'and  ',
 'Deficit Budget  - When government expenditure is more than receipts',
 '1. Fiscal Deficit refers to excess of total expenditure over total '
 'receipts  ',
 'a. Deficit Financing  - The Central bank may finance the government  ',
 'from the Central Bank i.e RBI against securities or

In [26]:
print(summarize(fulltext, word_count=150))

('receipts and estimated expenditures of the government during a fiscal year '
 '(from 1st  \n'
 '6. During inflation, government reduces its expenditure and increases its '
 'receipts  \n'
 'Deficit Budget  - When government expenditure is more than receipts\n'
 'a. Deficit Financing  - The Central bank may finance the government  \n'
 'Sometimes government borrows money to finance its Budget Deficit.\n'
 'deposits known as Cash Reserve Ratio (CRR) with the Central Bank.\n'
 'It enables the Central Bank to control credit creation done by the  \n'
 'A commercial bank can borrow from the Central  \n'
 'A decrease in Bank Rate by RBI reduces the cost of borrowings for '
 'commercial  \n'
 'to take loans so it increases the ability of commercial banks to create '
 'credit i.e.\n'
 'government securities in the open market by the Central Bank.\n'
 'reserves of the commercial bank which in turn increases the credit '
 'creation  \n'
 'An increase in CRR reduces the cash reserves of the comm

In [27]:
print(keywords(fulltext))

('bank\n'
 'banking\n'
 'government\n'
 'governments\n'
 'commercial banks\n'
 'governement uses\n'
 'macroeconomics\n'
 'macroeconomic\n'
 'budget\n'
 'economics\n'
 'economic\n'
 'general\n'
 'generates\n'
 'generation\n'
 'generally\n'
 'generations\n'
 'receipts\n'
 'receipt\n'
 'taxes\n'
 'tax\n'
 'deficit\n'
 'examples\n'
 'example\n'
 'rate\n'
 'like\n'
 'fiscal\n'
 'borrowing\n'
 'borrowings\n'
 'borrow\n'
 'borrows\n'
 'borrowers\n'
 'borrower\n'
 'main credit\n'
 'certain\n'
 'public\n'
 'rbi\n'
 'deposits\n'
 'deposit\n'
 'depositing\n'
 'deposited\n'
 'function\n'
 'functions\n'
 'functioning\n'
 'period\n'
 'periodic\n'
 'income\n'
 'securities\n'
 'security\n'
 'reducing\n'
 'reduce\n'
 'reduces\n'
 'reduced\n'
 'financial\n'
 'money\n'
 'loans\n'
 'loan\n'
 'foreign\n'
 'lending rates\n'
 'controls\n'
 'control\n'
 'controller\n'
 'controlling\n'
 'controlled\n'
 'increasing\n'
 'increase\n'
 'increases\n'
 'increased\n'
 'direct\n'
 'direction\n'
 'directions\n'
 'polic

In [28]:
print(keywords(fulltext, lemmatize=True))

('governments\n'
 'commercial banks\n'
 'macroeconomic\n'
 'budget\n'
 'economic\n'
 'generations\n'
 'receipt\n'
 'tax\n'
 'deficit\n'
 'example\n'
 'rates\n'
 'like\n'
 'fiscal\n'
 'borrower\n'
 'main credit\n'
 'certain\n'
 'public\n'
 'rbi\n'
 'deposited\n'
 'functioning\n'
 'periodic\n'
 'income\n'
 'security\n'
 'reduced\n'
 'financial\n'
 'money\n'
 'loan\n'
 'foreign\n'
 'controlled\n'
 'increased\n'
 'directions\n'
 'policy\n'
 'central\n'
 'sectors\n'
 'called\n'
 'expenditure\n'
 'margins\n'
 'reserves\n'
 'revenue\n'
 'total\n'
 'capital\n'
 'higher\n'
 'provident\n'
 'keynes\n'
 'type\n'
 'currencies\n'
 'investment\n'
 'india\n'
 'deflation\n'
 'inflation\n'
 'price\n'
 'crr\n'
 'legal\n'
 'sources\n'
 'lend\n'
 'making\n'
 'market\n'
 'words\n'
 'dif ferent\n'
 'schedule\n'
 'service\n'
 'small\n'
 'required\n'
 'creating capacity\n'
 'cash\n'
 'enterprises\n'
 'save\n'
 'objective\n'
 'productive\n'
 'growth\n'
 'liquid\n'
 'leads\n'
 'short\n'
 'measure\n'
 'burden\n'


In [29]:
print(keywords(fulltext, split=True))

['bank',
 'banking',
 'government',
 'governments',
 'commercial banks',
 'governement uses',
 'macroeconomics',
 'macroeconomic',
 'budget',
 'economics',
 'economic',
 'general',
 'generates',
 'generation',
 'generally',
 'generations',
 'receipts',
 'receipt',
 'taxes',
 'tax',
 'deficit',
 'examples',
 'example',
 'rate',
 'like',
 'fiscal',
 'borrowing',
 'borrowings',
 'borrow',
 'borrows',
 'borrowers',
 'borrower',
 'main credit',
 'certain',
 'public',
 'rbi',
 'deposits',
 'deposit',
 'depositing',
 'deposited',
 'function',
 'functions',
 'functioning',
 'period',
 'periodic',
 'income',
 'securities',
 'security',
 'reducing',
 'reduce',
 'reduces',
 'reduced',
 'financial',
 'money',
 'loans',
 'loan',
 'foreign',
 'lending rates',
 'controls',
 'control',
 'controller',
 'controlling',
 'controlled',
 'increasing',
 'increase',
 'increases',
 'increased',
 'direct',
 'direction',
 'directions',
 'policies',
 'policy',
 'central',
 'sector',
 'sectors',
 'called',
 'expen

In [30]:
print(keywords(fulltext, ratio=0.1))

('bank\n'
 'banking\n'
 'government\n'
 'governement\n'
 'governments\n'
 'commercial banks\n'
 'credit\n'
 'macroeconomics\n'
 'macroeconomic\n'
 'budget\n'
 'economics\n'
 'economic\n'
 'general\n'
 'generates\n'
 'generation\n'
 'generally\n'
 'generations\n'
 'receipts\n'
 'receipt\n'
 'taxes\n'
 'tax\n'
 'deficit\n'
 'examples\n'
 'example\n'
 'rate\n'
 'like\n'
 'fiscal\n'
 'borrowing\n'
 'borrowings\n'
 'borrow\n'
 'borrows\n'
 'borrowers\n'
 'borrower\n'
 'certain\n'
 'public\n'
 'rbi\n'
 'deposits\n'
 'deposit\n'
 'depositing\n'
 'deposited\n'
 'function\n'
 'functions\n'
 'functioning\n'
 'period\n'
 'periodic\n'
 'income\n'
 'securities\n'
 'security\n'
 'reducing\n'
 'reduce\n'
 'reduces\n'
 'reduced\n'
 'financial\n'
 'money\n'
 'loans\n'
 'loan\n'
 'foreign\n'
 'lending rates\n'
 'controls\n'
 'control\n'
 'controller\n'
 'controlling\n'
 'controlled\n'
 'increasing\n'
 'increase\n'
 'increases\n'
 'increased\n'
 'direct\n'
 'direction\n'
 'directions\n'
 'policies\n'
 'p