In [1]:
import spacy
from spacy.lang.en.stop_words import STOP_WORDS
from string import punctuation

In [2]:
nlp=spacy.load('en_core_web_sm')

#Text Cleaning

In [3]:
text = """ In recent years, Artificial Intelligence (AI) has transformed the way organizations operate across the globe. Companies such as Google, Microsoft, and Amazon have invested billions of dollars into research and development to improve intelligent systems that can learn from data and make decisions with minimal human intervention. These advancements have led to the rapid growth of fields like Machine Learning, Natural Language Processing, and Computer Vision.

In India, the adoption of AI has accelerated significantly, particularly in sectors such as healthcare, education, and finance. For example, hospitals in Bangalore and Delhi now use AI-powered diagnostic tools to detect diseases like cancer at an early stage. Similarly, banks such as State Bank of India (SBI) and HDFC Bank have deployed chatbots to assist customers with account-related queries, improving both efficiency and user satisfaction.

Education has also benefited from AI-driven platforms. Universities like IIT Bombay and IIT Delhi are incorporating data science and AI courses into their curricula to prepare students for future job markets. Online learning platforms such as Coursera, edX, and Udemy provide personalized learning experiences by analyzing user behavior and recommending relevant courses. As a result, students from remote areas of Rajasthan and Uttar Pradesh now have access to high-quality educational resources.

Despite these benefits, the rise of AI has raised several ethical and social concerns. Experts from organizations like the World Economic Forum (WEF) and UNESCO have warned about issues related to data privacy, algorithmic bias, and job displacement. In 2023, a report published by UNESCO emphasized the need for transparent and fair AI systems to ensure that technology benefits society as a whole.

Governments around the world are responding by introducing regulations and policies. The European Union proposed the AI Act, aiming to regulate high-risk AI applications, while the Government of India announced initiatives such as Digital India and National AI Strategy to promote responsible AI development. Policymakers believe that a balanced approach can foster innovation while protecting citizens' rights.

Looking ahead, the future of AI appears promising yet challenging. Researchers predict that AI will continue to evolve, enabling smarter cities, autonomous vehicles, and advanced healthcare systems. However, collaboration between governments, private organizations, and academic institutions will be essential to address ethical challenges and ensure sustainable growth. Ultimately, AI is not just a technological revolution but a societal transformation that will shape the way humans live and work in the decades to come. """


In [4]:
stopwords=list(STOP_WORDS)

In [5]:
doc=nlp(text) #Applied tokenization

In [6]:
tokens=[token.text for token in doc]

In [7]:
print(tokens)

[' ', 'In', 'recent', 'years', ',', 'Artificial', 'Intelligence', '(', 'AI', ')', 'has', 'transformed', 'the', 'way', 'organizations', 'operate', 'across', 'the', 'globe', '.', 'Companies', 'such', 'as', 'Google', ',', 'Microsoft', ',', 'and', 'Amazon', 'have', 'invested', 'billions', 'of', 'dollars', 'into', 'research', 'and', 'development', 'to', 'improve', 'intelligent', 'systems', 'that', 'can', 'learn', 'from', 'data', 'and', 'make', 'decisions', 'with', 'minimal', 'human', 'intervention', '.', 'These', 'advancements', 'have', 'led', 'to', 'the', 'rapid', 'growth', 'of', 'fields', 'like', 'Machine', 'Learning', ',', 'Natural', 'Language', 'Processing', ',', 'and', 'Computer', 'Vision', '.', '\n\n', 'In', 'India', ',', 'the', 'adoption', 'of', 'AI', 'has', 'accelerated', 'significantly', ',', 'particularly', 'in', 'sectors', 'such', 'as', 'healthcare', ',', 'education', ',', 'and', 'finance', '.', 'For', 'example', ',', 'hospitals', 'in', 'Bangalore', 'and', 'Delhi', 'now', 'use', 

In [8]:
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [9]:
punctuation=punctuation + '\n'

In [10]:
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [11]:
word_frequencies={}
for word in doc:
  if word.text.lower() not in stopwords:
    if word.text.lower() not in punctuation:
      if word.text not in word_frequencies.keys():
        word_frequencies[word.text]=1
      else:
        word_frequencies[word.text]+=1

In [12]:
word_frequencies

{' ': 1,
 'recent': 1,
 'years': 1,
 'Artificial': 1,
 'Intelligence': 1,
 'AI': 14,
 'transformed': 1,
 'way': 2,
 'organizations': 3,
 'operate': 1,
 'globe': 1,
 'Companies': 1,
 'Google': 1,
 'Microsoft': 1,
 'Amazon': 1,
 'invested': 1,
 'billions': 1,
 'dollars': 1,
 'research': 1,
 'development': 2,
 'improve': 1,
 'intelligent': 1,
 'systems': 3,
 'learn': 1,
 'data': 3,
 'decisions': 1,
 'minimal': 1,
 'human': 1,
 'intervention': 1,
 'advancements': 1,
 'led': 1,
 'rapid': 1,
 'growth': 2,
 'fields': 1,
 'like': 4,
 'Machine': 1,
 'Learning': 1,
 'Natural': 1,
 'Language': 1,
 'Processing': 1,
 'Computer': 1,
 'Vision': 1,
 '\n\n': 5,
 'India': 4,
 'adoption': 1,
 'accelerated': 1,
 'significantly': 1,
 'particularly': 1,
 'sectors': 1,
 'healthcare': 2,
 'education': 1,
 'finance': 1,
 'example': 1,
 'hospitals': 1,
 'Bangalore': 1,
 'Delhi': 2,
 'use': 1,
 'powered': 1,
 'diagnostic': 1,
 'tools': 1,
 'detect': 1,
 'diseases': 1,
 'cancer': 1,
 'early': 1,
 'stage': 1,
 'Si

In [13]:
max_frequency=max(word_frequencies.values())

In [14]:
max_frequency

14

In [15]:
for word in word_frequencies.keys():
  word_frequencies[word]=word_frequencies[word]/max_frequency

In [16]:
print(word_frequencies)

{' ': 0.07142857142857142, 'recent': 0.07142857142857142, 'years': 0.07142857142857142, 'Artificial': 0.07142857142857142, 'Intelligence': 0.07142857142857142, 'AI': 1.0, 'transformed': 0.07142857142857142, 'way': 0.14285714285714285, 'organizations': 0.21428571428571427, 'operate': 0.07142857142857142, 'globe': 0.07142857142857142, 'Companies': 0.07142857142857142, 'Google': 0.07142857142857142, 'Microsoft': 0.07142857142857142, 'Amazon': 0.07142857142857142, 'invested': 0.07142857142857142, 'billions': 0.07142857142857142, 'dollars': 0.07142857142857142, 'research': 0.07142857142857142, 'development': 0.14285714285714285, 'improve': 0.07142857142857142, 'intelligent': 0.07142857142857142, 'systems': 0.21428571428571427, 'learn': 0.07142857142857142, 'data': 0.21428571428571427, 'decisions': 0.07142857142857142, 'minimal': 0.07142857142857142, 'human': 0.07142857142857142, 'intervention': 0.07142857142857142, 'advancements': 0.07142857142857142, 'led': 0.07142857142857142, 'rapid': 0.

#Sentence Tokenization

In [17]:
sentence_tokens=[sent for sent in doc.sents]
print(sentence_tokens)

[ In recent years, Artificial Intelligence (AI) has transformed the way organizations operate across the globe., Companies such as Google, Microsoft, and Amazon have invested billions of dollars into research and development to improve intelligent systems that can learn from data and make decisions with minimal human intervention., These advancements have led to the rapid growth of fields like Machine Learning, Natural Language Processing, and Computer Vision.

, In India, the adoption of AI has accelerated significantly, particularly in sectors such as healthcare, education, and finance., For example, hospitals in Bangalore and Delhi now use AI-powered diagnostic tools to detect diseases like cancer at an early stage., Similarly, banks such as State Bank of India (SBI) and HDFC Bank have deployed chatbots to assist customers with account-related queries, improving both efficiency and user satisfaction.

, Education has also benefited from AI-driven platforms., Universities like IIT Bo

In [18]:
len(sentence_tokens)

20

In [19]:
sentence_score={}
for sent in sentence_tokens:
  for word in sent:
    if word.text.lower() in word_frequencies.keys():
      if sent not in sentence_score.keys():
        sentence_score[sent]=word_frequencies[word.text.lower()]
      else:
        sentence_score[sent]+=word_frequencies[word.text.lower()]


In [20]:
print(sentence_score)

{ In recent years, Artificial Intelligence (AI) has transformed the way organizations operate across the globe.: 0.7857142857142856, Companies such as Google, Microsoft, and Amazon have invested billions of dollars into research and development to improve intelligent systems that can learn from data and make decisions with minimal human intervention.: 1.357142857142857, These advancements have led to the rapid growth of fields like Machine Learning, Natural Language Processing, and Computer Vision.

: 1.2142857142857142, In India, the adoption of AI has accelerated significantly, particularly in sectors such as healthcare, education, and finance.: 0.6428571428571428, For example, hospitals in Bangalore and Delhi now use AI-powered diagnostic tools to detect diseases like cancer at an early stage.: 1.0714285714285712, Similarly, banks such as State Bank of India (SBI) and HDFC Bank have deployed chatbots to assist customers with account-related queries, improving both efficiency and use

#Select 30% sentences


In [21]:
from heapq import nlargest

In [22]:
select_length=int(len(sentence_tokens))*0.3

In [23]:
print(select_length)

6.0


#Getting the summary

In [24]:
summary=nlargest(n=int(select_length),iterable=sentence_score,key=sentence_score.get)

In [25]:
print(summary)

[In 2023, a report published by UNESCO emphasized the need for transparent and fair AI systems to ensure that technology benefits society as a whole.

, Experts from organizations like the World Economic Forum (WEF) and UNESCO have warned about issues related to data privacy, algorithmic bias, and job displacement., Universities like IIT Bombay and IIT Delhi are incorporating data science and AI courses into their curricula to prepare students for future job markets., Companies such as Google, Microsoft, and Amazon have invested billions of dollars into research and development to improve intelligent systems that can learn from data and make decisions with minimal human intervention., Similarly, banks such as State Bank of India (SBI) and HDFC Bank have deployed chatbots to assist customers with account-related queries, improving both efficiency and user satisfaction.

, However, collaboration between governments, private organizations, and academic institutions will be essential to ad

In [26]:
#Combine these sentences together
final_summary=[word.text for word in summary]

In [27]:
final_summary

['In 2023, a report published by UNESCO emphasized the need for transparent and fair AI systems to ensure that technology benefits society as a whole.\n\n',
 'Experts from organizations like the World Economic Forum (WEF) and UNESCO have warned about issues related to data privacy, algorithmic bias, and job displacement.',
 'Universities like IIT Bombay and IIT Delhi are incorporating data science and AI courses into their curricula to prepare students for future job markets.',
 'Companies such as Google, Microsoft, and Amazon have invested billions of dollars into research and development to improve intelligent systems that can learn from data and make decisions with minimal human intervention.',
 'Similarly, banks such as State Bank of India (SBI) and HDFC Bank have deployed chatbots to assist customers with account-related queries, improving both efficiency and user satisfaction.\n\n',
 'However, collaboration between governments, private organizations, and academic institutions wil

In [28]:
summary=' '.join(final_summary)

In [29]:
print(summary)

In 2023, a report published by UNESCO emphasized the need for transparent and fair AI systems to ensure that technology benefits society as a whole.

 Experts from organizations like the World Economic Forum (WEF) and UNESCO have warned about issues related to data privacy, algorithmic bias, and job displacement. Universities like IIT Bombay and IIT Delhi are incorporating data science and AI courses into their curricula to prepare students for future job markets. Companies such as Google, Microsoft, and Amazon have invested billions of dollars into research and development to improve intelligent systems that can learn from data and make decisions with minimal human intervention. Similarly, banks such as State Bank of India (SBI) and HDFC Bank have deployed chatbots to assist customers with account-related queries, improving both efficiency and user satisfaction.

 However, collaboration between governments, private organizations, and academic institutions will be essential to address 

In [30]:
#Comaparing
len(text) #length of original text

2748

In [31]:
len(summary) #length of summary

1049

# Text Summarization on Pdf

In [32]:
pip install PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/232.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [33]:
import PyPDF2

In [36]:
f=open('/content/Long_Text_for_NLP.pdf','rb')
pdf_text=[]
pdf_reader=PyPDF2.PdfReader(f)
for p in range(len(pdf_reader.pages)):
    page=pdf_reader.pages[p]
    pdf_text.append(page.extract_text())
f.close()

In [37]:
f=pdf_text
f

['In recent years, Artificial Intelligence (AI) has transformed the way organizations operate across the\nglobe. Companies such as Google, Microsoft, and Amazon have invested billions of dollars into\nresearch and development to improve intelligent systems that can learn from data and make\ndecisions with minimal human intervention. These advancements have led to the rapid growth of\nfields like Machine Learning, Natural Language Processing, and Computer Vision.\nIn India, the adoption of AI has accelerated significantly, particularly in sectors such as healthcare,\neducation, and finance. For example, hospitals in Bangalore and Delhi now use AI-powered\ndiagnostic tools to detect diseases like cancer at an early stage. Similarly, banks such as State\nBank of India (SBI) and HDFC Bank have deployed chatbots to assist customers with\naccount-related queries, improving both efficiency and user satisfaction.\nEducation has also benefited from AI-driven platforms. Universities like IIT Bom

#Text Cleaning

In [38]:
stopwords=list(STOP_WORDS)

In [39]:
nlp=spacy.load('en_core_web_sm')

In [41]:
doc=nlp(' '.join(f)) #Applied tokenization

In [40]:
tokens=[token.text for token in doc]

In [42]:
print(tokens)

[' ', 'In', 'recent', 'years', ',', 'Artificial', 'Intelligence', '(', 'AI', ')', 'has', 'transformed', 'the', 'way', 'organizations', 'operate', 'across', 'the', 'globe', '.', 'Companies', 'such', 'as', 'Google', ',', 'Microsoft', ',', 'and', 'Amazon', 'have', 'invested', 'billions', 'of', 'dollars', 'into', 'research', 'and', 'development', 'to', 'improve', 'intelligent', 'systems', 'that', 'can', 'learn', 'from', 'data', 'and', 'make', 'decisions', 'with', 'minimal', 'human', 'intervention', '.', 'These', 'advancements', 'have', 'led', 'to', 'the', 'rapid', 'growth', 'of', 'fields', 'like', 'Machine', 'Learning', ',', 'Natural', 'Language', 'Processing', ',', 'and', 'Computer', 'Vision', '.', '\n\n', 'In', 'India', ',', 'the', 'adoption', 'of', 'AI', 'has', 'accelerated', 'significantly', ',', 'particularly', 'in', 'sectors', 'such', 'as', 'healthcare', ',', 'education', ',', 'and', 'finance', '.', 'For', 'example', ',', 'hospitals', 'in', 'Bangalore', 'and', 'Delhi', 'now', 'use', 

In [43]:
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'

In [44]:
punctuation=punctuation + '\n'

In [45]:
punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n\n'

In [46]:
word_frequencies={}
for word in doc:
  if word.text.lower() not in stopwords:
    if word.text.lower() not in punctuation:
      if word.text not in word_frequencies.keys():
        word_frequencies[word.text]=1
      else:
        word_frequencies[word.text]+=1

In [47]:
word_frequencies

{'recent': 1,
 'years': 1,
 'Artificial': 1,
 'Intelligence': 1,
 'AI': 13,
 'transformed': 1,
 'way': 1,
 'organizations': 3,
 'operate': 1,
 'globe': 1,
 'Companies': 1,
 'Google': 1,
 'Microsoft': 1,
 'Amazon': 1,
 'invested': 1,
 'billions': 1,
 'dollars': 1,
 'research': 1,
 'development': 2,
 'improve': 1,
 'intelligent': 1,
 'systems': 3,
 'learn': 1,
 'data': 3,
 'decisions': 1,
 'minimal': 1,
 'human': 1,
 'intervention': 1,
 'advancements': 1,
 'led': 1,
 'rapid': 1,
 'growth': 2,
 'fields': 1,
 'like': 4,
 'Machine': 1,
 'Learning': 1,
 'Natural': 1,
 'Language': 1,
 'Processing': 1,
 'Computer': 1,
 'Vision': 1,
 'India': 4,
 'adoption': 1,
 'accelerated': 1,
 'significantly': 1,
 'particularly': 1,
 'sectors': 1,
 'healthcare': 2,
 'education': 1,
 'finance': 1,
 'example': 1,
 'hospitals': 1,
 'Bangalore': 1,
 'Delhi': 2,
 'use': 1,
 'powered': 1,
 'diagnostic': 1,
 'tools': 1,
 'detect': 1,
 'diseases': 1,
 'cancer': 1,
 'early': 1,
 'stage': 1,
 'Similarly': 1,
 'banks'

In [48]:
max_frequency=max(word_frequencies.values())

In [49]:
max_frequency

13

In [50]:
for word in word_frequencies.keys():
  word_frequencies[word]=word_frequencies[word]/max_frequency

In [51]:
print(word_frequencies)

{'recent': 0.07692307692307693, 'years': 0.07692307692307693, 'Artificial': 0.07692307692307693, 'Intelligence': 0.07692307692307693, 'AI': 1.0, 'transformed': 0.07692307692307693, 'way': 0.07692307692307693, 'organizations': 0.23076923076923078, 'operate': 0.07692307692307693, 'globe': 0.07692307692307693, 'Companies': 0.07692307692307693, 'Google': 0.07692307692307693, 'Microsoft': 0.07692307692307693, 'Amazon': 0.07692307692307693, 'invested': 0.07692307692307693, 'billions': 0.07692307692307693, 'dollars': 0.07692307692307693, 'research': 0.07692307692307693, 'development': 0.15384615384615385, 'improve': 0.07692307692307693, 'intelligent': 0.07692307692307693, 'systems': 0.23076923076923078, 'learn': 0.07692307692307693, 'data': 0.23076923076923078, 'decisions': 0.07692307692307693, 'minimal': 0.07692307692307693, 'human': 0.07692307692307693, 'intervention': 0.07692307692307693, 'advancements': 0.07692307692307693, 'led': 0.07692307692307693, 'rapid': 0.07692307692307693, 'growth

#Sentence Tokenization

In [52]:
sentence_tokens=[sent for sent in doc.sents]
print(sentence_tokens)

[In recent years, Artificial Intelligence (AI) has transformed the way organizations operate across the
globe., Companies such as Google, Microsoft, and Amazon have invested billions of dollars into
research and development to improve intelligent systems that can learn from data and make
decisions with minimal human intervention., These advancements have led to the rapid growth of
fields like Machine Learning, Natural Language Processing, and Computer Vision.
, In India, the adoption of AI has accelerated significantly, particularly in sectors such as healthcare,
education, and finance., For example, hospitals in Bangalore and Delhi now use AI-powered
diagnostic tools to detect diseases like cancer at an early stage., Similarly, banks such as State
Bank of India (SBI) and HDFC Bank have deployed chatbots to assist customers with
account-related queries, improving both efficiency and user satisfaction.
, Education has also benefited from AI-driven platforms., Universities like IIT Bomba

In [53]:
len(sentence_tokens)

18

In [54]:
sentence_score={}
for sent in sentence_tokens:
  for word in sent:
    if word.text.lower() in word_frequencies.keys():
      if sent not in sentence_score.keys():
        sentence_score[sent]=word_frequencies[word.text.lower()]
      else:
        sentence_score[sent]+=word_frequencies[word.text.lower()]

In [55]:
print(sentence_score)

{In recent years, Artificial Intelligence (AI) has transformed the way organizations operate across the
globe.: 0.6923076923076923, Companies such as Google, Microsoft, and Amazon have invested billions of dollars into
research and development to improve intelligent systems that can learn from data and make
decisions with minimal human intervention.: 1.4615384615384615, These advancements have led to the rapid growth of
fields like Machine Learning, Natural Language Processing, and Computer Vision.
: 0.9230769230769231, In India, the adoption of AI has accelerated significantly, particularly in sectors such as healthcare,
education, and finance.: 0.6923076923076923, For example, hospitals in Bangalore and Delhi now use AI-powered
diagnostic tools to detect diseases like cancer at an early stage.: 1.1538461538461537, Similarly, banks such as State
Bank of India (SBI) and HDFC Bank have deployed chatbots to assist customers with
account-related queries, improving both efficiency and user

#Select 30% sentence

In [56]:
from heapq import nlargest

In [57]:
select_length=int(len(sentence_tokens))*0.3

In [58]:
print(select_length)

5.3999999999999995


#Getting the summary

In [61]:
summary=nlargest(n=int(select_length),iterable=sentence_score,key=sentence_score.get)

In [60]:
summary

[Experts from
 organizations like the World Economic Forum (WEF) and UNESCO have warned about issues
 related to data privacy, algorithmic bias, and job displacement.,
 Universities like IIT Bombay and IIT Delhi
 are incorporating data science and AI courses into their curricula to prepare students for future job
 markets.,
 Companies such as Google, Microsoft, and Amazon have invested billions of dollars into
 research and development to improve intelligent systems that can learn from data and make
 decisions with minimal human intervention.,
 However, collaboration between governments, private organizations, and academic
 institutions will be essential to address ethical challenges and ensure sustainable growth.,
 Online learning platforms such as Coursera, edX, and Udemy provide personalized
 learning experiences by analyzing user behavior and recommending relevant courses.]

In [62]:
#Combine these sentences together
final_summary=[word.text for word in summary]

In [63]:
final_summary

['Experts from\norganizations like the World Economic Forum (WEF) and UNESCO have warned about issues\nrelated to data privacy, algorithmic bias, and job displacement.',
 'Universities like IIT Bombay and IIT Delhi\nare incorporating data science and AI courses into their curricula to prepare students for future job\nmarkets.',
 'Companies such as Google, Microsoft, and Amazon have invested billions of dollars into\nresearch and development to improve intelligent systems that can learn from data and make\ndecisions with minimal human intervention.',
 'However, collaboration between governments, private organizations, and academic\ninstitutions will be essential to address ethical challenges and ensure sustainable growth.\n',
 'Online learning platforms such as Coursera, edX, and Udemy provide personalized\nlearning experiences by analyzing user behavior and recommending relevant courses.']

In [64]:
summary=' '.join(final_summary)

In [65]:
print(summary)

Experts from
organizations like the World Economic Forum (WEF) and UNESCO have warned about issues
related to data privacy, algorithmic bias, and job displacement. Universities like IIT Bombay and IIT Delhi
are incorporating data science and AI courses into their curricula to prepare students for future job
markets. Companies such as Google, Microsoft, and Amazon have invested billions of dollars into
research and development to improve intelligent systems that can learn from data and make
decisions with minimal human intervention. However, collaboration between governments, private organizations, and academic
institutions will be essential to address ethical challenges and ensure sustainable growth.
 Online learning platforms such as Coursera, edX, and Udemy provide personalized
learning experiences by analyzing user behavior and recommending relevant courses.
