In [1]:
!pip install wikipedia==1.4.0

Collecting wikipedia==1.4.0
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11695 sha256=828915086d77b27f014514f984600cf522dc1974769ecf2e8bda1907ab9f62c2
  Stored in directory: /root/.cache/pip/wheels/15/93/6d/5b2c68b8a64c7a7a04947b4ed6d89fb557dcc6bc27d1d7f3ba
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


#**Load Data**

In [9]:
import wikipedia
wiki = wikipedia.page('Artificial Intelligence')
text=wiki.content

In [15]:
import textwrap
textwrap.shorten(text, width=1000, placeholder="...")

'Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to natural intelligence displayed by animals including humans. Leading AI textbooks define the field as the study of "intelligent agents": any system that perceives its environment and takes actions that maximize its chance of achieving its goals. Some popular accounts use the term "artificial intelligence" to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving", however, this definition is rejected by major AI researchers.AI applications include advanced web search engines (e.g., Google), recommendation systems (used by YouTube, Amazon and Netflix), understanding human speech (such as Siri and Alexa), self-driving cars (e.g., Tesla), automated decision-making and competing at the highest level in strategic game systems (such as chess and Go). As machines become increasingly capable, tasks considered to require...'

# **Summarizing text using TF/IDF topic representation**

https://towardsdatascience.com/text-summarization-using-tf-idf-e64a0644ace3


In [17]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

Summarise top N important sentence in tf/idf matrix

In [64]:
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk import tokenize


def tfidf_summary(text, num_summary_sentence):
    summary_sentence = []

    ## tokenise sentence
    sentences = tokenize.sent_tokenize(text) 

    tfidfVectorizer = TfidfVectorizer()

    ## tf/idf matrix on sentence tokens
    words_tfidf = tfidfVectorizer.fit_transform(sentences)
    ##print("Shape of tf/idf matrix :")
    ##print(words_tfidf.shape) 

    # Sort the sentences in descending order by the sum of TF-IDF values
    sentence_sum = words_tfidf.sum(axis=1)
    important_sentences = np.argsort(sentence_sum, axis=0)[::-1]

    ## Summary of top N tf/idf sentence 
    for i in range(0, len(sentences)):
        if i in important_sentences[:num_summary_sentence]: ## num_summary_sentence :  Parameter to specify number of summary sentences required
            summary_sentence.append(sentences[i])

    return summary_sentence

In [65]:
tfidf_summary(text,2)

['==== Bad actors and weaponized AI ====\n\nAI provides a number of tools that are particularly useful for authoritarian governments: smart spyware, face recognition and voice recognition allow widespread surveillance; such surveillance allows machine learning to classify potential enemies of the state and can prevent them from hiding; recommendation systems can precisely target propaganda and misinformation for maximum effect; deepfakes aid in producing misinformation; advanced AI can make centralized decision making more competitive with liberal and decentralized systems such as markets.Terrorists, criminals and rogue states may use other forms of weaponized AI such as advanced digital warfare and lethal autonomous weapons.',
 'Williams, R. J.; Zipser, D. (1994), "Gradient-based learning algorithms for recurrent networks and their computational complexity", Back-propagation: Theory, Architectures and Applications, Hillsdale, NJ: Erlbaum\nHochreiter, Sepp; Schmidhuber, Jürgen (1997), 

#**Text Summarization Using SUMY** 
Sumy offers several algorithms and methods for summarization such as:
*   Luhn – heurestic method
*   Latent Semantic Analysis
*   LexRank – Unsupervised approach inspired by algorithms PageRank and HITS
*   TextRank

In [None]:
!pip install sumy

In [69]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lex_rank import LexRankSummarizer

##**Summarizing text using Latent Semantic Analysis (LSA)**

https://iq.opengenus.org/latent-semantic-analysis-for-text-summarization/

https://towardsdatascience.com/document-summarization-using-latent-semantic-indexing-b747ef2d2af6


In [70]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
from sumy.summarizers.lsa import LsaSummarizer

def lsa_summary(text, num_summary_sentence):
    summary_sentence = []
    LANGUAGE = "english"
    stemmer = Stemmer(LANGUAGE)
    parser = PlaintextParser.from_string(text, Tokenizer(LANGUAGE))
    summarizer = LsaSummarizer(stemmer)
    summarizer.stop_words = get_stop_words(LANGUAGE)
    for sentence in summarizer(parser.document, num_summary_sentence):
        summary_sentence.append(str(sentence))
    return summary_sentence

In [71]:
lsa_summary(text,2)

['Hans Moravec and Marvin Minsky argue that work in different individual domains can be incorporated into an advanced multi-agent system or cognitive architecture with general intelligence.Pedro Domingos hopes that there is a conceptually straightforward, but mathematically difficult, "master algorithm" that could lead to AGI.',
 'Prominent tech titans including Peter Thiel (Amazon Web Services) and Musk have committed more than $1 billion to nonprofit companies that champion responsible AI development, such as OpenAI and the Future of Life Institute.Mark Zuckerberg (CEO, Facebook) has said that artificial intelligence is helpful in its current form and will continue to assist humans.']

##**Summarizing text using TextRankSummarizer**

https://medium.com/data-science-in-your-pocket/text-summarization-using-textrank-in-nlp-4bce52c5b390


In [75]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words
from sumy.summarizers.text_rank import TextRankSummarizer

def textrank_summary(text, num_summary_sentence):
    summary_sentence = []
    LANGUAGE = "english"
    stemmer = Stemmer(LANGUAGE)
    parser = PlaintextParser.from_string(text, Tokenizer(LANGUAGE))
    summarizer = TextRankSummarizer(stemmer)
    summarizer.stop_words = get_stop_words(LANGUAGE)
    for sentence in summarizer(parser.document, num_summary_sentence):
        summary_sentence.append(str(sentence))
    return summary_sentence

In [76]:
textrank_summary(text,2)

['Some popular accounts use the term "artificial intelligence" to describe machines that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving", however, this definition is rejected by major AI researchers.AI applications include advanced web search engines (e.g., Google), recommendation systems (used by YouTube, Amazon and Netflix), understanding human speech (such as Siri and Alexa), self-driving cars (e.g., Tesla), automated decision-making and competing at the highest level in strategic game systems (such as chess and Go).',
 "AI researchers are divided as to whether to pursue the goals of artificial general intelligence and superintelligence (general AI) directly, or to solve as many specific problems as possible (narrow AI) in hopes these solutions will lead indirectly to the field's long-term goals General intelligence is difficult to define and difficult to measure, and modern AI has had more verifiable successes by focus

##**Summarizing text using LexRank**

* unsupervised approach to text summarization based on graph-based centrality scoring of sentences.
* The main idea is that sentences “recommend” other similar sentences to the reader. Thus, if one sentence is very similar to many others, it will likely be a sentence of great importanc

In [89]:
# Using LexRank

def plaintext_summary(text, num_summary_sentence):
  summary_sentence = []
  summarizer = LexRankSummarizer()
  parser = PlaintextParser.from_string(text,Tokenizer("english"))

  #Summarize the document with num_summary_sentence sentences
  summary = summarizer(parser.document, num_summary_sentence)

  for sentence in summary:
    summary_sentence.append(str(sentence))

  return summary_sentence

In [90]:
plaintext_summary(text,2)

['By the middle of the 1960s, research in the U.S. was heavily funded by the Department of Defense and laboratories had been established around the world.Researchers in the 1960s and the 1970s were convinced that symbolic approaches would eventually succeed in creating a machine with artificial general intelligence and considered this the goal of their field.Herbert Simon predicted, "machines will be capable, within twenty years, of doing any work a man can do".Marvin Minsky agreed, writing, "within a generation ... the problem of creating \'artificial intelligence\' will substantially be solved".They failed to recognize the difficulty of some of the remaining tasks.',
 'Computationalism is the position in the philosophy of mind that the human mind is an information processing system and that thinking is a form of computing.']

##**Summarizing text using LUHN**
* Based on frequency of most important words

In [91]:
from sumy.summarizers.luhn import LuhnSummarizer

def luhntext_summary(text, num_summary_sentence):
  summary_sentence=[]
  summarizer_luhn = LuhnSummarizer()
  summary_1 =summarizer_luhn(parser.document,num_summary_sentence)

  for sentence in summary:
    summary_sentence.append(str(sentence))

  return summary_sentence

In [92]:
luhntext_summary(text,2)

['By the middle of the 1960s, research in the U.S. was heavily funded by the Department of Defense and laboratories had been established around the world.Researchers in the 1960s and the 1970s were convinced that symbolic approaches would eventually succeed in creating a machine with artificial general intelligence and considered this the goal of their field.Herbert Simon predicted, "machines will be capable, within twenty years, of doing any work a man can do".Marvin Minsky agreed, writing, "within a generation ... the problem of creating \'artificial intelligence\' will substantially be solved".They failed to recognize the difficulty of some of the remaining tasks.',
 'Computationalism is the position in the philosophy of mind that the human mind is an information processing system and that thinking is a form of computing.']

#**Measuring the performance of Text Summarization methods**

In [97]:
!pip install rouge-score

Collecting rouge-score
  Downloading rouge_score-0.0.4-py2.py3-none-any.whl (22 kB)
Installing collected packages: rouge-score
Successfully installed rouge-score-0.0.4


In [98]:
from rouge_score import rouge_scorer
def print_rouge_score(rouge_score):
    for k,v in rouge_score.items():
        print (k, 'Precision:', "{:.2f}".format(v.precision), 'Recall:', "{:.2f}".format(v.recall), 'fmeasure:', "{:.2f}".format(v.fmeasure))

In [94]:
txt_gold_std= "Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to natural intelligence displayed by animals including humans. \
Leading AI textbooks define the field as the study of 'intelligent agents': any system that perceives its environment and takes actions \
that maximize its chance of achieving its goals.[a] Some popular accounts use the term 'artificial intelligence' to describe machines \
that mimic 'cognitive' functions that humans associate with the human mind, such as 'learning' and 'problem solving', \
however, this definition is rejected by major AI researchers.[b] AI applications include advanced web search engines (e.g., Google), \
recommendation systems (used by YouTube, Amazon and Netflix), understanding human speech (such as Siri and Alexa), self-driving cars (e.g., Tesla), \
automated decision-making and competing at the highest level in strategic game systems"

In [104]:
num_summary_sentence = 2 
gold_standard = txt_gold_std
summary = ""

print("\ntextrank_summary :")
summary = ''.join(textrank_summary(text, num_summary_sentence))
scorer = rouge_scorer.RougeScorer(['rouge1'], use_stemmer=True)
scores = scorer.score(gold_standard, summary)
print_rouge_score(scores)

print("\nluhntext_summary :")
summary = ''.join(luhntext_summary(text, num_summary_sentence))
scorer = rouge_scorer.RougeScorer(['rouge1'], use_stemmer=True)
scores = scorer.score(gold_standard, summary)
print_rouge_score(scores)

print("\nlsa_summary :")
summary = ''.join(lsa_summary(text, num_summary_sentence))
scorer = rouge_scorer.RougeScorer(['rouge1'], use_stemmer=True)
scores = scorer.score(gold_standard, summary)
print_rouge_score(scores)

print("\nplaintext_summary :")
summary = ''.join(plaintext_summary(text, num_summary_sentence))
scorer = rouge_scorer.RougeScorer(['rouge1'], use_stemmer=True)
scores = scorer.score(gold_standard, summary)
print_rouge_score(scores)

print("\ntfidf_summary :")
summary = ''.join(tfidf_summary(text, num_summary_sentence))
scorer = rouge_scorer.RougeScorer(['rouge1'], use_stemmer=True)
scores = scorer.score(gold_standard, summary)
print_rouge_score(scores)


textrank_summary :
rouge1 Precision: 0.63 Recall: 0.76 fmeasure: 0.69

luhntext_summary :
rouge1 Precision: 0.63 Recall: 0.76 fmeasure: 0.69

lsa_summary :
rouge1 Precision: 0.33 Recall: 0.25 fmeasure: 0.28

plaintext_summary :
rouge1 Precision: 0.30 Recall: 0.30 fmeasure: 0.30

tfidf_summary :
rouge1 Precision: 0.19 Recall: 0.28 fmeasure: 0.22
