# Basic content summarizing

## Introduction of algorithm

1. Find the frequencies of main words in the content
2. Score the sentences using the frequencies of words inside it
3. Take the n highest scores in sentences

In [4]:
# Load the resource
import wikipedia

content = wikipedia.page('Artificial Intelligence').content
print(content)

Artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals. Leading AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. A quip in Tesler's Theorem says "AI is whatever hasn't been done yet." For instance, optical character recognition is frequently excluded from things considered to be AI, having become a routine technology. Modern machine capabilities generally

In [5]:
# Downloading required packages

import nltk
nltk.download('punkt')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /home/amiresm/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /home/amiresm/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [7]:
# Tokenizing sentences, words
from nltk import sent_tokenize, word_tokenize

sentences = sent_tokenize(content)
words = list()
for sent in sentences:
    words.extend(word_tokenize(sent))
print(words)

['Artificial', 'intelligence', '(', 'AI', ')', ',', 'sometimes', 'called', 'machine', 'intelligence', ',', 'is', 'intelligence', 'demonstrated', 'by', 'machines', ',', 'unlike', 'the', 'natural', 'intelligence', 'displayed', 'by', 'humans', 'and', 'animals', '.', 'Leading', 'AI', 'textbooks', 'define', 'the', 'field', 'as', 'the', 'study', 'of', '``', 'intelligent', 'agents', "''", ':', 'any', 'device', 'that', 'perceives', 'its', 'environment', 'and', 'takes', 'actions', 'that', 'maximize', 'its', 'chance', 'of', 'successfully', 'achieving', 'its', 'goals', '.', 'Colloquially', ',', 'the', 'term', '``', 'artificial', 'intelligence', "''", 'is', 'often', 'used', 'to', 'describe', 'machines', '(', 'or', 'computers', ')', 'that', 'mimic', '``', 'cognitive', "''", 'functions', 'that', 'humans', 'associate', 'with', 'the', 'human', 'mind', ',', 'such', 'as', '``', 'learning', "''", 'and', '``', 'problem', 'solving', "''", '.As', 'machines', 'become', 'increasingly', 'capable', ',', 'tasks'

In [8]:
#Finding frequencies of words
from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))
words_freq = dict()
for word in words:
    if word not in stop_words and word.isalpha():
        if word in words_freq.keys():
            words_freq[word] += 1
        else:
            words_freq[word] = 1
print(words_freq)

{'Artificial': 10, 'intelligence': 90, 'AI': 151, 'sometimes': 3, 'called': 4, 'machine': 33, 'demonstrated': 1, 'machines': 29, 'unlike': 5, 'natural': 8, 'displayed': 2, 'humans': 26, 'animals': 2, 'Leading': 2, 'textbooks': 2, 'define': 1, 'field': 17, 'study': 5, 'intelligent': 25, 'agents': 14, 'device': 2, 'perceives': 3, 'environment': 8, 'takes': 4, 'actions': 10, 'maximize': 5, 'chance': 4, 'successfully': 5, 'achieving': 3, 'goals': 14, 'Colloquially': 1, 'term': 2, 'artificial': 55, 'often': 13, 'used': 16, 'describe': 3, 'computers': 9, 'mimic': 4, 'cognitive': 7, 'functions': 2, 'associate': 1, 'human': 60, 'mind': 19, 'learning': 28, 'problem': 26, 'solving': 7, 'become': 7, 'increasingly': 2, 'capable': 5, 'tasks': 5, 'considered': 8, 'require': 5, 'removed': 1, 'definition': 3, 'phenomenon': 3, 'known': 8, 'effect': 4, 'A': 29, 'quip': 1, 'Tesler': 1, 'Theorem': 1, 'says': 1, 'whatever': 2, 'done': 1, 'yet': 1, 'For': 13, 'instance': 2, 'optical': 1, 'character': 2, 're

In [32]:
# Scoring sentences
total_weight = sum(words_freq.values())
sent_score = dict()

for sent in sentences:
    score = 0
    words = word_tokenize(sent)
    if len(words) < 50:
        for word in words:
            if word in words_freq:
                score += words_freq[word]/total_weight
    sent_score[sent] = score
print(sent_score)

{'Artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, unlike the natural intelligence displayed by humans and animals.': 0.10143999999999999, 'Leading AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals.': 0.04400000000000001, 'Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect.': 0, 'A quip in Tesler\'s Theorem says "AI is whatever hasn\'t been done yet."': 0.030080000000000003, 'For instance, optical character recognition is frequently excluded from things considered 

In [33]:
# sort scores

sorted_scores = {sent: sc for sent, sc in sorted(sent_score.items(), key=lambda i: i[1])}

In [34]:
sorted_scores

{'Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving".As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect.': 0,
 'Modern machine capabilities generally classified as AI include successfully understanding human speech, competing at the highest level in strategic game systems (such as chess and Go), autonomously operating cars, intelligent routing in content delivery networks, and military simulations.Artificial intelligence was founded as an academic discipline in 1955, and in the years since has experienced several waves of optimism, followed by disappointment and the loss of funding (known as an "AI winter"), followed by new approaches, success and renewed funding.': 0,
 'Sub-fields have also been based on social

In [35]:
' '.join(list(sorted_scores.keys())[-5:])

'In the early 1980s, AI research was revived by the commercial success of expert systems, a form of AI program that simulated the knowledge and analytical skills of human experts. == Regulation ==\n\nThe regulation of artificial intelligence is the development of public sector policies and laws for promoting and regulating artificial intelligence (AI); it is therefore related to the broader regulation of algorithms. Scientists from the Future of Life Institute, among others, described some short-term research goals to see how AI influences the economy, the laws and ethics that are involved with AI and how to minimize AI security risks. ==== Strong AI hypothesis ====\n\nThe philosophical position that John Searle has named "strong AI" states: "The appropriately programmed computer with the right inputs and outputs would thereby have a mind in exactly the same sense human beings have minds." Artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated