## Summarize wiki article using NLTK:

resource: https://stackabuse.com/text-summarization-with-nltk-in-python/

1. Convert paragraph to sentences <br>
2. Clean sentences --> Remove stopwords, special char, numbers, etc... <br>
3. Tokenize. <br>

In [27]:
from bs4 import BeautifulSoup
import nltk
import requests
import urllib.request
import re

url = 'https://en.wikipedia.org/wiki/Artificial_intelligence'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

paragraphs = soup.find_all('p')

article = ""

for paragraph in paragraphs:
    article += paragraph.text

In [24]:
# get rid of references using regex.
article = re.sub(r'\[[0-9]*\]', ' ', article) # remove references [1], [2],...etc.
article = re.sub(r'\s+', ' ', article) # remove white spaces to one space.
print(article)

 In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans. Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving". As machines become increasingly capable, tasks considered to require "intelligence" are often removed from the definition of AI, a phenomenon known as the AI effect. A quip in Tesler's Theorem says "AI is whatever hasn't been done yet." For instance, optical character recognition is frequently excluded from things considered to be AI, having become a routine technology. Modern machine capabilities generally classified as AI include successfully understanding human speech, competing at the highest level in strategic game systems (such as chess and Go), autonomously operating cars, in

In [26]:
# remove anything other then a-z,A-Z alphabet
formatted_article = re.sub('[^a-zA-Z]', ' ', article)
formatted_article = re.sub(r'\s+', ' ', formatted_article)
print(formatted_article)

 In computer science artificial intelligence AI sometimes called machine intelligence is intelligence demonstrated by machines in contrast to the natural intelligence displayed by humans Colloquially the term artificial intelligence is often used to describe machines or computers that mimic cognitive functions that humans associate with the human mind such as learning and problem solving As machines become increasingly capable tasks considered to require intelligence are often removed from the definition of AI a phenomenon known as the AI effect A quip in Tesler s Theorem says AI is whatever hasn t been done yet For instance optical character recognition is frequently excluded from things considered to be AI having become a routine technology Modern machine capabilities generally classified as AI include successfully understanding human speech competing at the highest level in strategic game systems such as chess and Go autonomously operating cars intelligent routing in content deliver

__NOTE:__ word_tokenize break into each word. <br>
sent_tokenize break paragrpah into each sentence.

In [39]:
from collections import defaultdict
# Tokenize
sentence_tokenized = nltk.sent_tokenize(article)

# Find Weighted frequency.
stopwords = nltk.corpus.stopwords.words('english')

word_frequencies = {}
for word in nltk.word_tokenize(formatted_article):
    if word not in stopwords:
        if word not in word_frequencies.keys():
            word_frequencies[word] = 1
        else:
            word_frequencies[word] += 1

# most occured.***Weighted***
max_freq = max(word_frequencies.values())
# divied all num_occurence by max_freq
for word in word_frequencies.keys():
    word_frequencies[word] = (word_frequencies[word]/max_freq)
print(word_frequencies)

{'In': 0.21787709497206703, 'computer': 0.15083798882681565, 'science': 0.0670391061452514, 'artificial': 0.33519553072625696, 'intelligence': 0.48044692737430167, 'AI': 1.0, 'sometimes': 0.0223463687150838, 'called': 0.0446927374301676, 'machine': 0.22905027932960895, 'demonstrated': 0.0111731843575419, 'machines': 0.13966480446927373, 'contrast': 0.027932960893854747, 'natural': 0.061452513966480445, 'displayed': 0.0111731843575419, 'humans': 0.12849162011173185, 'Colloquially': 0.00558659217877095, 'term': 0.0446927374301676, 'often': 0.0893854748603352, 'used': 0.16759776536312848, 'describe': 0.01675977653631285, 'computers': 0.0670391061452514, 'mimic': 0.0111731843575419, 'cognitive': 0.05027932960893855, 'functions': 0.027932960893854747, 'associate': 0.00558659217877095, 'human': 0.39106145251396646, 'mind': 0.08379888268156424, 'learning': 0.3128491620111732, 'problem': 0.1564245810055866, 'solving': 0.05027932960893855, 'As': 0.01675977653631285, 'become': 0.0279329608938547

???? REVIEW:

In [40]:
# calculating score for each sentence by adding weighted feq of words that 
# occur in particular sentence.
scores = {}
for sentence in sent_tokenized:
    # lower all character in sentence then tokenize by word
    for word in nltk.word_tokenize(sentence.lower()):
        if word in word_frequencies.keys():
            if len(sentence.split()) < 30: # senetence with less than 30 words
                if sentence not in scores.keys():
                    scores[sentence] = word_frequencies[word]
                else:
                    scores[sentence] += word_frequencies[word]

In [46]:
# summarize with scores dictionary
import heapq

# grab 10 sentences with greatest score
summarized = heapq.nlargest(7, scores, key=scores.get)
for sent in summarized:
    print(sent)
    print("----")
    
final_summary = " ".join(summarized)

 In computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans.
----
Neural networks can be applied to the problem of intelligent control (for robotics) or learning, using such techniques as Hebbian learning ("fire together, wire together"), GMDH or competitive learning.
----
Artificial intelligence can be classified into three different types of systems: analytical, human-inspired, and humanized artificial intelligence.
----
IBM has created its own artificial intelligence computer, the IBM Watson, which has beaten human intelligence (at some levels).
----
Musk also funds companies developing artificial intelligence such as Google DeepMind and Vicarious to "just keep an eye on what's going on with artificial intelligence.
----
Many of the problems in this article may also require general intelligence, if machines are to solve the problems as well as people d