In [66]:
import spacy
from nltk.corpus import stopwords
from string import punctuation
import heapq

In [67]:
nlp=spacy.load("en_core_web_md")

In [90]:
text="""Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.

IBM has a rich history with machine learning. One of its own, Arthur Samuel, is credited for coining the term, “machine learning” with his research (PDF, 481 KB) (link resides outside IBM) around the game of checkers. Robert Nealey, the self-proclaimed checkers master, played the game on an IBM 7094 computer in 1962, and he lost to the computer. Compared to what can be done today, this feat seems trivial, but it’s considered a major milestone in the field of artificial intelligence.

Over the last couple of decades, the technological advances in storage and processing power have enabled some innovative products based on machine learning, such as Netflix’s recommendation engine and self-driving cars.

Machine learning is an important component of the growing field of data science. Through the use of statistical methods, algorithms are trained to make classifications or predictions, and to uncover key insights in data mining projects. These insights subsequently drive decision making within applications and businesses, ideally impacting key growth metrics. As big data continues to expand and grow, the market demand for data scientists will increase. They will be required to help identify the most relevant business questions and the data to answer them.

Machine learning algorithms are typically created using frameworks that accelerate solution development, such as TensorFlow and PyTorch.

Since deep learning and machine learning tend to be used interchangeably, it’s worth noting the nuances between the two. Machine learning, deep learning, and neural networks are all sub-fields of artificial intelligence. However, neural networks is actually a sub-field of machine learning, and deep learning is a sub-field of neural networks.

The way in which deep learning and machine learning differ is in how each algorithm learns. "Deep" machine learning can use labeled datasets, also known as supervised learning, to inform its algorithm, but it doesn’t necessarily require a labeled dataset. Deep learning can ingest unstructured data in its raw form (e.g., text or images), and it can automatically determine the set of features which distinguish different categories of data from one another. This eliminates some of the human intervention required and enables the use of larger data sets. You can think of deep learning as "scalable machine learning" as Lex Fridman notes in this MIT lecture (01:08:05) (link resides outside IBM).

Classical, or "non-deep", machine learning is more dependent on human intervention to learn. Human experts determine the set of features to understand the differences between data inputs, usually requiring more structured data to learn.

Neural networks, or artificial neural networks (ANNs), are comprised of node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold. If the output of any individual node is above the specified threshold value, that node is activated, sending data to the next layer of the network. Otherwise, no data is passed along to the next layer of the network by that node. The “deep” in deep learning is just referring to the number of layers in a neural network. A neural network that consists of more than three layers—which would be inclusive of the input and the output—can be considered a deep learning algorithm or a deep neural network. A neural network that only has three layers is just a basic neural network.

Deep learning and neural networks are credited with accelerating progress in areas such as computer vision, natural language processing, and speech recognition.

See the blog post “AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference?” for a closer look at how the different concepts relate."""

##### Removing Whitespaces and Newlines

In [91]:
def white_spaces(data):
    import re 
    res=re.sub(r'\s+'," ",data)
    return res

text=white_spaces(text)

In [92]:
doc=nlp(text)
stopwords_list=stopwords.words('english')

In [93]:
words=[word.text.lower() for word in doc if (word.text.lower() not in stopwords_list) and (word.text.lower() not in punctuation)]
print(len(words))
words

406


['machine',
 'learning',
 'branch',
 'artificial',
 'intelligence',
 'ai',
 'computer',
 'science',
 'focuses',
 'use',
 'data',
 'algorithms',
 'imitate',
 'way',
 'humans',
 'learn',
 'gradually',
 'improving',
 'accuracy',
 'ibm',
 'rich',
 'history',
 'machine',
 'learning',
 'one',
 'arthur',
 'samuel',
 'credited',
 'coining',
 'term',
 '“',
 'machine',
 'learning',
 '”',
 'research',
 'pdf',
 '481',
 'kb',
 'link',
 'resides',
 'outside',
 'ibm',
 'around',
 'game',
 'checkers',
 'robert',
 'nealey',
 'self',
 'proclaimed',
 'checkers',
 'master',
 'played',
 'game',
 'ibm',
 '7094',
 'computer',
 '1962',
 'lost',
 'computer',
 'compared',
 'done',
 'today',
 'feat',
 'seems',
 'trivial',
 '’s',
 'considered',
 'major',
 'milestone',
 'field',
 'artificial',
 'intelligence',
 'last',
 'couple',
 'decades',
 'technological',
 'advances',
 'storage',
 'processing',
 'power',
 'enabled',
 'innovative',
 'products',
 'based',
 'machine',
 'learning',
 'netflix',
 '’s',
 'recommendat

#### Word Frequency Score

In [94]:
word_frequencies={}
for word in words:
    if word not in word_frequencies:
        word_frequencies[word]=1
    else:
        word_frequencies[word]+=1

##### *This normalization process ensures that the highest frequency value in the dictionary becomes 1 and all other values are scaled accordingly, providing a measure of the relative importance of each word in the text. This is useful for creating a text summarizer, as it allows for the identification of the most important words in the text.*

In [95]:
max_freq=max(word_frequencies.values())
for key in word_frequencies:
    word_frequencies[key]= word_frequencies[key]/max_freq

In [96]:
sentence_score={}
for sent in doc.sents:
    for word in sent:
        if word.text.lower() in word_frequencies.keys():
            if sent not in sentence_score:
                sentence_score[sent]=word_frequencies[word.text.lower()]
            else:
                sentence_score[sent]+=word_frequencies[word.text.lower()]

In [103]:
len(sentence_score.values())

31

##### *Considering if I have to Summarize the text by 30% of all sentenses. I am using the below formula*

##### Formula: integer --> **(Length of Sentences)*(Percentage)**

In [106]:
number_of_sentences = int(len(list(doc.sents))*0.3)
number_of_sentences

9

In [107]:
count = 0
for sent in heapq.nlargest(number_of_sentences,sentence_score,key=sentence_score.get):
    print(sent)
    count+=1
count

However, neural networks is actually a sub-field of machine learning, and deep learning is a sub-field of neural networks.
See the blog post “AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference?” for a closer look at how the different concepts relate.
A neural network that consists of more than three layers—which would be inclusive of the input and the output—can be considered a deep learning algorithm or a deep neural network.
Machine learning, deep learning, and neural networks are all sub-fields of artificial intelligence.
"Deep" machine learning can use labeled datasets, also known as supervised learning, to inform its algorithm, but it doesn’t necessarily require a labeled dataset.
You can think of deep learning as "scalable machine learning" as Lex Fridman notes in this MIT lecture (01:08:05) (link resides outside IBM).
Since deep learning and machine learning tend to be used interchangeably, it’s worth noting the nuances between the two.
Deep lea

9