### Importing libraries and dependencies

In [1]:
import nltk
import re
import pandas as pd
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer

In [2]:
import warnings
warnings.filterwarnings('ignore')

### Specifying Paragraph

In [3]:
paragraph = "Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to natural intelligence displayed by animals including humans. Leading AI textbooks define the field as the study of intelligent agents: any system that perceivesits environment and takes actions that maximize its chance of achieving its goals.Some popular accounts use the term artificial intelligence to describe machines that mimic cognitive functions that humans associate with the human mind, such as learning and problem solving.AI applications include advanced web search engines (e.g., Google), recommendation systems (used by YouTube, Amazon and Netflix), understanding  human speech (such as Siri and Alexa), self-driving cars (e.g., Tesla), automated decision-making and competing at the highest level in strategic game systems (such as chess and Go). As machines become increasingly capable, tasks considered to require intelligence are often removed from the definition of AI, a phenomenon known as the AI effect.For instance, optical character recognition is frequently excluded from things considered to be AI, having become a routine technology.Artificial intelligence was founded as an academic discipline in 1956, and in the years since has experienced several waves of optimism,followed by disappointment and the loss of funding (known as an AI winter),followed by new approaches, success and renewed funding.AI research has tried and discarded many different approaches since its founding, including simulating the brain, modeling human problem solving, formal logic, large databases of knowledge and imitating animal behavior. In the first decades of the 21st century, highly mathematical statistical machine learning has dominated the field,and this technique has proved highly successful, helping to solve many challenging problems throughout industry and academia.The various sub-fields of AI research are centered around particular goals and the use of particular tools. The traditional goals of AI research include reasoning, knowledge representation, planning, learning, natural language processing,perception, and the ability to move and manipulate objects. General intelligence (the ability to solve an arbitrary problem)is among the fields long-term goals.To solve these problems, AI researchers have adapted and integrated a wide range of problem-solving techniques—including search and mathematical optimization, formal logic, artificial neural networksand methods based on statistics, probability and economics. AI also draws upon computer science, psychology, linguistics,philosophy, and many other fields.The field was founded on the assumption that human intelligence can be so precisely described that a machine can be made to imulate it. This raises philosophical arguments about the mind and the ethics of creating artificial beings endowed with  human -like intelligence. These issues have been explored by myth, fiction, and philosophy since antiquity.  Science fiction and futurology have also suggested that, with its enormous potential and power, AI may become an existential risk to humanity."

In [4]:
paragraph

'Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to natural intelligence displayed by animals including humans. Leading AI textbooks define the field as the study of intelligent agents: any system that perceivesits environment and takes actions that maximize its chance of achieving its goals.Some popular accounts use the term artificial intelligence to describe machines that mimic cognitive functions that humans associate with the human mind, such as learning and problem solving.AI applications include advanced web search engines (e.g., Google), recommendation systems (used by YouTube, Amazon and Netflix), understanding  human speech (such as Siri and Alexa), self-driving cars (e.g., Tesla), automated decision-making and competing at the highest level in strategic game systems (such as chess and Go). As machines become increasingly capable, tasks considered to require intelligence are often removed from the definition of AI, a phenomenon known as the A

### Initialiizng Stemmer

In [5]:
stemmer = PorterStemmer()

### Initializing Lemmatizer

In [6]:
lemmatizer = WordNetLemmatizer()

### Tokenization 

In [7]:
sentences = nltk.sent_tokenize(paragraph)

### Cleaning Text

In [8]:
corpus = []

In [9]:
for i in range(len(sentences)):
    review = re.sub('[^a-zA-Z]',' ', sentences[i]) #Removing un necassary characters 
    review = review.lower() #converting to lowercase
    review = review.split() #split for getting the list of words
    review = [lemmatizer.lemmatize(word) for word in review if not word in stopwords.words("english")]
    review = ' '.join(review) #joining
    corpus.append(review)

In [10]:
corpus

['artificial intelligence ai intelligence demonstrated machine opposed natural intelligence displayed animal including human',
 'leading ai textbook define field study intelligent agent system perceivesits environment take action maximize chance achieving goal popular account use term artificial intelligence describe machine mimic cognitive function human associate human mind learning problem solving ai application include advanced web search engine e g google recommendation system used youtube amazon netflix understanding human speech siri alexa self driving car e g tesla automated decision making competing highest level strategic game system chess go',
 'machine become increasingly capable task considered require intelligence often removed definition ai phenomenon known ai effect instance optical character recognition frequently excluded thing considered ai become routine technology artificial intelligence founded academic discipline year since experienced several wave optimism follo

# TF-IDF Model 

In [11]:
cv = TfidfVectorizer(max_features = 1500)

In [12]:
X = cv.fit_transform(corpus).toarray() # Document Matrix

In [13]:
X

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.12127986],
       [0.        , 0.        , 0.11098695, ..., 0.11098695, 0.11098695,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])

In [14]:
X.shape

(10, 211)

In [15]:
df = pd.DataFrame(X,columns=cv.get_feature_names())

In [16]:
df

Unnamed: 0,ability,academia,academic,account,achieving,action,adapted,advanced,agent,ai,...,upon,use,used,various,wave,web,wide,winter,year,youtube
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149244,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.12128,0.12128,0.12128,0.0,0.12128,0.12128,0.107675,...,0.0,0.103099,0.12128,0.0,0.0,0.12128,0.0,0.0,0.0,0.12128
2,0.0,0.0,0.110987,0.0,0.0,0.0,0.0,0.0,0.0,0.246342,...,0.0,0.0,0.0,0.0,0.110987,0.0,0.0,0.110987,0.110987,0.0
3,0.0,0.172201,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076442,...,0.0,0.146387,0.0,0.172201,0.0,0.0,0.0,0.0,0.0,0.0
4,0.222242,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.116053,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.150979,0.0,0.0,0.0,0.0,0.0,0.177603,0.0,0.0,0.07884,...,0.0,0.0,0.0,0.0,0.0,0.0,0.177603,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.109673,...,0.247059,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.127681,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [17]:
df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
ability,0.0,0.00000,0.000000,0.000000,0.222242,0.150979,0.0,0.0,0.0,0.0
academia,0.0,0.00000,0.000000,0.172201,0.000000,0.000000,0.0,0.0,0.0,0.0
academic,0.0,0.00000,0.110987,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
account,0.0,0.12128,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
achieving,0.0,0.12128,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...
web,0.0,0.12128,0.000000,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
wide,0.0,0.00000,0.000000,0.000000,0.000000,0.177603,0.0,0.0,0.0,0.0
winter,0.0,0.00000,0.110987,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
year,0.0,0.00000,0.110987,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0
