In [1]:
paragraph = '''AI, machine learning and deep learning are common terms in enterprise.
                IT and sometimes used interchangeably, especially by companies in their marketing materials. 
                But there are distinctions. The term AI, coined in the 1950s, refers to the simulation of human 
                intelligence by machines. It covers an ever-changing set of capabilities as new technologies 
                are developed. Technologies that come under the umbrella of AI include machine learning and 
                deep learning. Machine learning enables software applications to become more accurate at 
                predicting outcomes without being explicitly programmed to do so. Machine learning algorithms 
                use historical data as input to predict new output values. This approach became vastly more 
                effective with the rise of large data sets to train on. Deep learning, a subset of machine 
                learning, is based on our understanding of how the brain is structured. Deep learning's 
                use of artificial neural networks structure is the underpinning of recent advances in AI, 
                including self-driving cars and ChatGPT.'''

In [2]:
from nltk.tokenize import sent_tokenize
sentences = sent_tokenize(paragraph)
sentences

['AI, machine learning and deep learning are common terms in enterprise.',
 'IT and sometimes used interchangeably, especially by companies in their marketing materials.',
 'But there are distinctions.',
 'The term AI, coined in the 1950s, refers to the simulation of human \n                intelligence by machines.',
 'It covers an ever-changing set of capabilities as new technologies \n                are developed.',
 'Technologies that come under the umbrella of AI include machine learning and \n                deep learning.',
 'Machine learning enables software applications to become more accurate at \n                predicting outcomes without being explicitly programmed to do so.',
 'Machine learning algorithms \n                use historical data as input to predict new output values.',
 'This approach became vastly more \n                effective with the rise of large data sets to train on.',
 'Deep learning, a subset of machine \n                learning, is based on our

In [3]:
from nltk.stem import PorterStemmer, WordNetLemmatizer
ps = PorterStemmer()
lm = WordNetLemmatizer()

from nltk.corpus import stopwords
stp = stopwords.words('english')

### Porter Stemmer 
#### The following steps are followed :
1. Sentence tokenization  
eg, -The term AI, coined in the 1950s, refers to the simulation of human \n                intelligence by machines.'
Porter Stemmer  

2. Word tokenization of each sentences  
['The', 'term', 'AI', ',', 'coined', 'in', 'the', '1950s', ',', 'refers', 'to', 'the', 'simulation', 'of', 'human', 'intelligence', 'by', 'machines', '.']  

3. Removing stop words (All words are converted to lower case as list of words in corpus.stopwords('english') are lower cased  
['term', 'AI', ',', 'coined', '1950s', ',', 'refers', 'simulation', 'human', 'intelligence', 'machines', '.']  

4. Stemming  
['term', 'ai', ',', 'coin', '1950', ',', 'refer', 'simul', 'human', 'intellig', 'machin', '.']  

5. Joining all the list words  
term ai , coin 1950 , refer simul human intellig machin .

In [5]:
from nltk.tokenize import word_tokenize

sentences_ps = []
for sentence in sentences:
    word_tkn_of_sent = word_tokenize(sentence)
    stop_words_removed = [ps.stem(word) for word in word_tkn_of_sent if word.lower() not in stp]
    sentences_ps += [' '.join(stop_words_removed)]
sentences_ps

['ai , machin learn deep learn common term enterpris .',
 'sometim use interchang , especi compani market materi .',
 'distinct .',
 'term ai , coin 1950 , refer simul human intellig machin .',
 'cover ever-chang set capabl new technolog develop .',
 'technolog come umbrella ai includ machin learn deep learn .',
 'machin learn enabl softwar applic becom accur predict outcom without explicitli program .',
 'machin learn algorithm use histor data input predict new output valu .',
 'approach becam vastli effect rise larg data set train .',
 'deep learn , subset machin learn , base understand brain structur .',
 "deep learn 's use artifici neural network structur underpin recent advanc ai , includ self-driv car chatgpt ."]

### Lemmatization 
#### Steps: Same as above except last step of lemmatization  
term AI , coined 1950s , refers simulation human intelligence machine .

In [6]:
sentences_lm = []
for sentence in sentences:
    word_tkn_of_sent = word_tokenize(sentence)
    stop_words_removed = [lm.lemmatize(word) for word in word_tkn_of_sent if word not in stp]
    sentences_lm += [' '.join(stop_words_removed)]
sentences_lm

['AI , machine learning deep learning common term enterprise .',
 'IT sometimes used interchangeably , especially company marketing material .',
 'But distinction .',
 'The term AI , coined 1950s , refers simulation human intelligence machine .',
 'It cover ever-changing set capability new technology developed .',
 'Technologies come umbrella AI include machine learning deep learning .',
 'Machine learning enables software application become accurate predicting outcome without explicitly programmed .',
 'Machine learning algorithm use historical data input predict new output value .',
 'This approach became vastly effective rise large data set train .',
 'Deep learning , subset machine learning , based understanding brain structured .',
 "Deep learning 's use artificial neural network structure underpinning recent advance AI , including self-driving car ChatGPT ."]