# Making Decisions and Taking Control

One of the most significant properties of programming, and thus subsequently natural language processing, is the ability of the programs to make decisions in instructed and or automated fashion, executing instructions when certain conditions are fulfilled, or repeatedly going through loops on textual data until some condition is satisfied.

# Conditionals

Python programming supports a wide range of operators, such as <, >=,==, !=, etc. for testing the relationship between values. We can use the various operators to select different words from a sentence of a piece of text. 

In [1]:
#Importing various books in- built in nltk library
from nltk.book import *

*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908


In [2]:
#fact: senti is first sentence of text texti, e.g, sent1 of text1, sent2 of text2, etc.
print(sent5)

['I', 'have', 'a', 'problem', 'with', 'people', 'PMing', 'me', 'to', 'lol', 'JOIN']


In [3]:
print("Usage of 'less-than' condition in a list- based operation: ")
sent5_with_fit1 =[word for word in sent5 if len(word) < 4]
print(sent5_with_fit1)

Usage of 'less-than' condition in a list- based operation: 
['I', 'a', 'me', 'to', 'lol']


In [4]:
print("Usage of 'less-than or equal-to' condition in a list- based operation: ")
sent5_with_fit1 =[word for word in sent5 if len(word) <= 4]
print(sent5_with_fit1)

Usage of 'less-than or equal-to' condition in a list- based operation: 
['I', 'have', 'a', 'with', 'me', 'to', 'lol', 'JOIN']


In [5]:
print("Usage of 'equal-to' condition in a list- based operation: ")
sent5_with_fit1 =[word for word in sent5 if len(word) == 4]
print(sent5_with_fit1)

Usage of 'equal-to' condition in a list- based operation: 
['have', 'with', 'JOIN']


In [6]:
print("Usage of 'not equal-to' condition in a list- based operation: ")
sent5_with_fit1 =[word for word in sent5 if len(word) != 4]
print(sent5_with_fit1)

Usage of 'not equal-to' condition in a list- based operation: 
['I', 'a', 'problem', 'people', 'PMing', 'me', 'to', 'lol']


# Operating on Every Element

For performing operations on every element of the list under consideration, we use expressions with the syntactical form [f(w) for …] or [w.f() for …], where f is a function that operates on each word of the list to perform operations like computing its length, converting it to uppercase, converting it to lowercase, etc .

In [7]:
print("Size of vocabulary of text1: ")
len(text1)

Size of vocabulary of text1: 


260819

In [8]:
print("Size of vocabulary of text1 with set operation applied: ")
len(set(text1))

Size of vocabulary of text1 with set operation applied: 


19317

In [9]:
#Using the syntax  [w.f() for …] for conversion of every word in the text to lowercase
text1_lower=[word.lower() for word in text1]
len(text1_lower)

260819

In [10]:
print("Size of vocabulary of text1 with set operation applied on lower_cased text: ")
len(set(text1_lower))

Size of vocabulary of text1 with set operation applied on lower_cased text: 


17231

Note: The set operations are used to avoid duplication (or for de- duplication effects), which show clearly in the contrast of values between the operations, len(set(text1)) and len(set(text1_lower)).      

The following piece of code can be used in case we want to really clean the textual information and operate on the textual information only that are exclusively alphabetic (i.e, no numbers,punctuation signs or any special character).

In [11]:
len(set(word.lower() for word in text1 if word.isalpha()))

16948

# Nested Code Blocks

As also implied by the literal meaning of "nested", the concept of nested code blocks is basically the concept of code blocks within code- blocks. These blocks are used in cases when multiple conditions have to be put in order to accomplish the programming operations. Nested code- blocks can be used in complex programs that need to be logically analyzed, in a nested fashion, into easy-to-handle blocks. 

Question: Input a word and find out if all the letters of the word are in lower- case or not. If yes, check if the word length is greater than 6 or not. 

In [12]:
word = input('Enter the word: ')

if word.islower():
    if len(word) >6:
        print('All letters of the word are in lower- case and the word length is greater than 6')
    else:
        print('All letters of the word are in lower- case but the word length is not greater than 6')
else:
    print('All the letters are not in lower-case.')   

Enter the word: car
All letters of the word are in lower- case but the word length is not greater than 6


# Looping with Conditions

Looping is the operation being done on various versions of a variable that is updated in each new cycle of operations known as loop. Conditions are applied when we want to filter the data with well- defined specified constraints and limitations. In cases when needs of multiple loops and constraints are simulteneously present, we use looping with conditions.   

In [13]:
print("Original length of text1: ")
len(text1)

Original length of text1: 


260819

In [14]:
print("Length of text1 with set operation applied to the entire text1: ")
len(set(text1))

Length of text1 with set operation applied to the entire text1: 


19317

In [15]:
new_text1=[]
for word in text1:
    if word.isalpha():
        new_text1.append(word.lower())

print("Length of text1 after transformation, i.e, new_text1: ")
len(new_text1)

Length of text1 after transformation, i.e, new_text1: 


218361

In [16]:
print("Length of new_text1 with set operation applied to the entire new_text1: ")
len(set(new_text1))

Length of new_text1 with set operation applied to the entire new_text1: 


16948