## Approach of the problem

### Tokenizing
In a program, text is represented as a string of characters. How can we go about moving one level of abstraction up, to the level of words, or tokens? To tokenize a sentence you may be tempted to use Pythonâ€™s .split() method, but this means you will need to code additional rules to remove hyphens, newlines and punctuation when appropriate.<br/>

We will be using NLTK regular expression tokenizer.


### Tagging
The next step is tagging. This uses statistical data to apply a Part-of-speech tag to each token, e.g. ADJ, NN (Noun), and so on. Since it is statistical, we need to either train our model or use a pre-trained model. NLTK comes with a pretty good one for general use.


### Chunking
Now we can use the part-of-speech tags to lift out noun phrases (NP) based on patterns of tags.Chunking is a term referring to the process of taking individual pieces of information (chunks) and grouping them into larger units. 
We can define the form of our chunks using a regular expression, and build a chunker from that.

### Walk the tree
The output of chunking is a tree, where the noun phrase nodes are located just one level before the leaves, which are the words that constitute the noun phrase:

In [45]:
import nltk

text = input("Enter the text:")
print()
print(text)


# Word Tokenization Regex adapted from NLTK book

# abbreviations, e.g. U.S.A. (with optional last period)
# words with optional internal hyphens
# currency and percentages, e.g. $12.40, 82%
# ellipsis
# these are separate tokens

sentence_re = r'(?:(?:[A-Z])(?:.[A-Z])+.?)|(?:\w+(?:-\w+)*)|(?:\$?\d+(?:.\d+)?%?)|(?:...|)(?:[][.,;"\'?():-_`])'

grammar = r"""
    NBAR:{<PRP>}
         {<NN.*|JJ>*<NN.*>} # Nouns and Adjectives, terminated with Nouns
        
    NP:
        {<DT>*<NBAR>}
        {<DT>*<NBAR><IN><NBAR>}  # Above, connected with in/of/etc...
"""
chunker = nltk.RegexpParser(grammar)
toks = nltk.regexp_tokenize(text, sentence_re)
postoks = nltk.tag.pos_tag(toks)

print()
print("The tokens with tags:")
print(postoks)

tree = chunker.parse(postoks)
#tree.draw()



#Finds NP (nounphrase) leaf nodes of a chunk tree.

def leaves(tree):
    for subtree in tree.subtrees(filter = lambda t: t.label()=='NP'):
        yield subtree.leaves()
        


def get_terms(tree):
    for leaf in leaves(tree):
        term = [ w for w,t in leaf  ]
        yield term

terms = get_terms(tree)

print()
print("The noun phrases are as follows:")
for term in terms:
    for word in term:
        print(word,"",end="")
    print()


Enter the text:Today is a very great day. Indian politicians 

Today is a very great day. Indian politicians 

The tokens with tags:
[('Today', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('very', 'RB'), ('great', 'JJ'), ('day', 'NN'), ('.', '.'), ('Indian', 'JJ'), ('politicians', 'NNS')]

The noun phrases are as follows:
Today 
great day 
Indian politicians 
