## NLTK Practice Cleaning
1. Installing and Importing NLTK
2. Tokenization
3. Stemming
4. Lemmatization
5. Stopwords
6. Parts of Speech Tagging
7. Named Entity Recognition



---
### 1. Installing and Importing NLTK

In [47]:
# !pip install nltk

In [48]:
import nltk 

In [49]:
# IMPORTANT DOWNLOADS

"""import nltk

nd = ['punkt', 'punkt_tab', 'wordnet', 'stopwords', 'averaged_perceptron_tagger', 'averaged_perceptron_tagger_eng', 'maxent_ne_chunker', 'maxent_ne_chunker_tab', 'words']
for i in nd :
    nltk.download(i)"""

"import nltk\n\nnd = ['punkt', 'punkt_tab', 'wordnet', 'stopwords', 'averaged_perceptron_tagger', 'averaged_perceptron_tagger_eng', 'maxent_ne_chunker', 'maxent_ne_chunker_tab', 'words']\nfor i in nd :\n    nltk.download(i)"

### 2. Tokenization
- **Tokenization** is the process of breaking down a text (like a sentence or paragraph) into smaller pieces called tokens. 
- It’s the first step in most NLP tasks (like translation, sentiment analysis, text classification).

#### 2.1 Turning CORPUS into DOCUMENTS

- If you have a corpus as a big chunk of text, you might want to split it into smaller pieces (the documents) so you can process or analyze them individually.
    - CORPUS ---> Paragraph 
    - DOCUMENTS ---> Sentences
    - Vocabulary ---> Unique words
    - Words ---> word

In [50]:
corpus = """Hello welcome to my NLTK Prctice i.e., my rough work on nltk.
Let's explore what nltk can do.
I'm really excited! ready set go.
"""

In [51]:
print(corpus)

Hello welcome to my NLTK Prctice i.e., my rough work on nltk.
Let's explore what nltk can do.
I'm really excited! ready set go.



In [52]:
from nltk.tokenize import sent_tokenize
# nltk.download('punkt_tab')

In [53]:
documents = sent_tokenize(corpus)
documents

['Hello welcome to my NLTK Prctice i.e., my rough work on nltk.',
 "Let's explore what nltk can do.",
 "I'm really excited!",
 'ready set go.']

In [54]:
for sent in documents:
    print(sent)

Hello welcome to my NLTK Prctice i.e., my rough work on nltk.
Let's explore what nltk can do.
I'm really excited!
ready set go.


#### 2.2 Turning DOCUMENTS into WORDS
- Each document is a chunk of text, and **word tokenization** splits that text into individual words (tokens).

In [55]:
from nltk.tokenize import word_tokenize

In [56]:
word_tokenize(corpus)

['Hello',
 'welcome',
 'to',
 'my',
 'NLTK',
 'Prctice',
 'i.e.',
 ',',
 'my',
 'rough',
 'work',
 'on',
 'nltk',
 '.',
 'Let',
 "'s",
 'explore',
 'what',
 'nltk',
 'can',
 'do',
 '.',
 'I',
 "'m",
 'really',
 'excited',
 '!',
 'ready',
 'set',
 'go',
 '.']

In [57]:
for sent in documents:
    print(word_tokenize(sent))

['Hello', 'welcome', 'to', 'my', 'NLTK', 'Prctice', 'i.e.', ',', 'my', 'rough', 'work', 'on', 'nltk', '.']
['Let', "'s", 'explore', 'what', 'nltk', 'can', 'do', '.']
['I', "'m", 'really', 'excited', '!']
['ready', 'set', 'go', '.']


In [58]:
from nltk.tokenize import wordpunct_tokenize

wordpunct_tokenize(corpus) # will consider puncuations as words

['Hello',
 'welcome',
 'to',
 'my',
 'NLTK',
 'Prctice',
 'i',
 '.',
 'e',
 '.,',
 'my',
 'rough',
 'work',
 'on',
 'nltk',
 '.',
 'Let',
 "'",
 's',
 'explore',
 'what',
 'nltk',
 'can',
 'do',
 '.',
 'I',
 "'",
 'm',
 'really',
 'excited',
 '!',
 'ready',
 'set',
 'go',
 '.']

In [59]:
from nltk.tokenize import TreebankWordTokenizer

tokenizer = TreebankWordTokenizer()

tokenizer.tokenize(corpus)   # won't treat fullstop as a word will consider it in the previous word

['Hello',
 'welcome',
 'to',
 'my',
 'NLTK',
 'Prctice',
 'i.e.',
 ',',
 'my',
 'rough',
 'work',
 'on',
 'nltk.',
 'Let',
 "'s",
 'explore',
 'what',
 'nltk',
 'can',
 'do.',
 'I',
 "'m",
 'really',
 'excited',
 '!',
 'ready',
 'set',
 'go',
 '.']

### 3. Stemming
- **Stemming** is the process of reducing a word to its root word called **Stem**, that affixes, suffixes or perfixes to the root word known as a **Lemma**
- **Stemming** is important in Natural Language Understanding (NLU) and Natural Langugae Processing
- **Stemming** Examples
    - [eat, eating, eaten] --> eat (root word, stem word)
    - [running, run, ran] --> run  (root word, stem word)

In [60]:
words = ['playing', 'played', 'plays', 'flying', 'flies', 'cried', 'crying', 'happier', 'happyly', 'studies', 'studying']

#### 3.1 Porter Stemming
- The **Porter Stemmer** is a widely used algorithm in natural language processing (NLP) for word stemming—reducing words to their base or root form

In [61]:
from nltk.stem import PorterStemmer
stemming = PorterStemmer()

In [62]:
for word in words :
    print(f"{word} ---> {stemming.stem(word)}")
# will give some errors e.g. [flying ---> fli], [crying ---> cri] etc

playing ---> play
played ---> play
plays ---> play
flying ---> fli
flies ---> fli
cried ---> cri
crying ---> cri
happier ---> happier
happyly ---> happyli
studies ---> studi
studying ---> studi


In [63]:
stemming.stem('Congratulations') 
# returns word 'congratul' which completly changes the meaning

'congratul'

In [64]:
print(stemming.stem('sitting')) # returns sit
print(stemming.stem('ssitting')) # returns ssit

# This problem will get fixed with the help of Lemmatzation

sit
ssit


#### 3.2 RegexpStemmer
- The **RegexpStemmer** (Regular Expression Stemmer) is a simple and customizable rule-based stemmer that removes suffixes from words using regular expressions.
- Unlike more complex stemmers like the **PorterStemmer**, which use rule sets and conditions, **RegexpStemmer** works by applying your specified regular expression—which makes it very flexible but also very manual.

In [65]:
from nltk.stem import RegexpStemmer

reg_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [66]:
print(reg_stemmer.stem('eating'))
print(reg_stemmer.stem('ingeating')) # returns 'ingeat' coz we addded '$' at last if we'll remove '$' then it will return 'eat' for the same input

eat
ingeat


In [67]:
for word in words :
    print(f"{word} ---> {reg_stemmer.stem(word)}")

playing ---> play
played ---> played
plays ---> play
flying ---> fly
flies ---> flie
cried ---> cried
crying ---> cry
happier ---> happier
happyly ---> happyly
studies ---> studie
studying ---> study


#### 3.3 Snowball Stemmer
-  The **Snowball Stemmer** uses an improved version of the original Porter algorithm (often called Porter2), which is less aggressive and more accurate.
-  Unlike the **Porter Stemmer**, which primarily works for English, the **Snowball Stemmer** supports several languages.

In [68]:
from nltk.stem import SnowballStemmer

snowballstemmer = SnowballStemmer('english')

In [69]:
for word in words :
    print(f'{word} ---> {snowballstemmer.stem(word)}')

playing ---> play
played ---> play
plays ---> play
flying ---> fli
flies ---> fli
cried ---> cri
crying ---> cri
happier ---> happier
happyly ---> happyli
studies ---> studi
studying ---> studi


In [70]:
print('Porter : ' + stemming.stem('fairly'), stemming.stem('sportingly'))

print('Snowball : ' + snowballstemmer.stem('fairly'), snowballstemmer.stem('sportingly'))

Porter : fairli sportingli
Snowball : fair sport


### 4. Lemmatization

- **Lemmatization** is another text preprocessing technique in Natural Language Processing (NLP) that aims to reduce words to their base or dictionary form (called a lemma).
- Unlike **stemming**, which often removes suffixes in a mechanical or rule-based manner, **lemmatization** takes into account the context and the part of speech of a word.

In [71]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

In [72]:
# nltk.download('wordnet')

#### 4.1 The Part Of Speech tag. Valid options are :
- "n" : nouns **(By Deafult)**
- "v" : verbs
- "a" : adjectives 
- "r" : adverbs 
- "s" : satellite adjectives

In [73]:
lemmatizer.lemmatize('going', pos = 'v')

'go'

In [74]:
for word in words:
    print(f'{word} ---> {lemmatizer.lemmatize(word, pos = 'v')}')

playing ---> play
played ---> play
plays ---> play
flying ---> fly
flies ---> fly
cried ---> cry
crying ---> cry
happier ---> happier
happyly ---> happyly
studies ---> study
studying ---> study


### 5. Stopwords
- **Stopwords** are common words in a language that are usually filtered out before or after processing text in Natural Language Processing (NLP) tasks, because they are considered to carry little meaningful information.

In [75]:
paragraph = """I have three visions for India. In 3000 years of our history people from all over the world have come and invaded us, captured our lands, conquered our minds. From Alexander onwards the Greeks, the Turks, the Moguls, the Portuguese, the British, the French, the Dutch, all of them came and looted us, took over what was ours. Yet we have not done this to any other nation. We have not conquered anyone. We have not grabbed their land, their culture and their history and tried to enforce our way of life on them. Why? Because we respect the freedom of others. That is why my FIRST VISION is that of FREEDOM. I believe that India got its first vision of this in 1857, when we started the war of Independence. It is this freedom that we must protect and nurture and build on. If we are not free, no one will respect us.

We have 10 percent growth rate in most areas. Our poverty levels are falling. Our achievements are being globally recognised today. Yet we lack the self-confidence to see ourselves as a developed nation, self-reliant and self-assured. Isn't this incorrect? MY SECOND VISION for India is DEVELOPMENT. For fifty years we have been a developing nation. It is time we see ourselves as a developed nation. We are among top five nations in the world in terms of GDP.

I have a THIRD VISION. India must stand up to the world. Because I believe that unless India stands up to the world, no one will respect us. Only strength respects strength. We must be strong not only as a military power but also as an economic power. Both must go hand-in-hand. My good fortune was to have worked with three great minds. Dr.Vikram Sarabhai, of the Dept. of Space, Professor Satish Dhawan, who succeeded him and Dr. Brahm Prakash, father of nuclear material. I was lucky to have worked with all three of them closely and consider this the great opportunity of my life.

I was in Hyderabad giving this lecture, when a 14 year-old girl asked me for my autograph. I asked her what her goal in life is. She replied: I want to live in a developed India. For her, you and I will have to build this developed India. You must proclaim India is not an underdeveloped nation; it is a highly developed nation.

You say that our government is inefficient. You say that our laws are too old. You say that the municipality does not pick up the garbage. You say that the phones don't work, the railways are a joke, the airline is the worst in the world, and mails never reach their destination. You say that our country has been fed to the dogs and is the absolute pits. You say, say and say. What do you do about it?

Dear Indians, I am echoing J.F.Kennedy's words to his fellow Americans to relate to Indians ……. “ASK WHAT WE CAN DO FOR INDIA AND DO WHAT HAS TO BE DONE TO MAKE INDIA WHAT AMERICA AND OTHER WESTERN COUNTRIES ARE TODAY.”"""

In [76]:
# from nltk.stem import PorterStemmer
# from nltk.stem import SnowballStemmer
# from nltk.stem import WordNetLemmatizer
# from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize






In [77]:
import nltk
# nltk.download('stopwords')

In [78]:
from nltk.corpus import stopwords
stopwords.words('english')

['a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 "he'd",
 "he'll",
 'her',
 'here',
 'hers',
 'herself',
 "he's",
 'him',
 'himself',
 'his',
 'how',
 'i',
 "i'd",
 'if',
 "i'll",
 "i'm",
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it'd",
 "it'll",
 "it's",
 'its',
 'itself',
 "i've",
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'on

In [79]:
sentences = sent_tokenize(paragraph)

In [80]:
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()

sentences_p = sentences.copy()

for i in range(len(sentences_p)) :
    words = word_tokenize(sentences_p[i])
    words = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences_p[i] = ' '.join(words) # converting all the words into sentence

In [81]:
sentences_p

['i three vision india .',
 'in 3000 year histori peopl world come invad us , captur land , conquer mind .',
 'from alexand onward greek , turk , mogul , portugues , british , french , dutch , came loot us , took .',
 'yet done nation .',
 'we conquer anyon .',
 'we grab land , cultur histori tri enforc way life .',
 'whi ?',
 'becaus respect freedom other .',
 'that first vision freedom .',
 'i believ india got first vision 1857 , start war independ .',
 'it freedom must protect nurtur build .',
 'if free , one respect us .',
 'we 10 percent growth rate area .',
 'our poverti level fall .',
 'our achiev global recognis today .',
 'yet lack self-confid see develop nation , self-reli self-assur .',
 "is n't incorrect ?",
 'my second vision india develop .',
 'for fifti year develop nation .',
 'it time see develop nation .',
 'we among top five nation world term gdp .',
 'i third vision .',
 'india must stand world .',
 'becaus i believ unless india stand world , one respect us .',
 'on

In [82]:
from nltk.stem import SnowballStemmer

snowballstemmer = SnowballStemmer('english')

sentences_s = sentences.copy()

for i in range(len(sentences_s)) :
    words = nltk.word_tokenize(sentences_s[i])
    words = [snowballstemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences_s[i] = ' '.join(words) # converting all the words into sentence

In [83]:
sentences_s

['i three vision india .',
 'in 3000 year histori peopl world come invad us , captur land , conquer mind .',
 'from alexand onward greek , turk , mogul , portugues , british , french , dutch , came loot us , took .',
 'yet done nation .',
 'we conquer anyon .',
 'we grab land , cultur histori tri enforc way life .',
 'whi ?',
 'becaus respect freedom other .',
 'that first vision freedom .',
 'i believ india got first vision 1857 , start war independ .',
 'it freedom must protect nurtur build .',
 'if free , one respect us .',
 'we 10 percent growth rate area .',
 'our poverti level fall .',
 'our achiev global recognis today .',
 'yet lack self-confid see develop nation , self-reli self-assur .',
 "is n't incorrect ?",
 'my second vision india develop .',
 'for fifti year develop nation .',
 'it time see develop nation .',
 'we among top five nation world term gdp .',
 'i third vision .',
 'india must stand world .',
 'becaus i believ unless india stand world , one respect us .',
 'on

In [84]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

sentences_l = sentences.copy()

for i in range(len(sentences_l)) :
    words = word_tokenize(sentences_l[i])
    words = [lemmatizer.lemmatize(word.lower(), pos='v') for word in words if word not in set(stopwords.words('english'))]
    sentences_l[i] = ' '.join(words) # converting all the words into sentence

In [85]:
sentences_l

['i three visions india .',
 'in 3000 years history people world come invade us , capture land , conquer mind .',
 'from alexander onwards greeks , turks , moguls , portuguese , british , french , dutch , come loot us , take .',
 'yet do nation .',
 'we conquer anyone .',
 'we grab land , culture history try enforce way life .',
 'why ?',
 'because respect freedom others .',
 'that first vision freedom .',
 'i believe india get first vision 1857 , start war independence .',
 'it freedom must protect nurture build .',
 'if free , one respect us .',
 'we 10 percent growth rate areas .',
 'our poverty level fall .',
 'our achievements globally recognise today .',
 'yet lack self-confidence see develop nation , self-reliant self-assured .',
 "be n't incorrect ?",
 'my second vision india development .',
 'for fifty years develop nation .',
 'it time see develop nation .',
 'we among top five nations world term gdp .',
 'i third vision .',
 'india must stand world .',
 'because i believe un

### 6. Parts of Speech Tagging
- **Part-of-Speech** (POS) tagging is the process of labeling each word in a sentence with its corresponding part of speech
    - such as noun, verb, adjective, etc
- Based on both definition and context.


- Full list of POS tags :
 Penn Treebank POS Tagset (36 Tags)

| Tag   | Description                                | Example(s)               |
|--------|--------------------------------------------|---------------------------|
| **CC**  | Coordinating conjunction                   | and, but, or              |
| **CD**  | Cardinal number                            | one, two, 100             |
| **DT**  | Determiner                                 | the, a, an, this          |
| **EX**  | Existential "there"                        | there (in "there is")     |
| **FW**  | Foreign word                               | d'accord, coup d’état     |
| **IN**  | Preposition/subordinating conjunction      | in, of, like, because     |
| **JJ**  | Adjective                                  | green, quick, easy        |
| **JJR** | Adjective, comparative                     | better, faster            |
| **JJS** | Adjective, superlative                     | best, fastest             |
| **LS**  | List item marker                           | 1., A., a)                |
| **MD**  | Modal auxiliary verb                       | can, must, should         |
| **NN**  | Noun, singular or mass                     | cat, knowledge            |
| **NNS** | Noun, plural                               | cats, cars                |
| **NNP** | Proper noun, singular                      | John, London              |
| **NNPS**| Proper noun, plural                        | Americans, Beatles        |
| **PDT** | Predeterminer                              | all, both (in "all the children") |
| **POS** | Possessive ending                          | 's                        |
| **PRP** | Personal pronoun                           | I, you, he, she, it       |
| **PRP$**| Possessive pronoun                         | my, your, his             |
| **RB**  | Adverb                                     | quickly, very             |
| **RBR** | Adverb, comparative                        | faster, better            |
| **RBS** | Adverb, superlative                        | best, most                |
| **RP**  | Particle                                   | up, off (in "give up")    |
| **SYM** | Symbol                                     | $, %, &                   |
| **TO**  | "to"                                       | to (in "to go")           |
| **UH**  | Interjection                               | oh, wow, hey              |
| **VB**  | Verb, base form                            | go, eat                   |
| **VBD** | Verb, past tense                           | went, ate                 |
| **VBG** | Verb, gerund/present participle            | going, eating             |
| **VBN** | Verb, past participle                      | gone, eaten               |
| **VBP** | Verb, non-3rd person singular present      | go, eat                   |
| **VBZ** | Verb, 3rd person singular present          | goes, eats                |
| **WDT** | Wh-determiner                              | which, that               |
| **WP**  | Wh-pronoun                                 | who, what                 |
| **WP$** | Possessive wh-pronoun                      | whose                     |
| **WRB** | Wh-adverb                                  | where, when, why          |



In [86]:
import nltk
from nltk.corpus import stopwords
# nltk.download('averaged_perceptron_tagger_eng')

In [87]:
sentences_pos = nltk.sent_tokenize(paragraph)
sentences_pos

['I have three visions for India.',
 'In 3000 years of our history people from all over the world have come and invaded us, captured our lands, conquered our minds.',
 'From Alexander onwards the Greeks, the Turks, the Moguls, the Portuguese, the British, the French, the Dutch, all of them came and looted us, took over what was ours.',
 'Yet we have not done this to any other nation.',
 'We have not conquered anyone.',
 'We have not grabbed their land, their culture and their history and tried to enforce our way of life on them.',
 'Why?',
 'Because we respect the freedom of others.',
 'That is why my FIRST VISION is that of FREEDOM.',
 'I believe that India got its first vision of this in 1857, when we started the war of Independence.',
 'It is this freedom that we must protect and nurture and build on.',
 'If we are not free, no one will respect us.',
 'We have 10 percent growth rate in most areas.',
 'Our poverty levels are falling.',
 'Our achievements are being globally recognised

In [88]:
    
for i in range(len(sentences_pos)):
    words = nltk.word_tokenize(sentences_pos[i])
    words = [word for word in words if word not in set(stopwords.words('english'))]
    pos_tag = nltk.pos_tag(words)
    print(pos_tag)


[('I', 'PRP'), ('three', 'CD'), ('visions', 'NNS'), ('India', 'NNP'), ('.', '.')]
[('In', 'IN'), ('3000', 'CD'), ('years', 'NNS'), ('history', 'NN'), ('people', 'NNS'), ('world', 'NN'), ('come', 'VBP'), ('invaded', 'VBN'), ('us', 'PRP'), (',', ','), ('captured', 'VBD'), ('lands', 'NNS'), (',', ','), ('conquered', 'VBD'), ('minds', 'NNS'), ('.', '.')]
[('From', 'IN'), ('Alexander', 'NNP'), ('onwards', 'NNS'), ('Greeks', 'NNP'), (',', ','), ('Turks', 'NNP'), (',', ','), ('Moguls', 'NNP'), (',', ','), ('Portuguese', 'NNP'), (',', ','), ('British', 'NNP'), (',', ','), ('French', 'NNP'), (',', ','), ('Dutch', 'NNP'), (',', ','), ('came', 'VBD'), ('looted', 'JJ'), ('us', 'PRP'), (',', ','), ('took', 'VBD'), ('.', '.')]
[('Yet', 'RB'), ('done', 'VBN'), ('nation', 'NN'), ('.', '.')]
[('We', 'PRP'), ('conquered', 'VBD'), ('anyone', 'NN'), ('.', '.')]
[('We', 'PRP'), ('grabbed', 'VBD'), ('land', 'NN'), (',', ','), ('culture', 'NN'), ('history', 'NN'), ('tried', 'VBD'), ('enforce', 'JJ'), ('way',

### 7. Named Entity Recognition


NLTK’s `ne_chunk()` function identifies and labels the following named entities in text.

#### 📋 NER Labels (from `nltk.ne_chunk()`)

| Label          | Description                                        | Example(s)                       |
|----------------|----------------------------------------------------|----------------------------------|
| **PERSON**      | People, fictional characters                       | Barack Obama, Sherlock Holmes    |
| **ORGANIZATION**| Companies, agencies, institutions                  | Google, United Nations           |
| **GPE**         | Geo-Political Entities (countries, cities, states) | India, New York, California      |
| **LOCATION**    | Physical locations (non-GPE)                       | Mount Everest, Nile River        |
| **FACILITY**    | Buildings, airports, highways, etc. (less common) | Empire State Building, Heathrow  |
| **GSP**         | Geopolitical subgroup (rare in NLTK)              | The South, West Coast            |




In [89]:
sentence_ner = """The Eiffel Tower is a landmark in Paris.  It was designed by Gustave Eiffel's company for the 1889 World's Fair and initially intended as a temporary structure. In 1909, there were discussions regarding the potential dismantling of the Eiffel Tower. However, the decision to save this iconic structure was ultimately reached by city officials, who acknowledged its importance as a radiotelegraphy station.
The Tower cost 7,799,401.31 French gold francs to build in 1889, an amount equal to $1,495,139.89 at that time. Today, its cost would equal to $36,784,020.11"""

In [90]:
import nltk
# nltk.download('maxent_ne_chunker')
# nltk.download('maxent_ne_chunker_tab')
# nltk.download('words')

In [91]:
words_ner = nltk.word_tokenize(sentence_ner)
tag_elements = nltk.pos_tag(words_ner)

In [92]:
nltk.ne_chunk(tag_elements).draw()