# Text analytics using NLTK

Need to run 'pip install nltk' first.
More information can be found on [NLTK website](http://www.nltk.org/)

Just getting started with some introduction example of using NLTK. Making sure everything is working as expected. 

In [30]:
# Import NLTK package.
import nltk

# Download averaged perceptron tagger (for pos_tag)
nltk.download('averaged_perceptron_tagger')

# Download entities chunck
nltk.download('maxent_ne_chunker')

# Download popular packages
nltk.download('popular', halt_on_error=False)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/ahmet/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /Users/ahmet/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading collection 'popular'
[nltk_data]    | 
[nltk_data]    | Downloading package cmudict to
[nltk_data]    |     /Users/ahmet/nltk_data...
[nltk_data]    |   Unzipping corpora/cmudict.zip.
[nltk_data]    | Downloading package gazetteers to
[nltk_data]    |     /Users/ahmet/nltk_data...
[nltk_data]    |   Unzipping corpora/gazetteers.zip.
[nltk_data]    | Downloading package genesis to
[nltk_data]    |     /Users/ahmet/nltk_data...
[nltk_data]    |   Unzipping corpora/genesis.zip.
[nltk_data]    | Downloading package gutenberg to
[nltk_data]    |     /Users/ahmet/nltk_data...
[nltk_data]    |   Unzipping corpora/gutenberg

True

List of pos_tag available from [here](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)

Number  Tag  Description
1.	CC	Coordinating conjunction
2.	CD	Cardinal number
3.	DT	Determiner
4.	EX	Existential there
5.	FW	Foreign word
6.	IN	Preposition or subordinating conjunction
7.	JJ	Adjective
8.	JJR	Adjective, comparative
9.	JJS	Adjective, superlative
10.	LS	List item marker
11.	MD	Modal
12.	NN	Noun, singular or mass
13.	NNS	Noun, plural
14.	NNP	Proper noun, singular
15.	NNPS	Proper noun, plural
16.	PDT	Predeterminer
17.	POS	Possessive ending
18.	PRP	Personal pronoun
19.	PRPs	Possessive pronoun
20.	RB	Adverb
21.	RBR	Adverb, comparative
22.	RBS	Adverb, superlative
23.	RP	Particle
24.	SYM	Symbol
25.	TO	to
26.	UH	Interjection
27.	VB	Verb, base form
28.	VBD	Verb, past tense
29.	VBG	Verb, gerund or present participle
30.	VBN	Verb, past participle
31.	VBP	Verb, non-3rd person singular present
32.	VBZ	Verb, 3rd person singular present
33.	WDT	Wh-determiner
34.	WP	Wh-pronoun
35.	WPs	Possessive wh-pronoun
36.	WRB	Wh-adverb

In [25]:
sentence = """At eight o'clock on Thursday morning Arthur didn't feel very good."""

# Extract word tokens
tokens = nltk.word_tokenize(sentence)
print(tokens)

# Showing the tags
tagged = nltk.pos_tag(tokens)
print(tagged)

['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']
[('At', 'IN'), ('eight', 'CD'), ("o'clock", 'NN'), ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN'), ('Arthur', 'NNP'), ('did', 'VBD'), ("n't", 'RB'), ('feel', 'VB'), ('very', 'RB'), ('good', 'JJ'), ('.', '.')]


In [31]:
entities = nltk.chunk.ne_chunk(tagged)
print(entities)

(S
  At/IN
  eight/CD
  o'clock/NN
  on/IN
  Thursday/NNP
  morning/NN
  (PERSON Arthur/NNP)
  did/VBD
  n't/RB
  feel/VB
  very/RB
  good/JJ
  ./.)


In [35]:
from nltk.corpus import treebank
t = treebank.parsed_sents('wsj_0001.mrg')[0]
t.draw()

from nltk.tree import Tree
from nltk.draw.tree import TreeView
t = Tree.fromstring('(S (NP this tree) (VP (V is) (AdjP pretty)))')
TreeView(t)._cframe.print_to_file('output.ps')

import os
os.system('convert output.ps output.png')

32512