# Exercise 03: Splitting sentences and PoS annotation

Let's start with a simple paragraph, copied from the course description:

In [None]:
text = """
Increasingly, customers send text to interact or leave comments, 
which provides a wealth of data for text mining. That’s a great 
starting point for developing custom search, content recommenders, 
and even AI applications.
"""
repr(text)

Notice how there are explicit *line breaks* in the text. Let's write some code to flow the paragraph without any line breaks:

In [None]:
text = " ".join(map(lambda x: x.strip(), text.split("\n"))).strip()
repr(text)

Now we can use [TextBlob](http://textblob.readthedocs.io/) to *split* the paragraph into sentences:

In [None]:
from textblob import TextBlob

for sent in TextBlob(text).sentences:
  print("> ", sent)

Next we take a sentence and *annotate* it with part-of-speech (PoS) tags:

In [None]:
import textblob_aptagger as tag

sent = "Increasingly, customers send text to interact or leave comments, which provides a wealth of data for text mining."

ts = tag.PerceptronTagger().tag(sent)
print(ts)

Given these annotations for part-of-speech tags, we can *lemmatize* nouns and verbs to get their root forms. This will also singularize the plural nouns:

In [None]:
from textblob import Word

ts = [('InterAct', 'VB'), ('comments', 'NNS'), ('provides', 'VBZ'), ('mining', 'NN')]

for lex, pos in ts:
  w = Word(lex.lower())
  lemma = w.lemmatize(pos[0].lower())
  print(lex, pos, lemma)

We can also lookup synonyms and definitions for each word, using *synsets* from [WordNet](https://wordnet.princeton.edu/):

In [None]:
from textblob.wordnet import VERB

w = Word("comments")

for synset, definition in zip(w.get_synsets(), w.define()):
  print(synset, definition)