# NLP with Python

Python is already quite mature for NLP, and since we can call Python code from within Julia
via PyCall.jl, it might be worthy to learn a bit of NLP with Python.

In [1]:
from nltk.tokenize import sent_tokenize, word_tokenize

mytext = '''In the previous chapter, we saw examples of some common NLP
applications that we might encounter in everyday life. If we were asked to
build such an application, think about how we would approach doing so at our
organization. We would normally walk through the requirements and break the
problem down into several sub-problems, then try to develop a step-by-step
procedure to solve them. Since language processing is involved, we would also
list all the forms of text processing needed at each step. This step-by-step
processing of text is known as pipeline. It is the series of steps involved in
building any NLP model. These steps are common in every NLP project, so it
makes sense to study them in this chapter. Understanding some common procedures
in any NLP pipeline will enable us to get started on any NLP problem encountered
in the workplace. Laying out and developing a text-processing pipeline is seen
as a starting point for any NLP application development process. In this
chapter, we will learn about the various steps involved and how they play
important roles in solving the NLP problem and we’ll see a few guidelines
about when and how to use which step. In later chapters, we’ll discuss
specific pipelines for various NLP tasks (e.g., Chapters 4–7).'''

my_sentences = sent_tokenize(mytext)
my_sentences

['In the previous chapter, we saw examples of some common NLP\napplications that we might encounter in everyday life.',
 'If we were asked to\nbuild such an application, think about how we would approach doing so at our\norganization.',
 'We would normally walk through the requirements and break the\nproblem down into several sub-problems, then try to develop a step-by-step\nprocedure to solve them.',
 'Since language processing is involved, we would also\nlist all the forms of text processing needed at each step.',
 'This step-by-step\nprocessing of text is known as pipeline.',
 'It is the series of steps involved in\nbuilding any NLP model.',
 'These steps are common in every NLP project, so it\nmakes sense to study them in this chapter.',
 'Understanding some common procedures\nin any NLP pipeline will enable us to get started on any NLP problem encountered\nin the workplace.',
 'Laying out and developing a text-processing pipeline is seen\nas a starting point for any NLP applicatio

In [2]:
import nltk
import pandas as pd

In [3]:
contos = [nltk.corpus.machado.raw(conto)[0:2000] for conto in nltk.corpus.machado.fileids() if "contos/" in conto];

In [4]:
data = pd.DataFrame({'contos':contos})

In [5]:
import spacy
nlp = spacy.load("pt_core_news_md")

In [6]:
data["tokens"] = data["contos"].apply(lambda x: nlp(x))

In [7]:
doc = nlp('Exemplos de textos aqui. Vamos ver como funciona.')

In [8]:
token = doc[0]
token.lemma_
token

Exemplos

In [9]:
from spacy import displacy

displacy.render(data["tokens"][0], style='dep',jupyter=True, options={'distance': 120})

In [6]:
doc = nlp(u'Testando um texto aleatório. Vamos ver como se comporta a ferramenta.')
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.shape_, token.is_alpha, token.is_stop)

Testando Testar VERB Xxxxx True False
um um DET xx True True
texto texto NOUN xxxx True False
aleatório aleatório ADJ xxxx True False
. . PUNCT . False False
Vamos ir AUX Xxxxx True False
ver ver VERB xxx True True
como como ADV xxxx True True
se se PRON xx True True
comporta comportar VERB xxxx True False
a o DET x True True
ferramenta ferramenta NOUN xxxx True False
. . PUNCT . False False


In [10]:
from spacy.lang.pt import Portuguese
parser = Portuguese()
doc = nlp(u'Testando um texto aleatório. Vamos ver como se comporta a ferramenta.')
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.shape_, token.is_alpha, token.is_stop)

Testando Testar VERB Xxxxx True False
um um DET xx True True
texto texto NOUN xxxx True False
aleatório aleatório ADJ xxxx True False
. . PUNCT . False False
Vamos ir AUX Xxxxx True False
ver ver VERB xxx True True
como como ADV xxxx True True
se se PRON xx True True
comporta comportar VERB xxxx True False
a o DET x True True
ferramenta ferramenta NOUN xxxx True False
. . PUNCT . False False


## Bag-Of-Words