### This notebook summarizes some core NLP tasks I learned.

In [1]:
import spacy
nlp = spacy.load('en_core_web_lg')#small and medium libraries are okay too

In [2]:
#random sentences
doc=nlp(u'I am Azka. I want to learn Machine Learning more. I want to dive deep in Deep Learning. I am passionate about Artificial Intelligence. I want to join this community!.')

# Tokenization

In [3]:
#tokenization
sents=[sent for sent in doc.sents]
print(f'Splitting into sentences.')
print(sents)
print('\nTokens of a single sentence')
for tokens in sents[4]:
    print(tokens.text)

Splitting into sentences.
[I am Azka., I want to learn Machine Learning more., I want to dive deep in Deep Learning., I am passionate about Artificial Intelligence., I want to join this community!.]

Tokens of a single sentence
I
want
to
join
this
community
!
.


# Named Entities

In [4]:
doc2=nlp(u'Google LLC is an American multinational technology company. CEO of Google is Sundar Pichai.It was founded in September 4, 1998 by Menlo Park, California, United States.')
for entities in doc2.ents:
    print(f'{entities.text:{35}} {entities.label_:{15}}{str(spacy.explain(entities.label_))}')

Google LLC                          ORG            Companies, agencies, institutions, etc.
American                            NORP           Nationalities or religious or political groups
Google                              ORG            Companies, agencies, institutions, etc.
Sundar Pichai                       PERSON         People, including fictional
September 4, 1998                   DATE           Absolute or relative dates or periods
Menlo Park                          GPE            Countries, cities, states
California                          GPE            Countries, cities, states
United States                       GPE            Countries, cities, states


### Visualizing entities

In [5]:
from spacy import displacy
displacy.render(doc2, style='ent', jupyter=True, options={'distance': 100})

In [6]:
len(doc.ents)

4

## Noun Chunks

In [8]:
doc3=nlp(u'Lisa is wearing a sleeveless shirt today and eating hot soup.')
for chunks in doc3.noun_chunks:
    print(chunks.text)

Lisa
a sleeveless shirt
hot soup


# Lemmatization

In [16]:
#reducion of word to its root word
doc4=nlp(u'I was sick I sat I ate I ran.I am eating the cookies that I baked this morning.')
for token in doc4:
    print(f'{token.text:{20}} {token.pos_:{20}} {token.lemma_}')

I                    PRON                 -PRON-
was                  AUX                  be
sick                 ADJ                  sick
I                    PRON                 -PRON-
sat                  VERB                 sit
I                    PRON                 -PRON-
ate                  VERB                 eat
I                    PRON                 -PRON-
ran                  VERB                 run
.                    PUNCT                .
I                    PRON                 -PRON-
am                   AUX                  be
eating               VERB                 eat
the                  DET                  the
cookies              NOUN                 cookie
that                 PRON                 that
I                    PRON                 -PRON-
baked                VERB                 bake
this                 DET                  this
morning              NOUN                 morning
.                    PUNCT                .


# Stemming
Spacy doesn't support stemming. 
Using nltk for stemming

In [18]:
import nltk
from nltk.stem.porter import *

In [21]:
p_stemmer = PorterStemmer()
tokens = ['ate','eating','eater','eat','caught','catch','slowly','gradually','mutually']
for t in tokens:
    print(t+' _____ '+p_stemmer.stem(t))

ate _____ ate
eating _____ eat
eater _____ eater
eat _____ eat
caught _____ caught
catch _____ catch
slowly _____ slowli
gradually _____ gradual
mutually _____ mutual


Stemming is a tecnique which follows some rules and cut down last part of words which is not suitable for some words as shown above.
Lemmatiation however is intelligent enough to understand the root word for given words.