## Tokenization:

* Tokenization is task of splitting a text into meaningful segmengts, called tokens. 
* The input to tokenizer is unicode text, output is Doc object.

Text ->`[Tokenizer -> Tagger -> Parser -> NER -> ...]` -> Doc

In [1]:
import spacy

In [2]:
nlp = spacy.load('en_core_web_sm')

In [3]:
text = 'Apple Inc. is an American multinational technology company headquartered in Cupertino, California, that designs, develops and sells consumer electronics, computer software, and online services. It is considered one of the Big Tech technology companies, alongside Amazon, Google, Microsoft, and Facebook.[8][9][10]The companys hardware products include the iPhone smartphone, the iPad tablet computer, the Mac personal computer, the iPod portable media player, the Apple Watch smartwatch, the Apple TV digital media player, the AirPods wireless earbuds and the HomePod smart speaker. Apples software includes macOS, iOS, iPadOS, watchOS, and tvOS operating systems, the iTunes media player, the Safari web browser, the Shazam music identifier and the iLife and iWork creativity and productivity suites, as well as professional applications like Final Cut Pro, Logic Pro, and Xcode. Its online services include the iTunes Store, the iOS App Store, Mac App Store, Apple Music, Apple TV+, iMessage, and iCloud. Other services include Apple Store, Genius Bar, AppleCare, Apple Pay, Apple Pay Cash, and Apple Card.'

In [4]:
doc = nlp(text)

In [5]:
for token in doc:
    print(token.text)

Apple
Inc.
is
an
American
multinational
technology
company
headquartered
in
Cupertino
,
California
,
that
designs
,
develops
and
sells
consumer
electronics
,
computer
software
,
and
online
services
.
It
is
considered
one
of
the
Big
Tech
technology
companies
,
alongside
Amazon
,
Google
,
Microsoft
,
and
Facebook.[8][9][10]The
companys
hardware
products
include
the
iPhone
smartphone
,
the
iPad
tablet
computer
,
the
Mac
personal
computer
,
the
iPod
portable
media
player
,
the
Apple
Watch
smartwatch
,
the
Apple
TV
digital
media
player
,
the
AirPods
wireless
earbuds
and
the
HomePod
smart
speaker
.
Apples
software
includes
macOS
,
iOS
,
iPadOS
,
watchOS
,
and
tvOS
operating
systems
,
the
iTunes
media
player
,
the
Safari
web
browser
,
the
Shazam
music
identifier
and
the
iLife
and
iWork
creativity
and
productivity
suites
,
as
well
as
professional
applications
like
Final
Cut
Pro
,
Logic
Pro
,
and
Xcode
.
Its
online
services
include
the
iTunes
Store
,
the
iOS
App
Store
,
Mac
App
Store
,
Apple
Mu

## Parts of Speech tagging [POS]


In [8]:
for token in doc:
    print(f'{token.text:{15}} {token.pos_:}')

Apple           PROPN
Inc.            PROPN
is              AUX
an              DET
American        ADJ
multinational   ADJ
technology      NOUN
company         NOUN
headquartered   VERB
in              ADP
Cupertino       PROPN
,               PUNCT
California      PROPN
,               PUNCT
that            SCONJ
designs         VERB
,               PUNCT
develops        VERB
and             CCONJ
sells           VERB
consumer        NOUN
electronics     NOUN
,               PUNCT
computer        NOUN
software        NOUN
,               PUNCT
and             CCONJ
online          ADJ
services        NOUN
.               PUNCT
It              PRON
is              AUX
considered      VERB
one             NUM
of              ADP
the             DET
Big             ADJ
Tech            PROPN
technology      NOUN
companies       NOUN
,               PUNCT
alongside       ADP
Amazon          PROPN
,               PUNCT
Google          PROPN
,               PUNCT
Microsoft       PROPN
,    

In [10]:
from spacy import displacy
displacy.render(doc,style = 
               'dep'
               )

## NER (Named Entity Recognition)

* visit spaCy documentation for NER

In [11]:
for ent in doc.ents:
    print(f'{ent.text:{15}} {ent.label_:}')

Apple Inc.      ORG
American        NORP
Cupertino       GPE
California      GPE
Big Tech        ORG
Amazon          ORG
Google          ORG
Microsoft       ORG
iPhone          ORG
Mac             ORG
iPod            ORG
Apple Watch     ORG
Apple TV        ORG
AirPods         ORG
HomePod         ORG
watchOS         PRODUCT
tvOS            ORG
Safari          NORP
Shazam          LOC
iLife           ORG
Cut Pro         PERSON
Xcode           PERSON
the iTunes Store ORG
the iOS App Store ORG
Mac App Store   PERSON
Apple Music     ORG
Apple TV+, iMessage ORG
iCloud          ORG
Apple Store     ORG
Genius Bar      ORG
AppleCare       ORG
Apple           ORG
Apple           ORG
Pay Cash        PERSON
Apple Card      ORG


In [12]:
displacy.render(doc,style = 'ent')