## Using Spacy Library

In [2]:
import spacy

Spacy library is Object Oriented, while the NLTK is String Processing

In [3]:
nlp = spacy.load("en_core_web_sm")
doc = nlp("Dr. Smith, renowned for his groundbreaking research in neuroscience, will deliver the keynote address at the conference. After years of dedicated study, Sarah finally achieved her dream of becoming Dr. Johnson, earning her Ph.D. in astrophysics.")

#### 1. Sentence Tokenization

In [4]:
for sentence in doc.sents:
    print(sentence)

Dr. Smith, renowned for his groundbreaking research in neuroscience, will deliver the keynote address at the conference.
After years of dedicated study, Sarah finally achieved her dream of becoming Dr. Johnson, earning her Ph.D. in astrophysics.


Here, we can see these two sentences is properly segmented/tokenized without just splitting from '.'.

#### 2. Word Tokenization

In [6]:
for sentence in doc.sents:
    for word in sentence:
        print(word)

Dr.
Smith
,
renowned
for
his
groundbreaking
research
in
neuroscience
,
will
deliver
the
keynote
address
at
the
conference
.
After
years
of
dedicated
study
,
Sarah
finally
achieved
her
dream
of
becoming
Dr.
Johnson
,
earning
her
Ph.D.
in
astrophysics
.


## Using NLTK Library

In [7]:
import nltk

NLTK is more likely a DSLR camera, where we have to set up everything. While Spacy likes a mobile phone that comes with pre-defined settings.

#### 1. Sentence Tokenization

In [9]:
from nltk.tokenize import sent_tokenize

In [10]:
sent_tokenize("Dr. Smith, renowned for his groundbreaking research in neuroscience, will deliver the keynote address at the conference. After years of dedicated study, Sarah finally achieved her dream of becoming Dr. Johnson, earning her Ph.D. in astrophysics.")

['Dr. Smith, renowned for his groundbreaking research in neuroscience, will deliver the keynote address at the conference.',
 'After years of dedicated study, Sarah finally achieved her dream of becoming Dr. Johnson, earning her Ph.D. in astrophysics.']

#### 2. Word Tokenization

In [11]:
from nltk.tokenize import word_tokenize

In [12]:
word_tokenize("Dr. Smith, renowned for his groundbreaking research in neuroscience, will deliver the keynote address at the conference. After years of dedicated study, Sarah finally achieved her dream of becoming Dr. Johnson, earning her Ph.D. in astrophysics.")

['Dr.',
 'Smith',
 ',',
 'renowned',
 'for',
 'his',
 'groundbreaking',
 'research',
 'in',
 'neuroscience',
 ',',
 'will',
 'deliver',
 'the',
 'keynote',
 'address',
 'at',
 'the',
 'conference',
 '.',
 'After',
 'years',
 'of',
 'dedicated',
 'study',
 ',',
 'Sarah',
 'finally',
 'achieved',
 'her',
 'dream',
 'of',
 'becoming',
 'Dr.',
 'Johnson',
 ',',
 'earning',
 'her',
 'Ph.D.',
 'in',
 'astrophysics',
 '.']