# Spacy Vs NLTK

Spacy and NLTK are language processing tools that allows us to extract information from a text.

Here's the difference between both of them

<table>
    <thead>
        <th>Spacy</th>
        <th>NLTK</th>
    </thead>
    <tbody>
        <tr>
            <td>Object Oriented</td>
            <td>Mainly a string processing library</td>
        </tr>
        <tr>
            <td>It gives the best and most effective algorithm for a given task. If you care about the end result, Spacy is the best option (Best for production)</td>
            <td>It offer access to a lot of algorithm. If you care about specific algorithm and customization, go for NLTK (Best for research)</td>
        </tr>
        <tr>
            <td>User friendly</td>
            <td>User friendly too but less user friendly compared to Spacy</td>
        </tr>
    </tbody>
</table>


### Spacy Demo


In [25]:
import spacy

In [2]:
nlp = spacy.load("en_core_web_sm")

In [4]:
text = "Dr. Strange likes Indonesian food a lot. He likes nasi goreng, ayam bakar, pempek, etc. He's planning to go to Indonesian one day"

In [24]:
# Split by sentences
sentences = []
for sent in nlp(text).sents:
    sentences.append(sent)
sentences


[Dr. Strange likes Indonesian food a lot.,
 He likes nasi goreng, ayam bakar, pempek, etc.,
 He's planning to go to Indonesian one day]

In [23]:
# Split by words
words = []
for sent in nlp(text).sents:
    for word in sent:
        words.append(word)
words


[Dr.,
 Strange,
 likes,
 Indonesian,
 food,
 a,
 lot,
 .,
 He,
 likes,
 nasi,
 goreng,
 ,,
 ayam,
 bakar,
 ,,
 pempek,
 ,,
 etc,
 .,
 He,
 's,
 planning,
 to,
 go,
 to,
 Indonesian,
 one,
 day]

### NLTK Demo


In [28]:
import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

nltk.download("punkt")


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\hp\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping tokenizers\punkt.zip.


True

In [30]:
sent_tokenize(text)

['Dr.',
 'Strange likes Indonesian food a lot.',
 'He likes nasi goreng, ayam bakar, pempek, etc.',
 "He's planning to go to Indonesian one day"]

In [29]:
word_tokenize(text)

['Dr',
 '.',
 'Strange',
 'likes',
 'Indonesian',
 'food',
 'a',
 'lot',
 '.',
 'He',
 'likes',
 'nasi',
 'goreng',
 ',',
 'ayam',
 'bakar',
 ',',
 'pempek',
 ',',
 'etc',
 '.',
 'He',
 "'s",
 'planning',
 'to',
 'go',
 'to',
 'Indonesian',
 'one',
 'day']