# Notes on learning NLTK

This notebook is a collection of notes and codes developed during my studies of the NLTK library.

## Index
* [Setting Up the Environment](#setting-up-the-environment)
* [Language Processing and Python](#language-processing-and-python)

-------------------------------------------------------------------------------
## Setting Up the Environment
NLTK stands for **Natural Language Toolkit** and it is the most used Python library used to work with human language. More information about it can be found on the website [www.nltk.org](www.nltk.org).

At the moment of writing, the following is installed system-wide:

```
C:\nltk>python
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:06:47) [MSC v.1914 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

C:\notebooks>pip list
Package            Version
------------------ -------
backcall           0.1.0
bleach             3.0.2
colorama           0.4.0
decorator          4.3.0
defusedxml         0.5.0
entrypoints        0.2.3
ipykernel          5.1.0
ipython            7.1.1
ipython-genutils   0.2.0
ipywidgets         7.4.2
jedi               0.13.1
Jinja2             2.10
jsonschema         2.6.0
jupyter            1.0.0
jupyter-client     5.2.3
jupyter-console    6.0.0
jupyter-core       4.4.0
MarkupSafe         1.0
mistune            0.8.4
nbconvert          5.4.0
nbformat           4.4.0
notebook           5.7.0
pandocfilters      1.4.2
parso              0.3.1
pickleshare        0.7.5
pip                18.1
prometheus-client  0.4.2
prompt-toolkit     2.0.7
Pygments           2.2.0
python-dateutil    2.7.5
pywinpty           0.5.4
pyzmq              17.1.2
qtconsole          4.4.2
Send2Trash         1.5.0
setuptools         40.5.0
six                1.11.0
terminado          0.8.1
testpath           0.4.2
tornado            5.1.1
traitlets          4.3.2
virtualenv         16.0.0
wcwidth            0.1.7
webencodings       0.5.1
widgetsnbextension 3.4.2
```

The first thing to do in order to work with NLTK is to install it. Detailed instructions are available on the [Installing NLTK](http://www.nltk.org/install.html) page. In this section, I will detail what I installed on my Windows system. Should I ever install it on another system, I will integrate it.


### Install Numpy
[NumPy](http://www.numpy.org/) is the fundamental package for scientific computing with Python. Install it on your system with the following command

```
C:\notebooks>pip install numpy
Collecting numpy
  Downloading https://files.pythonhosted.org/packages/42/5a/eaf3de1cd47a5a6baca4
1215fba0528ee277259604a50229190abf0a6dd2/numpy-1.15.4-cp37-none-win32.whl (9.9MB
)
    100% |████████████████████████████████| 9.9MB 2.1MB/s
Installing collected packages: numpy
Successfully installed numpy-1.15.4
```

### Install NLTK
In order to install NLTK libraries, run the following command:

```
(nltk) C:\GitHub\nltk>pip install nltk
Collecting nltk
Collecting six (from nltk)
  Using cached https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bf
a78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Installing collected packages: six, nltk
Successfully installed nltk-3.3 six-1.11.0
```

With this, we are ready to start working with NLTK, and we'll start by following the [Natural Language Processing with Python](http://www.nltk.org/book/) book.

-------------------------------------------------------------------------------
## Language Processing and Python
In this notebook, I will focus exclusively on the NLTK argument, skipping all the Python related parts of the book. **This is NOT a replacement for reading the book and walk the path yourself!**

Let's start by downloading and install the data required for the book, by executing the following code that will install all the data into the local directory `.\nltk-data`.

In [6]:
import nltk
nltk.download('book',download_dir='.\\nltk-data')

[nltk_data] Downloading collection 'book'
[nltk_data]    | 
[nltk_data]    | Downloading package abc to .\nltk-data...
[nltk_data]    |   Unzipping corpora\abc.zip.
[nltk_data]    | Downloading package brown to .\nltk-data...
[nltk_data]    |   Unzipping corpora\brown.zip.
[nltk_data]    | Downloading package chat80 to .\nltk-data...
[nltk_data]    |   Unzipping corpora\chat80.zip.
[nltk_data]    | Downloading package cmudict to .\nltk-data...
[nltk_data]    |   Unzipping corpora\cmudict.zip.
[nltk_data]    | Downloading package conll2000 to .\nltk-data...
[nltk_data]    |   Unzipping corpora\conll2000.zip.
[nltk_data]    | Downloading package conll2002 to .\nltk-data...
[nltk_data]    |   Unzipping corpora\conll2002.zip.
[nltk_data]    | Downloading package dependency_treebank to .\nltk-
[nltk_data]    |     data...
[nltk_data]    |   Unzipping corpora\dependency_treebank.zip.
[nltk_data]    | Downloading package genesis to .\nltk-data...
[nltk_data]    |   Unzipping corpora\genesis.z

True

Now, we can load the content of the `book`'s data

In [7]:
from nltk.book import *

*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
