In block 3, we will start interacting with the world outside of our neat Notebooks, through working with files and importing external modules. While this is quite cool, it brings new types of errors that might occur. These errors might not be logical errors in your program, but rather caused by circumstances in your system. For example, you are trying to write a file to a directory that does not exist. Or, you are trying to use an external library that is not installed in your computer.

The purpose of this Notebook is to get you set up for block 3 and remove/minimize such errors in case they appear on your computer.

### 1. Installation

We will use several existing libraries in this block. Luckily, these are standard libraries and are conveniently included in the Python and Anaconda installations. The only module that might cause us problems is the Natural Language ToolKit (NLTK), which has many specific sub-libraries, and some of them might not come as part of the Anaconda installation. So let's check this to be sure.

First, we `import` NLTK (we will learn more about the import command in this block, though it is quite intuitive):

In [2]:
import nltk # include NLTK in your program

Successful?! If yes, we can now use various standard text processing functions in NLTK. Let's now check if the NLTK functions we are interested in exist:

In [3]:
text = "This example sentence is used for illustrating some basic NLP tasks. Language is awesome!"

# Tokenization
tokens = nltk.word_tokenize(text)

# Sentence splitting
sentences = nltk.sent_tokenize(text)

# POS tagging
tagged_tokens = nltk.pos_tag(tokens)

# Lemmatization
lmtzr = nltk.stem.wordnet.WordNetLemmatizer()
lemma=lmtzr.lemmatize(tokens[1], 'n')

In the assignment, we will use the VADER tool. Let's confirm that we have that one installed too:

In [4]:
# Import the sentiment analyzer class.
from nltk.sentiment.vader import SentimentIntensityAnalyzer
# NOTE: this will produce a warning, but you can safely ignore it.
sid = SentimentIntensityAnalyzer()



This might produce a warning, but you can safely ignore that. If there is an error, ask for help.

No error? Then NLTK is running smoothly on your machine. Way to go ;-)

### 2. Directories/folders

Problems with writing in non-existing directories, or reading non-existing files, or accessing directories that we are not permitted to access, are common and usually easy to fix. Let's make sure that the directories we need for this block exist on your machine.

In [10]:
import os.path as path

locations_to_test = ('../Data', '../Data/Charlie/charlie.txt', '../Data/Dreams/', 
                     '../Data/baby_names/names_by_state', '../Data/Debate/debate.csv', '../Data/LCohen/')

for location in locations_to_test:
    assert path.exists(location), f"{location} does not exist on your machine!"


Also, let's make sure that the needed folders are not empty:

In [9]:
from os import listdir as ls

non_empty_dirs = ('../Data', '../Data/Dreams/', '../Data/baby_names/names_by_state/', '../Data/LCohen/')
for directory in non_empty_dirs:
    assert ls(directory), f"{directory} is empty"

If you have encountered no error until here, then you are all set!