# Basics of text processing with spaCy: First Notebook

Welcome to Jupyter notebooks! This is a practice notebook. You can edit any of these cells and run them with the `Run` button above, or using `ctrl + enter`. You can find a full ist of shortcuts under the `Help/Keyboard Shortcuts` menu above.

### I. Load spaCy

If you already installed [spaCy](https://spacy.io/usage), run the following cell to *import spaCy*. The cell should run. Otherwise, you will get an error.

In [9]:
import spacy

### II. Load spaCy's language models

Now, let's load the language models that we previously downloaded. If you did not download the language models, exit this notebook (close the window, then `ctrl + c` on the terminal that you used to open it) go back to the README.md or do `python -m spacy download en_core_web_sm` and `python -m spacy download en_core_web_trf`, and restart this notebook.

In [87]:
nlp_eff = spacy.load("en_core_web_sm") # Loads the web_sm model: - Size, but + Efficient
nlp_acc = spacy.load("en_core_web_trf") # Loads the web_trf model: + Size, but + Accurate

### III. Add some data and process it with the language models

Let's create a sentence and process it. You can alter the sentence below as you like, but remember to run the cell every time you alter it so that the code below knows the new sentence.

In [108]:
sentence = "Jane likes flying to Tokio for business."

# Now, let's process the sentence above using the two different language models for later comparison:
sentence_eff = nlp_eff(sentence) # en_core_web_sm
sentence_acc = nlp_acc(sentence) # en_core_web_trf

### IV. Dependencies

Let's take a look at the [dependencies](https://web.stanford.edu/~jurafsky/slp3/14.pdf) detected by spaCy. We will use spaCy's `displacy` submodule to graph these dependencies. But first, let's add some configuration for our graph. You can play with the options below, just remember to run the cell every time you alter something.

In [112]:
# `compact: True` will show a squared version of the dependency tree
# `add_lemma: True` will show a the `base form` of each word in the sentence
# `color: "#000000"` displays a black tree, but you can use color names or hex codes to change this
options_dep = {"compact": True, "add_lemma": True, "color": "#000000"}

Now, graph the dependency tree for our sentence. We'll use `sentence_eff` in this case, but you can use either.

In [113]:
spacy.displacy.render(sentence_eff, style="dep", jupyter=True, options=options_dep)

### V. Named Entities

Now let's look at the named entities detected by spaCy. We'll compare the web_sm model against the web_trf model.

In [116]:
# Let's use a common configuration for named entities.
# `ents: None` will show all entities
# `colors: {}` allows us to assign colors to some of those entities for a better comparison
options_ent = {"ents": None, "colors": {"PERSON": "cyan", "ORG": "#F67F77", "GPE": "#7BCB7E"}}

Now we will use the displacy module to display named entities. First, let's start with `sentence_eff`, which uses the efficient, but not-so-accurate `web_sm` model:

In [120]:
spacy.displacy.render(sentence_eff, style="ent", jupyter=True, options=options_ent)
print("web_sm model")

web_sm model


Let's compare the visualization above with the `sentence_acc`, which uses the more accurate model `web_trf`:

In [121]:
spacy.displacy.render(sentence_acc, style="ent", jupyter=True, options=options_ent)
print("web_trf model")

web_trf model


### VI. End of our First Notebook

This is the end of our first notebook. This is a small example of what we will be working on durin the workshop. More notebooks will be released during the workshop on May 21st, 2021.