# Installing spaCy

SpaCy is a set of tools for Natural Language Processing. For more info: [spaCy](https://spacy.io/).

This notebook will help you install it, but you can also go to the [installation instructions](https://spacy.io/usage) for the best version for your system. 

## Two ways of installing spaCy

### 1. Jupyter notebook
You can install it from the notebook, by running the 2 lines below in your Jupyter notebook. Remember you only have to do this once. 

### 2. Command prompt
* In Windows, if you have Anaconda, you can open an Anaconda powershell prompt. If you don't have Anaconda, just open a Windows powershell in admin mode.
* On a Mac, open a terminal window (spotlight and type "terminal"). Or look for "terminal" in your apps folder. 

Now that you have a command window open, simply go to the spaCy website and choose your operating system to copy and paste the right commands (one at a time). Click on the right options for you from here: https://spacy.io/usage  

## Compatibility problems
If you get an error that says something about numpy, you can do two things, below.

Possible error messages:
* numpy.ndarray ...
* numpy.dtype ...

### 1. Follow instructions on the spaCy site
Go to the heading "Using build constraints when compiling from source" in https://spacy.io/usage. In a command prompt/terminal, type the two lines (one at a time) that start with `PIP_CONSTRAINT`.

### 2. Downgrade numpy
Type one of the two commands below, either in a notebook or in terminal/command prompt:

* In notebook: `!pip install numpy==1.26.4`
* In command prompt: `pip install numpy==1.26.4`

### Installing spaCy and language model

If running this notebook locally, you'll only have to do the next two lines once.

In [1]:
pip install numpy==1.26.4

Collecting numpy==1.26.4
  Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl.metadata (61 kB)
Downloading numpy-1.26.4-cp311-cp311-win_amd64.whl (15.8 MB)
   ---------------------------------------- 0.0/15.8 MB ? eta -:--:--
   ----- ---------------------------------- 2.4/15.8 MB 12.3 MB/s eta 0:00:02
   ------------- -------------------------- 5.5/15.8 MB 13.4 MB/s eta 0:00:01
   ------------------- -------------------- 7.6/15.8 MB 12.4 MB/s eta 0:00:01
   ------------------------ --------------- 9.7/15.8 MB 12.1 MB/s eta 0:00:01
   ---------------------------- ----------- 11.3/15.8 MB 10.8 MB/s eta 0:00:01
   ----------------------------------- ---- 13.9/15.8 MB 11.2 MB/s eta 0:00:01
   ---------------------------------------- 15.8/15.8 MB 11.1 MB/s eta 0:00:00
Installing collected packages: numpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.0.2
    Uninstalling numpy-2.0.2:
      Successfully uninstalled numpy-2.0.2
Successfully installed numpy-1.26.4
Note


[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
!pip install spacy

Collecting spacy
  Downloading spacy-3.8.4-cp311-cp311-win_amd64.whl.metadata (27 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Downloading spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading murmurhash-1.0.12-cp311-cp311-win_amd64.whl.metadata (2.2 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading cymem-2.0.11-cp311-cp311-win_amd64.whl.metadata (8.8 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Downloading preshed-3.0.9-cp311-cp311-win_amd64.whl.metadata (2.2 kB)
Collecting thinc<8.4.0,>=8.3.4 (from spacy)
  Downloading thinc-8.3.4-cp311-cp311-win_amd64.whl.metadata (15 kB)
Collecting wasabi<1.2.0,>=0.9.1 (from spacy)
  Downloading wasabi-1.1.3-py3-none-any.whl.metadata (28 kB)
Collecting srsly<3.0.0,>=2.4.3 (from spacy)
  Downloading srsly-2.5.1-cp311-cp311-win_amd64


[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     --------- ------------------------------ 2.9/12.8 MB 15.2 MB/s eta 0:00:01
     ------------------- -------------------- 6.3/12.8 MB 16.1 MB/s eta 0:00:01
     ------------------------------ -------- 10.0/12.8 MB 16.8 MB/s eta 0:00:01
     --------------------------------------- 12.8/12.8 MB 16.7 MB/s eta 0:00:00
Installing collected packages: en-core-web-sm
Successfully installed en-core-web-sm-3.8.0
[38;5;2mâœ” Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')



[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python.exe -m pip install --upgrade pip


### Loading spaCy and language model
Installation (if local) only needs to be done once. However, you need to import the spaCy module and load the language model every time you want to use it. 

Here, we are loading the small model for English derived from web data. There are other [models](https://spacy.io/usage/models) for English and for other languages. 

In [4]:
import spacy

In [5]:
nlp = spacy.load("en_core_web_sm")

### Testing installation

We'll define a sentence, process it with spaCy and check the output. This will test whether all the components are installed.

In [6]:
sentence = "This is a test sentence about Canada, but you can type whatever you want here."

### Converting string to doc with spaCy
spaCy has a special type of object, a `Doc`. It's the entire processing pipeline for any NLP system, in a single object. It takes a text, e.g., `sent1` and applies all the NLP steps to it (tokenization, tagging, named entity recognition). Once you have converted a string (a sentence) or a whole text to Doc, you can access everything that spaCy has done with it, i.e., the entire structure of language information that it has applied to it, with labels. spaCy refers to that language information and labels as 'linguistic annotations'. spaCy does this with a simple function, `nlp()`.

![spaCy pipeline](https://spacy.io/images/pipeline.svg)

Image from https://spacy.io/usage/processing-pipelines

In [7]:
doc = nlp(sentence)

### Accesing the information in the Doc object

`doc` contains lots of [useful information](https://spacy.io/api/doc):

* tokens (words)
* lemmas
* morphology
* part of speech tags (pos tags) 
* syntactic structure (a parse tree)
* named entities

In [8]:
# print word tokens

for token in doc:
    print(token)
    

This
is
a
test
sentence
about
Canada
,
but
you
can
type
whatever
you
want
here
.


In [9]:
# lemmas

for token in doc:
    print(token.lemma_)

this
be
a
test
sentence
about
Canada
,
but
you
can
type
whatever
you
want
here
.


In [10]:
# morphology

for token in doc:
    print(token.text, token.morph)

This Number=Sing|PronType=Dem
is Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin
a Definite=Ind|PronType=Art
test Number=Sing
sentence Number=Sing
about 
Canada Number=Sing
, PunctType=Comm
but ConjType=Cmp
you Case=Nom|Person=2|PronType=Prs
can VerbForm=Fin
type VerbForm=Inf
whatever 
you Case=Nom|Person=2|PronType=Prs
want Tense=Pres|VerbForm=Fin
here PronType=Dem
. PunctType=Peri


In [11]:
# POS tags (more on this below)

for token in doc:
    print(token.text, token.pos_)

This PRON
is AUX
a DET
test NOUN
sentence NOUN
about ADP
Canada PROPN
, PUNCT
but CCONJ
you PRON
can AUX
type VERB
whatever PRON
you PRON
want VERB
here ADV
. PUNCT


In [12]:
# named entities

for ent in doc.ents:
    print(ent.text, ent.label_)

Canada GPE
