## What is spaCy?

spaCy is a **natural language processing (NLP)** framework designed to parse human language in computational systems. NLP, a branch of linguistics, enables applications across industries, from legal document analysis to financial forecasting. Language ambiguity—such as double negatives—poses challenges for computers, which rely on **artificial neural networks (ANNs)** and deep learning to handle complexity. spaCy leverages these advancements to process text efficiently, with tools like **transformer models** driving modern NLP innovations.

## How to Install spaCy

In order to install spaCy, visit their website, here: https://spacy.io/usage . They have a user-friendly interface. Input your device configurations, e.g. Mac or Windows or Linux, etc. The web-app will automatically populate the commands that you need to execute to get started. Since this is a Jupyter notebook, we can install these with a “#” before in a cell to indicate that we want to run a terminal command. I will be installing spaCy and the small English model, en_core_web_sm.

In [7]:
# Install the spaCy library using pip package manager
# pip install spacy

In [None]:
# Download the pre-trained English language model (en_core_web_sm)
# This model includes components for tokenization, POS tagging, NER, and dependency parsing
# python -m spacy download en_core_web_sm

In [None]:
# Import the spaCy library for natural language processing
import spacy

In [None]:
# Load the pre-trained English language model
# The nlp object is the core component for processing text with spaCy
nlp = spacy.load("en_core_web_sm")

In [11]:
# Simple test: Process a text and display tokens
text = "A quick brown fox jumps over the lazy dog"
doc = nlp(text)

print("Tokens in the text:")
for token in doc:
    print(f"{token.text:10} -> {token.pos_:5} ({spacy.explain(token.pos_)})")

Tokens in the text:
A          -> DET   (determiner)
quick      -> ADJ   (adjective)
brown      -> ADJ   (adjective)
fox        -> NOUN  (noun)
jumps      -> VERB  (verb)
over       -> ADP   (adposition)
the        -> DET   (determiner)
lazy       -> ADJ   (adjective)
dog        -> NOUN  (noun)
