# Installing spacy

When working with text it is convenient to use some language tools in your analysis'. spaCy is such a tool.

spaCy is an open-source natural language processing (NLP) library for Python. It is designed to be fast, efficient, and easy to use for various NLP tasks. spaCy provides pre-trained language models and a wide range of tools and features for working with text data, making it a popular choice for NLP tasks in research and industry applications.

Here are some key features and use cases of spaCy:

- Tokenization: spaCy can segment text into words, punctuation, and other meaningful tokens. It can handle complex tokenization tasks, including splitting contractions and handling special cases like hyphenated words.

- Part-of-Speech Tagging: It can assign parts of speech (e.g., noun, verb, adjective) to each token in a text.

- Named Entity Recognition (NER): spaCy can identify and classify named entities in text, such as names of persons, organizations, locations, dates, and more.

- Dependency Parsing: It can analyze the grammatical structure of a sentence by identifying the relationships between words and their dependencies.

- Lemmatization: spaCy can convert words to their base or dictionary forms (lemmas). For example, it can transform "running" to "run" and "better" to "good."

- Text Classification: It supports text classification tasks, such as sentiment analysis and topic classification, using machine learning models.

- Entity Linking: It can link named entities in the text to external knowledge bases or databases.

- Training Custom Models: You can train custom models using spaCy, which allows you to create models for specific languages or domains.

spaCy is known for its performance and efficiency, making it suitable for both small-scale NLP tasks and large-scale text processing applications. It supports multiple languages and has a rich ecosystem of extensions and third-party packages to enhance its capabilities.

We will not cover it all, but use some functionality from spacy throughout the course. You can read more about spaCy here: https://spacy.io/

To get access to the functionality we need to download and install spacy. This can be done directly from this jupyter notebook by running the following code snippets. You can run the code either by 1) clicking on the code block and on the "run"-button in top of the page, or by 2) clicking on the code block and press 'Shift+enter'. (you should se a star to the right of the code block if it is running or a number if it is done running the code).

In [None]:
!pip install spacy

This code installs spacy. it might take some time - if it takes more than 15 minutes, consider restarting the kernel and try again (in top of the page)

We also need to download some models for each of the languages we want to work with. In this case we will download an english model (en_core_web_lg) and an english model a danish model (da_core_news_lg)

The english model

In [None]:
!python -m spacy download en_core_web_sm

In [None]:
!python -m spacy download en_core_web_md

In [None]:
!python -m spacy download en_core_web_lg

The danish model

In [None]:
!python -m spacy download da_core_news_sm

In [None]:
!python -m spacy download da_core_news_md

In [None]:
!python -m spacy download da_core_news_lg

This installation is only needed once. From now on you only need to write the following code when using the spaCy-library

In [None]:
import spacy

If you work with english texts you need to run the following code to specify the model:

In [None]:
nlp = spacy.load("en_core_web_lg")

or if you work with danish texts you need to run the following code to specify the model:

In [None]:
nlp = spacy.load("da_core_news_lg")

# Install DaCy

Similar to spacy, we will also uce DaCy. DaCy is a Danish natural language preprocessing framework made with SpaCy. You can read more about DaCY here: https://github.com/centre-for-humanities-computing/DaCy

Install DaCy with this code:

In [None]:
!pip install DaCy

We can now use dacy by importing it, just as we did with spacy

In [None]:
import dacy

You don't need to download specific models for dacy, since dacy already includes several models, which you can see by running the following code:

In [None]:
for model in dacy.models():
    print(model)

If everything works so far (if you see a small number to the left of your code), you are done installing and ready to get started !