# Preparing your environment

For this course, we will only use the `python` programming language. I use the latest versions of all packages and Python 3.10 with pip package manager. You are free to use other versions or other package managers of course.

We will make extensive use of the following packages
* spaCy
* pandas
* transformers 🤗
* datasets🤗
* sklearn
* matplotlib
* langchain

The following code is to install and test if your environment works as intended, so that you don't lose time during the course.

Python dependencies can be real nasty !

### Check Python version

In [None]:
import sys
assert sys.version_info.major==3, "Python 3.x is required"
if sys.version_info.minor<10: print("Warning: Python 3.10 is recommended")
else: print("Python >= 3.10 👍")

## Installation

In [None]:
!test -f pyproject.toml || wget https://raw.githubusercontent.com/JosPolfliet/vlerick-mai-nlp-2025/main/pyproject.toml
!pip install uv
!uv pip install -e .

Test basic imports

In [None]:
import spacy
import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import ConfusionMatrixDisplay

from spacy.lang.en.stop_words import STOP_WORDS
from spacy.lang.en import English
from spacy.vectors import Vectors

from matplotlib import pyplot as plt

from langchain_anthropic import ChatAnthropic
from langchain.prompts import ChatPromptTemplate

print("It works!")

## Download spacy models

In [5]:
# If you use `uv` and get a "No module named pip" error, run `uv venv --seed` first
!python -m spacy download en_core_web_lg # used in word arithmetic and ESG classifier
!python -m spacy download en_core_web_md # used in BOW


Collecting en-core-web-lg==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.8.0/en_core_web_lg-3.8.0-py3-none-any.whl (400.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.7/400.7 MB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:03[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
Collecting en-core-web-md==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.8.0/en_core_web_md-3.8.0-py3-none-any.whl (33.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m33.5/33.5 MB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_md')


### Test if that worked

In [4]:
nlp = spacy.load('en_core_web_md')
nlp("Medium model works!")

NameError: name 'spacy' is not defined

In [None]:
nlp = spacy.load('en_core_web_lg')
nlp("Large model works!")

## Download HuggingFace models

In [None]:
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=3)
print("It works!")

## Download nltk packages

In [None]:
import nltk
nltk.download('punkt')