# Preparing your environment

For this course, we will only use the `python` programming language. I use the latest versions of all packages and Python 3.10 with pip package manager. You are free to use other versions or other package managers of course.

We will make extensive use of the following packages
* spaCy
* pandas
* transformers 🤗
* datasets🤗
* sklearn
* matplotlib

The following code is to install and test if your environment works as intended, so that you don't lose time during the course.

Python dependencies can be real nasty !

### Check Python version

In [None]:
import sys
assert sys.version_info.major==3, "Python 3.x is required"
if sys.version_info.minor<10: print("Warning: Python 3.10 is recommended")
else: print("Python >= 3.10 👍")

## Installation

In [None]:
!pip install  -U spacy scikit-learn matplotlib transformers datasets pandas wordfreq spacy-transformers pytorch-transformers sentence-transformers faiss-cpu langchain tqdm datasets accelerate torch nltk

Test basic imports

In [None]:
import spacy
import pandas as pd
import numpy as np

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import ConfusionMatrixDisplay

from spacy.lang.en.stop_words import STOP_WORDS
from spacy.lang.en import English
from spacy.vectors import Vectors

from matplotlib import pyplot as plt

print("It works!")

## Download spacy models

In [None]:
!python -m spacy download en_core_web_lg

In [None]:
!python -m spacy download en_core_web_md

In [None]:
!python -m spacy download en_core_web_sm

### Test if that worked

In [None]:
import spacy
nlp = spacy.load('en_core_web_sm')
nlp("Small model works!")

In [None]:
nlp = spacy.load('en_core_web_md')
nlp("Medium model works!")

In [None]:
nlp = spacy.load('en_core_web_lg')
nlp("Large model works!")

## Download HuggingFace models

In [None]:
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=3)
print("It works!")

## Download nltk packages

In [None]:
import nltk
nltk.download('punkt')