# Course set-up

In [1]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Fall 2024"

from numpy.distutils.system_info import conda


  `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
  of the deprecation of `distutils` itself. It will be removed for
  Python >= 3.12. For older Python versions it will remain present.
  It is recommended to use `setuptools < 60.0` for those Python versions.
  For more details, see:
    https://numpy.org/devdocs/reference/distutils_status_migration.html 


  from numpy.distutils.system_info import conda


ImportError: cannot import name 'conda' from 'numpy.distutils.system_info' (/Users/cmandal/miniconda3/envs/nlu/lib/python3.9/site-packages/numpy/distutils/system_info.py)

This notebook covers the steps you'll need to take to get set up for [CS224u](http://web.stanford.edu/class/cs224u/).

## Contents

1. [Anaconda](#Anaconda)
1. [The course Github repository](#The-course-Github-repository)
1. [Services](#Services)
1. [Additional installations](#Additional-installations)
1. [Jupyter notebooks](#Jupyter-notebooks)

## Anaconda

We recommend installing [the free Anaconda Python distribution](https://www.anaconda.com/products/individual), which includes IPython, Numpy, Scipy, matplotlib, scikit-learn, NLTK, and many other useful packages. This is not required, but it's an easy way to get all these packages installed. Unless you're very comfortable with Python package management and like installing things, this is the option for you!

Please be sure that you download the __Python 3__ version, which currently installs Python 3.9. __Our codebase is not compatible with Python 2__.

One you have Anaconda installed, create a virtual environment for the course. In a terminal, run

```conda create -n nlu python=3.9 anaconda```

to create an environment called `nlu`.

> NOTE: Run the above command without `anaconda` if you see this error message `PackagesNotFoundError: The following packages are not available from current channels: - anaconda`


Then, to enter the environment, run

```conda activate nlu```

To leave it, you can just close the window, or run

```conda deactivate```

If your version of Anaconda is older than version 4.4 (see `conda --version`), then replace `conda` with `source` in the above (and consider upgrading your Anaconda!).

[This page](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) has more detailed instructions on managing virtual environments with Anaconda.

## The course Github repository

The core materials for the course are on Github:

https://github.com/cgpotts/cs224u

We'll be working in this repository a lot, and it will receive updates throughout the quarter, as we add new materials and correct bugs.

If you're new to git and Github, we recommend using [Github's Desktop Apps](https://desktop.github.com). Then you just have to clone our repository and sync your local copy with the official one when there are updates. 

If you are comfortable with git in the command line, you can type the following command to clone the course's Github repo:

```git clone https://github.com/cgpotts/cs224u```

## Services

There are a variety of services that we recommend signing up for to help you with course work:

* [Google Colab](https://colab.research.google.com/signup/pricing): Browser-based system for working with noteoboks. This is free, but for $9.99/month you get a substantial upgrade in performance and reliability. Consider subscribing for three months – this is probably less than your least expensive required textbook (we have no textbook)!

* [SageMaker Studio Lab](https://studiolab.sagemaker.aws/): Similiar to Colab but often with better GPU support. This service is currently free for all users.

* [OpenAI](https://beta.openai.ai): New accounts get <s>18</s> 5 dollars in free credits, and some of the models may still be free to use.

## Additional installations

Be sure to do these additional installations from [inside your virtual environment](#Anaconda) for the course! Before you proceed from here, perhaps run

```conda activate nlu```

to make sure you are in that environment.

In [2]:
print("conda")

conda


First we would need to define the device used throughout the assignments. If you have a Nvidia GPU the device you would need to set would be of form `cuXXX`. Google Colab uses currently Cuda 12.1 so the device could be set then to `cu121` but can be differently based on the CUDA version installed on your machine. Refer to `nvidia-smi` command to look it up. If you don't have a Nvidia GPU it should be then set to `cpu`. Below we simply default to `cpu` for Windows/Linux systems, while for MacOS this is not required and therefore can be skipped.

*NOTE:* uncomment the lines with `torch` in the `requirements.txt` file and then run the following commands to install the required packages.

**For Linux**:

```
export DEVICE=cpu
```

**For Windows**:

```
set DEVICE=cpu
```

After setting the device we need to install the core packages required by the assignments.

**For MacOS**:

```
pip install -f https://download.pytorch.org/whl/torch_stable.html -r requirements.txt
pip install torch
```

**For Linux**:

```
pip install \
    -f https://download.pytorch.org/whl/torch/ \
    -r requirements.txt
```

**For Windows**:

```
pip install ^
    -f https://download.pytorch.org/whl/torch/ ^
    -r requirements.txt
```

Our most important and finicky installations relate to our deep learning code. The following will check that you have the desired versions of the core libraries ([PyTorch](https://pytorch.org/) and [Hugging Face](https://huggingface.co/) `transformers`):

In [3]:
import torch

assert torch.__version__.startswith('2.4.0'),\
    f"torch version is {torch.__version__}"

print(f"torch version is {torch.__version__}")

torch version is 2.4.0


In [4]:
import transformers
from packaging import version

assert version.parse(transformers.__version__) > version.parse("4.37"),\
    f"transformers version is {transformers.__version__}"

print(f"transformers version is {transformers.__version__}")


transformers version is 4.41.2


If the above tests didn't pass, you *might* be okay, but it is probably best to change your versions inside the `nlu` virtual environment. These are fast-changing libraries and we can't ensure complete backward compatibility.

## Jupyter notebooks

The majority of the materials for this course are Jupyter notebooks, which allow you to work in a browser, mixing code and description. It's a powerful form of [literate programming](https://en.wikipedia.org/wiki/Literate_programming), and increasingly a standard for open science.

To start a notebook server, navigate to the directory where you want to work and run

```jupyter notebook --port 5656```

The port specification is optional. 

This should launch a browser that takes you to a view of the directory you're in. You can then open notebooks for working and create new notebooks.

A major advantage of working with Anaconda is that you can switch virtual environments from inside a notebook, via the __Kernel__ menu. If this isn't an option for you, then run this command while inside your virtual environment:

```
python -m ipykernel install --user --name nlu --display-name "nlu"
```

(If you named your environment something other than `nlu`, then change the `--name` and `--display-name` values.) 

[Additional discussion of Jupyter and kernels.](https://stackoverflow.com/a/44786736)

For some tips on getting started with notebooks, see [our Jupyter tutorial](tutorial_jupyter_notebooks.ipynb).

**Alternatively**, if you are *visual studio code* user, you can use the [Jupyter extension](https://code.visualstudio.com/docs/datascience/jupyter-notebooks) to run notebooks directly in the editor.