# Course set-up

In [1]:
__author__ = "Christopher Potts"
__version__ = "CS224u, Stanford, Spring 2022"

This notebook covers the steps you'll need to take to get set up for [CS224u](http://web.stanford.edu/class/cs224u/).

## Contents

1. [Anaconda](#Anaconda)
1. [The course Github repository](#The-course-Github-repository)
1. [Main data distribution](#Main-data-distribution)
1. [Additional installations](#Additional-installations)
1. [Jupyter notebooks](#Jupyter-notebooks)

## Anaconda

We recommend installing [the free Anaconda Python distribution](https://www.anaconda.com/products/individual), which includes IPython, Numpy, Scipy, matplotlib, scikit-learn, NLTK, and many other useful packages. This is not required, but it's an easy way to get all these packages installed. Unless you're very comfortable with Python package management and like installing things, this is the option for you!

Please be sure that you download the __Python 3__ version, which currently installs Python 3.9. __Our codebase is not compatible with Python 2__.

One you have Anaconda installed, create a virtual environment for the course. In a terminal, run

```conda create -n nlu python=3.9 anaconda```

to create an environment called `nlu`.

Then, to enter the environment, run

```conda activate nlu```

To leave it, you can just close the window, or run

```conda deactivate```

If your version of Anaconda is older than version 4.4 (see `conda --version`), then replace `conda` with `source` in the above (and consider upgrading your Anaconda!).

[This page](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) has more detailed instructions on managing virtual environments with Anaconda.

## The course Github repository

The core materials for the course are on Github:

https://github.com/cgpotts/cs224u

We'll be working in this repository a lot, and it will receive updates throughout the quarter, as we add new materials and correct bugs.

If you're new to git and Github, we recommend using [Github's Desktop Apps](https://desktop.github.com). Then you just have to clone our repository and sync your local copy with the official one when there are updates. 

If you are comfortable with git in the command line, you can type the following command to clone the course's Github repo:

```git clone https://github.com/cgpotts/cs224u```

## Main data distribution

The datasets needed to run the course notebooks and complete the assignments are in the following zip archive:

http://web.stanford.edu/class/cs224u/data/data.tgz

We recommend that you download it, unzip it, and place it in the same directory as your local copy of this Github repository. If you decide to put it somewhere else, you'll need to adjust the paths given in the "Set-up" sections of essentially all the notebooks.

We recommend you to check the `md5` checksum of the `data.tgz` after the download. The current version (as of March 25, 2022), the checksum is `5e4a4e4c6b1aca47d711e25cb306a3aa`. If you see the different checksum, then please report this to the teaching staff.

## Additional installations

Be sure to do these additional installations from [inside your virtual environment](#Anaconda) for the course! Before you proceed from here, perhaps run

```conda activate nlu```

to make sure you are in that environment.

If you are running Anaconda, then you can simply run

```pip install -r requirements.txt```

from inside the course virtual environment to install the core additional packages.

People who aren't using Anaconda should edit `requirements.txt` so that it installs all the prerequisites that come with Anaconda and then run the above `pip` command from inside the course virtual environment to install the core additional packages.

Our most important and finicky installations relate to our deep learning code. The following will check that you have the desired versions of the core libraries ([PyTorch](https://pytorch.org/) and [Hugging Face](https://huggingface.co/) `transformers`):

In [3]:
import torch

assert torch.__version__ == '1.10.1',\
    f"torch version is {torch.__version__}"

In [2]:
import transformers

assert transformers.__version__ == '4.17.0',\
    f"transformers version is {transformers.__version__}"

If the above tests didn't pass, you *might* be okay, but it is probably best to change your versions inside the `nlu` virtual environment. These are fast-changing libraries and we can't ensure complete backward compatibility.

If you have a [CUDA-enabled GPU](https://developer.nvidia.com/cuda-gpus), we recommend following the instructions posted here for installing PyTorch in a way that will let you take advantage of this:

https://pytorch.org/get-started/locally/

## Jupyter notebooks

The majority of the materials for this course are Jupyter notebooks, which allow you to work in a browser, mixing code and description. It's a powerful form of [literate programming](https://en.wikipedia.org/wiki/Literate_programming), and increasingly a standard for open science.

To start a notebook server, navigate to the directory where you want to work and run

```jupyter notebook --port 5656```

The port specification is optional. 

This should launch a browser that takes you to a view of the directory you're in. You can then open notebooks for working and create new notebooks.

A major advantage of working with Anaconda is that you can switch virtual environments from inside a notebook, via the __Kernel__ menu. If this isn't an option for you, then run this command while inside your virtual environment:

```python -m ipykernel install --user --name nlu --display-name "nlu"```

(If you named your environment something other than `nlu`, then change the `--name` and `--display-name` values.) 

[Additional discussion of Jupyter and kernels.](https://stackoverflow.com/a/44786736)

For some tips on getting started with notebooks, see [our Jupyter tutorial](tutorial_jupyter_notebooks.ipynb).