# Lesson 5: Introduction and Setup of the Experimental Environment

## Introduction (1 minute)

Welcome to our hands-on session on setting up the experimental environment for our LLM course. In the next 12 minutes, we'll dive into the practical aspects of preparing your workspace. This setup will be crucial for all the exercises and projects we'll be doing throughout the course.

## Lesson Objectives

By the end of this lesson, you will:
1. Understand the components of our course's experimental environment
2. Set up and configure the online server environment
3. Learn about model storage and invocation techniques
4. Troubleshoot common setup issues

Let's get our hands dirty!

## 1. Introduction to the Experimental Environment (2 minutes)

Our cloud-based environment ensures everyone has access to the necessary computational resources. Here's what we're working with:

- Ubuntu 20.04 LTS as our operating system
- Python 3.8+ for programming
- CUDA 11.2 for GPU support
- PyTorch 1.9+ and Hugging Face Transformers 4.10+ for LLM work
- Jupyter Lab for interactive coding
- Git for version control
- Docker for containerization

Let's visualize our setup:

In [None]:
from graphviz import Digraph

dot = Digraph(comment='Course Environment')
dot.attr(rankdir='TB', size='8,8')

dot.node('A', 'Cloud Server (Ubuntu 20.04)')
dot.node('B', 'Python 3.8+')
dot.node('C', 'CUDA 11.2')
dot.node('D', 'PyTorch 1.9+')
dot.node('E', 'Transformers 4.10+')
dot.node('F', 'Jupyter Lab')
dot.node('G', 'Git')
dot.node('H', 'Docker')

dot.edge('A', 'B')
dot.edge('A', 'C')
dot.edge('B', 'D')
dot.edge('B', 'E')
dot.edge('A', 'F')
dot.edge('A', 'G')
dot.edge('A', 'H')

dot.render('course_environment', format='png', cleanup=True)
dot.view()

[Image Placeholder: Detailed diagram of the Course Environment]

## 2. Setup and Configuration of the Online Server Environment (5 minutes)

Let's go through the setup process step-by-step:

1. Log in to the provided cloud server:
   ```
   ssh student@course-server.com
   ```
   Use the password provided in your welcome email.

2. Update the system:
   ```
   sudo apt-get update && sudo apt-get upgrade -y
   ```

3. Install required system packages:
   ```
   sudo apt-get install -y build-essential libssl-dev zlib1g-dev \
   libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
   libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev \
   liblzma-dev python-openssl git
   ```

4. Install Miniconda:
   ```
   wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
   bash Miniconda3-latest-Linux-x86_64.sh
   ```
   Follow the prompts to install Miniconda.

5. Create a new conda environment:
   ```
   conda create -n llm_env python=3.8 -y
   conda activate llm_env
   ```

6. Install PyTorch with CUDA support:
   ```
   conda install pytorch torchvision torchaudio cudatoolkit=11.2 -c pytorch
   ```

7. Install Transformers and other required packages:
   ```
   pip install transformers datasets scikit-learn matplotlib jupyter
   ```

8. Clone the course repository:
   ```
   git clone https://github.com/course/llm-training.git
   cd llm-training
   ```

9. Start Jupyter Lab:
   ```
   jupyter lab --no-browser --port=8888
   ```

10. Set up port forwarding on your local machine:
    ```
    ssh -N -f -L localhost:8888:localhost:8888 student@course-server.com
    ```

11. Open `http://localhost:8888` in your browser and enter the token provided in the terminal.

## 3. Model Storage and Invocation (3 minutes)

We'll be working with various pre-trained models throughout the course. Here's how we'll manage and use them:

1. Models will be stored in a dedicated `/models` directory on the server.

2. We'll use the Hugging Face `transformers` library to load and use models. Here's a quick example:

In [None]:
from transformers import AutoModel, AutoTokenizer
import torch

model_name = "bert-base-uncased"
model_path = "/models/" + model_name

# Load model and tokenizer
model = AutoModel.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Example usage
text = "Hello, world!"
inputs = tokenizer(text, return_tensors="pt")

# Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
inputs = {k: v.to(device) for k, v in inputs.items()}

outputs = model(**inputs)

print(f"Input text: {text}")
print(f"Output shape: {outputs.last_hidden_state.shape}")
print(f"Device used: {device}")

This code loads a BERT model, moves it to GPU if available, and performs a forward pass with a simple input.

3. To download a new model, use the following code:

In [None]:
from transformers import AutoModel, AutoTokenizer

model_name = "gpt2"
save_path = f"/models/{model_name}"

model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

model.save_pretrained(save_path)
tokenizer.save_pretrained(save_path)

print(f"Model and tokenizer saved to {save_path}")

## Troubleshooting Common Issues (1 minute)

1. If you encounter CUDA out of memory errors, try reducing batch sizes or model sizes.
2. For package conflicts, create a new conda environment and install packages one by one.
3. If Jupyter Lab doesn't start, check if the port is already in use: `lsof -i :8888`

## Additional Resources (1 minute)

1. Jupyter Lab Documentation: https://jupyterlab.readthedocs.io/en/stable/
2. Conda Cheat Sheet: https://docs.conda.io/projects/conda/en/latest/user-guide/cheatsheet.html
3. PyTorch Documentation: https://pytorch.org/docs/stable/index.html
4. Hugging Face Transformers Documentation: https://huggingface.co/transformers/
5. Git Cheat Sheet: https://education.github.com/git-cheat-sheet-education.pdf
6. CUDA Programming Guide: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

## Conclusion and Next Steps (1 minute)

Congratulations! You've now set up a powerful environment for LLM development. Before our next lesson:

1. Try loading different models and running inferences
2. Experiment with GPU vs. CPU performance
3. Familiarize yourself with Jupyter Lab's interface

If you encounter any issues, please reach out to our support team at support@llmcourse.com.

Next, we'll dive into the fascinating world of tokenization and embeddings. Get ready for some exciting NLP adventures!