## **1: The Building Blocks - Environment & PyTorch Tensors**

In this notebook, we'll set up a clean, modern Python environment and master the fundamental data object in all of deep learning: the tensor. Understanding tensors is crucialâ€”they are the foundation upon which all neural network operations are built.

### **Platform Disclaimer**

**Important:** This tutorial is designed for macOS, Linux, and Windows. The tooling (uv) is optimized for macOS and Linux. For GPU acceleration, PyTorch supports:

- `CUDA` (NVIDIA GPUs on Linux/Windows)
- `MPS` (Apple Silicon GPUs on macOS)
- `CPU` (fallback for all platforms)

The code will automatically detect and use the best available device.

---

### **1. Setting Up the Environment**

We'll use `uv` to create a clean Python environment. This ensures that our dependencies are isolated and won't conflict with other projects.

```bash
# Create a new environment named 'dl-llm-majors'
uv new dl-llm-majors
# Activate the environment
uv activate dl-llm-majors
```


### **2. Installing PyTorch**

Next, we'll install PyTorch. The installation command varies based on your platform and desired features (like GPU support). Visit the [PyTorch Get Started](https://pytorch.org/get-started/locally/) page to find the right command for your setup. For example, for CUDA support on Linux, you might use:

```bash
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
```


### **3. Verifying the Installation**

To verify that PyTorch is installed correctly and to check if it can access the GPU, run the following code in a Python shell:

```python
import torch
print("PyTorch version:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
print("MPS available:", torch.backends.mps.is_available())
```

This will confirm that PyTorch is installed and whether it can utilize your GPU.



### **Understanding Tensors**

Tensors are multi-dimensional arrays that are the core data structure in PyTorch. They can be manipulated on both CPUs and GPUs, making them essential for deep learning. Here's how to create and manipulate tensors:

```python
import torch

# Create a tensor
x = torch.tensor([[1, 2], [3, 4]])
print("Tensor x:\n", x)

# Perform operations on tensors
y = x + 10
print("Tensor y (x + 10):\n", y)
z = x * 2
print("Tensor z (x * 2):\n", z)
```

---

## **Introduction to Tensors**

- What is a tensor? In the simplest terms: It's a multi-dimensional array, like a list of lists of lists. Think of it as a super-powered NumPy array that can run on GPUs.
- Tensors can be 0D (scalar), 1D (vector), 2D (matrix), or even higher-dimensional. They are the fundamental building blocks for all operations in deep learning, from simple arithmetic to complex neural network computations.

Tensors are the backbone of deep learning frameworks like PyTorch. They allow us to perform efficient computations on large datasets, and they can be easily moved between CPUs and GPUs for faster processing. Understanding how to create, manipulate, and utilize tensors is essential for anyone looking to dive into deep learning. We can create tensors from

- Python lists, 
- NumPy arrays,
- Using built-in functions like `torch.rand()`, `torch.ones()`, `torch.zeros()`,
- With specific values using `torch.tensor()` 
- or even directly from data files. 

Once we have our tensors, we can perform a wide range of operations on them, such as addition, multiplication, and more complex functions like matrix multiplication and convolution.

In [1]:
import torch
import numpy as np

# Create tensor from a Python list
tensor_from_list = torch.tensor([1, 2, 3, 4, 5])
print("Tensor from list:", tensor_from_list)

Tensor from list: tensor([1, 2, 3, 4, 5])


In [2]:
# Create tensor from a NumPy array
numpy_array = np.array([[1, 2], [3, 4]])
tensor_from_numpy = torch.from_numpy(numpy_array)
print("Tensor from NumPy array:\n", tensor_from_numpy)

Tensor from NumPy array:
 tensor([[1, 2],
        [3, 4]])


In [3]:
# Create random tensor
random_tensor = torch.rand(3, 3)
print("Random tensor:\n", random_tensor)

Random tensor:
 tensor([[0.8013, 0.0548, 0.3567],
        [0.1011, 0.1140, 0.5162],
        [0.4225, 0.4603, 0.3205]])


In [4]:
# Create tensor of ones
ones_tensor = torch.ones(2, 4)
print("Tensor of ones:\n", ones_tensor)

Tensor of ones:
 tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])


In [5]:
# Create tensor of zeros
zeros_tensor = torch.zeros(2, 4)
print("Tensor of zeros:\n", zeros_tensor)

Tensor of zeros:
 tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.]])


## **Tensor Attributes**

Every tensor has attributes that describe its properties:
- `shape`: The dimensions of the tensor (e.g., (2, 3) for a 2D tensor with 2 rows and 3 columns).
- `dtype`: The data type of the tensor (e.g., `torch.float32`, `torch.int64`).
- `device`: The device on which the tensor is stored (e.g., `cpu`, `cuda`, `mps`).

In [None]:
# Create a tensor and inspect its attributes
tensor = torch.tensor([[1, 2], [3, 4]], dtype=torch.float32, device='cpu')

print("Tensor:\n", tensor)
print("Shape:", tensor.shape)
print("Data type:", tensor.dtype)
print("Device:", tensor.device)

Tensor:
 tensor([[1., 2.],
        [3., 4.]])
Shape: torch.Size([2, 2])
Data type: torch.float32
Device: cpu


## **The Art of Shaping Data**

Tensor operations are the core skill for a code-first deep learning practitioner. To perform operations on tensors, they often need to be the same shape. This is where tensor manipulation comes in. You can reshape, permute, and broadcast tensors to make them compatible for operations. For example, if you have a tensor of shape (2, 3) and another of shape (3,), you can reshape the second tensor to (1, 3) and then perform operations like addition or multiplication.

In [7]:
# Create a 4x4 tensor
matrix = torch.arange(16).reshape(4, 4)
print("Original matrix:\n", matrix)

Original matrix:
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]])


In [8]:
# Slice the first row
first_row = matrix[0, :]
print("First row:\n", first_row)

First row:
 tensor([0, 1, 2, 3])


In [9]:
# Slice the first column
first_column = matrix[:, 0]
print("First column:\n", first_column)

First column:
 tensor([ 0,  4,  8, 12])


In [10]:
# Slice the matrix to get the first two rows
sliced_matrix = matrix[:2, :]
print("Sliced matrix (first two rows):\n", sliced_matrix)

Sliced matrix (first two rows):
 tensor([[0, 1, 2, 3],
        [4, 5, 6, 7]])


In [11]:
# Slice the matrix to get the first two columns
sliced_matrix_columns = matrix[:, :2]
print("Sliced matrix (first two columns):\n", sliced_matrix_columns)

Sliced matrix (first two columns):
 tensor([[ 0,  1],
        [ 4,  5],
        [ 8,  9],
        [12, 13]])


## **Reshaping with View**

The `view()` method allows you to reshape a tensor without changing its data. For example, if you have a tensor of shape `(4, 4)` and you want to reshape it to `(2, 8)`, you can do so with `view()`.

The `-1` argument in `view()` is a special placeholder that tells PyTorch to infer the size of that dimension based on the total number of elements and the other specified dimensions. For example, if you have a tensor with 16 elements and you want to reshape it to have 4 rows, you can use `view(4, -1)`, and PyTorch will automatically calculate that the number of columns should be 4 (since 16 / 4 = 4). This makes it easy to reshape tensors without having to manually calculate the new dimensions.

In [12]:
# Create a tensor with initial shape
original_tensor = torch.arange(12)
print("Original tensor:", original_tensor)
print("Original shape:", original_tensor.shape)

Original tensor: tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
Original shape: torch.Size([12])


In [13]:
# Reshape using view with -1 trick
reshaped_tensor = original_tensor.view(4, -1)
print("Reshaped tensor:\n", reshaped_tensor)
print("Reshaped shape:", reshaped_tensor.shape)

Reshaped tensor:
 tensor([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]])
Reshaped shape: torch.Size([4, 3])


In [14]:
# Reshape the tensor to (3, 4)
reshaped_tensor = original_tensor.view(3, 4)
print("Reshaped tensor (3, 4):\n", reshaped_tensor)

Reshaped tensor (3, 4):
 tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])


## **Adding and Removing Dimensions**

You can add dimensions to a tensor using `unsqueeze()` and remove them with `squeeze()`. For example, if you have a tensor of shape (3,) and you want to add a new dimension to make it (1, 3), you can use `unsqueeze(0)`. Conversely, if you have a tensor of shape (1, 3) and you want to remove the extra dimension, you can use `squeeze(0)` to get back to (3,).


- `unsqueeze(dim)`: Adds a dimension of size 1 at the specified position `dim`. For example, if you have a tensor of shape (3,) and you call `unsqueeze(0)`, it will become (1, 3). If you call `unsqueeze(1)`, it will become (3, 1).
- `squeeze(dim)`: Removes a dimension of size 1 at the specified position `dim`. For example, if you have a tensor of shape (1, 3) and you call `squeeze(0)`, it will become (3,). If you have a tensor of shape (3, 1) and you call `squeeze(1)`, it will also become (3,).

In [15]:
# Create a tensor
tensor = torch.tensor([1, 2, 3, 4, 5])
print("Original tensor:", tensor)
print("Original shape:", tensor.shape)

Original tensor: tensor([1, 2, 3, 4, 5])
Original shape: torch.Size([5])


In [16]:
# Add a dimension at position 0 (adds batch dimension)
unsqueezed_tensor = tensor.unsqueeze(0)
print("Tensor after unsqueeze(0):\n", unsqueezed_tensor)
print("Shape after unsqueeze(0):", unsqueezed_tensor.shape)

Tensor after unsqueeze(0):
 tensor([[1, 2, 3, 4, 5]])
Shape after unsqueeze(0): torch.Size([1, 5])


In [17]:
# Remove the added dimension
squeezed_tensor = unsqueezed_tensor.squeeze(0)
print("Tensor after squeeze(0):\n", squeezed_tensor)
print("Shape after squeeze(0):", squeezed_tensor.shape)

Tensor after squeeze(0):
 tensor([1, 2, 3, 4, 5])
Shape after squeeze(0): torch.Size([5])


## **GPU vs. CPU Acceleration**

PyTorch allows you to perform tensor operations on both CPUs and GPUs. If you have a compatible GPU, you can move your tensors to the GPU for faster computation. This is done using the `to()` method. For example, if you have a tensor `x` and you want to move it to the GPU, you can use `x.to('cuda')`. If you're on a Mac with an M1 chip, you can use `x.to('mps')` to take advantage of the Apple Silicon GPU. If you don't have a compatible GPU, PyTorch will automatically fall back to using the CPU, so your code will still run, albeit more slowly. It's important to check for GPU availability and move your tensors accordingly to maximize performance.

- `to(device)`: Moves the tensor to the specified device. For example, `x.to('cuda')` moves the tensor to the GPU, while `x.to('cpu')` moves it back to the CPU. You can also use `x.to('mps')` for Apple Silicon GPUs on macOS.
- `CUDA` (NVIDIA GPUs on Linux/Windows)
- `MPS` (Apple Silicon GPUs on macOS)
- `CPU` (fallback for all platforms)

In the next cell, we'll:

- Detect the best available device (CUDA for NVIDIA GPUs, MPS for Apple Silicon, or CPU as a fallback).
- Create two large tensors and perform a simple operation on them to demonstrate the speed difference between CPU and GPU computations.
- Time the operations to show the performance benefits of using a GPU when available.
- Move them to the GPU (if available) and perform the same operation to see the speedup.

In [18]:
import time

# Detect the best available device: CUDA -> MPS (Apple Silicon) -> CPU
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("Using CUDA (GPU) for computations.")
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    print("Using MPS (Apple Silicon GPU) for computations.")
else:
    device = torch.device("cpu")
    print("Using CPU for computations.")


size = 2000
a_cpu = torch.rand(size, size)
b_cpu = torch.rand(size, size)

# Time CPU computation
start_time = time.time()
result_cpu = torch.mm(a_cpu, b_cpu)
cpu_time = time.time() - start_time
print(f"CPU computation time: {cpu_time:.4f} seconds")


# Time GPU computation (if available)
if device.type != 'cpu':
    a_gpu = a_cpu.to(device)
    b_gpu = b_cpu.to(device)
    
    # Warm up (first operation can be slower due to initialization)
    _ = torch.mm(a_gpu, b_gpu)
    
    start_time = time.time()
    result_gpu = torch.mm(a_gpu, b_gpu)
    gpu_time = time.time() - start_time
    print(f"{device.type.upper()} computation time: {gpu_time:.4f} seconds")
    print(f"Speedup: {cpu_time / gpu_time:.2f}x faster than CPU")
else:
    print("GPU not available, skipping GPU computation.")

Using MPS (Apple Silicon GPU) for computations.
CPU computation time: 0.0239 seconds
MPS computation time: 0.0003 seconds
Speedup: 70.69x faster than CPU
