# PyTorch Deep Dive: Introduction and Tensors

Welcome to the absolute foundation of Deep Learning. 

Before we write code, we need to define the **Objects** we are working with.

## Learning Objectives
- **The Vocabulary**: What is a "Tensor", "Scalar", "Vector", and "Matrix"?
- **The Intuition**: Tensors as "Excel Sheets on Steroids".
- **The Hardware**: CPU vs GPU (Minivan vs Sports Car).
- **The Mechanics**: Shapes, Dtypes, and Broadcasting.
- **The Deep Dive**: Strides and Memory Layout (What actually happens in RAM).


In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt

print("PyTorch version:", torch.__version__)

## Part 1: The Vocabulary (Definitions First)

In Data Science, we have specific names for containers of numbers based on their dimensions.

### 1. Scalar (0D)
- A single number.
- Example: `7`, `3.14`.
- Use case: A loss value, a learning rate.

### 2. Vector (1D)
- A list of numbers.
- Example: `[1, 2, 3]`.
- Use case: A row of data, a bias term.

### 3. Matrix (2D)
- A grid of numbers (Rows and Columns).
- Example: An Excel sheet, a black-and-white image.
- Use case: A dataset, weights of a linear layer.

### 4. Tensor (ND)
- A generic term for N-dimensional arrays (3D, 4D, etc.).
- Example: A color image (Height x Width x 3 Channels), a video (Time x H x W x C).
- Use case: Complex data structures.

## Part 2: The Intuition (Excel on Steroids)

Now that we know the terms, let's visualize them.

- **2D Tensor**: Imagine a single Excel sheet.
- **3D Tensor**: Imagine a **Workbook** containing multiple Excel sheets.
- **4D Tensor**: Imagine a **Filing Cabinet** containing multiple Workbooks.
- **5D Tensor**: Imagine a **Library** containing multiple Filing Cabinets.

PyTorch Tensors are just these containers, but optimized for math.

In [None]:
# 0D Tensor (Scalar)
scalar = torch.tensor(7)
print(f"Scalar: {scalar.item()} (Shape: {scalar.shape})")

# 1D Tensor (Vector)
vector = torch.tensor([7, 2, 5])
print(f"Vector: {vector} (Shape: {vector.shape})")

# 2D Tensor (Matrix)
matrix = torch.tensor([[1, 2], [3, 4]])
print(f"Matrix:\n{matrix} (Shape: {matrix.shape})")

# 3D Tensor (Image-like)
tensor3d = torch.rand(3, 4, 4) # 3 Channels, 4x4 pixels
print(f"3D Tensor Shape: {tensor3d.shape}")

## Part 3: The Hardware (Minivan vs Sports Car)

Why do we use PyTorch Tensors instead of Python lists or NumPy arrays?

**The GPU.**

- **CPU (Central Processing Unit)**: Like a **Minivan**. It can carry a few things (instructions) very quickly and flexibly. Good for sequential logic.
- **GPU (Graphics Processing Unit)**: Like a **Train** or **Fleet of Sports Cars**. It can carry MASSIVE amounts of data at once, but it's hard to turn. Good for parallel math (matrix multiplication).

PyTorch Tensors can live on the GPU. NumPy arrays cannot.

In [None]:
# Check if GPU is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
if device.type == "cuda":
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# Move tensor to GPU
x = torch.tensor([1, 2, 3])
x_gpu = x.to(device)

print(f"x is on: {x.device}")
print(f"x_gpu is on: {x_gpu.device}")

# Note: You cannot add a CPU tensor to a GPU tensor. They live in different worlds!
# x + x_gpu # This would error

## Part 4: The Mechanics (Broadcasting)

Broadcasting is magic. It allows you to do math on tensors of different shapes.

Imagine you have a matrix of students' scores (Rows=Students, Cols=Subjects).
You want to add 5 bonus points to **every** score.

You don't need to create a matrix of 5s. You just add `5`.
PyTorch "broadcasts" (stretches) the `5` to match the matrix shape.

In [None]:
matrix = torch.tensor([[10, 20, 30], 
                       [40, 50, 60]])

# Add scalar (Broadcasting 5 to match 2x3)
print("Matrix + 5:\n", matrix + 5)

# Add vector (Broadcasting [1, 2, 3] to every row)
vector = torch.tensor([1, 2, 3])
print("Matrix + Vector:\n", matrix + vector)

# Visualizing what happened:
# [10, 20, 30] + [1, 2, 3] = [11, 22, 33]
# [40, 50, 60] + [1, 2, 3] = [41, 52, 63]

## Part 5: The Deep Dive (Strides & Memory)

Here is the secret: **A Tensor is just a 1D array in memory.**

The "shape" (rows, columns) is an illusion created by **Strides**.

- **Stride**: How many steps in memory do I need to jump to get to the next element in this dimension?

Example: Matrix 2x3
`[[1, 2, 3], [4, 5, 6]]`

In memory (RAM): `[1, 2, 3, 4, 5, 6]`

To go from `1` to `2` (next column), jump 1 step.
To go from `1` to `4` (next row), jump 3 steps.

Stride = `(3, 1)`.

In [None]:
t = torch.tensor([[1, 2, 3], 
                  [4, 5, 6]])

print(f"Shape: {t.shape}")
print(f"Stride: {t.stride()}")

# Transpose it (swap rows/cols)
t_T = t.t()
print(f"\nTransposed Shape: {t_T.shape}")
print(f"Transposed Stride: {t_T.stride()}")

# Notice: The DATA in memory didn't move! PyTorch just swapped the strides.
# This makes transposing instant, even for huge matrices.

## Summary Checklist

1. **Tensor** = N-dimensional container for numbers.
2. **GPU** = Parallel processing beast (use `.to(device)`).
3. **Broadcasting** = Magic stretching of shapes for math.
4. **Strides** = The metadata that defines the shape over linear memory.

You now understand the data structure. Next, we'll learn how to calculate gradients (Autograd).