# üìò What Are Tensors? (Foundations for Deep Learning)

A **tensor** is a **specialized multi-dimensional array** designed for **mathematical and computational efficiency**.   

Tensors are the core data structure used in **PyTorch**, **TensorFlow**, and all modern deep learning frameworks.

---

## üî¢ Tensor Ranks (Dimensions)

The **rank** of a tensor refers to the **number of dimensions** it has.

---

### 1Ô∏è‚É£ Scalars ($0$D Tensor)

- A **single number**
- No dimensions

**Example use-case**
- Loss value after a forward pass

```text
5.0
-3.14
````

üìå Used to represent:

* Loss
* Accuracy
* Single metrics

---

### 2Ô∏è‚É£ Vectors ($1$D Tensor)

* A list or sequence of numbers

**Example use-case**

* Feature vectors
* Word embeddings in NLP

```text
[0.12, -0.84, 0.33]
```

üìå In NLP:

* Each word $\rightarrow$ vector
* Shape: `[embedding_dim]`

---

### 3Ô∏è‚É£ Matrices ($2$D Tensor)

* Grid of numbers (rows √ó columns)

**Example use-case**

* Grayscale images
* Tabular data

```text
[[0, 255, 128],
 [34,  90, 180]]
```

üìå Each value represents:

* Pixel intensity (for images)
* Feature value (for tables)

---

### 4Ô∏è‚É£ 3D Tensors (Colored Images)

* Adds a **channel dimension**

**Example use-case**

* RGB images

```text
Shape: [height, width, channels]
Example: [256, 256, 3]
```

üìå Channels:

* R ‚Üí Red
* G ‚Üí Green
* B ‚Üí Blue

---

### 5Ô∏è‚É£ 4D Tensors (Batches of Images)

* Adds **batch size** dimension

**Example use-case**

* Training multiple images at once

```text
Shape: [batch_size, height, width, channels]
Example: [32, 128, 128, 3]
```

üìå Batch dimension improves:

* GPU utilization
* Training efficiency

---

### 6Ô∏è‚É£ 5D Tensors (Video Data)

* Adds a **time/frame dimension**

**Example use-case**

* Video clips

```text
Shape: [batch, frames, height, width, channels]
Example: [10, 16, 64, 64, 3]
```

üìå Each frame is an RGB image.

---

## üß† Why Are Tensors Useful?

---

### 1Ô∏è‚É£ Mathematical Operations

Tensors enable:

* Addition
* Multiplication
* Dot products
* Matrix multiplication

These operations form the backbone of:

* Neural networks
* Backpropagation
* Optimization

---

### 2Ô∏è‚É£ Representation of Real-World Data

| Data Type | Tensor Representation |
| --------- | --------------------- |
| Image     | 3D / 4D tensor        |
| Text      | 2D / 3D tensor        |
| Audio     | 1D / 2D tensor        |
| Video     | 5D tensor             |

üìå Everything becomes **numbers + shape**.

---

### 3Ô∏è‚É£ Efficient Computation

Tensors are:

* Optimized for **parallel computation**
* Executed efficiently on **GPUs & TPUs**

This makes large-scale deep learning feasible.

---

## üß† Where Are Tensors Used in Deep Learning?

---

### 1Ô∏è‚É£ Data Storage

* Holds training inputs (e.g., batches of images).

---

### 2Ô∏è‚É£ Model Parameters

Stores learnable weights ($W$) and biases ($b$)

---

### 3Ô∏è‚É£ Matrix Operations

* Linear layers
* Attention mechanisms
* Convolutions

---

### 4Ô∏è‚É£ Training Process

* Forward pass ‚Üí Data flows through the network as tensors or tensors flow through layers

* Backward pass ‚Üí Gradients are calculated and stored as tensors to update the model.

---

## üîë Core Mental Model

> **Everything in deep learning is a tensor.**

> Data ‚Üí Tensor

> Model ‚Üí Tensors

> Gradients ‚Üí Tensors

If you understand tensors, you understand deep learning.

---

## ‚úÖ One-Line Summary

> **Tensor = numbers + shape + meaning**


In [2]:
# 1. Import the PyTorch library
# PyTorch is the primary framework for Deep Learning (preferred over TensorFlow in research).
import torch

# 2. Check the Software Version
# It is crucial to check this to ensure compatibility with other libraries
# (like torchvision or torchaudio) and your specific CUDA version.
print(torch.__version__)

# 3. Check for Hardware Acceleration (CUDA)
# torch.cuda.is_available() returns True if:
#   a) You have an NVIDIA GPU.
#   b) The NVIDIA drivers are installed correctly.
#   c) The CUDA toolkit matching your PyTorch version is visible.
if torch.cuda.is_available():
    print("GPU is available!")

    # 4. Get the Hardware Name
    # .get_device_name(0) returns the specific name of your graphics card (e.g., "NVIDIA GeForce RTX 3060").
    # The index (0) refers to the FIRST GPU found. If you had two GPUs, the second would be index (1).
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
else:
    # 5. Fallback
    # If False, PyTorch will run on the CPU (much slower for training neural networks).
    print("GPU not available. Using CPU.")

2.9.0+cu126
GPU is available!
Using GPU: Tesla T4


Key Concepts:

1. CUDA (Compute Unified Device Architecture): This is NVIDIA's parallel computing platform. Deep Learning relies heavily on matrix multiplication, which GPUs (with thousands of tiny cores) can do much faster than CPUs (which have fewer, stronger cores).

2. The Index (0): In multi-GPU setups (common in production or cloud training), you have to specify which GPU you want to query or use. Index 0 is always the default/primary card.

# Creating a Tensor

In [3]:
# 1. Using empty
# Creates a tensor of size 2x3 without initializing data.
# The values will be whatever "garbage" memory was already at that address.
# It is very fast because it skips the step of writing zeros or ones.
a = torch.empty(2,3)

In [4]:
# 2. Check type
# Returns the Python type of the object, which is <class 'torch.Tensor'>.
# To check the data type OF the numbers inside (e.g., float32), use a.dtype.
type(a)

torch.Tensor

In [5]:
# 3. Using zeros
# Creates a 2x3 tensor filled entirely with 0s.
# Useful for initializing weights in some specific neural network architectures (like bias).
torch.zeros(2,3)

tensor([[0., 0., 0.],
        [0., 0., 0.]])

In [6]:
# 4. Using ones
# Creates a 2x3 tensor filled entirely with 1s.
torch.ones(2,3)

tensor([[1., 1., 1.],
        [1., 1., 1.]])

In [7]:
# 5. Using rand
# Creates a 2x3 tensor with random numbers from a Uniform Distribution
# between 0 and 1 (interval [0, 1)).
torch.rand(2,3)

tensor([[0.0543, 0.7557, 0.1464],
        [0.7087, 0.2508, 0.6681]])

In [8]:
# 6. Use of seed (Demonstration of non-reproducibility)
# Calling rand again WITHOUT resetting the seed will generate DIFFERENT numbers.
torch.rand(2,3)

tensor([[0.1168, 0.3304, 0.5598],
        [0.8645, 0.0376, 0.9821]])

In [9]:
# 7. manual_seed
# Sets the "seed" for the random number generator.
# A seed is a starting point; if you start from the same point, the sequence of
# "random" numbers is identical.
torch.manual_seed(100)
print(torch.rand(2,3)) # Prints Sequence A

tensor([[0.1117, 0.8158, 0.2626],
        [0.4839, 0.6765, 0.7539]])


In [10]:
# Resetting the seed to the SAME value (100) ensures the next call produces
# exactly Sequence A again. This is crucial for debugging ML models.
torch.manual_seed(100)
print(torch.rand(2,3)) # Prints Sequence A again

tensor([[0.1117, 0.8158, 0.2626],
        [0.4839, 0.6765, 0.7539]])


In [11]:
torch.manual_seed(100)
torch.rand(2,3)

tensor([[0.1117, 0.8158, 0.2626],
        [0.4839, 0.6765, 0.7539]])

In [12]:
# 8. Using tensor
# Creates a tensor from a standard Python list.
# PyTorch infers the data type (usually int64 or float32) based on the input.
torch.tensor([[1,2,3],[4,5,6]])

tensor([[1, 2, 3],
        [4, 5, 6]])

 ---
 ## Other ways to create tensors
 ---

In [13]:
# 9. arange (Array Range)
# Creates a 1D tensor starting at start (0), up to but NOT including end (10),
# stepping by step (2).
# Output: [0, 2, 4, 6, 8]
print("using arange ->", torch.arange(0,10,2))

using arange -> tensor([0, 2, 4, 6, 8])


In [14]:
# 10. linspace (Linear Space)
# Creates a 1D tensor starting at start (0) and ending EXACTLY at end (10).
# The third argument is the NUMBER OF STEPS (points) to generate, not the step size.
# Useful for creating graphs or time-series grids.
print("using linspace ->", torch.linspace(0,10,10))

using linspace -> tensor([ 0.0000,  1.1111,  2.2222,  3.3333,  4.4444,  5.5556,  6.6667,  7.7778,
         8.8889, 10.0000])


In [15]:
# 11. using eye (Identity Matrix)
# Creates a 2D square tensor (5x5) with 1s on the diagonal and 0s elsewhere.
# "Eye" sounds like "I" for Identity. Crucial in Linear Algebra.
print("using eye ->", torch.eye(5))

using eye -> tensor([[1., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 1.]])


In [16]:
# 12. using full
# Creates a tensor of specific shape (3x3) filled entirely with a specific value (5).
# Equivalent to: torch.ones(3,3) * 5
print("using full ->", torch.full((3, 3), 5))

using full -> tensor([[5, 5, 5],
        [5, 5, 5],
        [5, 5, 5]])



---

#### 1Ô∏è‚É£ `torch.empty()`

```python
torch.empty(2, 3)
````

##### What it does:

* Allocates memory for a tensor **without initializing values**
* Contents are **random garbage values**

##### When to use:

* When performance matters and values will be overwritten
* Common in low-level or optimized code

‚ö†Ô∏è **Danger:** Never use `empty()` if you expect meaningful values!

---

#### 2Ô∏è‚É£ `torch.zeros()` and `torch.ones()`

```python
torch.zeros(2, 3)
torch.ones(2, 3)
```

##### What they do:

* Create tensors filled with `0`s or `1`s

##### When to use:

* Initializing weights, biases, masks
* Safe and predictable defaults

---

#### 3Ô∏è‚É£ `torch.rand()`

```python
torch.rand(2, 3)
```

##### What it does:

* Generates random numbers from **Uniform(0, 1)**

##### Use cases:

* Weight initialization
* Data augmentation
* Random sampling

---

#### 4Ô∏è‚É£ Reproducibility with `torch.manual_seed()`

```python
torch.manual_seed(100)
torch.rand(2, 3)
```

##### Why this matters:

* Same seed ‚Üí same random numbers
* Critical for:

  * Debugging
  * Experiments
  * Research reproducibility

> Always set a seed when training models you want to compare.

---

#### 5Ô∏è‚É£ `torch.tensor()`

```python
torch.tensor([[1, 2, 3],
              [4, 5, 6]])
```

##### What it does:

* Converts Python lists or NumPy arrays into tensors

##### Use cases:

* Loading small datasets
* Creating fixed reference tensors

---

#### 6Ô∏è‚É£ `torch.arange()`

```python
torch.arange(0, 10, 2)
```

##### Output:

```
tensor([0, 2, 4, 6, 8])
```

##### When to use:

* Indexing
* Loop counters
* Discrete sequences

---

#### 7Ô∏è‚É£ `torch.linspace()`

```python
torch.linspace(0, 10, 10)
```

##### Key difference from `arange()`:

* Specifies **number of points**, not step size

##### Common use:

* Plotting
* Continuous ranges
* Numerical methods

---

#### 8Ô∏è‚É£ `torch.eye()`

```python
torch.eye(5)
```

##### What it creates:

* Identity matrix

##### Why important:

* Linear algebra
* Neural network layers
* Matrix transformations

---

#### 9Ô∏è‚É£ `torch.full()`

```python
torch.full((3, 3), 5)
```

##### What it does:

* Creates a tensor with the same constant value everywhere

##### Use cases:

* Padding
* Masks
* Custom initialization

---

## üß† Mental Model

> **A tensor is just a box of numbers + shape + datatype + device**



# Tensor Shapes

In [17]:
# 1. Create a base tensor
# We create a 2D tensor (a matrix) with 2 rows and 3 columns.
x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])

In [18]:
# 2. Inspect the tensor
# Output will display the data:
# tensor([[1, 2, 3],
#         [4, 5, 6]])
x

tensor([[1, 2, 3],
        [4, 5, 6]])

In [19]:
# 3. Check the Shape
# Returns torch.Size([2, 3]).
# This tells us the dimensions: 2 rows (dimension 0) and 3 columns (dimension 1).
# In Deep Learning, checking .shape is the #1 way to debug errors.
x.shape

torch.Size([2, 3])

 ---
 ### The `"_like"` Functions
 ---

In [20]:
# These functions are shortcuts. Instead of manually specifying (2, 3),
# you just say "make me a new tensor with the same shape as x"

In [21]:
# 4. empty_like
# Creates a new 2x3 tensor with uninitialized data ("garbage values").
# It looks at 'x', sees it is 2x3, and allocates that much memory.
# Note: It inherits the device (CPU/GPU) of x, but the data is random noise.
torch.empty_like(x)

tensor([[              0,      1102781216,      1074268576],
        [139242769659808,               0,               0]])

In [22]:
# 5. zeros_like
# Creates a new 2x3 tensor filled entirely with 0s.
# Highly useful for initializing "mask" tensors or accumulators.
torch.zeros_like(x)

tensor([[0, 0, 0],
        [0, 0, 0]])

In [23]:
# 6. ones_like
# Creates a new 2x3 tensor filled entirely with 1s.
torch.ones_like(x)

tensor([[1, 1, 1],
        [1, 1, 1]])

In [24]:
# 7. rand_like
# Creates a new 2x3 tensor filled with random numbers from 0 to 1.
# CRITICAL NOTE: 'rand' requires floating point numbers.
# Our original 'x' contains integers (1, 2, 3...), so its type is likely Long/Int64.
# Random numbers are floats (0.45, 0.99), so we MUST specify dtype=torch.float32
# otherwise PyTorch might try to cast 0.5 to an integer (which becomes 0) or throw an error.
torch.rand_like(x, dtype=torch.float32)

tensor([[0.2627, 0.0428, 0.2080],
        [0.1180, 0.1217, 0.7356]])

>The function torch.rand (uniform distribution) inherently generates floating-point numbers. If you try torch.rand_like(x) where x is an integer tensor, PyTorch sees a conflict: "You want a tensor like x (Integers) but you want random probabilities (Floats)." Explicitly passing dtype=torch.float32 resolves this ambiguity.


Understanding tensor shapes and `_like()` functions is essential for:
- Weight initialization
- Mask creation
- Broadcasting
- Writing bug-free deep learning code

---

#### 1Ô∏è‚É£ Tensor Shape

```python
x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
x.shape
````

##### Output

```text
torch.Size([2, 3])
```

##### Interpretation

* `2` ‚Üí number of rows
* `3` ‚Üí number of columns

> **Mental model:**
> Shape tells you **how data is arranged in memory**, not what the values mean.

---

#### 2Ô∏è‚É£ Why Shape Matters

Almost all PyTorch errors come from:

* Shape mismatch
* Unexpected broadcasting
* Wrong batch dimensions

Example:

```text
RuntimeError: size mismatch
```

If you understand `.shape`, you avoid 80% of bugs.

---

#### 3Ô∏è‚É£ `_like()` Functions ‚Äî The Core Idea

##### Key Principle

> `_like()` functions **copy the shape (and usually dtype)** of an existing tensor.

In real deep learning code (like Transformers or CNNs), tensor shapes change dynamically based on the batch size or the length of a sentence. You often don't know the shape ahead of time.

1. Hard-coding: torch.zeros(32, 10) requires you to know the batch size is 32. If you change your batch size to 64 later, this line breaks.

2. Dynamic (Best Practice): torch.zeros_like(input_batch) automatically adapts. If input_batch is 64, the zeros will be 64. If it's 32, the zeros will be 32. This makes your code robust and reusable.

They answer:

> ‚ÄúGive me a tensor shaped exactly like this one.‚Äù

---

##### 4Ô∏è‚É£ `torch.empty_like()`

```python
torch.empty_like(x)
```

##### What it does:

* Same shape as `x`
* Values are **uninitialized**

##### When to use:

* Performance-critical code
* Values will be overwritten immediately

‚ö†Ô∏è Never assume values are zero!

---

#### 5Ô∏è‚É£ `torch.zeros_like()`

```python
torch.zeros_like(x)
```

##### What it does:

* Same shape as `x`
* All values = `0`

##### Common use cases:

* Initializing gradients
* Creating masks
* Resetting buffers

---

#### 6Ô∏è‚É£ `torch.ones_like()`

```python
torch.ones_like(x)
```

##### What it does:

* Same shape as `x`
* All values = `1`

##### Use cases:

* Bias initialization
* Scaling factors
* Boolean masks (after type conversion)

---

#### 7Ô∏è‚É£ `torch.rand_like()`

```python
torch.rand_like(x, dtype=torch.float32)
```

##### What it does:

* Same shape as `x`
* Random values from **Uniform(0, 1)**

##### Why specify `dtype`?

* Original tensor `x` is integer
* Random numbers must be floats
* Prevents silent bugs

---

#### 8Ô∏è‚É£ Shape + dtype Summary Table

| Function     | Shape         | Values  | Initialized |
| ------------ | ------------- | ------- | ----------- |
| `empty_like` | Same as input | Garbage | ‚ùå           |
| `zeros_like` | Same as input | All 0   | ‚úÖ           |
| `ones_like`  | Same as input | All 1   | ‚úÖ           |
| `rand_like`  | Same as input | Random  | ‚úÖ           |

---

##### üß† Core Mental Model

> `_like()` functions = **shape cloning tools**

Instead of manually remembering shapes:

```python
torch.zeros(2, 3)  # ‚ùå brittle
```

Do this:

```python
torch.zeros_like(x)  # ‚úÖ safe and scalable
```

---

#### üöÄ Why This Matters in Deep Learning

* Neural network layers expect **exact shapes**
* `_like()` prevents:

  * Hard-coded dimensions
  * Shape mismatch bugs
  * Fragile refactors

If your code survives changing batch size ‚Üí you did it right.




## Tensor Data Types

In [25]:
# Assume x is a tensor we created earlier
x = torch.tensor([1, 2, 3])

# 1. Find Data Type
# .dtype is an attribute (not a method, so no parenthesis).
# It tells you how the numbers are stored in memory (e.g., torch.int64, torch.float32).
# PyTorch defaults to int64 (Long) for integers and float32 for decimals.
x.dtype

torch.int64

In [26]:
# 2. Assign Data Type during creation
# Here we pass a list of FLOATS [1.0, 2.0, 3.0] but force the type to INT32.
# PyTorch will truncate the decimal part. This saves memory (32-bit vs 64-bit)
# but loses precision.
torch.tensor([1.0, 2.0, 3.0], dtype=torch.int32)

tensor([1, 2, 3], dtype=torch.int32)

In [27]:
# Here we pass a list of INTEGERS [1, 2, 3] but force the type to FLOAT64 (Double).
# This is useful when you need extreme numerical precision, though it is
# slower and takes up 2x memory compared to float32.
torch.tensor([1, 2, 3], dtype=torch.float64)

tensor([1., 2., 3.], dtype=torch.float64)

In [28]:
# 3. Using .to() for casting
# This is the most common way to convert an EXISTING tensor to a new type.
# It returns a NEW tensor with the requested type.
# Note: .to() is also used to move tensors between CPU and GPU (e.g., .to("cuda")).
x = x.to(torch.float32)

### The `.to()` Method: The Swiss Army Knife
The .to() method is one of the most important commands in PyTorch because it handles two things simultaneously:

1. Type Conversion: x.to(torch.float32)
2. Device Movement: x.to('cuda')

You can even do both at once:

```
# Move to GPU AND convert to float16 (half precision) in one step
x = x.to(device='cuda', dtype=torch.float16)
```

| **Data Type**             | **Dtype**         | **Description**                                                                                                                                                                |
|---------------------------|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **32-bit Floating Point** | `torch.float32`   | Standard floating-point type used for most deep learning tasks. Provides a balance between precision and memory usage.                                                         |
| **64-bit Floating Point** | `torch.float64`   | Double-precision floating point. Useful for high-precision numerical tasks but uses more memory.                                                                               |
| **16-bit Floating Point** | `torch.float16`   | Half-precision floating point. Commonly used in mixed-precision training to reduce memory and computational overhead on modern GPUs.                                            |
| **BFloat16**              | `torch.bfloat16`  | Brain floating-point format with reduced precision compared to `float16`. Used in mixed-precision training, especially on TPUs.                                                |
| **8-bit Floating Point**  | `torch.float8`    | Ultra-low-precision floating point. Used for experimental applications and extreme memory-constrained environments (less common).                                               |
| **8-bit Integer**         | `torch.int8`      | 8-bit signed integer. Used for quantized models to save memory and computation in inference.                                                                                   |
| **16-bit Integer**        | `torch.int16`     | 16-bit signed integer. Useful for special numerical tasks requiring intermediate precision.                                                                                    |
| **32-bit Integer**        | `torch.int32`     | Standard signed integer type. Commonly used for indexing and general-purpose numerical tasks.                                                                                  |
| **64-bit Integer**        | `torch.int64`     | Long integer type. Often used for large indexing arrays or for tasks involving large numbers.                                                                                  |
| **8-bit Unsigned Integer**| `torch.uint8`     | 8-bit unsigned integer. Commonly used for image data (e.g., pixel values between 0 and 255).                                                                                    |
| **Boolean**               | `torch.bool`      | Boolean type, stores `True` or `False` values. Often used for masks in logical operations.                                                                                      |
| **Complex 64**            | `torch.complex64` | Complex number type with 32-bit real and 32-bit imaginary parts. Used for scientific and signal processing tasks.                                                               |
| **Complex 128**           | `torch.complex128`| Complex number type with 64-bit real and 64-bit imaginary parts. Offers higher precision but uses more memory.                                                                 |
| **Quantized Integer**     | `torch.qint8`     | Quantized signed 8-bit integer. Used in quantized models for efficient inference.                                                                                              |
| **Quantized Unsigned Integer** | `torch.quint8` | Quantized unsigned 8-bit integer. Often used for quantized tensors in image-related tasks.                                                                                     |


## Mathematical Operations (Element-wise)


PyTorch supports **vectorized mathematical operations**, meaning:
> The operation is applied to **every element** in the tensor automatically.

No loops required üöÄ

### 1. Scalar operation

In [29]:
# 1. Create a random tensor
# Creates a 2x2 matrix with random floats between 0 and 1.
x = torch.rand(2,2)
x

tensor([[0.7118, 0.7876],
        [0.4183, 0.9014]])

In [30]:
# 2. Addition (Broadcasting)
# Adds 2 to EVERY element in the tensor.
# Internally, PyTorch "broadcasts" the scalar 2 to shape (2,2) and adds it.
# result[i][j] = x[i][j] + 2
x + 2

tensor([[2.7118, 2.7876],
        [2.4183, 2.9014]])

In [31]:
# 3. Subtraction
# Subtracts 2 from every element.
x - 2

tensor([[-1.2882, -1.2124],
        [-1.5817, -1.0986]])

In [32]:
# 4. Scalar Multiplication
# Multiplies every element by 3.
# Note: This is NOT matrix multiplication. It simply scales the values.
x * 3

tensor([[2.1353, 2.3627],
        [1.2549, 2.7042]])

In [33]:
# 5. Division
# Divides every element by 3. Result is a float tensor.
x / 3

tensor([[0.2373, 0.2625],
        [0.1394, 0.3005]])

In [34]:
# 6. Floor (Integer) Division
# First scales by 100, then divides by 3, and keeps only the WHOLE number part.
# Example: 5.9 // 3 = 1.0 (truncates the decimal).
(x * 100) // 3

tensor([[23., 26.],
        [13., 30.]])

In [35]:
# 7. Modulo (Remainder)
# Returns the remainder after division.
# Useful for checking parity (even/odd) or cycling indices.
((x * 100) // 3) % 2

tensor([[1., 0.],
        [1., 0.]])

In [36]:
# 8. Power / Exponentiation
# Squares EVERY individual element.
# result[i][j] = x[i][j] ^ 2
# Note: This is element-wise squaring, NOT matrix squaring (x @ x).
x**2

tensor([[0.5066, 0.6203],
        [0.1750, 0.8125]])

### 2. Element wise operation

Element-wise operations apply a function to **each corresponding element** of tensors.

> No loops. No indexing. Fully vectorized.

In [37]:
# 1. Setup Data
# Create two 2x3 matrices with random values between 0 and 1.
a = torch.rand(2,3)
b = torch.rand(2,3)

print("Tensor A:\n", a)
print("Tensor B:\n", b)

Tensor A:
 tensor([[0.9969, 0.7565, 0.2239],
        [0.3023, 0.1784, 0.8238]])
Tensor B:
 tensor([[0.5557, 0.9770, 0.4440],
        [0.9478, 0.7445, 0.4892]])


In [38]:
# --- Basic Arithmetic (Element-Wise) ---
# In all these cases, the operation happens between a[i][j] and b[i][j].
# The tensors MUST have the same shape (or be broadcastable).

# Addition: a[i] + b[i]
print(a + b)

tensor([[1.5526, 1.7335, 0.6679],
        [1.2502, 0.9229, 1.3130]])


In [39]:
# Subtraction: a[i] - b[i]
print(a - b)

tensor([[ 0.4411, -0.2205, -0.2201],
        [-0.6455, -0.5661,  0.3346]])


In [40]:
# Multiplication (Hadamard Product): a[i] * b[i]
# WARNING: This is NOT matrix multiplication (dot product).
print(a * b)

tensor([[0.5540, 0.7391, 0.0994],
        [0.2866, 0.1328, 0.4030]])


In [41]:
# Division: a[i] / b[i]
print(a / b)

tensor([[1.7938, 0.7743, 0.5042],
        [0.3190, 0.2397, 1.6841]])


In [42]:
# Exponentiation/Power: a[i] raised to the power of b[i]
print(a ** b)

tensor([[0.9983, 0.7614, 0.5145],
        [0.3218, 0.2771, 0.9096]])


In [43]:
# Modulo (Remainder): Remainder of division a[i] / b[i]
print(a % b)

tensor([[0.4411, 0.7565, 0.2239],
        [0.3023, 0.1784, 0.3346]])


In [44]:
# --- Math Functions ---

# Create a sample tensor with integers (positive and negative)
c = torch.tensor([1, -2, 3, -4])

In [45]:
# Absolute Value
# Converts all negative numbers to positive. |-2| -> 2.
print(torch.abs(c))

tensor([1, 2, 3, 4])


In [46]:
# Negation
# Flips the sign of every element. 1 -> -1, -2 -> 2.
print(torch.neg(c))

tensor([-1,  2, -3,  4])


In [47]:
# --- Rounding & Clamping ---
# I am defining 'd' here since it was missing in your snippet.
# We need floating point numbers to see the effects of rounding.
d = torch.tensor([1.1, 2.5, 2.9, 3.0])

In [48]:
# Round
# Rounds to the nearest integer.
# Note: PyTorch uses "Round to Even" for x.5 cases (2.5 -> 2, 3.5 -> 4).
print(torch.round(d))

tensor([1., 2., 3., 3.])


In [49]:
# Ceiling (Ceil)
# Rounds UP to the nearest integer (moves towards +infinity).
# 1.1 -> 2.0
print(torch.ceil(d))

tensor([2., 3., 3., 3.])


In [50]:
# Floor
# Rounds DOWN to the nearest integer (moves towards -infinity).
# 2.9 -> 2.0
print(torch.floor(d))

tensor([1., 2., 2., 3.])


In [51]:
# Clamp (Clip)
# Restricts values to be within a specific range [min, max].
# If x < min, replace with min.
# If x > max, replace with max.
# Example: clamp(d, min=2, max=3)
# 1.1 becomes 2.0 (too small)
# 2.5 stays 2.5 (in range)
# 2.9 stays 2.9 (in range)
# 3.0 stays 3.0 (in range)
print(torch.clamp(d, min=2, max=3))

tensor([2.0000, 2.5000, 2.9000, 3.0000])


###Key Concepts
1. Element-Wise OperationsAll the operators shown above $(+, -, *, /, **)$ work element-wise.
* If you multiply a matrix A by a matrix B using $A * B$, PyTorch multiplies $A_{11}$ by $B_{11}$, $A_{12}$ by $B_{12}$, etc.
* This is known as the **Hadamard Product**.
* **Critical Warning**: If you want actual linear algebra "Matrix Multiplication" (dot product), you must use `torch.matmul(x, x)` or the `@` operator `(x @ x)`.
2. BroadcastingThe magic that allows you to do `Matrix + Number` is called **Broadcasting**. PyTorch automatically expands the smaller tensor (the scalar $2$) to match the dimensions of the larger tensor (2x2) without actually copying the data in memory. This makes operations incredibly fast and memory-efficient.

### 3. Reduction operation

Reduction operations **collapse a tensor** into:
- A scalar, or
- A lower-dimensional tensor

They are fundamental for:
- Loss computation
- Metrics
- Decision making (argmax / argmin)

### What is a Reduction?

> **Reduction = many values ‚Üí fewer values**

Examples:
- Sum all values ‚Üí scalar
- Mean per column ‚Üí vector

---

In [52]:
# 1. Create a Tensor
# We generate random integers between 0 and 10, but STORE them as floats (float32).
# Why float32? Functions like mean(), std(), and var() only work on floating-point numbers.
# If you used integers (default for randint), those lines would throw an error.
e = torch.randint(size=(2,3), low=0, high=10, dtype=torch.float32)
# Example State:
# [[1., 5., 2.],
#  [8., 2., 4.]] (2 Rows, 3 Columns)
print(e)

tensor([[8., 0., 7.],
        [0., 0., 9.]])


In [100]:
# --- Summation ---
# Adds up EVERY element in the tensor -> returns a scalar (0-dimensional tensor).
torch.sum(e)

tensor(24.)

In [101]:
# Sum along Columns (dim=0)
# "Collapse the rows". We squash the tensor flat from top to bottom.
# Result shape: [3] (1 value for each column).
torch.sum(e, dim=0)

tensor([ 8.,  0., 16.])

In [102]:
# Sum along Rows (dim=1)
# "Collapse the columns". We squash the tensor from left to right.
# Result shape: [2] (1 value for each row).
torch.sum(e, dim=1)

tensor([15.,  9.])

In [103]:
# --- Statistics ---

# Mean (Average)
# Total Sum / Total Count
torch.mean(e)

tensor(4.)

In [104]:
# Mean along Columns (dim=0)
# Calculates the average for each column individually.
torch.mean(e, dim=0)

tensor([4., 0., 8.])

In [105]:
# Median
# Returns the middle value of the flattened tensor.
torch.median(e)

tensor(0.)

In [106]:
# --- Extremes ---

# Maximum and Minimum
# Returns the largest/smallest value in the entire tensor.
torch.max(e)
torch.min(e)

tensor(0.)

In [107]:
# --- Product ---

# Multiplies all elements together.
# Warning: If the tensor contains even a single 0, the result is 0.
torch.prod(e)

tensor(0.)

In [108]:
# --- Variance & Standard Deviation ---
# These measure the "spread" or "dispersion" of the data.

# Standard Deviation (std)
# How much do values typically differ from the mean?
torch.std(e)

tensor(4.4272)

In [109]:
# Variance (var)
# The square of the Standard Deviation.
torch.var(e)

tensor(19.6000)

In [110]:
# --- Indices of Extremes (Argmax/Argmin) ---
# Crucial for Classification tasks!
# Instead of returning the *value* (e.g., "9.0"), these return the *position* (index).


In [61]:
# Argmax
# Returns the flattened index of the highest value.
# If max is at row 1, col 1 (in a 2x3 grid), that's the 4th position (index 4).
torch.argmax(e)

tensor(1)

In [111]:
# Argmin
# Returns the flattened index of the lowest value.
torch.argmin(e)

tensor(1)

### üß† Deep Dive: Understanding Dimensions (`dim`)

In Data Science (and libraries like NumPy/PyTorch), the `dim` parameter (sometimes called `axis`) tells the computer **which dimension to erase/collapse**.

* **`dim=0` (Vertical / Rows):**
* Think: "Squash it top-down."
* If you have a table of data, this calculates the stat for **each column**.
* *Analogy:* Calculating the average height for *each student* across multiple exams? No, that would be rows. This is calculating the average score for **each exam** across all students.


* **`dim=1` (Horizontal / Columns):**
* Think: "Squash it left-to-right."
* This calculates the stat for **each row**.
* *Analogy:* Calculating the average grade for **each student** (row) across all their subjects.



### üí° Why `argmax`?

You will use `torch.argmax` constantly in Deep Learning.

* **Scenario:** Your Neural Network classifies an image of a digit (0‚Äì9).
* **Output:** It outputs a probability vector of size 10: `[0.1, 0.05, 0.8, ...]`
* **Goal:** You don't care that the score is 0.8; you care that it's in the **2nd index** (representing the digit "2").
* **Code:** `predicted_digit = torch.argmax(model_output)`
5Ô∏è‚É£ Shape Summary
Operation	Output
sum(e)	Scalar
sum(e, dim=0)	Vector
mean(e)	Scalar
argmax(e)	Scalar index
argmax(e, dim=1)	Vector of indices


**Reductions answer questions like:
How much?**
* How large?
* Where is the largest?
* How spread out?
* Neural networks turn large tensors ‚Üí small decisions using reductions.

**üöÄ Why This Matters in Deep Learning**

* Loss = reduction over batch
* Accuracy = reduction over predictions
* Backprop depends on reduction ops
* Argmax gives final class prediction

**‚úÖ Rule of Thumb**

* Use dim to control what collapses
* No dim ‚Üí everything collapses
* argmax returns index, not value

### 4. Matrix operations

Matrix operations follow **linear algebra rules**, not element-wise rules.

> If shapes don‚Äôt align ‚Üí operation is invalid.

In [112]:
# 1. Setup Matrices
# Create a 2x3 matrix (2 rows, 3 columns) with random integers 0-9.
f = torch.randint(size=(2,3), low=0, high=10)

# Create a 3x2 matrix (3 rows, 2 columns).
# Note: For matrix multiplication (A x B), the columns of A must match the rows of B.
g = torch.randint(size=(3,2), low=0, high=10)

print("Matrix f:\n", f)
print("Matrix g:\n", g)

Matrix f:
 tensor([[8, 4, 6],
        [6, 9, 7]])
Matrix g:
 tensor([[7, 4],
        [8, 1],
        [3, 7]])


In [113]:
# 2. Matrix Multiplication (MatMul)
# Performs the linear algebra "dot product" of rows and columns.
# Shape: (2,3) x (3,2) -> Result is (2,2).
# This is the "Engine Room" of Deep Learning (Layers communicating with each other).
torch.matmul(f, g)

tensor([[106,  78],
        [135,  82]])

In [114]:
# --- Vector Operations ---

vector1 = torch.tensor([1, 2])
vector2 = torch.tensor([3, 4])

# 3. Dot Product
# Multiplies corresponding elements and sums them up.
# (1*3) + (2*4) = 3 + 8 = 11
# Conceptually: Measures how much two vectors point in the same direction (Similarity).
torch.dot(vector1, vector2)

tensor(11)

In [115]:
# --- Transformations ---

# 4. Transpose
# Flips the matrix over its diagonal. Rows become columns, columns become rows.
# Arguments: (input, dim0, dim1) -> Swap dimension 0 (rows) with dimension 1 (cols).
# Shape change: (2,3) -> (3,2).
torch.transpose(f, 0, 1)

tensor([[8, 6],
        [4, 9],
        [6, 7]])

In [117]:
# --- Linear Algebra Properties ---

# We need Float32 for these operations. Determinants/Inverses involve division,
# so they don't work on Integers.
h = torch.randint(size=(3,3), low=0, high=10, dtype=torch.float32)
print("Matrix h:\n", h)

Matrix h:
 tensor([[3., 8., 2.],
        [0., 8., 8.],
        [6., 1., 1.]])


In [118]:
# 5. Determinant (det)
# A scalar value describing the "scaling factor" of the linear transformation.
# If det is 0, the matrix "squishes" space into a lower dimension (and has no inverse).
torch.det(h)

tensor(288.)

In [119]:
# 6. Inverse
# Finds a matrix H' such that H @ H' = Identity Matrix.
# Conceptually: "Undoing" the transformation applied by H.
# Essential for solving systems of linear equations (though computationally expensive).
torch.inverse(h)

tensor([[ 0.0000, -0.0208,  0.1667],
        [ 0.1667, -0.0312, -0.0833],
        [-0.1667,  0.1562,  0.0833]])

üß† Deep Dive: The Linear Algebra Connection

1. `matmul vs dot`

* `torch.dot`: Strictly for 1D arrays (vectors). It returns a single number (scalar). It tells you "how similar" two vectors are (unnormalized cosine similarity).

* `torch.matmul` (or the `@` operator): The general-purpose tool. It handles 2D matrices (and higher-dimensional batches).

> Neural Network Context: When you see $y = Wx + b$, the $Wx$ part is torch.matmul(W, x).

2. The Determinant (det)

* Grant Sanderson (3Blue1Brown) describes the determinant as the "Change in Area/Volume".

* If `torch.det(h)` is 2.5, it means the transformation `h` stretches the unit cube to be 2.5x larger.

* If `torch.det(h)` is 0, the volume collapses to zero (a flat sheet or line), meaning information is lost and you cannot reverse (invert) the process.

3. The Inverse (`inverse`)

* Calculating the inverse is the mathematical equivalent of "running the film backwards."

* Warning: In production Deep Learning (e.g., Hands-On Machine Learning), we rarely compute the explicit inverse because it is numerically unstable and slow for large matrices. We usually approximate it or use optimization techniques (like Gradient Descent) to find solutions instead.



### 5. Comparison Operations

Comparison operations compare tensors **element by element** and return
**boolean tensors** (`True` / `False`).

In [121]:
# 1. Create Data
# Generate two random 2x3 matrices with integers between 0 and 9.
i = torch.randint(size=(2,3), low=0, high=10)
j = torch.randint(size=(2,3), low=0, high=10)

print("Tensor i:\n", i)
print("Tensor j:\n", j)

Tensor i:
 tensor([[6, 2, 0],
        [1, 8, 6]])
Tensor j:
 tensor([[5, 6, 4],
        [9, 2, 7]])


In [122]:
# --- Comparison Operations (Element-Wise) ---
# All these operations return a "Boolean Tensor" (True/False).
# They compare i[row][col] with j[row][col].

# 2. Greater Than (>)
# Returns True if the value in 'i' is strictly larger than 'j'.
i > j

tensor([[ True, False, False],
        [False,  True, False]])

In [123]:
# 3. Less Than (<)
# Returns True if the value in 'i' is strictly smaller than 'j'.
i < j

tensor([[False,  True,  True],
        [ True, False,  True]])

In [124]:
# 4. Equal To (==)
# Checks for exact equality.
# Crucial for calculating accuracy: sum(y_pred == y_true).
i == j

tensor([[False, False, False],
        [False, False, False]])

In [125]:
# 5. Not Equal To (!=)
# Returns True if values are different.
i != j

tensor([[True, True, True],
        [True, True, True]])

In [126]:
# 6. Greater Than or Equal To (>=)
# Returns True if 'i' is larger than OR equal to 'j'.
# Included based on your comment request.
i >= j

tensor([[ True, False, False],
        [False,  True, False]])

In [127]:
# 7. Less Than or Equal To (<=)
# Returns True if 'i' is smaller than OR equal to 'j'.
# Included based on your comment request.
i <= j

tensor([[False,  True,  True],
        [ True, False,  True]])

### üß† Deep Dive: Boolean Masks

In libraries like PyTorch and NumPy (the backbone of the Python Data Science Handbook), these operations create what we call a Boolean Mask.

1. What is a Boolean Mask?

> It is a tensor of the same shape as your data, but filled with True and False instead of numbers.

> * True = "Yes, this specific pixel/number satisfies your condition."

> * False = "No, it does not."

2. Real-World Use Cases

> Calculating Accuracy: To find out how well your model performed, you compare predictions to targets:

```
correct_predictions = (predictions == targets) # Boolean Mask

accuracy = correct_predictions.sum() / len(targets)
```

> **ReLU Activation (Manual)**:

> The ReLU function is essentially "keep positive numbers, set negatives to zero."

```
x[x < 0] = 0  # "Find all values < 0 and set them to 0"
```

This technique (masking) is faster than writing a for loop to check every number.



### 6. Special functions

Special functions introduce **non-linearity** into models.
Without them, neural networks would be only linear ‚Üí not powerful.

In [128]:
# 1. Create a Tensor
# We use float32 because log, exp, sqrt, and sigmoid require floating-point numbers.
# If you used integers, PyTorch would throw a runtime error.
k = torch.randint(size=(2,3), low=0, high=10, dtype=torch.float32)
print("Original Tensor k:\n", k)

Original Tensor k:
 tensor([[5., 6., 7.],
        [9., 9., 1.]])


In [129]:
# --- Mathematical Transformations ---

# 2. Natural Logarithm (ln)
# Computes natural log (ln) Applied element-wise
# Input must be positive as undefined for x <= 0 (returns -inf or nan).
# Used heavily in Loss Functions (like Cross-Entropy Loss).
torch.log(k)

tensor([[1.6094, 1.7918, 1.9459],
        [2.1972, 2.1972, 0.0000]])

In [130]:
# 3. Exponential (e^x)
# Calculates Euler's number (e ‚âà 2.718) raised to the power of k.
# This is the inverse of log. Used to undo log-transformations.
torch.exp(k)

tensor([[1.4841e+02, 4.0343e+02, 1.0966e+03],
        [8.1031e+03, 8.1031e+03, 2.7183e+00]])

In [131]:
# 4. Square Root
# Returns the square root of each element.
torch.sqrt(k)

tensor([[2.2361, 2.4495, 2.6458],
        [3.0000, 3.0000, 1.0000]])

In [132]:
# --- Activation Functions (The Heart of Deep Learning) ---

# 5. Sigmoid
# Squashes every number into the range (0, 1).
# Formula: œÉ(x) = 1 / (1 + e‚ÅªÀ£)
# Use Case: Binary Classification (converting a raw score into a probability).
torch.sigmoid(k)

tensor([[0.9933, 0.9975, 0.9991],
        [0.9999, 0.9999, 0.7311]])

In [133]:
# 6. Softmax
# Converts values into probabilities
# Converts a vector of numbers into a "Probability Distribution" that sums to 1.
# dim=0 means "make the columns sum to 1".
# dim=1 means "make the rows sum to 1".
# Use Case: Multi-Class Classification (e.g., "Is this image a Cat, Dog, or Bird?").
torch.softmax(k, dim=0)

tensor([[0.0180, 0.0474, 0.9975],
        [0.9820, 0.9526, 0.0025]])

In [134]:
# 7. ReLU (Rectified Linear Unit)
# The most popular activation function in modern AI.
# Formula: max(0, x).
# Logic: "If it's positive, keep it. If it's negative, turn it off (make it 0)."
# Use Case: Hidden layers of nearly all Deep Neural Networks (CNNs, Transformers).
torch.relu(k)

tensor([[5., 6., 7.],
        [9., 9., 1.]])

üß† Deep Dive: Activation Functions

These "special functions" are what make Neural Networks learn complex patterns. Without them, a neural network is just a giant Linear Regression model.

1. Sigmoid vs. ReLU

> * Sigmoid: Used to be popular, but it has a problem called "Vanishing Gradients" (for very high or low numbers, the slope becomes flat/zero, and the model stops learning).

> * ReLU: Solved this problem. It is computationally fast (just a max check) and keeps gradients alive for positive numbers.

2. The Softmax Magic

If your model outputs raw scores (logits) like [2.0, 1.0, 0.1], you can't say "There is a 200% chance it's class A". Softmax fixes this by normalizing them relative to each other:

* Input: [2.0, 1.0, 0.1]

* Output: [0.7, 0.2, 0.1] (approx) $\rightarrow$ "70% Class A, 20% Class B, 10% Class C".

* Note: Notice the dim=0 in your code. This is crucial. If you get the dimension wrong, you normalize the wrong batch of numbers!

### Detailed Explanation

1. **Element-Wise vs. Matrix Operations**
The most critical distinction in Linear Algebra libraries is between `*` and `matmul`.
* **Element-wise `(a * b)`**: Matches corresponding pixels/cells. Used for masking (e.g., set specific values to 0) or applying activation functions.
* **Matrix Multiplication `(a @ b or torch.matmul)`**: The "Row times Column" rule used in Neural Network layers.
2. **Rounding Behavior**

PyTorch (like NumPy) often defaults to **"Round Half to Even"** (also known as Banker's Rounding).
* Standard Rounding: 2.5 $\rightarrow$ 3
* Banker's Rounding: 2.5 $\rightarrow$ 2 (Even), 3.5 $\rightarrow$ 4 (Even).
* Why? Standard rounding introduces a slight upward bias in large datasets. Banker's rounding averages out errors over time.
3. The Power of `torch.clamp`

Clamping is essential for **Gradient Clipping** and **Probability Safety**.* **Exploding Gradients**: In RNNs, gradients can become huge (NaN). Using `clamp` keeps them manageable.
* **Log Safety**: If you try to calculate `log(x)` and `x` is 0, you get `-inf`. A common trick is `torch.log(torch.clamp(x, min=1e-9))` to prevent crashes.

###‚úÖ Rule of Thumb

* `+ - * / ** %` $\rightarrow$ element-wise

* `@` or `torch.matmul()` $\rightarrow$ matrix multiplication

## Inplace Operations

In [78]:
# 1. Initialize Tensors
# Create two random matrices of size 2x3
m = torch.rand(2,3)
n = torch.rand(2,3)

print("Original m:", m)
print("Original n:", n)

Original m: tensor([[0.6574, 0.3451, 0.0453],
        [0.9798, 0.5548, 0.6868]])
Original n: tensor([[0.4920, 0.0748, 0.9605],
        [0.3271, 0.0103, 0.9516]])


In [79]:
# 2. In-Place Addition
# The underscore (_) at the end of the method name is a PyTorch convention.
# It signifies that the operation happens IN-PLACE.
# This modifies 'm' directly in memory using the values from 'n'.
# No new memory is allocated for the result.
m.add_(n)

tensor([[1.1494, 0.4199, 1.0058],
        [1.3069, 0.5650, 1.6384]])

In [80]:
# 'm' has now changed.
print("Modified m:", m)
# 'n' remains the same.
print("Unchanged n:", n)

Modified m: tensor([[1.1494, 0.4199, 1.0058],
        [1.3069, 0.5650, 1.6384]])
Unchanged n: tensor([[0.4920, 0.0748, 0.9605],
        [0.3271, 0.0103, 0.9516]])


In [81]:
# 3. Standard ReLU (Out-of-Place)
# Applies the Rectified Linear Unit function (max(0, x)).
# # torch.relu(m) is an OUT-OF-PLACE operation
# It returns a new tensor where all negative values are replaced with 0
# This creates and returns a NEW tensor. The original 'm' is NOT touched.
new_tensor = torch.relu(m)

In [82]:
m

tensor([[1.1494, 0.4199, 1.0058],
        [1.3069, 0.5650, 1.6384]])

In [83]:
# 4. In-Place ReLU
# Note the underscore again: .relu_()
# This applies the function directly to the data inside 'm'.
# Any negative numbers in 'm' are replaced by 0 inside the same memory block.
m.relu_()

tensor([[1.1494, 0.4199, 1.0058],
        [1.3069, 0.5650, 1.6384]])

In [84]:
print("Final In-Place Modified m:", m)

Final In-Place Modified m: tensor([[1.1494, 0.4199, 1.0058],
        [1.3069, 0.5650, 1.6384]])


ReLU vs ReLU_

| Operation       | Modifies Tensor? | Returns New Tensor |
| --------------- | ---------------- | ------------------ |
| `torch.relu(m)` | ‚ùå No             | ‚úÖ Yes              |
| `m.relu_()`     | ‚úÖ Yes            | ‚ùå No               |


## Copying a Tensor

In [85]:
# 1. Create a Tensor
a = torch.rand(2,3)
print("Original a:\n", a)

Original a:
 tensor([[0.2855, 0.2324, 0.9141],
        [0.7668, 0.1659, 0.4393]])


In [86]:
# --- SCENARIO 1: Reference Assignment (Aliasing) ---
# We assign 'a' to 'b'. In Python, this DOES NOT create a new tensor.
# It effectively creates a new label 'b' that points to the SAME memory address as 'a'.
b = a

In [87]:
# Modification:
# We change the value at row 0, col 0 of 'a'.
a[0][0] = 0

In [88]:
# Result:
# Because 'b' points to the same object, 'b' is ALSO modified.
print("\n--- After Reference Assignment ---")
print("Modified a:\n", a)
print("Modified b (reflects a):\n", b)


--- After Reference Assignment ---
Modified a:
 tensor([[0.0000, 0.2324, 0.9141],
        [0.7668, 0.1659, 0.4393]])
Modified b (reflects a):
 tensor([[0.0000, 0.2324, 0.9141],
        [0.7668, 0.1659, 0.4393]])


In [89]:
# Proof:
# The id() function returns the unique memory address of the object.
# These two numbers will be IDENTICAL.
print(f"Address of a: {id(a)}")
print(f"Address of b: {id(b)}")

Address of a: 139246786257904
Address of b: 139246786257904


In [90]:
# --- SCENARIO 2: Cloning (True Copy) ---
# .clone() creates a brand new tensor in a NEW memory block,
# copying all the data from 'a' into it.
b = a.clone()

In [91]:
# Modification:
# We change 'a' again (setting top-left to 10).
a[0][0] = 10

In [92]:
# Result:
# 'a' changes, but 'b' REMAINS THE SAME as it was at the moment of cloning.
# They are now independent.
print("\n--- After Cloning ---")
print("Modified a (is 10):\n", a)
print("Independent b (is 0):\n", b)


--- After Cloning ---
Modified a (is 10):
 tensor([[10.0000,  0.2324,  0.9141],
        [ 0.7668,  0.1659,  0.4393]])
Independent b (is 0):
 tensor([[0.0000, 0.2324, 0.9141],
        [0.7668, 0.1659, 0.4393]])


In [93]:
# Proof:
# These two numbers will be DIFFERENT.
print(f"Address of a: {id(a)}")
print(f"Address of b: {id(b)}")

Address of a: 139246786257904
Address of b: 139242769653408


## Tensor Operations on GPU

In [94]:
torch.cuda.is_available()

True

In [95]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [96]:
# creating a new tensor on GPU
torch.rand((2, 3), device=device)

tensor([[0.3563, 0.0303, 0.7088],
        [0.2009, 0.0224, 0.9896]], device='cuda:0')

In [97]:
# moving an existing tensor to GPU
a = torch.rand((2, 3))
a

tensor([[0.2243, 0.8935, 0.0497],
        [0.1780, 0.3011, 0.1893]])

In [98]:
b = a.to(device)
b

tensor([[0.2243, 0.8935, 0.0497],
        [0.1780, 0.3011, 0.1893]], device='cuda:0')

In [99]:
import time
import torch

# -------------------------------
# Device selection
# -------------------------------

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device:", device)

# -------------------------------
# Matrix size (SAFE)
# -------------------------------

size = 1000  # CPU-safe, GPU-safe

# -------------------------------
# CPU computation
# -------------------------------

matrix_cpu1 = torch.rand(size, size)
matrix_cpu2 = torch.rand(size, size)

start_time = time.time()
result_cpu = torch.matmul(matrix_cpu1, matrix_cpu2)
end_time = time.time()

print(f"CPU time: {end_time - start_time:.4f} seconds")

# -------------------------------
# GPU computation
# -------------------------------

matrix_gpu1 = matrix_cpu1.to(device)
matrix_gpu2 = matrix_cpu2.to(device)

# IMPORTANT: synchronize before timing
torch.cuda.synchronize()
start_time = time.time()

result_gpu = torch.matmul(matrix_gpu1, matrix_gpu2)

# IMPORTANT: synchronize after computation
torch.cuda.synchronize()
end_time = time.time()

print(f"GPU time: {end_time - start_time:.4f} seconds")


Using device: cuda
CPU time: 0.0398 seconds
GPU time: 0.1096 seconds



---

````markdown
# üß† PyTorch Tensor Creation ‚Äî Practice Exercises

> **Instructions**
> - Try to answer **without running code first**
> - Write your answer on paper or in a code cell
> - Expand the hidden section **only after committing**

---

## ‚ùì Question 1
What is the **shape** of the following tensor?

```python
torch.zeros(4, 2)
````

<details>
<summary>‚úÖ Show Answer</summary>

**Shape:** `(4, 2)`

</details>

---

## ‚ùì Question 2

What is the **key difference** between `torch.empty()` and `torch.zeros()`?

<details>
<summary>‚úÖ Show Answer</summary>

* `torch.empty()` allocates memory **without initializing values**
* `torch.zeros()` initializes **all values to zero**

</details>

---

## ‚ùì Question 3

What type of values does `torch.rand(3, 3)` generate?

<details>
<summary>‚úÖ Show Answer</summary>

Random floating-point numbers drawn from a **uniform distribution between 0 and 1**

</details>

---

## ‚ùì Question 4

What will be the output of this code?

```python
torch.manual_seed(42)
a = torch.rand(2, 2)

torch.manual_seed(42)
b = torch.rand(2, 2)

a == b
```

<details>
<summary>‚úÖ Show Answer</summary>

A tensor of `True` values ‚Äî both tensors are **identical** because the seed was reset.

</details>

---

## ‚ùì Question 5

How is `torch.arange()` **different** from `torch.linspace()`?

<details>
<summary>‚úÖ Show Answer</summary>

* `arange()` ‚Üí uses a **step size**
* `linspace()` ‚Üí uses a **fixed number of points**

</details>

---

## ‚ùì Question 6

Predict the output:

```python
torch.arange(1, 10, 3)
```

<details>
<summary>‚úÖ Show Answer</summary>

```text
tensor([1, 4, 7])
```

</details>

---

## ‚ùì Question 7

Predict the output:

```python
torch.linspace(0, 1, 5)
```

<details>
<summary>‚úÖ Show Answer</summary>

```text
tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])
```

</details>

---

## ‚ùì Question 8

What does `torch.eye(4)` create, and where are the `1`s located?

<details>
<summary>‚úÖ Show Answer</summary>

A **4√ó4 identity matrix** with `1`s on the **main diagonal**

</details>

---

## ‚ùì Question 9

What is the output shape of this tensor?

```python
torch.full((2, 5), 7)
```

<details>
<summary>‚úÖ Show Answer</summary>

**Shape:** `(2, 5)`
All values are `7`

</details>

---

## ‚ùì Question 10

Which function should you use if you want to **convert a Python list into a tensor**?

<details>
<summary>‚úÖ Show Answer</summary>

`torch.tensor()`

</details>

---

## ‚ùì Question 11 (Conceptual)

Why is setting `torch.manual_seed()` important in machine learning experiments?

<details>
<summary>‚úÖ Show Answer</summary>

It ensures **reproducibility**, making experiments debuggable and comparable.

</details>

---

## ‚ùì Question 12 (Tricky)

Is this statement true or false?

> Two calls to `torch.rand()` without setting a seed will always produce the same output.

<details>
<summary>‚úÖ Show Answer</summary>

‚ùå **False** ‚Äî outputs differ unless the random seed is fixed.

</details>

---

## üß† Self-Assessment

* 10‚Äì12 correct ‚Üí üî• **Excellent**
* 7‚Äì9 correct ‚Üí üëç **Solid foundation**
* <7 ‚Üí üîÅ Revisit tensor creation basics

---

## üöÄ Next-Level Challenge (Optional)

Try to create:

1. A **3√ó3 tensor** filled with `-1`
2. A tensor with values **[0, 0.1, 0.2, ..., 1.0]**
3. A **5√ó5 identity matrix multiplied by 2**

(Write code before checking docs!)

```

---


Awesome ‚Äî let‚Äôs **lock this in with active recall** üß†üî•
Below is a **PyTorch tensor shapes & `_like()` functions exercise set** with **12 questions**, and **answers hidden** using collapsible sections.

You can paste this **directly into a Jupyter / Colab Markdown cell** üëá

---

````markdown
# üß† PyTorch Tensor Shapes & `_like()` ‚Äî Practice Exercises

> **Instructions**
> - Do NOT run code first
> - Predict shapes, values, or behavior mentally
> - Expand answers only after committing

---

## ‚ùì Question 1
What is the **shape** of the tensor below?

```python
x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
x.shape
````

<details>
<summary>‚úÖ Show Answer</summary>

`torch.Size([2, 3])`

</details>

---

## ‚ùì Question 2

What does the **first number** in `x.shape` represent?

<details>
<summary>‚úÖ Show Answer</summary>

Number of **rows** (or samples / batch elements)

</details>

---

## ‚ùì Question 3

Predict the **shape** of this tensor:

```python
torch.zeros_like(x)
```

<details>
<summary>‚úÖ Show Answer</summary>

Same shape as `x` ‚Üí `(2, 3)`

</details>

---

## ‚ùì Question 4

What is the **main danger** of using `torch.empty_like()`?

<details>
<summary>‚úÖ Show Answer</summary>

It contains **uninitialized garbage values** from memory.

</details>

---

## ‚ùì Question 5

True or False:

> `_like()` functions automatically copy the shape of the input tensor.

<details>
<summary>‚úÖ Show Answer</summary>

‚úÖ **True**

</details>

---

## ‚ùì Question 6

Why does this code specify `dtype=torch.float32`?

```python
torch.rand_like(x, dtype=torch.float32)
```

<details>
<summary>‚úÖ Show Answer</summary>

Because `x` is an **integer tensor**, but random values must be **floating-point**.

</details>

---

## ‚ùì Question 7

What happens if you run `torch.rand_like(x)` **without** specifying `dtype`?

<details>
<summary>‚úÖ Show Answer</summary>

It raises an error or produces incorrect behavior because random values cannot be stored in an integer tensor.

</details>

---

## ‚ùì Question 8

Which function would you use to create a tensor that:

* Has the same shape as `x`
* Is filled with `1`s

<details>
<summary>‚úÖ Show Answer</summary>

`torch.ones_like(x)`

</details>

---

## ‚ùì Question 9

Predict the output shape:

```python
y = torch.rand_like(x)
y.shape
```

<details>
<summary>‚úÖ Show Answer</summary>

Same as `x` ‚Üí `(2, 3)`

</details>

---

## ‚ùì Question 10

Fill in the blank:

> `_like()` functions help avoid __________ bugs.

<details>
<summary>‚úÖ Show Answer</summary>

**Shape mismatch** bugs

</details>

---

## ‚ùì Question 11 (Tricky)

Is this statement correct?

> `torch.zeros_like(x)` copies the values of `x` and replaces them with zeros.

<details>
<summary>‚úÖ Show Answer</summary>

‚ùå **False** ‚Äî it copies only the **shape and dtype**, not values.

</details>

---

## ‚ùì Question 12 (Conceptual)

Why are `_like()` functions preferred over hard-coding dimensions?

<details>
<summary>‚úÖ Show Answer</summary>

They make code **robust, scalable, and safe** when tensor shapes change (e.g., batch size).

</details>

---

## üß† Self-Check

* 10‚Äì12 correct ‚Üí üî• Excellent grasp
* 7‚Äì9 correct ‚Üí üëç Solid foundation
* <7 ‚Üí üîÅ Revisit shapes and `_like()` basics

---

## üöÄ Bonus Challenge (Optional)

Without running code, predict:

```python
a = torch.ones_like(x)
b = torch.zeros_like(a)
c = torch.rand_like(b, dtype=torch.float32)
```

* What is the shape of `c`?
* What is the datatype of `c`?

```

---

If you want next:
- üß© **Broadcasting prediction puzzles**
- üî• **Interview-style shape traps**
- üß† **Visual diagrams for tensor flow**

Just tell me how hard you want it üòÑ
```
