### 🚀 Open in Google Colab
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Geekgineer/DL-Essentials-PyTorch/blob/main/1-Data-Manipulation-with-PyTorch/Data-Manipulation-with-PyTorch.ipynb
)


In [None]:
import os

# List of required files
file_urls = [
    "https://raw.githubusercontent.com/Geekgineer/DL-Essentials-PyTorch/main/1-Data-Manipulation-with-PyTorch/Data_Manipulation.ipynb",
    "https://raw.githubusercontent.com/Geekgineer/DL-Essentials-PyTorch/main/1-Data-Manipulation-with-PyTorch/utils.py"
]

# Download each file
for url in file_urls:
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        os.system(f"wget -q {url}")

# Verify downloaded files
!ls



Cloning into 'DL-Essentials-PyTorch'...
remote: Enumerating objects: 6, done.[K
remote: Counting objects: 100% (6/6), done.[K
remote: Compressing objects: 100% (6/6), done.[K
remote: Total 6 (delta 0), reused 5 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (6/6), done.
remote: Enumerating objects: 2, done.[K
remote: Counting objects: 100% (2/2), done.[K
remote: Total 2 (delta 0), reused 2 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (2/2), 81 bytes | 81.00 KiB/s, done.
/home/amer/Desktop/OBS/data/Mastring-PyTorch/أساسيات الرياضيات والتعلم العميق باستخدام PyTorch | Deep Learning Essentials with PyTorch
/1-Data-Manipulation-with-PyTorch/DL-Essentials-PyTorch


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


remote: Enumerating objects: 9, done.[K
remote: Counting objects: 100% (9/9), done.[K
remote: Compressing objects: 100% (9/9), done.[K
remote: Total 9 (delta 0), reused 9 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (9/9), 1.54 MiB | 1.80 MiB/s, done.
Updating files: 100% (12/12), done.
/home/amer/Desktop/OBS/data/Mastring-PyTorch/أساسيات الرياضيات والتعلم العميق باستخدام PyTorch | Deep Learning Essentials with PyTorch
/1-Data-Manipulation-with-PyTorch/DL-Essentials-PyTorch/1-Data-Manipulation-with-PyTorch
broadcastplot.py  Data-Manipulation-with-PyTorch.ipynb	imgs


In [31]:
from google.colab import drive
drive.mount('/content/drive/')

import os
os.chdir('/content/drive/My Drive/Colab Notebooks/DeepLearningEssentials')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


# Introduction



<img src="https://github.com/user-attachments/assets/29a6f82d-4084-4964-8964-cd851f55d567" alt="Overall Image" width="800" height="500">


1. **Deep Learning Revolution**:
   - Automates feature extraction that traditionally required manual engineering.
   - Uses large datasets to train models that approximate complex functions.

2. **PyTorch's Flexibility and Ease of Use**:
   - Praised for its simplicity and Pythonic nature.
   - Supports accelerated computation on GPUs, ideal for deep learning projects.

3. **Building Blocks of PyTorch**:
   - **Tensors**: For multidimensional arrays.
   - **Autograd Engine**: For automatic differentiation, essential for training neural networks.

4. **Training and Deployment**:
   - Supports the full cycle: data loading, training, and deployment.
   - Tools: TorchScript for deployment and ONNX format for exporting models.

5. **Hardware and Software Requirements**:
   - Basic tasks can be performed on standard hardware.
   - Advanced projects benefit from a CUDA-capable GPU for faster training.

6. **Evolution of Deep Learning Libraries**:
   - Consolidation in the deep learning landscape.
   - PyTorch and TensorFlow are the dominant libraries.


# Data Manipulation with Tensors

Purpose of Data Manipulation
1. **Acquisition**: Gathering data from various sources.
2. **Processing**: Performing operations and transformations on data inside the computer.

Tensors: n-Dimensional Arrays
- **Definition**: Fundamental data structure for storing and manipulating data.
- **Similarity to NumPy**: Tensors resemble NumPy's `ndarray` but with additional features.

Features of Tensor Classes
- **Automatic Differentiation**: Facilitates gradient computation for optimization, crucial for training neural networks.
- **GPU Acceleration**: Speeds up numerical computations by utilizing GPUs, unlike NumPy which is CPU-based.

Advantages
- **Ease of Coding**: Simplifies the implementation of neural network operations.
- **Efficiency**: Enhances performance by leveraging hardware acceleration.

<img src="https://github.com/user-attachments/assets/7e7c6a12-4f14-49c8-b81f-3971a36fc3e8" alt="worldasfb Image" width="800" height="600">


Floating-Point Numbers
- **Encoding and Decoding**: Converting real-world data into a format digestible by neural networks and decoding output back to usable information.
- **Intermediate Representations**: Sequences of floating-point numbers that capture data characteristics at various stages of transformation.

PyTorch Tensors
- **Definition**: Generalization of vectors and matrices to multiple dimensions (multidimensional arrays).
- **Comparison with NumPy**: PyTorch tensors offer advantages such as GPU acceleration and efficient computation across devices.

Key Concepts
- **Tensor Basics**: Fundamental structure for storing and manipulating data in PyTorch.
- **Capabilities**: Includes fast operations, GPU support, and integration with NumPy and other scientific libraries.

<img src="https://github.com/user-attachments/assets/1caf174f-805d-4480-824a-3626aa1ebc73" alt="tensors Image" width="800" height="300">


(**To start, we import the PyTorch library.
Note that the package name is `torch`.**)


In [35]:
import torch
torch.__version__

'2.4.1+cu121'

**The Essence of Tensors**

- **Python Lists vs. Tensors**:
  - Python lists or tuples of numbers store individual Python objects, each with its own memory allocation.
  - PyTorch tensors or NumPy arrays, in contrast, are views over contiguous memory blocks with unboxed C numeric types, making them more memory efficient.

- **Memory Efficiency**:
  - For example, a 1D tensor with 1,000,000 float numbers requires exactly 4,000,000 bytes plus a small overhead for metadata.

- **Creating Tensors**:
  - A tensor can be initialized using `torch.zeros()` to create an appropriately sized array, which can then be filled with specific values.
  - Alternatively, tensors can be directly created from Python lists using `torch.tensor()`.

- **Tensor Operations**:
  - Accessing elements in a tensor can be done with indexing, e.g., `points[0, 1]` retrieves the value at the specified index.
  - A tensor's shape can be queried with `.shape`, indicating the size along each dimension.

- **Views and Memory**:
  - Accessing a subset of a tensor (e.g., a row or column) does not allocate new memory or copy data; it creates a new view on the existing data to improve efficiency.

<img src="https://github.com/user-attachments/assets/638e0cd3-d1b4-4191-a6ef-cd89ce9229d9" alt="listtensormemory Image" width="800" height="300">



[**A tensor represents a (possibly multidimensional) array of numerical values.**]
In the one-dimensional case, i.e., when only one axis is needed for the data,
a tensor is called a *vector*.
With two axes, a tensor is called a *matrix*.
With $k > 2$ axes, we drop the specialized names
and just refer to the object as a $k^\textrm{th}$-*order tensor*.


In [37]:
import torch

# Create a sample tensor for demonstration
tensor = torch.tensor([10, 20, 30, 40, 50, 60, 70, 80])

# Display the tensor
print("Original tensor:", tensor)

# 1. All elements in the list from element 1 inclusive to element 4 exclusive
sub_tensor1 = tensor[1:4]
print("Elements from 1 inclusive to 4 exclusive:", sub_tensor1)

# 2. From element 1 inclusive to the end of the list
sub_tensor2 = tensor[1:]
print("Elements from 1 inclusive to the end:", sub_tensor2)

# 3. From the start of the list to element 4 exclusive
sub_tensor3 = tensor[:4]
print("Elements from the start to 4 exclusive:", sub_tensor3)

# 4. From the start of the list to one before the last element
sub_tensor4 = tensor[:-1]
print("Elements from the start to one before the last:", sub_tensor4)

# 5. From element 1 inclusive to element 4 exclusive, in steps of 2
sub_tensor5 = tensor[1:4:2]
print("Elements from 1 to 4 exclusive, in steps of 2:", sub_tensor5)



Original tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80])
Elements from 1 inclusive to 4 exclusive: tensor([20, 30, 40])
Elements from 1 inclusive to the end: tensor([20, 30, 40, 50, 60, 70, 80])
Elements from the start to 4 exclusive: tensor([10, 20, 30, 40])
Elements from the start to one before the last: tensor([10, 20, 30, 40, 50, 60, 70])
Elements from 1 to 4 exclusive, in steps of 2: tensor([20, 40])


In [38]:
# Create a 2D tensor for the following operations
tensor_2d = torch.tensor([[1, 2, 3],
                          [4, 5, 6],
                          [7, 8, 9]])

# Display the 2D tensor
print("\n2D tensor:\n", tensor_2d)

# 6. All rows after the first; implicitly all columns
sub_tensor6 = tensor_2d[1:]
print("All rows after the first; implicitly all columns:\n", sub_tensor6)

# 7. All rows after the first; all columns
sub_tensor7 = tensor_2d[1:, :]
print("All rows after the first; all columns:\n", sub_tensor7)

# 8. All rows after the first; first column
sub_tensor8 = tensor_2d[1:, 0]
print("All rows after the first; first column:\n", sub_tensor8)

# 9. Adds a dimension of size 1, just like unsqueeze
# Adding a new dimension at position 0
sub_tensor9 = tensor_2d.unsqueeze(0)
print("\nTensor with an added dimension (unsqueeze at position 0):\n", sub_tensor9)


2D tensor:
 tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
All rows after the first; implicitly all columns:
 tensor([[4, 5, 6],
        [7, 8, 9]])
All rows after the first; all columns:
 tensor([[4, 5, 6],
        [7, 8, 9]])
All rows after the first; first column:
 tensor([4, 7])

Tensor with an added dimension (unsqueeze at position 0):
 tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]])


In [39]:
sub_tensor9.size()

torch.Size([1, 3, 3])

PyTorch provides a variety of functions
for creating new tensors
prepopulated with values.
For example, by invoking `arange(n)`,
we can create a vector of evenly spaced values,
starting at 0 (included)
and ending at `n` (not included).
By default, the interval size is $1$.
Unless otherwise specified,
new tensors are stored in main memory
and designated for CPU-based computation.


Specifying the numeric type with `dtype`

PyTorch Tensor Data Types

| `dtype` Argument          | Description                                 |
|---------------------------|---------------------------------------------|
| `torch.float32` or `torch.float`  | 32-bit floating-point                        |
| `torch.float64` or `torch.double` | 64-bit double-precision floating-point       |
| `torch.float16` or `torch.half`   | 16-bit half-precision floating-point          |
| `torch.int8`               | Signed 8-bit integers                        |
| `torch.uint8`              | Unsigned 8-bit integers                      |
| `torch.int16` or `torch.short`    | Signed 16-bit integers                       |
| `torch.int32` or `torch.int`      | Signed 32-bit integers                       |
| `torch.int64` or `torch.long`     | Signed 64-bit integers                       |
| `torch.bool`               | Boolean                                     |




In [72]:
x = torch.arange(12, dtype=torch.int64)
x

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [41]:
x = x.to(dtype=torch.short)
x.dtype

torch.int16

**Summary Table**

| Format    | Bits | Sign Bits | Exponent Bits | Mantissa Bits | Range                            | Precision                |
|-----------|------|-----------|---------------|---------------|----------------------------------|--------------------------|
| Float32   | 32   | 1         | 8             | 23            | ±3.4 × 10^38                      | ~7 decimal digits         |
| Float64   | 64   | 1         | 11            | 52            | ±1.8 × 10^308                     | ~15-17 decimal digits     |
| BFloat16  | 16   | 1         | 8             | 7             | Similar to Float32 (reduced precision) | ~3 decimal digits       |
| TF32      | 19   | 1         | 8             | 10            | Similar to Float32                | Balanced for deep learning tasks |

**Use Cases**

- **Float32** is commonly used for general floating-point arithmetic in many applications.
- **Float64** is used in scenarios requiring higher precision, such as scientific computations.
- **BFloat16** and **TF32** are optimized for specific hardware (e.g., TPUs and NVIDIA Ampere GPUs) and are used in deep learning to balance performance and precision.


In [42]:
import torch

# Example of enabling mixed precision and TF32
with torch.autocast("cuda", dtype=torch.bfloat16):
    # Perform tensor operations here with mixed precision
    tensor = torch.randn((1024, 1024), device='cuda')
    result = torch.matmul(tensor, tensor)

# Check GPU architecture and enable TF32 if using Ampere or later
if torch.cuda.get_device_properties(0).major >= 8:
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True

Each of these values is called
an *element* of the tensor.
The tensor `x` contains 12 elements.
We can inspect the total number of elements
in a tensor via its `numel` method.


In [43]:
x.numel()

12

(**We can access a tensor's *shape***)
(the length along each axis)
by inspecting its `shape` attribute.
Because we are dealing with a vector here,
the `shape` contains just a single element
and is identical to the size.


In [44]:
x.shape

torch.Size([12])

Tensors: Scenic Views of Storage


<img src="https://github.com/user-attachments/assets/ccb8c1c1-48b6-4da9-bed1-a9f76d393571" alt="tensorview Image" width="700" height="400">




- **Storage Basics**:
  - Tensors use contiguous memory blocks managed by `torch.Storage`.
  - A `Storage` instance is a one-dimensional array of numerical data (e.g., `float32`, `int64`).

- **Tensor as a View**:
  - A PyTorch `Tensor` is a view into a `Storage` instance.
  - Tensors index into storage using offsets and strides.

- **Multiple Views**:
  - Multiple tensors can reference the same `Storage`, even with different indexing.
  - This allows for creating different tensor views (e.g., 1D vs. 2D) of the same underlying data.

- **Efficiency**:
  - The underlying memory is allocated once, making it efficient to create different tensor views.


In [46]:
import torch

data = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
# Inspect storage
storage = data.untyped_storage()
print("Length of storage:", len(storage))
print("Storage contents:", list(storage))


Length of storage: 96
Storage contents: [2, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]


## Indexing and Slicing

As with  Python lists,
we can access tensor elements
by indexing (starting with 0).
To access an element based on its position
relative to the end of the list,
we can use negative indexing.
Finally, we can access whole ranges of indices
via slicing (e.g., `X[start:stop]`),
where the returned value includes
the first index (`start`) *but not the last* (`stop`).
Finally, when only one index (or slice)
is specified for a $k^\textrm{th}$-order tensor,
it is applied along axis 0.
Thus, in the following code,
[**`[-1]` selects the last row and `[1:3]`
selects the second and third rows**].


In [47]:
X = data

In [48]:
X[-1], X[1:3]

(tensor([4, 3, 2, 1]),
 tensor([[1, 2, 3, 4],
         [4, 3, 2, 1]]))

Beyond reading them, (**we can also *write* elements of a matrix by specifying indices.**)


In [49]:
X[1, 2] = 17
X

tensor([[ 2,  1,  4,  3],
        [ 1,  2, 17,  4],
        [ 4,  3,  2,  1]])

If we want [**to assign multiple elements the same value,
we apply the indexing on the left-hand side
of the assignment operation.**]
For instance, `[:2, :]`  accesses
the first and second rows,
where `:` takes all the elements along axis 1 (column).
While we discussed indexing for matrices,
this also works for vectors
and for tensors of more than two dimensions.


In [50]:
X[:2, :] = 12
X

tensor([[12, 12, 12, 12],
        [12, 12, 12, 12],
        [ 4,  3,  2,  1]])

We can [**change the shape of a tensor
without altering its size or values**],
by invoking `reshape`.
For example, we can transform
our vector `x` whose shape is (12,)
to a matrix `X`  with shape (3, 4).
This new tensor retains all elements
but reconfigures them into a matrix.
Notice that the elements of our vector
are laid out one row at a time and thus
`x[3] == X[0, 3]`.


In [51]:
x

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11], dtype=torch.int16)

In [52]:
X = x.reshape(3, 4)
X

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]], dtype=torch.int16)

Note that specifying every shape component
to `reshape` is redundant.
Because we already know our tensor's size,
we can work out one component of the shape given the rest.
For example, given a tensor of size $n$
and target shape ($h$, $w$),
we know that $w = n/h$.
To automatically infer one component of the shape,
we can place a `-1` for the shape component
that should be inferred automatically.
In our case, instead of calling `x.reshape(3, 4)`,
we could have equivalently called `x.reshape(-1, 4)` or `x.reshape(3, -1)`.

Practitioners often need to work with tensors
initialized to contain all 0s or 1s.
[**We can construct a tensor with all elements set to 0**] (~~or one~~)
and a shape of (2, 3, 4) via the `zeros` function.


In [53]:
torch.zeros((2, 3, 4))

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

Similarly, we can create a tensor
with all 1s by invoking `ones`.


In [54]:
torch.ones((2, 3, 4))

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

We often wish to
[**sample each element randomly (and independently)**]
from a given probability distribution.
For example, the parameters of neural networks
are often initialized randomly.
The following snippet creates a tensor
with elements drawn from
a standard Gaussian (normal) distribution
with mean 0 and standard deviation 1.


In [55]:
torch.randn(3, 4)

tensor([[ 1.0460,  1.2824, -0.8506, -0.5494],
        [ 0.2723,  0.1910,  1.1940,  0.8288],
        [ 0.7304, -0.0231, -1.7699,  0.9374]])

Finally, we can construct tensors by
[**supplying the exact values for each element**]
by supplying (possibly nested) Python list(s)
containing numerical literals.
Here, we construct a matrix with a list of lists,
where the outermost list corresponds to axis 0,
and the inner list corresponds to axis 1.


In [56]:
data = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
data

tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])

## Operations

- **Mathematical Operations**:
  - Tensors can be manipulated with various operations.
  - **Elementwise Operations** are a key type, applying scalar operations to each tensor element.

- **Elementwise Functions**:
  - Unary functions map a scalar to a scalar (e.g., $e^x$).
  - Apply elementwise to each tensor element.

- **Binary Operations**:
  - For functions with two tensor inputs, apply the operation to each pair of corresponding elements.

- **Notation**:
  - Unary operators: $f: \mathbb{R} \rightarrow \mathbb{R}$, mapping real numbers to real numbers.


PyTorch Tensor Operations

| Operation Group                          | Description                                             | Examples                   |
|------------------------------------------|---------------------------------------------------------|----------------------------|
| **Creation ops**                        | Functions for constructing tensors                     | `ones`, `from_numpy`       |
| **Indexing, slicing, joining, mutating ops** | Functions for modifying tensor shape, stride, or content | `transpose`                |
| **Math ops**                            | Functions for tensor computations                      |                            |
| - Pointwise ops                         | Apply a function to each element independently         | `abs`, `cos`               |
| - Reduction ops                         | Compute aggregate values across tensors                | `mean`, `std`, `norm`      |
| - Comparison ops                        | Evaluate numerical predicates over tensors             | `equal`, `max`             |
| - Spectral ops                          | Transform and operate in the frequency domain          | `stft`, `hamming_window`   |
| - Other operations                      | Special functions for vectors and matrices             | `cross`, `trace`           |
| - BLAS and LAPACK operations            | Scalar, vector, and matrix operations following BLAS    |                            |
| **Random sampling**                     | Generate values from probability distributions          | `randn`, `normal`          |
| **Serialization**                       | Save and load tensors                                  | `load`, `save`             |
| **Parallelism**                         | Control number of threads for parallel CPU execution   | `set_num_threads`    |

The online docs provide exhaustive and well organized example (http://pytorch.org/docs).


When mixing input types in operations, the inputs are converted to the larger type
automatically. Thus, if we want 32-bit computation, we need to make sure all our
inputs are (at most) 32-bit

In [57]:
torch.exp(x)

tensor([1.0000e+00, 2.7183e+00, 7.3891e+00, 2.0086e+01, 5.4598e+01, 1.4841e+02,
        4.0343e+02, 1.0966e+03, 2.9810e+03, 8.1031e+03, 2.2026e+04, 5.9874e+04])

Binary Scalar Operators

- **Binary Operators**:
  - Map pairs of real numbers to a single real number.
  - Notation: $f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}$.

- **Elementwise Binary Operation**:
  - Given two vectors $\mathbf{u}$ and $\mathbf{v}$ of the same shape, and a binary operator $f$:
    - Produce a vector $\mathbf{c} = F(\mathbf{u}, \mathbf{v})$.
    - Each element $c_i$ is computed as $c_i \gets f(u_i, v_i)$, where $c_i$, $u_i$, and $v_i$ are the $i$-th elements of $\mathbf{c}$, $\mathbf{u}$, and $\mathbf{v}$, respectively.

- **Function Lifting**:
  - Scalar function $f$ is "lifted" to operate elementwise on vectors.
  - Notation: $F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d$, extending the scalar function to vectors.

- **Standard Arithmetic Operators**:
  - Common operations like addition (`+`), subtraction (`-`), multiplication (`*`), division (`/`), and exponentiation (`**`) are lifted to handle elementwise operations on tensors of the same shape.


In [58]:
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x ** y

(tensor([ 3.,  4.,  6., 10.]),
 tensor([-1.,  0.,  2.,  6.]),
 tensor([ 2.,  4.,  8., 16.]),
 tensor([0.5000, 1.0000, 2.0000, 4.0000]),
 tensor([ 1.,  4., 16., 64.]))

Tensor Operations

- **Elementwise Computations**:
  - Perform operations like addition and multiplication on corresponding elements of tensors.
  - For details on linear algebraic operations (e.g., dot products, matrix multiplications), see :numref:`sec_linear-algebra`.

- **Concatenation**:
  - Combine multiple tensors by stacking them along a specified axis.
  - Provide a list of tensors and specify the axis for concatenation.
  - Example: Concatenating two matrices along different axes:
    - **Along rows (axis 0)**: The resulting tensor's axis-0 length is the sum of the input tensors' axis-0 lengths.
    - **Along columns (axis 1)**: The resulting tensor's axis-1 length is the sum of the input tensors' axis-1 lengths.



In [59]:
X = torch.arange(12, dtype=torch.float32).reshape((3,4))

#   X = ([[ 0.,  1.,  2.,  3.],
#         [ 4.,  5.,  6.,  7.],
#         [ 8.,  9., 10., 11.]])

Y = torch.tensor([[2.0, 1, 4, 3],
                  [1  , 2, 3, 4],
                  [4  , 3, 2, 1]])

torch.cat((X, Y), dim=0), torch.cat((X, Y), dim=1)

(tensor([[ 0.,  1.,  2.,  3.],
         [ 4.,  5.,  6.,  7.],
         [ 8.,  9., 10., 11.],
         [ 2.,  1.,  4.,  3.],
         [ 1.,  2.,  3.,  4.],
         [ 4.,  3.,  2.,  1.]]),
 tensor([[ 0.,  1.,  2.,  3.,  2.,  1.,  4.,  3.],
         [ 4.,  5.,  6.,  7.,  1.,  2.,  3.,  4.],
         [ 8.,  9., 10., 11.,  4.,  3.,  2.,  1.]]))

Sometimes, we want to
[**construct a binary tensor via *logical statements*.**]
Take `X == Y` as an example.
For each position `i, j`, if `X[i, j]` and `Y[i, j]` are equal,
then the corresponding entry in the result takes value `1`,
otherwise it takes value `0`.


In [60]:
X == Y

tensor([[False,  True, False,  True],
        [False, False, False, False],
        [False, False, False, False]])

[**Summing all the elements in the tensor**] yields a tensor with only one element.


In [61]:
X.sum()

tensor(66.)

Tensor metadata: Size, offset, and stride


- **Transposing**:
  - `points.t()` transposes a tensor without copying data.
  - Transposing changes strides but uses the same storage.

- **Storage and Strides**:
  - Transposed tensors share the same storage as the original.
  - Example: `points.stride()` vs. `points_t.stride()`

- **Higher-Dimensional Transposing**:
  - Apply `transpose(dim1, dim2)` to higher-dimensional tensors.

- **Contiguous Tensors**:
  - Contiguous tensors store elements sequentially.
  - Use `tensor.is_contiguous()` to check and `tensor.contiguous()` to ensure contiguity.

- **Moving to GPU**:
  - Tensors can be moved to a GPU for faster computations using GPU-specific routines.



Contiguity and Tensor Views

<img src="https://github.com/user-attachments/assets/c50a0940-f8c4-424e-b293-fcd6c28f11b1" alt="tesnorview Image" width="700" height="400">

Relationship between a tensor’s offset, size, and stride. Here the tensor is a view
of a larger storage, like one that might have been allocated when creating a larger tensor.
Essentially, strides help in navigating through the tensor’s data in memory.

- **Tensor View**:
  - Provides a different shape or indexing perspective on the same data.
  - Created by operations like slicing or transposing.

- **Contiguous Tensor**:
  - Data is stored in a single, linear block of memory.
  - Efficient for memory access.

- **Non-Contiguous Tensor**:
  - Data is not in a single block, often due to stride changes.

- **Views and Contiguity**:
  - Views can be contiguous or non-contiguous based on data access.
  - Non-contiguous views might need conversion for certain operations.

- **Making Contiguous**:
  - Use `tensor.contiguous()` to create a contiguous version of a non-contiguous tensor.

## Broadcasting

<img src="https://github.com/user-attachments/assets/fd4078db-00ba-4d63-be80-103a149bc1e4" alt="tensorbroadcast Image" width="700" height="600">

- **Elementwise Operations**:
  - Typically performed on tensors of the same shape.

- **Broadcasting Mechanism**:
  - **Step 1**: Expand tensors with shape differences by copying elements along axes of length 1 to match shapes.
  - **Step 2**: Perform elementwise operations on the expanded tensors.

- **Purpose**: Allows elementwise operations on tensors with different shapes by aligning them through expansion.



**Key Scenarios and Benefits of Broadcasting in Neural Networks**

| **Scenario**             | **Usage**                                                | **Benefit**                                              |
|--------------------------|----------------------------------------------------------|----------------------------------------------------------|
| **Elementwise Operations** | Adding biases to activations, applying activation functions | Simplifies operations by expanding shapes automatically, avoiding manual reshaping |
| **Normalization**         | Normalizing data (e.g., batch normalization)             | Efficiently applies normalization statistics across a batch without explicit reshaping |
| **Loss Functions**        | Comparing predictions to ground truths (e.g., cross-entropy loss) | Handles tensors of different shapes for loss calculations, simplifying the implementation |
| **Convolution Operations** | Applying convolutional kernels to different parts or channels | Facilitates efficient application of kernels across various dimensions using broadcasting |
| **Parameter Updates**     | Updating weights and biases during backpropagation       | Ensures gradients are applied correctly to parameters of different shapes |



In [64]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b

(tensor([[0],
         [1],
         [2]]),
 tensor([[0, 1]]))

Since `a` and `b` are $3\times1$
and $1\times2$ matrices, respectively,
their shapes do not match up.
Broadcasting produces a larger $3\times2$ matrix
by replicating matrix `a` along the columns
and matrix `b` along the rows
before adding them elementwise.


In [65]:
a + b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

## Saving Memory

- **Memory Allocation**:
  - Operations may allocate new memory for results.
  - Example: `Y = X + Y` creates new memory for the result.

- **Memory Address**:
  - Use Python's `id()` function to check memory addresses.
  - After `Y = Y + X`, `id(Y)` changes, indicating `Y` now points to new memory.

- **Explanation**:
  - Python evaluates `Y + X`, allocates new memory, and updates `Y` to point to this new location.


In [66]:
before = id(Y)
Y = Y + X
id(Y) == before

False

Issues with Unnecessary Memory Allocation

- **Memory Allocation Concerns**:
  - Frequent allocation of new memory can be inefficient.
  - In machine learning, large models update parameters rapidly, making in-place updates desirable.

- **In-Place Updates**:
  - Reduces memory usage by updating existing data rather than allocating new memory.
  - Essential for handling large parameters efficiently.

- **Multiple References**:
  - Variables may point to the same parameters.
  - Without in-place updates, managing these references can be tricky, risking memory leaks or stale data.

- **Best Practice**:
  - Prefer in-place operations to avoid unnecessary memory allocation and ensure consistent updates across references.


Performing In-Place Operations

- **In-Place Operations**:
  - Assign results directly to an existing array using slice notation: `Y[:] = <expression>`.

- **Example**:
  - Overwrite values of tensor `Z` with the same shape as `Y` using `zeros_like`.

- **Benefit**:
  - Efficiently updates the contents of an existing tensor without allocating new memory.


In [67]:
Z = torch.zeros_like(Y)
print('id(Z):', id(Z))
Z[:] = X + Y
print('id(Z):', id(Z))

id(Z): 134659203321984
id(Z): 134659203321984


[**If the value of `X` is not reused in subsequent computations,
we can also use `X[:] = X + Y` or `X += Y`
to reduce the memory overhead of the operation.**]


In [68]:
before = id(X)
X += Y
id(X) == before

True

## Conversion to Other Python Objects


[**Converting to a NumPy tensor (`ndarray`)**], or vice versa, is easy.
The torch tensor and NumPy array
will share their underlying memory,
and changing one through an in-place operation
will also change the other.


In [69]:
A = X.numpy()
B = torch.from_numpy(A)
type(A), type(B)

(numpy.ndarray, torch.Tensor)

To (**convert a size-1 tensor to a Python scalar**),
we can invoke the `item` function or Python's built-in functions.


In [70]:
a = torch.tensor([3.5])
a, a.item(), float(a), int(a)

(tensor([3.5000]), 3.5, 3.5, 3)

## Summary

The tensor class is the main interface for storing and manipulating data in deep learning libraries.
Tensors provide a variety of functionalities including construction routines; indexing and slicing; basic mathematics operations; broadcasting; memory-efficient assignment; and conversion to and from other Python objects.

