# Appendix A: The Little Book of Tensors
Welcome to Appendix A. This notebook contains the listings for Appendix A, which introduces the essential properties of tensors and the operations you can perform on them.

#Listing A-1 Properties of Tensors
Tensors expose many properties that describe their structure, layout, data type, device placement, and relationship to automatic differentiation.

Here is a simple way to print many of them:

In [None]:
import torch

tensor = torch.tensor([[1.0, 2.0], [3.0, 4.0]], requires_grad=True)

print("Shape:", tensor.shape)
print("Rank (Number of Dimensions):", tensor.dim())
print("Number of Elements:", tensor.numel())
print("Data Type:", tensor.dtype)
print("Device:", tensor.device)
print("Strides:", tensor.stride())
print("Requires Gradient:", tensor.requires_grad)
print("Gradient Function:", tensor.grad_fn)
print("Is Contiguous:", tensor.is_contiguous())
print("Element Size (bytes):", tensor.element_size())
print("Storage Offset:", tensor.storage_offset())
print("Data Pointer:", tensor.data_ptr())
print("Layout:", tensor.layout)
print("Is Sparse:", tensor.is_sparse)
print("Is Quantized:", tensor.is_quantized)
print("Is CUDA:", tensor.is_cuda)
print("Is Pinned:", tensor.is_pinned())

#Listing A-2 Preparing Token Sequences with Indexing, Slicing, and Masking
This example prepares a batch of tokenized sentences for a language model that uses padding.

In [None]:
import torch

# Real-world example: preparing a batch of tokenized sentences
# for a language model that uses padding.

# tokens.shape = (batch_size, seq_len)
# 0 is the padding token.
tokens = torch.tensor([
    [101, 2009, 2003, 1037, 2154,   0,   0,   0],  # "It is a nice [PAD] [PAD] [PAD]"
    [101, 1045, 2293, 3679, 3185, 102,   0,   0],  # "I love reading books [SEP] [PAD] [PAD]"
])
print("tokens:\n", tokens)

PAD_ID = 0

# Indexing: pick specific samples
# Example: select the second sentence from the batch
second_sentence = tokens[1]
print("\nSecond sentence (indexing):\n", second_sentence)

# Slicing: pick ranges (time steps)
# Example: model only uses the first 5 tokens of each sentence
first_five_tokens = tokens[:, :5]
print("\nFirst 5 tokens of each sentence (slicing):\n", first_five_tokens)

# Masking: ignore padding tokens
# Create a mask where True means "real token" and False means "padding"
non_pad_mask = tokens != PAD_ID
print("\nNon-padding mask (masking):\n", non_pad_mask)

# Use the mask to get all real tokens in a flat view
real_tokens = tokens[non_pad_mask]
print("\nAll real tokens (flattened, padding removed):\n", real_tokens)

# In practice, the non_pad_mask is also used to:
# - compute loss only on real tokens
# - build attention masks for Transformer models


#Listing A-3 Concatenation and Splitting Features in Real Model Workflow
Concatenation and splitting appear throughout deep learning workflows.

Concatenation is commonly used when combining feature representations from different sources. Examples include merging image features with tabular metadata, stitching together embeddings from multiple encoders, or joining sequence segments into a single input for a model.

Splitting is used for the opposite purpose. It divides data into manageable parts. Common uses include creating micro-batches when memory is limited, dividing long sequences into shorter windows, and splitting inputs for parallel or distributed processing.

The following example shows both operations in a realistic setting.



In [None]:
import torch

# Image features from a CNN
# Shape: (batch, image_feature_dim)
image_features = torch.randn(12, 128)

# Metadata features
# Shape: (batch, metadata_feature_dim)
metadata = torch.randn(12, 3)

# Concatenate features along the feature dimension
combined = torch.cat((image_features, metadata), dim=1)
print("Combined feature shape:", combined.shape)
# (12, 131)

# Split into micro-batches of size 4
micro_batches = torch.split(combined, 4)

print("Number of micro-batches:", len(micro_batches))
for i, mb in enumerate(micro_batches):
    print(f"Micro-batch {i} shape:", mb.shape)


#Listing A-4 Matrix Operations in a Simple Forward Pass
Matrix operations appear throughout a model’s forward pass. Inputs are reshaped, dimen-sions are reordered, and values are combined through matrix multiplication to produce in-termediate results and final outputs. While each operation is simple on its own, their repeat-ed composition is what enables models to learn useful structure from data.

The following example shows how these operations appear inside a simple model:

In [None]:
import torch
import torch.nn as nn

# A tiny example: classify a 4×4 grayscale image
image = torch.randn(1, 1, 4, 4)   # (batch, channels, height, width)

flatten = nn.Flatten()            # reshapes (1, 1, 4, 4) → (1, 16)
classifier = nn.Linear(16, 3)     # matrix multiplication + bias

x = flatten(image)
logits = classifier(x)
print("Logits:", logits)

#Listing A-5 Fourier Transforms and Gradient Computation in Practice
Fourier transforms appear in audio processing, imaging, and scientific analysis. Audio models often convert raw waveforms into frequency representations before feeding them to a neural network. Similar techniques are used in medical imaging and vibration analysis to detect repeating patterns or anomalies.

Gradient computations drive the training process for neural networks. Every training step relies on gradients to update model parameters, and automatic differentiation makes it prac-tical to define complex models without manually deriving equations.

The following example shows two realistic uses of these operations: computing a frequen-cy representation of a short signal, and computing gradients for a simple linear regression model with explicit values.


In [None]:
import torch
import torch.nn.functional as F

# Example 1: Frequency analysis of a short signal
signal = torch.tensor([0.0, 1.0, 0.0, -1.0])
spectrum = torch.fft.fft(signal)
print("Frequency spectrum:", spectrum)

# Example 2: Gradients for a tiny regression model
weights = torch.tensor([[2.0]], requires_grad=True)
inputs  = torch.tensor([[1.0], [2.0]])
targets = torch.tensor([[2.0], [4.0]])

preds = inputs @ weights          # predictions: [[2.0], [4.0]]
loss = F.mse_loss(preds, targets) # zero loss in this case

loss.backward()
print("Gradient on weights:", weights.grad)

#Listing A-6 Using Sparse, Quantized, and Named Tensors in a Recommendation Workflow
Advanced tensor properties matter in systems that operate at scale or under resource con-straints. Sparse layouts reduce memory and computation when data contains many zeros. Quantized tensors allow models to run efficiently on hardware with limited memory or bandwidth. The .nonzero() method provides direct access to meaningful values without scanning dense structures. Named tensors improve correctness and maintainability in com-plex, multi-dimensional pipelines.

This code demonstrates how these properties appear in a recommendation workflow. Sparse tensors represent user–item interactions efficiently. Dense model outputs derived from these interactions can be quantized for deployment. Named dimensions clarify tensor structure throughout the pipeline.


In [None]:
import torch

# 1. Sparse user–item interaction matrix
indices = torch.tensor([[0, 1, 2],
                        [1, 3, 4]])
values = torch.tensor([5.0, 3.0, 4.0])

interactions = torch.sparse_coo_tensor(indices, values, size=(3, 5))
print("Is sparse:", interactions.is_sparse)

# 2. Identify active interactions
active_positions = interactions.coalesce().indices().t()
print("Active user–item pairs:\n", active_positions)

# 3. Dense model output derived from interactions
dense_output = torch.randn(3, 5)

# Quantize dense output for deployment
qoutput = torch.quantize_per_tensor(dense_output, scale=0.1,
                                    zero_point=0, dtype=torch.qint8)
print("Is quantized:", qoutput.is_quantized)

# 4. Use named tensors for clarity
named_output = dense_output.refine_names('user', 'item')
print("Named tensor dims:", named_output.names)
