# Tensors and Basic Operations 

## Tensors 
A tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes at tf.dtypes.

All tensors are immutable like Python numbers and strings: you can never update the contents of a tensor, only create a new one.



### Framework Used

**Note-** This notebook uses **PyTorch** for implementing tensor operations and deep learning concepts. PyTorch is chosen due to its simplicity, dynamic computation graph, and ease of debugging, which makes it suitable for learning and experimentation.


**Data Types Include**: float32, int32, string and others.

**Shape**: Represents the dimension of data.

Just like vectors and matrices tensors can have operations applied to them like addition, subtraction, dot product, cross product etc.

### Scalar
- A tensor with **zero dimensions**
- Represents a single numerical value
- A scalar contains a single value, and no "axes".

Example:  
x = 5


In [1]:
import torch
scalar = torch.tensor(5)
scalar
x=scalar.shape
print(x)


torch.Size([])


### Vector
- A tensor with **one dimension**
- Represents a list of numbers
-  A vector has one axis

Example:  
x = [1, 2, 3]


### Matrix
- A tensor with **two dimensions**
- Represents rows and columns

Example:  
x = [[1, 2],  
     [3, 4]]


In [2]:
matrix = torch.tensor([[1, 2], [3, 4]])
matrix
x=matrix.shape
print(x)



torch.Size([2, 2])


### Higher-Order Tensors
- Tensors with **three or more dimensions**
- Used to represent images, videos, and batches of data

Example:
- 3D tensor → RGB image
- 4D tensor → batch of images


In [4]:
tensor_3d = torch.randn(2, 3, 4)
tensor_3d.shape


torch.Size([2, 3, 4])

### 1. Creating a Tensor from Python Data

Tensors can be created directly from Python lists or nested lists.


In [5]:
import torch

# Scalar
scalar = torch.tensor(10)

# Vector
vector = torch.tensor([1, 2, 3, 4])

# Matrix
matrix = torch.tensor([[1, 2, 3],
                       [4, 5, 6]])

scalar, vector, matrix


(tensor(10),
 tensor([1, 2, 3, 4]),
 tensor([[1, 2, 3],
         [4, 5, 6]]))

### 2. Creating Tensors with Specific Values

Deep learning frameworks provide functions to create tensors filled with predefined values.


In [7]:
# Tensor filled with zeros
zeros_tensor = torch.zeros(2, 3)

# Tensor filled with ones
ones_tensor = torch.ones(2, 3)

# Identity matrix
identity_tensor = torch.eye(3)

zeros_tensor, ones_tensor, identity_tensor


(tensor([[0., 0., 0.],
         [0., 0., 0.]]),
 tensor([[1., 1., 1.],
         [1., 1., 1.]]),
 tensor([[1., 0., 0.],
         [0., 1., 0.],
         [0., 0., 1.]]))

### 3. Creating Random Tensors

Random tensors are widely used to initialize weights in neural networks.


In [9]:
# Random values between 0 and 1
random_uniform = torch.rand(2, 3)

# Random values from normal distribution (mean=0, std=1)
random_normal = torch.randn(2, 3)

random_uniform, random_normal


(tensor([[0.9042, 0.0924, 0.8552],
         [0.7995, 0.1496, 0.0989]]),
 tensor([[-0.6224, -0.5790,  1.2360],
         [ 1.8613,  0.0829, -1.0737]]))

### 4. Creating Tensors with a Specific Data Type

Tensors can be created with specific data types such as integers or floating-point values.


In [10]:
float_tensor = torch.tensor([1, 2, 3], dtype=torch.float32)
int_tensor = torch.tensor([1, 2, 3], dtype=torch.int32)

float_tensor.dtype, int_tensor.dtype


(torch.float32, torch.int32)

### 5. Creating Tensors with a Specific Shape

Sometimes we need tensors of a particular shape for neural network layers.,
(Shape: The length (number of elements) of each of the axes of a tensor)


In [8]:
import torch

rank_4_tensor = torch.zeros(3, 2, 4, 5)
print(rank_4_tensor)


tensor([[[[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]]],


        [[[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]],

         [[0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.],
          [0., 0., 0., 0., 0.]]]])


## Understanding the output 
Each level of nested square brackets corresponds to one axis of the tensor.

The outermost level contains 3 large blocks, which represent the first axis. Inside each block, there are 2 sub-blocks, corresponding to the second axis. Each sub-block is a matrix with 4 rows, representing the third axis. Each row contains 5 values, which form the fourth axis.

All values in the tensor are 0. because the tensor was created using torch.zeros(), which initializes every element to zero.

You can think of this tensor as:

3 groups of data

each group containing 2 matrices

each matrix having 4 rows

each row containing 5 values
## Understanding a Rank-4 Tensor Using Axes

![4-Axis Tensor](https://www.tensorflow.org/static/guide/images/tensor/4-axis_block.png)

The image above visually represents a **rank-4 tensor**, which means the tensor has **four axes (dimensions)**. Each axis represents one level of data organization. Instead of thinking of a tensor as just numbers, it is helpful to think of it as data arranged across multiple directions.



In [11]:
# Create an empty tensor (values are uninitialized)
empty_tensor = torch.empty(3, 2)

# Create a tensor with the same shape as another tensor
reference = torch.ones(2, 2)
same_shape = torch.zeros_like(reference)

empty_tensor, same_shape


(tensor([[0., 0.],
         [0., 0.],
         [0., 0.]]),
 tensor([[0., 0.],
         [0., 0.]]))

### 6. Creating Tensors Using Ranges

Range-based tensors are useful for indexing and testing.


In [13]:
# Sequence of numbers
range_tensor = torch.arange(0, 10)

# Evenly spaced values
linspace_tensor = torch.linspace(0, 1, steps=5)

range_tensor, linspace_tensor


(tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000]))

### 7. Creating Tensors That Track Gradients

For training neural networks, tensors must track gradients.


In [14]:
weights = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
weights


tensor([1., 2., 3.], requires_grad=True)

## Key Points to Remember

- Tensors can be created from lists, predefined values, or random distributions
- Shape, data type, and device are important properties
- Random tensors are commonly used for weight initialization
- Gradient tracking is essential for training neural networks


## Indexing and Slicing of Tensors

Indexing and slicing are used to access or extract specific elements or parts of a tensor.  
This is very important in deep learning for selecting features, batches, channels, and regions of data.

### What is Indexing?

Indexing means selecting **individual elements** of a tensor using their position (index).  
Indexing in tensors starts from **0**.






In [15]:
import torch

# Create a 1D tensor (vector)
x = torch.tensor([10, 20, 30, 40, 50])
x


tensor([10, 20, 30, 40, 50])

### 1. Indexing a 1D Tensor (Vector)


In [18]:
x[0]   # first element


tensor(10)

In [19]:
x[2]   # third element


tensor(30)

In [20]:
x[-1]  # last element


tensor(50)

- Positive index → counts from the beginning  
- Negative index → counts from the end


### 2. Slicing a 1D Tensor

Slicing extracts a **range of elements** from a tensor.
Syntax: `start : end` (end index is excluded)


In [21]:
x[1:4]   # elements from index 1 to 3


tensor([20, 30, 40])

In [22]:
x[:3]    # first three elements


tensor([10, 20, 30])

In [23]:
x[2:]    # elements from index 2 to end


tensor([30, 40, 50])

### Step Size in Slicing

Syntax: `start : end : step`


In [24]:
x[::2]   # every second element


tensor([10, 30, 50])

In [27]:
import torch

x = torch.tensor([10, 20, 30, 40, 50])
torch.flip(x, dims=[0])
 # reverse the tensor


tensor([50, 40, 30, 20, 10])

### 3. Indexing a 2D Tensor (Matrix)


In [28]:
# Create a 2D tensor
m = torch.tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
m


tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

### Accessing Rows


In [29]:
m[0]     # first row


tensor([1, 2, 3])

In [30]:
m[1]     # second row


tensor([4, 5, 6])

### Accessing Columns


In [31]:
m[:, 0]  # first column


tensor([1, 4, 7])

In [32]:
m[:, 2]  # third column


tensor([3, 6, 9])

### 3. Slicing a 2D Tensor


In [33]:
m[0:2, 1:3]  # rows 0–1 and columns 1–2


tensor([[2, 3],
        [5, 6]])

In [34]:
m[:, 1:]     # all rows, columns from index 1 onward


tensor([[2, 3],
        [5, 6],
        [8, 9]])

In [35]:
m[:, 1:]     # all rows, columns from index 1 onward


tensor([[2, 3],
        [5, 6],
        [8, 9]])

### 4. Indexing Higher-Dimensional Tensors

Higher-order tensors are common in deep learning.
Example: 3D tensor (batch, rows, columns)


In [37]:
t = torch.randn(2, 3, 4)
t.shape


torch.Size([2, 3, 4])

### 5. Using Ellipsis (...)

Ellipsis is used to represent missing dimensions.


In [38]:
t[..., 1]   # all elements from last dimension index 1


tensor([[ 0.3199,  0.2449,  0.2132],
        [-2.2484,  0.3037, -0.0742]])

### 6. Boolean Indexing

Boolean indexing selects elements based on conditions.


In [40]:
x = torch.tensor([5, 10, 15, 20])
x[x > 10]


tensor([15, 20])

## Reshaping Tensors

Reshaping means changing the **shape (dimensions)** of a tensor **without changing its data**.  
In deep learning, reshaping is very common when preparing data for neural network layers.


### Why Reshaping is Needed in Deep Learning

- Neural network layers expect inputs in specific shapes  
- Images, text, and batches must be reshaped correctly  
- Helps convert data between layers (e.g., CNN → Fully Connected layer)


### Understanding Tensor Shape

The **shape** of a tensor tells:
- How many dimensions it has  
- How many elements are in each dimension


In [41]:
import torch

x = torch.tensor([1, 2, 3, 4, 5, 6])
x.shape


torch.Size([6])

### 1. Reshaping Using `reshape()`

The `reshape()` function changes the shape of a tensor.


In [42]:
x = torch.tensor([1, 2, 3, 4, 5, 6])
x_reshaped = x.reshape(2, 3)
x_reshaped


tensor([[1, 2, 3],
        [4, 5, 6]])

Rule:  
Total number of elements **must remain the same**.


### 2. Using `-1` to Automatically Infer Size

PyTorch can automatically calculate one dimension.


In [43]:
x.reshape(3, -1)


tensor([[1, 2],
        [3, 4],
        [5, 6]])

Here, PyTorch finds the correct value for `-1`.


### 3. Flattening a Tensor

Flattening means converting a multi-dimensional tensor into a 1D tensor.


In [44]:
m = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])

m.flatten()


tensor([1, 2, 3, 4, 5, 6])

### 4. `view()` vs `reshape()`

Both change the shape of a tensor, but they differ internally.


In [46]:
x = torch.tensor([1, 2, 3, 4])
x.view(2, 2)


tensor([[1, 2],
        [3, 4]])

- `view()` works only if tensor memory is contiguous  
- `reshape()` is safer and preferred


### 5. Adding or Removing Dimensions

Sometimes we need to add or remove dimensions.


In [47]:
x = torch.tensor([1, 2, 3])

# Add a new dimension
x.unsqueeze(0)


tensor([[1, 2, 3]])

In [48]:
# Remove dimensions of size 1
x.unsqueeze(0).squeeze()


tensor([1, 2, 3])

## Tensor Arithmetic Operations

Tensor arithmetic operations are mathematical operations performed on tensors.  
These operations are applied **element-wise** and are heavily used in deep learning for computing outputs, losses, and gradients.


In [49]:
import torch

a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])


In [50]:
# Addition
a + b


tensor([5, 7, 9])

In [51]:
# Subtraction
a - b


tensor([-3, -3, -3])

In [52]:
# Multiplication (element-wise)
a * b


tensor([ 4, 10, 18])

In [53]:
# Division
a / b


tensor([0.2500, 0.4000, 0.5000])

### Matrix Multiplication


In [54]:
m1 = torch.tensor([[1, 2],
                   [3, 4]])
m2 = torch.tensor([[5, 6],
                   [7, 8]])

torch.matmul(m1, m2)


tensor([[19, 22],
        [43, 50]])

## Broadcasting 

Broadcasting allows arithmetic operations between tensors of **different shapes**.


### Rules of Broadcasting
1. Compare tensor shapes from right to left  
2. Dimensions are compatible if they are equal or one of them is 1  
3. Smaller tensor is automatically expanded


In [55]:
x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]])
y = torch.tensor([1, 2, 3])

x + y


tensor([[2, 4, 6],
        [5, 7, 9]])

Here, `y` is broadcast across rows of `x`.


Broadcasting avoids copying data and improves performance.


The output shows the **probabilities** of each class.  
Softmax is ideal for problems with >2 classes.


## Key Concepts of Automatic Differentiation in Tensorflow
### 1.Computatinal Graph

Computational graphs are used to represent mathematical expressions in a structured way. In deep learning, they act like a descriptive language that explains how a model performs its computations. Formally, a computational graph is a directed graph that shows how values are computed and how data flows through different operations.

A computational graph supports two kinds of calculations: forward computation and backward computation. During forward computation, input values move through the graph and each operation produces an output, eventually giving the final result. During backward computation, gradients are calculated by moving backward through the graph, which is essential for training neural networks.

In a computational graph, variables are represented as nodes. These variables can be scalars, vectors, matrices, tensors, or other data structures. Edges represent data dependencies or function arguments, showing how one variable influences another. Operations such as addition, subtraction, or multiplication are also represented in the graph, and more complex functions are built by combining these simple operations.

For example, consider the expression

\[
Y = (a + b) * (b - c)
\]

To make the computation clearer, we introduce intermediate variables:

\[
d = a + b
\]

\[
e = b - c
\]

\[
Y = d * e
\]

Each of these steps becomes a node in the computational graph, and the arrows show how outputs from one operation are used as inputs for the next.


Each of these steps becomes a node in the computational graph, and the arrows show how outputs from one operation are used as inputs for the next.

In deep learning, computational graphs explain why training is divided into two phases. The forward pass computes the output and loss of the neural network, while the backward pass computes gradients using the chain rule. These gradients tell us how changes in inputs or parameters affect the final output.

When computing derivatives, the key idea is understanding how a small change in one variable affects another variable that depends on it. If a variable directly or indirectly influences the output, its contribution is captured using partial derivatives. By applying the chain rule across the graph, backpropagation efficiently computes derivatives with respect to all input variables.

There are two main types of computational graphs. 

**Static computational graphs** are defined completely before execution, which allows strong optimization and faster performance but makes them less flexible for variable-sized or dynamic data.

**Dynamic computational graphs** are built during execution, making them easier to debug and more flexible, though they offer fewer opportunities for global optimization.

![Computational Graph](https://media.geeksforgeeks.org/wp-content/uploads/20200527151747/e19.png)

![Computational Graph1](https://media.geeksforgeeks.org/wp-content/uploads/20200527173723/e34.png)




### 2.Gradient Boosting

Gradient Boosting is a powerful machine learning technique that builds a strong model by combining many weak models. The main idea is simple: instead of trying to build one perfect model, we build models step by step, where each new model focuses on correcting the mistakes made by the previous ones.

In Gradient Boosting, models are added sequentially, not independently. Each new model is trained to reduce the error/loss of the overall system. This is why it is called boosting ,every new model boosts the performance of the existing ensemble.

The process begins with a simple model, often one that makes very rough predictions. We then calculate how wrong these predictions are using a loss function. The next model is trained to predict these errors. This process is repeated many times, gradually improving the model’s predictions.

The term *gradient* comes from optimization. At each step, Gradient Boosting uses the gradient of the loss function to decide how the next model should improve the predictions. In other words, the model moves in the direction that most reduces the error.

Decision trees are commonly used as the weak learners in Gradient Boosting. These trees are usually shallow, meaning they are simple and make limited decisions. Even though each tree is weak on its own, combining many of them results in a very strong predictive model.

Gradient Boosting works well for both regression and classification problems. It is widely used because it can capture complex patterns in data and often delivers high accuracy. However, it can be sensitive to noise and overfitting if not properly tuned.

Important parameters in Gradient Boosting include the number of trees, learning rate, and tree depth. The learning rate controls how much each new model contributes, while the number of trees determines how long the boosting process continues.

Because Gradient Boosting builds models sequentially, training can be slower compared to methods like bagging. However, the performance gains often justify the additional computation.


### Steps Involved in Gradient Boosting

Gradient Boosting builds a strong predictive model by adding weak models one after another. Each new model focuses on correcting the errors made by the previous models. The process happens step by step as follows.

#### Step 1: Initialize the Model
The process starts with a simple model that makes basic predictions.  
For regression problems, this is usually the **mean of the target values**.  
This initial model does not use any features and serves as a starting point.

#### Step 2: Compute the Loss
The predictions made by the initial model are compared with the actual target values using a **loss function**.  
The loss function measures how wrong the predictions are.

#### Step 3: Calculate the Residuals
Residuals represent the **errors** made by the model.  
They are calculated as the difference between the actual values and the predicted values.  
These residuals tell us what the model failed to learn.

#### Step 4: Train a Weak Learner on Residuals
A new weak model (usually a shallow decision tree) is trained to predict the residuals instead of the original target values.  
This model learns patterns in the errors made by the previous model.

#### Step 5: Update the Model
The predictions from the new weak learner are added to the existing model.  
A **learning rate** is used to control how much influence the new model has.  
This helps prevent overfitting.

#### Step 6: Repeat the Process
Steps 2 to 5 are repeated multiple times.  
Each new model focuses on reducing the remaining errors left by the previous models.

#### Step 7: Final Prediction
The final model is a combination of all the weak learners.  
Predictions are made by summing the contributions from each model in the sequence.


### 3.Gradient Tape

Gradient Tape is a mechanism provided by TensorFlow to support automatic differentiation. It is implemented as a context manager using `tf.GradientTape()`. When computations are performed inside this context, TensorFlow records all operations involving tensors. This recording allows TensorFlow to later compute gradients with respect to selected variables.

The main idea behind Gradient Tape is simple: if TensorFlow knows how a value was computed, it can determine how that value changes when its inputs change. Gradient Tape keeps track of these dependencies by building a computational graph dynamically during execution.

Only tensors that are being watched by the tape are considered for gradient computation. By default, trainable variables are automatically watched, but other tensors can also be explicitly monitored. Once the forward computation is complete, gradients can be calculated by asking the tape to compute derivatives of an output with respect to the desired input variables.

Gradient Tape is especially useful because it allows developers to write normal Python code while still enabling backpropagation. This makes model development flexible and intuitive, particularly when experimenting with custom loss functions or training loops.

 

### 4.Gradient Descent

Gradient Descent is an optimization algorithm used to minimize a function, most commonly a loss function in machine learning. The goal of training a model is to find parameter values that produce the smallest possible loss, and gradient descent provides a systematic way to achieve this.

The key idea is based on the gradient, which represents the direction of the steepest increase of a function. To reduce the loss, parameters are updated in the opposite direction of the gradient. Each update moves the parameters slightly closer to the minimum of the loss function.

This process is iterative. Starting from an initial set of parameters, gradients are computed, parameters are updated, and the loss is recalculated. Over many iterations, these small updates gradually improve the model’s performance.

Gradient descent relies heavily on automatic differentiation. Without accurate gradients, the optimization process would not know how to adjust parameters. Together, automatic differentiation and gradient descent form the foundation of training neural networks.

Different variations of gradient descent exist, such as batch gradient descent, stochastic gradient descent, and mini-batch gradient descent, but they all follow the same core principle of learning by moving in the direction that reduces error.


## Task for the Reader  
Using a deep learning framework of your choice (TensorFlow or PyTorch), perform the following tasks to understand tensors and their basic operations.

a) Create a scalar tensor, a 1-D tensor, and a 2-D tensor. Print each tensor along with its shape and rank.

b) Perform element-wise addition and element-wise multiplication on two tensors of the same shape.

c) Multiply a tensor by a scalar value and observe how the values change.

d) Access a specific element from a tensor using indexing and extract a sub-part of a tensor using slicing.

e) Reshape a tensor into a different shape and explain how reshaping affects the tensor without changing the total number of elements.
