In [1]:
import torch

### Core concept:
- `tensor.view()` creates a new view of the same underlying data but with a different shape
- Memory-efficient and no new data is allocated.
- Shared memory: Because the original tensor and its view share the same memory, any in-place changes to the view will also modify the original tensor, and vice versa. 

In [4]:
x  = torch.arange(12)
print(x.is_contiguous())

True


### `.view()` works only on <b>CONTIGUOUS TENSORS</b>
- If the tensor is not contiguous, `.view()` will throw an error

In [None]:
x = torch.randn(2, 3)
y = x.t()
print(y.is_contiguous())  # False
y.view(6)                 # RuntimeError


y = y.contiguous().view(6)  # ✅ Works now

### CONTIGUITY in TENSORS
- Logical order of the tensor is the same as the physical order in memory
Eg. ` a = [[1,2,3],[4,5,6]]
- In memory, it looks like : [1,2,3,4,5,6]
- stride = `(s_x,s_y)` = `(3,1)` (number of places moved in memory to reach the next row element,  number of places moved in memory to reach the next col element)


#### For a tensor with shape (s_0, s_1, ..., s_n-1), the strides (t_0, t_1, ..., t_n-1) must satisfy the following: Innermost dimension stride: 
- The stride of the last dimension (\(t_{n-1}\)) must be 1. 
- This means adjacent elements in the final dimension are also adjacent in memory.
- General stride condition: For any other dimension i, the stride t_i must be equal to the product of the sizes of all dimensions to its right.

Eg Consider a tensor x with shape (4, 3, 2). 
- Its expected contiguous strides would be (6, 2, 1).
- `t_2 = 1 `;  `t_1 = t_2 * s_1` ; `t_0 = t_1 * s_0`

### Operations that destroy CONTIGUITY:
- `.permute`
- `.transpose` 

In [None]:
# Row/Col matrices are contiguous even when transposed

y = torch.rand((1,6))
print(y.is_contiguous())
print(y.T.is_contiguous())
print(y.reshape(2,3).is_contiguous())

print("- - - - - -- - - - - -- - - -- - - -")
x = torch.tensor(([1,2,3],[4,5,6]))
print(x.is_contiguous())
print(x.T.is_contiguous())
print(x.reshape(3,2).is_contiguous())

True
True
True
- - - - - -- - - - - -- - - -- - - -
True
False
True


### The `-1` magic dimension
- In .view() you can enter -1 to let PyTorch infer the dimension of the reshaped tensor automatically 

- PyTorch will compute the missing dimension accordingly whichever position `-1` is in

- Only one -1 is allowed.

- The total number of elements must match.

In [18]:
a = torch.arange(12)
b = a.view(3,-1)

### WHen to use `.view()` vs `.reshape()`
- When you know that the tensor is contiguous, then use `.view()`
- If not sure, use `.reshape()`

But `.reshape()` is less memory efficient than `.view()` because it creates a new tensor in memory, although its not guaranteed to do so
- `.reshape()` creates a copy and allocates new memory when the original tensor is not contiguous. A tensor becomes non-contiguous after certain operations, like `.permute()` or `.transpose()`, which reorder the dimensions without changing the underlying data layout. 

- If the input tensor is already contiguous, reshape() will simply return a view of the original tensor, just like view() does. In this case, no new memory is allocated, and the operation is just as memory-efficient as view()

- `.view()` is more restrictive and will only return a view if the tensor is contiguous. If the tensor is non-contiguous, calling `.view()` will result in a RuntimeError. This forces the user to explicitly handle the contiguity, typically by calling `.contiguous()` first, which performs the memory copy manually.

In [None]:
a = torch.rand((2,3))
print(a.is_contiguous())
b = a.T
print(b.is_contiguous())


a.view(3,-1) # Will work because the tensor is contiguos
b.view(2,-1) # runtime error because b is not contiguous

print(b.reshape(2,-1)) # Will work even when the tensor is not contiguous

True
False
tensor([[0.0801, 0.7271, 0.3827],
        [0.2099, 0.8101, 0.2096]])


### Relationship between `.view()` and `.flatten()`

` x.view(-1) == x.flatten() `

The only difference:
- `.flatten()` can flatten across specific dimensions (start_dim, end_dim)
- `.view(-1)` flattens the entire tensor.


In [33]:
a = torch.rand((3,4,5,6,7)) # shape = (3,4,5,6,7)
b = a.flatten(start_dim=2) # shape = (3,4,210)
print(b.shape) # Flattens starting from the start_dim

c = a.view(-1) # flattens along all dimensions
print(c.shape) # [2520,]

torch.Size([3, 4, 210])
torch.Size([2520])


### When .view() fails silently (common trap)
- You may think .view() reshaped the tensor correctly, but if the tensor was non-contiguous, and you used .contiguous() incorrectly, you might accidentally copy data unknowingly.

In [35]:
x = torch.randn(2, 3, 4)
y = x.permute(0, 2, 1)          # non-contiguous
z = y.contiguous().view(2, -1)  # makes a copy

# Z no longer shares memory with y

### `.view_as() `
- `x.view_as(y) gives x the same shape as y`


In [3]:
a = torch.rand((2,3,4))
a.view(3,2,4)
b = torch.zeros((2,4,3)).view_as(a)
print(b.shape)

torch.Size([2, 3, 4])


In practice, `.view()` is most used in reshaping between:

- CNN outputs and fully connected layers.
- Flattening (N, C, H, W) → (N, C*H*W) before Linear layers.
- Expanding dimensions for broadcasting.

In [4]:
x = torch.randn(32, 512, 7, 7)
x = x.view(32, -1)  # shape: (32, 512*7*7)


### You can use `.view() ` to add or align dimensions for broadcasting.
- `.view()` doesn't create a new tensor in memory and is memory-efficient making it the first choice in CNN flattening layers
- In-place computations

In [5]:
x = torch.randn(32, 100)
bias = torch.randn(100)
x = x + bias.view(1, 100)  # (32, 100) + (1, 100) ✅ broadcast

In [6]:
print(bias.shape)
bias = bias.view(1, -1)
print(bias.shape)
bias = bias.expand(32, -1)
print(bias.shape)

torch.Size([100])
torch.Size([1, 100])
torch.Size([32, 100])


### Another time when  `.view()` fails is when we take a transpose of the tensor. 
- The reason is because the stride of the tensor changes. 
- The stride is n-dim tuple where each element gives the number of elements to move in memory to reach the next element in that dimension
- When the transpose is taken, the stride changes and makes the tensor non-contiguous

In [10]:
x = torch.rand((2,3,4))
print(x.stride())
y  = x.T
print(y.is_contiguous())
print(y.stride())


(12, 4, 1)
False
(1, 4, 12)


## “A tensor can only be contiguous if stride[-1] == 1.”
'''
In contiguous memory layout, elements along the last dimension are stored next to each other.
That means their stride (the step to move by one element in memory) must be 1.
'''



In [None]:
'''
You have a tensor x = torch.arange(12)
You call x.view(3, 4) and x.view(2, 6).
Both are valid — why? What is the general mathematical rule that decides whether a .view() operation is valid?
'''

'''
The number of elements in the tensor is 12. And they are stored in contiguous memory.And the stride is (1). 
When the shape is changed to 3,4 or 2,6 this first dimension of the stride is not changed and the number of elements is still the same. 
Therefore both are valid ops
'''

'\nIn contiguous memory layout, elements along the last dimension are stored next to each other.\nThat means their stride (the step to move by one element in memory) must be 1.\n'

In [None]:
x = torch.randn(2, 3)
x = x.t()
x.view(6)

## here what do you mean by x is not-contiguous?

'''
After transposing the stride of x becomes (1,3) and this makes the tensor not contiguous
'''

In [23]:
## If you modify a tensor created via .view(), will the original tensor change?
## Why might this be dangerous in backpropagation or data preprocessing pipelines?

import torch

# Create the initial tensor
a = torch.randn((2, 4, 5))

# Create a view of the tensor
b = a.view(2, -1)

# Get the memory address of the Python objects (will be different)
mem_a_obj = id(a)
mem_b_obj = id(b)

# Get the memory address of the underlying data (will be the same)
mem_a_data = a.data_ptr()
mem_b_data = b.data_ptr()

print(f"ID of Python object `a`: {hex(mem_a_obj)}")
print(f"ID of Python object `b`: {hex(mem_b_obj)}")
print("---")
print(f"Data pointer of tensor `a`: {hex(mem_a_data)}")
print(f"Data pointer of tensor `b`: {hex(mem_b_data)}")

# Check if the data pointers are equal
print(f"\nDo a and b share the same data? {a.data_ptr() == b.data_ptr()}")

''' 
Yes the original tensor will change because .view() is an in-place operation and doesn't create a copy unlike .reshape() 
Corrupted gradients: If you make an in-place modification on a tensor that is a view, you are also changing the original tensor, which may have been saved for a gradient calculation. When the backward() pass runs, it will use the modified data instead of the original data, leading to incorrect or unpredictable gradients.
Failed checks: To prevent this, PyTorch often performs checks during backpropagation. If it detects that a tensor has been modified in-place, it will raise a RuntimeError to alert you of the invalid state. 
'''


ID of Python object `a`: 0x121013ef0
ID of Python object `b`: 0x12157b350
---
Data pointer of tensor `a`: 0x1368d9c80
Data pointer of tensor `b`: 0x1368d9c80

Do a and b share the same data? True


" \nYes the original tensor will change because .view() is an in-place operation and doesn't create a copy unlike .reshape() \nCorrupted gradients: If you make an in-place modification on a tensor that is a view, you are also changing the original tensor, which may have been saved for a gradient calculation. When the backward() pass runs, it will use the modified data instead of the original data, leading to incorrect or unpredictable gradients.\nFailed checks: To prevent this, PyTorch often performs checks during backpropagation. If it detects that a tensor has been modified in-place, it will raise a RuntimeError to alert you of the invalid state. \n"

### How `.contiguous()` fixes it
- The `.contiguous()` method fixes the contiguity problem by forcing a memory copy. 
- It allocates a new, fresh block of memory and copies the data from the non-contiguous tensor into it, arranged in the correct, sequential (row-major) order for its new shape. 



#### The problem: Sparse matrix compression
- You are given a 2D tensor M representing a sparse matrix. The matrix contains many zero values. To save memory and processing time, you want to compress this matrix into a 1D tensor C that only stores the non-zero values.
- The non-zero values should be arranged sequentially in C, preserving their original row-major order.
- The challenge is to perform this compression using only zero-copy operations (view(), permute(), narrow(), etc.) and basic tensor indexing. If a memory copy is necessary at any point, the solution is considered incorrect.
- You must then decompress the 1D tensor C back into a 2D tensor of the original matrix's shape. This decompression must also be a zero-copy operation.
- Finally, write a function is_valid_compression(M, C_decompressed) that checks if the decompression successfully reproduced the original non-zero values and their positions, without using any loops.





Constraints:
1. The original matrix M is a 2D tensor of shape (R, K).
2. You cannot use torch.reshape() on non-contiguous tensors.
3. The use of torch.flatten() or .contiguous() is strictly forbidden for both compression and decompression.
4. The final tensors for decompression and validation must be views of the original tensor data.

In [26]:
M = torch.rand((10,20))